Debugging timeouts: make tests verbose in CI #31087

rcomer · 2026-02-05T11:08:16Z

This PR is not for merging, just using the CI.

Made pytest verbose in hope of generating clues for diagnosing the subprocess timeout problem #30851. Removed the standard Ubuntu tests because I don't think we ever see timeouts there.

rcomer · 2026-02-05T11:57:46Z

Timeouts never come along when you want them!

rcomer · 2026-02-05T13:11:22Z

Here we have a timeout for test_font_manager.py::test_fontcache_thread_safe . If I am correct that the tests report in the order that they complete then gw1 was running test_streamplot.py::test_integration_options[png] while gw0 was timing out on test_font_manager.py::test_fontcache_thread_safe .
https://dev.azure.com/matplotlib/matplotlib/_build/results?buildId=47001&view=logs&j=0e51c8a9-4119-5244-b946-ee585b1b50f1&t=e90dd38b-859f-5d2f-90bb-85e3d7473c7a&l=16349

That test is relatively new so could fit with when we started to see CI timeouts.

rcomer · 2026-02-05T13:26:00Z

Very aware that I may be clutching at straws / on a wild goose chase / insert idiom of choice here.

story645 · 2026-02-05T15:29:28Z

Is there a way we could trigger verbose output via commit message? Basically I know this is a draft, but would it be useful to have a permanent flavor of this PR?

rcomer · 2026-02-05T16:39:48Z

At this point I don't even know if any of this is useful! 🫣

timhoffm · 2026-02-05T18:25:21Z

Is there a way we could trigger verbose output via commit message?

What‘s the purpose? Either we want to generally collect information, in which case we just switch on verbose on main. Or we only want to do targeted experiments, in which case a branch in a PR is sufficient.

story645 · 2026-02-05T18:34:22Z

What‘s the purpose?

For the reason I think Ruth opened this PR? Something is going wonky and it seems CI specific and it'd be helpful to have the verbose output specifically here for debugging the one thing.

timhoffm · 2026-02-06T00:47:41Z

Ye but what do you need the „trigger from commit message“ for? We can configure this once and commit it. Either here if we want to limit changes to a specific experimental environment, or on main if we want to generally collect data.

story645 · 2026-02-06T00:54:28Z

Either here if we want to limit changes to a specific experimental environment

I don't know what you mean by this

on main if we want to generally collect data.

I'm guessing we don't want this all the time, only when a PR is breaking in ways where the short messages are unhelpful.

timhoffm · 2026-02-06T06:37:28Z

Ah, I think I understand our mutual misunderstanding: I'm focussed on the flaky timeout issue (where toggling verbosity does not help). You are discussing whether toggling verbosity would be a generally desirable debugging tool.

Let's move that discussion out and focus here on the timeouts. You are welcome to open an issue for the general solution if you are interested.

timhoffm · 2026-02-06T06:46:40Z

@rcomer I believe your ideas and approach is valuable. Obviously, removing that single test didn't cut it.

I noticed here an in the run before that before the timeout, many (>10) tests from the other worker completed. So it's likely not a single long-running blocking task in the other worker. Ideas for further investigation:

Run in a single worker to check whether it's fundamentally tied to concurrency.
Run the original configuration multiple times, collect logs of failing tests, and investigate more broadly whether we can spot patterns around the failing tests (like many test completions in the other worker, but more broadly: are there recurring tests in the one or other worker etc.)

rcomer · 2026-02-06T12:41:33Z

Running in one worker: the first attempt gave no timeouts but there is pretty big variation in how long tests take.

On Azure py312 and py313, test_backend_inline.py::test_ipynb was the slowest at around half a minute, whereas on py311 it only took 7 seconds.

On MacOS 14 both python versions, test_backends_interactive.py::test_other_signal_before_sigint[show-kwargs0-MPLBACKEND=macosx-BACKEND_DEPS=matplotlib.backends._macosx] took slightly more than two minutes https://github.com/matplotlib/matplotlib/actions/runs/21749380505/job/62743194913?pr=31087#step:13:10338
https://github.com/matplotlib/matplotlib/actions/runs/21749380505/job/62743194872?pr=31087#step:13:10332

So maybe the concurrency is a red herring.

Trying again to see what turns up....

rcomer · 2026-02-06T13:26:59Z

This time MacOS 15 instead of MacOS 14 has the long-running test_other_signal_before_sigint[show-kwargs0-MPLBACKEND=macosx-BACKEND_DEPS=matplotlib.backends._macosx]
https://github.com/matplotlib/matplotlib/actions/runs/21750941199/job/62748452349?pr=31087#step:13:10336
https://github.com/matplotlib/matplotlib/actions/runs/21750941199/job/62748452338?pr=31087#step:13:10334

Edit: just realised this test is anyway xfailed, so maybe not one to focus on

matplotlib/lib/matplotlib/tests/test_backends_interactive.py

Lines 789 to 795 in d68c7e3

    
           if sys.platform == "darwin" and target == "show": 
        
               # We've not previously had these toolkits installed on CI, and so were never 
        
               # aware that this was crashing. However, we've had little luck reproducing it 
        
               # locally, so mark it xfail for now. For more information, see 
        
               # https://github.com/matplotlib/matplotlib/issues/27984 
        
               request.node.add_marker( 
        
                   pytest.mark.xfail(reason="Qt backend is buggy on macOS"))

rcomer · 2026-02-06T18:19:10Z

MacOS 14 and 15 have less capacity (CPU and memory) than any other runners
https://docs.github.com/en/actions/reference/runners/github-hosted-runners#standard-github-hosted-runners-for-public-repositories

In GitHub Actions we set the pytest number of runners to "auto". In this recent PR, pytest chose 3 workers for MacOS and only 2 for Ubuntu.
https://github.com/matplotlib/matplotlib/actions/runs/21737535668?pr=31085

That seems the wrong way around! Maybe we should just fix it at 2?

rcomer · 2026-02-06T19:07:36Z

On Ubuntu-arm at the top of this PR, pytest chose 4 workers. There was 1 timeout and 1 UnraiseableException.
https://github.com/matplotlib/matplotlib/actions/runs/21710571566/job/62612794806#step:13:31

rcomer · 2026-02-06T19:44:42Z

Well that gave me a failure, but it wasn't a timeout. It does have something to do with subprocesses though.
https://github.com/matplotlib/matplotlib/actions/runs/21762914110/job/62790968375?pr=31087#step:13:20536

The successful runs completed in 16-19 minutes, which I think is pretty consistent with what we get in general. So I don't think fixing at two runners will lose us anything. Re-spinning...

rcomer · 2026-02-06T20:15:39Z

Gah! Webagg timeout.

QuLogic · 2026-02-06T21:17:00Z

Possibly related, but when I run locally in parallel, then lib/matplotlib/tests/test_widgets.py::test_span_selector_animated_artists_callback (which uses Qt) often pauses until I mouse over it. I don't know why it seems to need some interaction to proceed, but if I run it by itself, it's fine and exits almost immediately.

There are also some knock-on effects of running this test with Qt, i.e., #31049; perhaps something is crashing, but not correctly raising due to it?

rcomer · 2026-02-08T15:54:40Z

Regardless of the time issue, perhaps increasing the density of subprocess-calling tests might trigger more clues.

rcomer · 2026-02-08T16:24:07Z

Huh. Azure only runs 28 tests this way with 165 skips.

This reverts commit 62be215.

rcomer · 2026-02-08T22:13:11Z

In a sample of 5 tries, I don't get any timeouts on Azure when limiting to the subprocess tests.

rcomer · 2026-02-09T13:03:12Z

With the current HEAD, Azure py312 timed out twice in test_blitting_events[MPLBACKEND=tkagg-BACKEND_DEPS=tkinter]. No other failures on Azure.

Let's see if it comes up a third time.

CI: make tests verbose [skip doc] [skip appveyor]

7c720ff

rcomer closed this Feb 5, 2026

rcomer reopened this Feb 5, 2026

skip the streamplot test [skip doc] [skip appveyor]

97622f3

rcomer force-pushed the verbose-ci branch from 8dfa2d4 to 97622f3 Compare February 5, 2026 13:48

timhoffm changed the title ~~CI: make tests verbose~~ Debugging timeouts: make tests verbose in CI Feb 6, 2026

run with single worker [skip doc] [skip appveyor]

f4b32fc

rcomer closed this Feb 6, 2026

rcomer reopened this Feb 6, 2026

rcomer closed this Feb 6, 2026

rcomer reopened this Feb 6, 2026

try GHA with two workers [skip doc] [skip appveyor]

82788d2

Only run tests that start subprocesses [skip doc] [skip appveyor]

fb30734

rcomer force-pushed the verbose-ci branch from 8cf7f06 to fb30734 Compare February 8, 2026 15:56

rcomer added 2 commits February 8, 2026 18:39

Revert "add __del__ method to classes that use Popen"

35e6145

This reverts commit 62be215.

skip sigint tests [skip doc] [skip appveyor]

c0d9bb1

github-actions bot removed the topic: animation label Feb 8, 2026

rcomer force-pushed the verbose-ci branch from 4f65c1b to e60f206 Compare February 8, 2026 21:44

only re-run on timeouts [skip doc] [skip appveyor]

f9d4b15

rcomer force-pushed the verbose-ci branch from e60f206 to f9d4b15 Compare February 8, 2026 21:55

story645 mentioned this pull request Feb 8, 2026

[MNT]: verbose toggle for CI #31120

Open

Run all tests on Azure; remove flaky markers [skip doc] [skip appveyor]

8657d9d