https://integration.wikimedia.org/zuul/ is currently showing puppet CRs queued up to 18 minutes on the test-prio queue.
This performance issue is blocking the normal workflow of the SRE team.
Description
Details
Project | Branch | Lines +/- | Subject | |
---|---|---|---|---|
mediawiki/core | master | +311 -142 | Make LocalisationCache a service | |
mediawiki/core | REL1_34 | +311 -142 | Make LocalisationCache a service | |
integration/config | master | +1 -1 | Move puppet jobs to dedicated small node | |
mediawiki/core | master | +404 -1 K | Revert "Make LocalisationCache a service" |
Related Objects
Event Timeline
For context, the actual time to run the tests for operations/puppet is under one minute for most patches.
Either Zuul or jenkins are broken, and this has been a constant pain in the last few weeks for everyone involved.
Triaging to UBN! because this is effectively an outage of the service.
Change 532399 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/core@master] Revert "Make LocalisationCache a service"
Change 532437 had a related patch set uploaded (by Thcipriani; owner: Thcipriani):
[integration/config@master] Move puppet jobs to dedicated small node
Change 532399 merged by jenkins-bot:
[mediawiki/core@master] Revert "Make LocalisationCache a service"
The root cause was a faulty patch merged in mediawiki/core on Friday. It roughly doubled the time it takes to run the tests so that eg Wikibase changes occupied execution slot for up to an hour. In turn that starved the very thin pool of executors we currently have which thus delayed execution of jobs for non MediaWiki repo.
Anyway, that has been fixed by reverting the faulty code.
Change 532679 had a related patch set uploaded (by simetrical; owner: simetrical):
[mediawiki/core@master] Make LocalisationCache a service
Change 532437 merged by jenkins-bot:
[integration/config@master] Move puppet jobs to dedicated small node
Change 532679 merged by jenkins-bot:
[mediawiki/core@master] Make LocalisationCache a service
Change 541624 had a related patch set uploaded (by Jforrester; owner: simetrical):
[mediawiki/core@REL1_34] Make LocalisationCache a service
Change 541624 merged by jenkins-bot:
[mediawiki/core@REL1_34] Make LocalisationCache a service