Page MenuHomePhabricator

"MediaWiki branch and publish WMF single-version image" job on releases-jenkins seems to pause for ~20 minutes during each run
Closed, InvalidPublicBUG REPORT

Description

I was checking the log of today's build with timestamps set to "elapsed time" and noticed:

00:08:34.388  create mode 160000 extensions/TitleBlacklist
00:08:37.093  create mode 160000 extensions/TocTree
00:28:32.110  create mode 160000 extensions/TorBlock
00:28:34.796  create mode 160000 extensions/Translate

That's an unexplained 20 minute pause between operations.

I zoomed back out to the build time trend graph and I can see this build was pretty typical in runtime.

Screenshot 2025-02-26 at 18.53.10.png (924×1 px, 115 KB)

Now I'm spot checking randomly and see that there is a similar pause every day. It happens in the same general area, but not exactly the same each time.

00:08:28.848  create mode 160000 extensions/TheWikipediaLibrary
00:29:44.476  create mode 160000 extensions/TimedMediaHandler
00:08:35.364  create mode 160000 extensions/TheWikipediaLibrary
00:28:40.741  create mode 160000 extensions/TimedMediaHandler
00:08:26.688  create mode 160000 extensions/SiteMatrix
00:29:28.436  create mode 160000 extensions/SpamBlacklist
00:08:37.232  create mode 160000 extensions/TimedMediaHandler
00:28:42.600  create mode 160000 extensions/TitleBlacklist

More investigation is needed. Can we reproduce this in an interactive shell where we might see what is going on? Is git doing something unexpectred? Is the instance running out of ram and swapping? Are we swamping out the ganetti hypervisor itself somehow?

Event Timeline

Another notable bit: The timestamps stop appearing in the log well before the last line of output:

17:28:35  create mode 160000 extensions/UniversalLanguageSelector
17:28:35  create mode 160000 extensions/UploadWizard
 create mode 160000 extensions/UploadsLink
 create mode 160000 extensions/UrlShortener
<many more lines without timestamps>

I'm inclined to blame the timestamper since the wait for merge was 19m55s. Suspiciously similar to the delay in the timestamp value:

Waiting up to 3600 seconds for https://gerrit.wikimedia.org/r/c/1123064 to merge
https://gerrit.wikimedia.org/r/c/1123064 has been merged after 19m55s

I'm inclined to blame the timestamper since the wait for merge was 19m55s. Suspiciously similar to the delay in the timestamp value:

Waiting up to 3600 seconds for https://gerrit.wikimedia.org/r/c/1123064 to merge
https://gerrit.wikimedia.org/r/c/1123064 has been merged after 19m55s

I was really hoping to find a magic time savings here, but I think you are right that this is really just a bug in the elapsed time display mode. The timestamps in the raw log view attribute the delay to things running through gate-and-submit: https://releases-jenkins.wikimedia.org/job/MediaWiki%20branch%20and%20publish%20WMF%20single-version%20image/176/timestamps/?elapsed=HH:mm:ss.S&appendLog

00:08:34.388   create mode 160000 skins/WikimediaApiPortal
00:08:34.388   create mode 160000 vendor
00:08:37.093  Waiting up to 3600 seconds for https://gerrit.wikimedia.org/r/c/1123064 to merge
00:28:32.110  https://gerrit.wikimedia.org/r/c/1123064 has been merged after 19m55s
00:28:34.796  Email was triggered for: Fixed

Waiting up to 3600 seconds seems to be a lie, but likely just a log comment out of sync with the actual wait loop timeout.

It doesn't seem highly valuable to turn this into tracking bug for some upstream issue with timestamper plugin.