Page MenuHomePhabricator

Continuous breakages of apt-staging
Closed, ResolvedPublic

Description

apt-staging is broken yet again.

The packages that get generated in gitlab are now being imported:

apt-staging2001:~# reprepro list bookworm-wikimedia
bookworm-wikimedia|main|amd64: python3-conftool 6.0.1+deb12u1
...

but the packages file still lists the previous, 5.3.0 version

curl -s 'https://apt-staging.wikimedia.org/wikimedia-staging/dists/bookworm-wikimedia/main/binary-amd64/Packages?cache=busted' | grep -a3 -F  'Package: python3-conftool'
Package: python3-conftool
Source: conftool
Version: 5.3.0+deb12u1
Architecture: all

and running reprepro update shows the configuration is completely broken, because it looks like we're importing it from the main reprepro:

# reprepro --noskipold update bookworm-wikimedia
Error parsing /srv/aptrepo/wikimedia-staging/conf/updates, line 5, column 7: Not previously seen component 'thirdparty/ci' within 'Flat' field.
There have been errors!

This issue is blocking important work from multiple teams, and I have had to unbreak some of it to at least import the packages - and this happens virtually every time I have to do an "automated" import.

This time the amount of breakage is more than I'm willing to fix myself given my time constraints. Please unbreak this ASAP.

Event Timeline

Joe triaged this task as Unbreak Now! priority.Nov 5 2025, 5:43 AM

Triaging to UBN! as this is blocking activity for at least two hypotheses.

It might still be caching, but https://apt-staging.wikimedia.org/wikimedia-staging/dists/trixie-wikimedia/main/binary-amd64/ is saying the Packages file is un-updated since 28 Oct (and it lacks e.g. python3-conftool which should be there by now).

I had a look at journalctl -u gitlab-package-puller.service -S 2025-11-03 yesterday, and found:

Nov 04 10:10:38 apt-staging2001 gitlab-package-puller[2330852]: Error: trying to put version '0.12.1~wmf6' of 'python3-wmfmariadbpy-remote' in 'trixie-wikimedia|main|amd64',
Nov 04 10:10:38 apt-staging2001 gitlab-package-puller[2330852]: while there already is the stricly newer '0.12.2~wmf1' in there.
Nov 04 10:10:38 apt-staging2001 gitlab-package-puller[2330852]: (To ignore this error add Permit: older_version.)
Nov 04 10:10:38 apt-staging2001 gitlab-package-puller[2330852]: Not deleting possibly left over files due to previous errors.
Nov 04 10:10:38 apt-staging2001 gitlab-package-puller[2330852]: (To keep the files in the still existing index files from vanishing)
Nov 04 10:10:38 apt-staging2001 gitlab-package-puller[2330852]: Use dumpunreferenced/deleteunreferenced to show/delete files without references.
Nov 04 10:10:38 apt-staging2001 gitlab-package-puller[2330852]: Warning: database 'trixie-wikimedia|main|amd64' was modified but no index file was exported.
Nov 04 10:10:38 apt-staging2001 gitlab-package-puller[2330852]: Changes will only be visible after the next 'export'!
Nov 04 10:10:38 apt-staging2001 gitlab-package-puller[2330852]: There have been errors!
Nov 04 10:10:38 apt-staging2001 gitlab-package-puller[2330834]: ERROR:root:Couldn't import packages to apt-staging. Error seen is: None
Nov 04 10:29:37 apt-staging2001 gitlab-package-puller[2331423]: Error: trying to put version '0.12.1~wmf6' of 'python3-wmfmariadbpy-remote' in 'trixie-wikimedia|main|amd64',
Nov 04 10:29:37 apt-staging2001 gitlab-package-puller[2331423]: while there already is the stricly newer '0.12.2-1' in there.
Nov 04 10:29:37 apt-staging2001 gitlab-package-puller[2331423]: (To ignore this error add Permit: older_version.)
Nov 04 10:29:37 apt-staging2001 gitlab-package-puller[2331423]: There have been errors!
Nov 04 10:29:37 apt-staging2001 gitlab-package-puller[2331423]: Skipping conftool_6.0.1+deb13u1_amd64.changes because all packages are skipped!
Nov 04 10:29:37 apt-staging2001 gitlab-package-puller[2331413]: ERROR:root:Couldn't import packages to apt-staging. Error seen is: None

Looking in /srv/aptrepo/wikimedia-staging/incoming/ I still see wmfmariadbpy_0.12.1~wmf6_amd64.changes and its associated .debs; so maybe (at least) dcmd rm wmfmariadbpy_0.12.1~wmf6_amd64.changes and then re-export or similar is necessary to unwedge this? I think wmfmariadbpy has been fixed to stop trying to build >1 different version targetted at trixie.

I came to the same conclusion, I've moved the botched wmfmariadbpy changes file away and now we have at least up-to-date Packages files again, i.e. gitlab-package-puller can run "reprepro processincoming" again.
I'll file a task to also fix up the logging of it, it current logs a _lot_ of meaningless updates, which makes it really hard to spot errors like the import failure in the first place.

But the original conftool upload is lost by now, looking at the logs this in fact failed due to the wmfmariadbp upload:

Nov 01 12:34:18 apt-staging2001 gitlab-package-puller[2034073]: Skipping conftool_6.0.0+deb13u1_amd64.changes because all packages are skipped!
Nov 01 12:34:19 apt-staging2001 gitlab-package-puller[2034084]: Skipping conftool_6.0.0+deb12u1_amd64.changes because all packages are skipped!
Nov 01 12:34:19 apt-staging2001 gitlab-package-puller[2034095]: Skipping conftool_6.0.0_amd64.changes because all packages are skipped!
Nov 01 12:34:29 apt-staging2001 gitlab-package-puller[2034110]: Skipping vopsbot_0.3.10-1+deb12u1_amd64.changes because all packages are skipped!
Nov 01 12:34:30 apt-staging2001 gitlab-package-puller[2034121]: Skipping wmfmariadbpy_0.12.1~wmf6+deb13u1_amd64.changes because all packages are skipped!
Nov 01 12:34:30 apt-staging2001 gitlab-package-puller[2034132]: Skipping wmfmariadbpy_0.12.1~wmf6+deb12u1_amd64.changes because all packages are skipped!
Nov 01 12:34:30 apt-staging2001 gitlab-package-puller[2034143]: Skipping wmfmariadbpy_0.12.1~wmf6+deb11u1_amd64.changes because all packages are skipped!
Nov 01 12:34:30 apt-staging2001 gitlab-package-puller[2034154]: Error: trying to put version '0.12.1~wmf6' of 'python3-wmfmariadbpy-remote' in 'trixie-wikimedia|main|amd64',
Nov 01 12:34:30 apt-staging2001 gitlab-package-puller[2034154]: while there already is the stricly newer '0.12.1~wmf6+deb13u1' in there.
Nov 01 12:34:30 apt-staging2001 gitlab-package-puller[2034154]: (To ignore this error add Permit: older_version.)
Nov 01 12:34:30 apt-staging2001 gitlab-package-puller[2034154]: There have been errors!
Nov 01 12:34:30 apt-staging2001 gitlab-package-puller[2034068]: ERROR:root:Couldn't import packages to apt-staging. Error seen is: None

But the conftool files uploaded on Nov 1 are no longer around since they were cleaned up by the timer added here: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1199243

So if we trigger a no change version bump for conftool, I'm optimistic it will properly show up in https://apt-staging.wikimedia.org/wikimedia-staging/dists/bookworm-wikimedia/main/binary-amd64/Packages now.

I'll check the import step into the main repo next.

Change #1202064 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Don't configure a repo sync for the staging repo

https://gerrit.wikimedia.org/r/1202064

LSobanski lowered the priority of this task from Unbreak Now! to High.Nov 5 2025, 9:59 AM
LSobanski subscribed.

As an aside, this task doesn''t meet Unbreak Now criteria as defined in https://www.mediawiki.org/wiki/Phabricator/Project_management#Priority_levels, which are focused on MW train right now. This possibly warrants a conversation on whether the document needs an update.

cc @Aklapper

Change #1202064 merged by Muehlenhoff:

[operations/puppet@production] Don't configure a repo sync for the staging repo

https://gerrit.wikimedia.org/r/1202064

Change #1202113 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] apt-staging: Relax the cleanup for the incoming queue

https://gerrit.wikimedia.org/r/1202113

Change #1202113 merged by Dzahn:

[operations/puppet@production] apt-staging: Relax the cleanup for the incoming queue

https://gerrit.wikimedia.org/r/1202113

As an aside, this task doesn''t meet Unbreak Now criteria as defined in https://www.mediawiki.org/wiki/Phabricator/Project_management#Priority_levels, which are focused on MW train right now. This possibly warrants a conversation on whether the document needs an update.

cc @Aklapper

I disagree. The sentence there says:

Unbreak Now! – Something is broken and needs to be fixed immediately, setting anything else aside. This should meet the requirements for issues that hold the train.

it's clearly very generic and it's just stating that train blockers are UBN!, mostly because they can either take the site down or block multiple teams.

If the site is down or there's a user-facing bug it's a UBN!; if the problem blocks two teams and their KR work (this case), it's also a UBN.

So I don't think any change of wording is needed. I should also add: I've never considered whatever is written on a page about phab project management to beat common sense. Once I realized I spent 4 hours fixing various problems of apt-staging over an entire day, and that work for both WE5 and essential work for DP were blocked, I decided it was something that someone from the team owning the service should look at with urgency.

I didn't intend to get other senior engineers from yet another team to drop what they were doing and fix it.

As my closing 2 cents about this whole system:

  • A system should report correctly what is blocking it and what is erroring out. The apt-staging intake is extremely confusing and there's zero indication of what actually doesn't work. I suspect there's a few bugs given the amount of error: None I've seen.
  • Failure of uploading a package can't block the whole process of uploading. Better isolation is definitely needed

in addition, I think the way we've structured this is too kludgy. Adding a new package building pipeline should take at most one change in git, instead we need multiple changes. I guess this comes from the fact we haven't really designed apt-staging with automation in mind, but with the goal of reducing changes to our existing processes. It might be worth to revisit how the whole system works to make it more automatable.

I merged two fixes for things which confused me when poking/unbreaking this; the incorrect config of the imports (which will prevent misleading merged in apt-staging) and relaxing the cleanup timer.

Two issues which should still be fixed as followups are:

  • gitlab_package_puller.py has no proper error handling, while the return code of "reprepro processincoming" is checked, it only adds a log, but given the fatality in the current model there's no cleanup. One way to flag this would be to remove the offending file and send an email to the email listed in the changes file (that's how the Debian archive software does it e.g.) or alternatively some other form or feedback.
  • The logging is far too noisy, the timer kicks in every five minutes and a clean run logs three lines which have no meaningful information. I think it would be best if a run which didn't import anything would simply log nothing at all. The information whether the timer ran is maintained by systemd anyway
LSobanski claimed this task.

I created follow up tasks to address the problems called out above.