Page MenuHomePhabricator

Replace deployment-mx03 with a bookworm-based instance (was Puppet failure: "Unable to locate package spamd")
Closed, ResolvedPublic

Description

The following puppet error started happening today

dancy@deployment-mx03:~$ hostname -f
deployment-mx03.deployment-prep.eqiad1.wikimedia.cloud
dancy@deployment-mx03:~$ sudo run-puppet-agent
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for deployment-mx03.deployment-prep.eqiad1.wikimedia.cloud
Info: Applying configuration version '(36a9001440) gitpuppet - varnish: Move error message from footer to body for HTTP 4xx responses'
Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install spamd' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package spamd
Error: /Stage[main]/Spamassassin/Package[spamd]/ensure: change from 'purged' to 'present' failed: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install spamd' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package spamd
Notice: /Stage[main]/Spamassassin/File[/etc/spamassassin/local.cf]: Dependency Package[spamd] has failures: true
Warning: /Stage[main]/Spamassassin/File[/etc/spamassassin/local.cf]: Skipping because of failed dependencies
Warning: /Stage[main]/Spamassassin/File[/etc/default/spamassassin]: Skipping because of failed dependencies
Warning: /Stage[main]/Spamassassin/Service[spamd]: Skipping because of failed dependencies
Warning: /Stage[main]/Exim4/Package[exim4-config]: Skipping because of failed dependencies
Warning: /Stage[main]/Exim4/Package[exim4-daemon-heavy]: Skipping because of failed dependencies
Warning: /Stage[main]/Exim4/File[/var/spool/exim4]: Skipping because of failed dependencies
Warning: /Stage[main]/Exim4/Exec[mkdir /var/spool/exim4/scan]: Skipping because of failed dependencies
Warning: /Stage[main]/Exim4/Mount[/var/spool/exim4/scan]: Skipping because of failed dependencies
Warning: /Stage[main]/Exim4/Mount[/var/spool/exim4/db]: Skipping because of failed dependencies
Warning: /Stage[main]/Exim4/File[/var/spool/exim4/scan]: Skipping because of failed dependencies
Warning: /Stage[main]/Exim4/File[/var/spool/exim4/db]: Skipping because of failed dependencies
Warning: /Stage[main]/Exim4/File[/etc/exim4/update-exim4.conf.conf]: Skipping because of failed dependencies
Warning: /Stage[main]/Exim4/File[/etc/default/exim4]: Skipping because of failed dependencies
Warning: /Stage[main]/Exim4/File[/etc/exim4/aliases]: Skipping because of failed dependencies
Warning: /Stage[main]/Exim4/File[/etc/exim4/dkim]: Skipping because of failed dependencies
Warning: /Stage[main]/Exim4/File[/etc/exim4/system_filter]: Skipping because of failed dependencies
Warning: /Stage[main]/Exim4/File[/etc/exim4/exim4.conf]: Skipping because of failed dependencies
Warning: /Stage[main]/Exim4/Logrotate::Conf[exim4-paniclog]/File[/etc/logrotate.d/exim4-paniclog]: Skipping because of failed dependencies
Warning: /Stage[main]/Profile::Mail::Mx/Exim4::Dkim[wikimedia.org]/File[/etc/exim4/dkim/beta.wmflabs.org-wikimedia.key]: Skipping because of failed dependencies
Warning: /Stage[main]/Profile::Mail::Mx/Exim4::Dkim[wiki-mail]/File[/etc/exim4/dkim/beta.wmflabs.org-wiki-mail.key]: Skipping because of failed dependencies
Warning: /Stage[main]/Exim4/Service[exim4]: Skipping because of failed dependencies
Warning: /Stage[main]/Profile::Mail::Mx/File[/etc/exim4/defer_domains]: Skipping because of failed dependencies
Warning: /Stage[main]/Profile::Mail::Mx/File[/etc/exim4/wikimedia_domains]: Skipping because of failed dependencies
Warning: /Stage[main]/Profile::Mail::Mx/File[/etc/exim4/legacy_mailing_lists]: Skipping because of failed dependencies
Notice: Applied catalog in 7.96 seconds

Event Timeline

dancy@deployment-mx03:~$ lsb_release  -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 11 (bullseye)
Release:        11
Codename:       bullseye

Why on earth does deployment-prep need an MX host? We could revert the original patch, but having cleanups held back by random unmaintained cloud VPS nodes is an unhealthy pattern.

This kind of thing is just a pain for all parties involved :/

How about this:

We add a lookup for the package name with a default value of spamd. And then we override it in web-Hiera on Horizon on this instance.

That way we don't have to revert it all and can get puppet fixed.

Change #1219180 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] mx/spamassassin: allow overriding sa daemon package name in Hiera

https://gerrit.wikimedia.org/r/1219180

like this? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1219180

That would mean no more OS / distro version check while still allowing to override it.

like this? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1219180

That would mean no more OS / distro version check while still allowing to override it.

This adds complexity for no purpose. There is no need for an MX reference in deployment-prep to begin with, but if anyone wants to see one it should follow to what is actually used in prod. The MXes in prod are on Bookworm for a long time and are also running a totally different that than the abandoned deployment-mx03 (we're on Postfix instead of Exim for a long time now).

I was trying to be pragmatic and just fix it without getting into the deeper problems with beta ownership.

This adds complexity for no purpose.

The purpose would have been to be able to move on without having to revert the OS check.

no need for an MX reference in deployment-prep to begin with,
if anyone wants to see one

No idea. Not going to fix ownership issues. Explicitly tried to avoid this.

Change #1219180 abandoned by Dzahn:

[operations/puppet@production] mx/spamassassin: allow overriding sa daemon package name in Hiera

Reason:

https://phabricator.wikimedia.org/T412975#11473246

https://gerrit.wikimedia.org/r/1219180

@dancy do you have any clues as to the need for this MX server? Also cc'ing @taavi as it was suggested to me that you may have knowledge on the topic. Knowing this would help to fix it the right way.

There's been an MX in beta for over a decade and it's there because beta strives to be "production-like" for users. deployment-mx03 is not abandoned—Developer Experience are the stewards for beta (since Nov 2024) so we strive to at least maintain status quo—which is why we filed this task.

We definitely don't/can't hold details for every service running in prod, but when we see a puppet failure it's a good signal something is wrong—sounds like there's more wrong here than the puppet failure (which is, unfortunately, not uncommon for Beta).

@MoritzMuehlenhoff if @Dzahn 's workaround is less than ideal to you as someone who knows how the MX hosts work in prod, then how should we proceed to get beta back on its feet?

@MoritzMuehlenhoff if @Dzahn 's workaround is less than ideal to you as someone who knows how the MX hosts work in prod, then how should we proceed to get beta back on its feet?

It should be as simple as re-creating a new node with bookworm as deployment-mx04. The old stack on Bullseye used Exim, while we've migrated to Postfix on the current mx* nodes in prod.

Even though it may be around for a decade, I would still argue however that an MX is not really in scope for deployment-prep: if the purpose is to be a testground for MX-related changes, then it fails that objective since running an MX in cloud VPS is simply too different in terms of external connectivy to be useful. For the purpose of sending out mail it for beta it would be better to simply use the default MXes for Cloud VPS.

If anyone wants to merge Daniel's patch as workaround over the holiday period, fine with me. But the underlying issue is that mx03 is fundamentally outdated and needs to be replaced.

Change #1219180 restored by Dzahn:

[operations/puppet@production] mx/spamassassin: allow overriding sa daemon package name in Hiera

https://gerrit.wikimedia.org/r/1219180

@MoritzMuehlenhoff if @Dzahn 's workaround is less than ideal to you as someone who knows how the MX hosts work in prod, then how should we proceed to get beta back on its feet?

It should be as simple as re-creating a new node with bookworm as deployment-mx04. The old stack on Bullseye used Exim, while we've migrated to Postfix on the current mx* nodes in prod.

Okie doke, there is now a deployment-mx04 running bullseye. After fighting puppet ssl cert signing and making a small regex tweak, puppet stopped yelling at me.

Using mail on the command line works. Are there other bits I should check here?

Even though it may be around for a decade, I would still argue however that an MX is not really in scope for deployment-prep: if the purpose is to be a testground for MX-related changes, then it fails that objective since running an MX in cloud VPS is simply too different in terms of external connectivy to be useful. For the purpose of sending out mail it for beta it would be better to simply use the default MXes for Cloud VPS.

That's fair—it's possible others know better than I do about why it was done this way. But I could also believe it's here because prod has one and therefore beta has one. In practice, it seems to be used for https://www.mediawiki.org/wiki/Extension:BounceHandler and as the smart_host (guessing for password resets and the like). Why not use the cloud MX? Dunno.

Change #1219939 had a related patch set uploaded (by Thcipriani; author: Thcipriani):

[operations/mediawiki-config@master] Beta: update mx host ip

https://gerrit.wikimedia.org/r/1219939

Why not use the cloud MX? Dunno.

I'm pretty sure that having an MX in deployment-prep predates having a shared MX for all of Cloud VPS. And the answer to an expected follow up of "why not use a prod MX then?" is that Faidon would have skinned me alive for doing so.

Assigning to Tyler after checking to see that he would be ok with it.

bd808 renamed this task from Puppet failure: "Unable to locate package spamd" on deployment-mx03.deployment-prep to Replace deployment-mx03 with a bullseye-based instance (was Puppet failure: "Unable to locate package spamd").Jan 6 2026, 7:14 PM
bd808 renamed this task from Replace deployment-mx03 with a bullseye-based instance (was Puppet failure: "Unable to locate package spamd") to Replace deployment-mx03 with a bookworm-based instance (was Puppet failure: "Unable to locate package spamd").

You can abandon it. Thanks again for stepping in quickly to try to help.

dancy triaged this task as Medium priority.

Change #1219939 merged by jenkins-bot:

[operations/mediawiki-config@master] Beta: update mx host ip

https://gerrit.wikimedia.org/r/1219939

Change #1225620 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/puppet@production] /home/dancy/src/wmf/operations/puppet/hieradata/cloud/eqiad1/deployment-prep/common.yaml

https://gerrit.wikimedia.org/r/1225620

Change #1219180 abandoned by Dzahn:

[operations/puppet@production] mx/spamassassin: allow overriding sa daemon package name in Hiera

Reason:

https://phabricator.wikimedia.org/T412975#11513241

https://gerrit.wikimedia.org/r/1219180

Change #1225620 merged by JHathaway:

[operations/puppet@production] deployment-prep common.yaml: Update mediawiki_smarthosts

https://gerrit.wikimedia.org/r/1225620

Mentioned in SAL (#wikimedia-releng) [2026-01-16T16:33:16Z] <dancy> Deleting deployment-mx03.deployment-prep (T412975)

Thanks for the work on this @thcipriani !

I confirmed that I did receive an email when I created a beta wiki account.