Page MenuHomePhabricator

No Puppet resources found on instance deployment-mx04 on project deployment-prep
Closed, ResolvedPublic

Description

Common information

  • summary: No Puppet resources found on instance deployment-mx04 on project deployment-prep
  • alertname: PuppetAgentNoResources
  • instance: deployment-mx04
  • job: node
  • project: deployment-prep
  • severity: warning

Firing alerts


  • summary: No Puppet resources found on instance deployment-mx04 on project deployment-prep
  • alertname: PuppetAgentNoResources
  • instance: deployment-mx04
  • job: node
  • project: deployment-prep
  • severity: warning
  • Source

Event Timeline

$ sudo run-puppet-agent
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Could not find class role::mail::mx for deployment-mx04.deployment-prep.eqiad1.wikimedia.cloud on node deployment-mx04.deployment-prep.eqiad1.wikimedia.cloud
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
bd808@deployment-mx04:~$ sudo -i puppet agent -tv
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Could not find class role::mail::mx for deployment-mx04.deployment-prep.eqiad1.wikimedia.cloud on node deployment-mx04.deployment-prep.eqiad1.wikimedia.cloud
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
bd808 added a subscriber: MoritzMuehlenhoff.

@MoritzMuehlenhoff removed the role. It sounds like prod changed how MX servers are built ("obsoleted by the Postfix migration") and nobody had tried to keep Beta Cluster up to date as those changes happened.

One way to move forward would be to remove the now absent role from the instance and then get work started on building a replacement MX service using whatever roles are currently used in prod so we can decomm this instance.

https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/580c6c098acbc7c67d5d2405048f36099480c39c%5E%21/#F0

diff --git a/deployment-prep/deployment-mx04.deployment-prep.eqiad1.wikimedia.cloud.roles b/deployment-prep/deployment-mx04.deployment-prep.eqiad1.wikimedia.cloud.roles
deleted file mode 100644
index 5871d97..0000000
--- a/deployment-prep/deployment-mx04.deployment-prep.eqiad1.wikimedia.cloud.roles
+++ /dev/null

@@ -1 +0,0 @@
-- role::mail::mx

Puppet works again, just without management of the still installed MX service stack.

bd808@deployment-mx04:~$ sudo -i puppet agent -tv
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for deployment-mx04.deployment-prep.eqiad1.wikimedia.cloud
Info: Applying configuration version '(25c2c9859b) gitpuppet - beta: Add a wmf-beta-update-all timer and script'
Notice: /Stage[main]/Apt/File[/etc/apt/sources.list]/ensure: removed (corrective)
Notice: /Stage[main]/Ssh::Server/Ssh::Server::Ca_signed_hostkey[/etc/ssh/ssh_host_rsa_key-cert.pub]/File[/etc/ssh/ssh_host_rsa_key-cert.pub]/content: content changed '{sha256}33051a0ee55edd8a28200d106a44eabf0f6dae5be6b714b0230934b695a08814' to '{sha256}09ced7c40e81cb5c629cc4e66535aa7ba096045321829416dfb8e4895a5a0041'
Info: Ssh::Server::Ca_signed_hostkey[/etc/ssh/ssh_host_rsa_key-cert.pub]: Scheduling refresh of Service[ssh]
Notice: /Stage[main]/Ssh::Server/Ssh::Server::Ca_signed_hostkey[/etc/ssh/ssh_host_ecdsa_key-cert.pub]/File[/etc/ssh/ssh_host_ecdsa_key-cert.pub]/content: content changed '{sha256}24ea9c4a1deebd3bf520605d66cad14c3329f5fc0b49c063549c481e73dcaaf8' to '{sha256}45ea6425f9970b9aaf661ad75829b37e7efab5424d1af45ea77b7ff4c5d1edef'
Info: Ssh::Server::Ca_signed_hostkey[/etc/ssh/ssh_host_ecdsa_key-cert.pub]: Scheduling refresh of Service[ssh]
Notice: /Stage[main]/Ssh::Server/Ssh::Server::Ca_signed_hostkey[/etc/ssh/ssh_host_ed25519_key-cert.pub]/File[/etc/ssh/ssh_host_ed25519_key-cert.pub]/content: content changed '{sha256}df9e55f9bfc2b2abc759170e8f108beaa61390c4eb305a79d6ac44432fdc3b06' to '{sha256}4f26feb9fdafbef3dde86a54a119dc43f088b7575159abe80a660cbfd1186a92'
Info: Ssh::Server::Ca_signed_hostkey[/etc/ssh/ssh_host_ed25519_key-cert.pub]: Scheduling refresh of Service[ssh]
Notice: /Stage[main]/Base::Kernel/Kmod::Blacklist[wmf]/File[/etc/modprobe.d/blacklist-wmf.conf]/content:
--- /etc/modprobe.d/blacklist-wmf.conf  2026-04-29 20:29:39.239717223 +0000
+++ /tmp/puppet-file20260514-1089747-2yk4hr     2026-05-14 15:35:21.286861195 +0000
@@ -5,8 +5,12 @@
 install acpi_power_meter /bin/true
 blacklist algif_aead
 install algif_aead /bin/true
+blacklist appletalk
+install appletalk /bin/true
 blacklist asn1_decoder
 install asn1_decoder /bin/true
+blacklist atm
+install atm /bin/true
 blacklist aufs
 install aufs /bin/true
 blacklist binder_linux
@@ -25,6 +29,10 @@
 install dccp_ipv6 /bin/true
 blacklist dccp_probe
 install dccp_probe /bin/true
+blacklist esp4
+install esp4 /bin/true
+blacklist esp6
+install esp6 /bin/true
 blacklist floppy
 install floppy /bin/true
 blacklist intel_cstate
@@ -39,12 +47,18 @@
 install n_gsm /bin/true
 blacklist n_hdlc
 install n_hdlc /bin/true
+blacklist nfc
+install nfc /bin/true
 blacklist parport
 install parport /bin/true
 blacklist parport_pc
 install parport_pc /bin/true
 blacklist ppdev
 install ppdev /bin/true
+blacklist rxrpc
+install rxrpc /bin/true
+blacklist tipc
+install tipc /bin/true
 blacklist usbip-core
 install usbip-core /bin/true
 blacklist usbip-host

Notice: /Stage[main]/Base::Kernel/Kmod::Blacklist[wmf]/File[/etc/modprobe.d/blacklist-wmf.conf]/content: content changed '{sha256}092e3f06b6487928c02117f5f9cdc3e4d5399209b21c2ca2de0364f24a49bfa0' to '{sha256}63ce10f105a42314ab08479bfe009cdcb8d98f0fff6e5b8778af01d19385bcba'
Info: /Stage[main]/Base::Kernel/Kmod::Blacklist[wmf]/File[/etc/modprobe.d/blacklist-wmf.conf]: Scheduling refresh of Exec[update-initramfs]
Notice: /Stage[main]/Profile::Rsyslog::Kafka_shipper/File[/etc/rsyslog.lookup.d/lookup_table_output.json]/content:
--- /etc/rsyslog.lookup.d/lookup_table_output.json      2026-04-13 15:59:54.474066993 +0000
+++ /tmp/puppet-file20260514-1089747-arx393     2026-05-14 15:35:21.418861896 +0000
@@ -132,6 +132,7 @@
     {"index" : "docker-striker", "value" : "kafka" },
     {"index" : "tcpircbot-logmsgbot", "value" : "kafka" },
     {"index" : "thanos-query", "value" : "kafka local" },
+    {"index" : "thanos-query-frontend", "value" : "kafka local" },
     {"index" : "ipmiseld", "value": "kafka local" }
   ]
 }

Notice: /Stage[main]/Profile::Rsyslog::Kafka_shipper/File[/etc/rsyslog.lookup.d/lookup_table_output.json]/content: content changed '{sha256}57eb0c238fdb91818bbf63b37dda1694b8434263dd4019bd63a3cf0cfe82251c' to '{sha256}11c95a1e0a4adf4d0238980ee1a0ae652ec6ec19bc4c028acdda2ef3c3ba79f1'
Info: /Stage[main]/Profile::Rsyslog::Kafka_shipper/File[/etc/rsyslog.lookup.d/lookup_table_output.json]: Scheduling refresh of Service[rsyslog]
Notice: /Stage[main]/Rsyslog/File[/etc/rsyslog.d/20-confd.conf]/ensure: removed
Notice: /Stage[main]/Rsyslog/File[/etc/rsyslog.d/25-nrpe2nodexp-ferm-active.conf]/ensure: removed
Notice: /Stage[main]/Rsyslog/File[/etc/rsyslog.d/40-clean-confd-rundir.conf]/ensure: removed
Notice: /Stage[main]/Rsyslog/File[/etc/rsyslog.d/40-confd-prometheus-metrics.conf]/ensure: removed
Notice: /Stage[main]/Rsyslog/File[/etc/rsyslog.d/40-generate-vrts-aliases.conf]/ensure: removed
Notice: /Stage[main]/Rsyslog/File[/etc/rsyslog.d/40-prometheus-node-exim-queue.conf]/ensure: removed
Notice: /Stage[main]/Rsyslog/File[/etc/rsyslog.d/40-wmf-auto-restart-spamd.conf]/ensure: removed
Notice: /Stage[main]/Rsyslog/File[/etc/rsyslog.d/40-wmf-auto-restart-ulogd2.conf]/ensure: removed
Info: /etc/rsyslog.d: Scheduling refresh of Service[rsyslog]
Notice: /Stage[main]/Ssh::Server/Service[ssh]: Triggered 'refresh' from 3 events
Notice: /Stage[main]/Initramfs/Exec[update-initramfs]: Triggered 'refresh' from 1 event
Notice: /Stage[main]/Sysctl/File[/etc/sysctl.d/70-ferm_conntrack.conf]/ensure: removed
Notice: /Stage[main]/Rsyslog/Service[rsyslog]: Triggered 'refresh' from 2 events
Notice: Applied catalog in 29.09 seconds
bd808 claimed this task.

The alert has cleared, so I'm going to resolve this. Things are in a sketchy state until T426326: Replace deployment-mx04 with a newer OS and MX service stack is completed.