Page MenuHomePhabricator

Re-image remaining full graph hosts to post-graph-split roles
Closed, ResolvedPublic

Description

Per parent task, we have removed the WDQS full graph LVS (load balancer) pools. The hosts formerly belonging to these pools† need to be reimaged so they can serve the split graphs.

Creating this ticket to:

  • reimage the hosts to their new roles as described here
  • transfer the data to the new hosts
  • deploy the appropriate graph on the new hosts
  • Add the new hosts to load balancer rotation
  • Verify operation

wdqs10[18-20].eqiad.wmnet, wdqs201[6-7].codfw.wmnet Check the progress of individual hosts in this Etherpad.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change #1192626 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] wdqs: add newly-reimaged hosts as scap targets

https://gerrit.wikimedia.org/r/1192626

Change #1192626 merged by Bking:

[operations/puppet@production] wdqs: add newly-reimaged hosts as scap targets

https://gerrit.wikimedia.org/r/1192626

Mentioned in SAL (#wikimedia-operations) [2025-09-30T20:33:44Z] <bking@deploy2002> Started deploy [wdqs/wdqs@fea7794]: T405978

Mentioned in SAL (#wikimedia-operations) [2025-09-30T20:33:58Z] <bking@deploy2002> Finished deploy [wdqs/wdqs@fea7794]: T405978 (duration: 00m 20s)

Mentioned in SAL (#wikimedia-operations) [2025-09-30T20:35:40Z] <bking@deploy2002> Started deploy [wdqs/wdqs@fea7794]: T405978

Mentioned in SAL (#wikimedia-operations) [2025-09-30T20:35:45Z] <bking@deploy2002> Finished deploy [wdqs/wdqs@fea7794]: T405978 (duration: 00m 10s)

bking changed the task status from Open to In Progress.Sep 30 2025, 8:45 PM
bking triaged this task as Medium priority.
bking updated the task description. (Show Details)

Change #1192890 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] wdqs-scholarly: Add wdqs2016 to load balancer pool

https://gerrit.wikimedia.org/r/1192890

Change #1192890 merged by Bking:

[operations/puppet@production] wdqs-scholarly: Add wdqs2016 to load balancer pool

https://gerrit.wikimedia.org/r/1192890

Cookbook cookbooks.sre.hosts.reimage was started by ryankemper@cumin2002 for host wdqs1018.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host wdqs2017.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ryankemper@cumin2002 for host wdqs1018.eqiad.wmnet with OS bullseye executed with errors:

  • wdqs1018 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wdqs1018.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host wdqs2017.codfw.wmnet with OS bullseye executed with errors:

  • wdqs2017 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wdqs2017.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Change #1193943 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] wdqs: Add soon-to-be-reimaged hosts to insetup role

https://gerrit.wikimedia.org/r/1193943

Change #1193943 merged by Bking:

[operations/puppet@production] wdqs: Add soon-to-be-reimaged hosts to insetup role

https://gerrit.wikimedia.org/r/1193943

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host wdqs2017.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by ryankemper@cumin2002 for host wdqs1018.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host wdqs1020.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host wdqs1020.eqiad.wmnet with OS bullseye executed with errors:

  • wdqs1020 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wdqs1020.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host wdqs1020.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host wdqs2017.codfw.wmnet with OS bullseye executed with errors:

  • wdqs2017 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wdqs2017.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by ryankemper@cumin2002 for host wdqs1018.eqiad.wmnet with OS bullseye executed with errors:

  • wdqs1018 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wdqs1018.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host wdqs1020.eqiad.wmnet with OS bullseye executed with errors:

  • wdqs1020 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wdqs1020.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host wdqs1020.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host wdqs1020.eqiad.wmnet with OS bullseye executed with errors:

  • wdqs1020 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wdqs1020.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host wdqs1020.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host wdqs1020.eqiad.wmnet with OS bullseye completed:

  • wdqs1020 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202510071947_bking_554703_wdqs1020.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1194286 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] wdqs1020: move back to production role

https://gerrit.wikimedia.org/r/1194286

Change #1194286 merged by Bking:

[operations/puppet@production] wdqs1020: move back to production role

https://gerrit.wikimedia.org/r/1194286

Mentioned in SAL (#wikimedia-operations) [2025-10-07T20:48:39Z] <bking@cumin2002> START - Cookbook sre.wdqs.data-transfer (T405978, transfer main graph to newly-reimaged host) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1020.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards

Mentioned in SAL (#wikimedia-operations) [2025-10-07T21:41:47Z] <bking@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T405978, transfer main graph to newly-reimaged host) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1020.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards

Icinga downtime and Alertmanager silence (ID=6af48cfa-98de-4c84-a5e9-d78b15b819ee) set by bking@cumin2002 for 20:00:00 on 1 host(s) and their services with reason: finish getting host ready for production

wdqs1020.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2025-10-07T21:49:45Z] <bking@deploy2002> Started deploy [wdqs/wdqs@fea7794]: T405978

Mentioned in SAL (#wikimedia-operations) [2025-10-07T21:50:29Z] <bking@deploy2002> Finished deploy [wdqs/wdqs@fea7794]: T405978 (duration: 00m 45s)

Cookbook cookbooks.sre.hosts.reimage was started by ryankemper@cumin2002 for host wdqs1018.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ryankemper@cumin2002 for host wdqs1018.eqiad.wmnet with OS bullseye completed:

  • wdqs1018 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202510080154_ryankemper_722343_wdqs1018.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2025-10-08T03:37:01Z] <ryankemper@deploy2002> Started deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-internal-main host T405978

wdqs1018 has been reimaged and scap-deployed. data-transfer in progress

Change #1194336 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] wdqs: provision wdqs1018 for wdqs-main

https://gerrit.wikimedia.org/r/1194336

Change #1194336 merged by Ryan Kemper:

[operations/puppet@production] wdqs: provision wdqs1018 for wdqs-main

https://gerrit.wikimedia.org/r/1194336

Mentioned in SAL (#wikimedia-operations) [2025-10-08T03:53:12Z] <ryankemper@deploy2002> Finished deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-internal-main host T405978 (duration: 16m 11s)

Mentioned in SAL (#wikimedia-operations) [2025-10-08T03:53:15Z] <ryankemper@deploy2002> Started deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-internal-main host T405978

^ oops that should say wdqs-main host not wdqs-internal-main

Mentioned in SAL (#wikimedia-operations) [2025-10-08T03:55:17Z] <ryankemper@deploy2002> Finished deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-internal-main host T405978 (duration: 02m 01s)

Mentioned in SAL (#wikimedia-operations) [2025-10-08T04:37:19Z] <ryankemper@deploy2002> Started deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-main host T405978

Mentioned in SAL (#wikimedia-operations) [2025-10-08T04:37:33Z] <ryankemper@deploy2002> Finished deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-main host T405978 (duration: 00m 14s)

Cookbook cookbooks.sre.hosts.reimage was started by ryankemper@cumin2002 for host wdqs1018.eqiad.wmnet with OS bullseye

Sigh, had host all ready for the data-transfer and then ran the reimage by mistake. Probably my sign to log off for the night :) This host will need a scap deploy and data transfer when done

Cookbook cookbooks.sre.hosts.reimage started by ryankemper@cumin2002 for host wdqs1018.eqiad.wmnet with OS bullseye executed with errors:

  • wdqs1018 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wdqs1018.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host wdqs2017.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host wdqs2017.codfw.wmnet with OS bullseye completed:

  • wdqs2017 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202510081613_bking_1286123_wdqs2017.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1194696 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] wdqs-internal-scholarly: add wdqs2017

https://gerrit.wikimedia.org/r/1194696

Change #1194696 merged by Bking:

[operations/puppet@production] wdqs-internal-scholarly: add wdqs2017

https://gerrit.wikimedia.org/r/1194696

Icinga downtime and Alertmanager silence (ID=c04b2c62-8b17-4f9a-b915-612722761c30) set by bking@cumin2002 for 20:00:00 on 1 host(s) and their services with reason: finish getting host ready for production

wdqs2017.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by ryankemper@cumin2002 for host wdqs1018.eqiad.wmnet with OS bullseye

Mentioned in SAL (#wikimedia-operations) [2025-10-08T21:13:10Z] <ryankemper@deploy2002> Started deploy [wdqs/wdqs@fea7794]: deploy to fresh internal-scholarly host T405978

Mentioned in SAL (#wikimedia-operations) [2025-10-08T21:13:22Z] <ryankemper@deploy2002> Finished deploy [wdqs/wdqs@fea7794]: deploy to fresh internal-scholarly host T405978 (duration: 00m 12s)

Mentioned in SAL (#wikimedia-operations) [2025-10-08T21:18:54Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T405978, transfer to freshly reimaged host) xfer scholarly_articles from wdqs2016.codfw.wmnet -> wdqs2017.codfw.wmnet w/ force delete existing files, repooling source-only afterwards

Mentioned in SAL (#wikimedia-operations) [2025-10-08T21:19:21Z] <ryankemper@cumin2002> END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) (T405978, transfer to freshly reimaged host) xfer scholarly_articles from wdqs2016.codfw.wmnet -> wdqs2017.codfw.wmnet w/ force delete existing files, repooling source-only afterwards

Mentioned in SAL (#wikimedia-operations) [2025-10-08T21:19:28Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T405978, transfer to freshly reimaged host) xfer scholarly_articles from wdqs2016.codfw.wmnet -> wdqs2017.codfw.wmnet w/ force delete existing files, repooling source-only afterwards

Cookbook cookbooks.sre.hosts.reimage was started by ryankemper@cumin2002 for host wdqs1019.eqiad.wmnet with OS bullseye

Mentioned in SAL (#wikimedia-operations) [2025-10-08T22:09:54Z] <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T405978, transfer to freshly reimaged host) xfer scholarly_articles from wdqs2016.codfw.wmnet -> wdqs2017.codfw.wmnet w/ force delete existing files, repooling source-only afterwards

Change #1194785 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] wdqs: bring wdqs101[8-9] into svc

https://gerrit.wikimedia.org/r/1194785

Change #1194785 merged by Ryan Kemper:

[operations/puppet@production] wdqs: bring wdqs101[8-9] into svc

https://gerrit.wikimedia.org/r/1194785

Mentioned in SAL (#wikimedia-operations) [2025-10-09T06:27:23Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1017.eqiad.wmnet -> wdqs1018.eqiad.wmnet w/ force delete existing files, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2025-10-09T06:27:27Z] <ryankemper@cumin2002> END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1017.eqiad.wmnet -> wdqs1018.eqiad.wmnet w/ force delete existing files, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2025-10-09T06:27:35Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1017.eqiad.wmnet -> wdqs1018.eqiad.wmnet w/ force delete existing files, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2025-10-09T06:28:36Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1020.eqiad.wmnet -> wdqs1019.eqiad.wmnet w/ force delete existing files, repooling both afterwards

Cookbook cookbooks.sre.hosts.reimage started by ryankemper@cumin2002 for host wdqs1019.eqiad.wmnet with OS bullseye completed:

  • wdqs1019 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202510082358_ryankemper_1455130_wdqs1019.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-operations) [2025-10-09T07:20:10Z] <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1017.eqiad.wmnet -> wdqs1018.eqiad.wmnet w/ force delete existing files, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2025-10-09T07:20:54Z] <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1020.eqiad.wmnet -> wdqs1019.eqiad.wmnet w/ force delete existing files, repooling both afterwards

Cookbook cookbooks.sre.hosts.reimage started by ryankemper@cumin2002 for host wdqs1018.eqiad.wmnet with OS bullseye completed:

  • wdqs1018 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202510082354_ryankemper_1448924_wdqs1018.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1195063 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] wdqs: bring internal-scholarly wdqs2017 into svc

https://gerrit.wikimedia.org/r/1195063

Change #1195063 merged by Ryan Kemper:

[operations/puppet@production] wdqs: bring internal-scholarly wdqs2017 into svc

https://gerrit.wikimedia.org/r/1195063

Mentioned in SAL (#wikimedia-operations) [2025-10-09T22:11:22Z] <inflatador> bking@wdqs10(18|19|20) systemctl start load-categories-daily.service T405978

Change #1197723 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] wdqs: add newly-reimaged hosts to nodes hieradata

https://gerrit.wikimedia.org/r/1197723

Change #1197723 merged by Bking:

[operations/puppet@production] wdqs: add newly-reimaged hosts to nodes hieradata

https://gerrit.wikimedia.org/r/1197723