Page MenuHomePhabricator

Dedicated puppet role to support testing of alternatives to Blazegraph
Open, Needs TriagePublic

Description

New test servers for WDQS have been setup in T410406, by reusing role::wdqs::test. Having a dedicate role will give us more flexibility to apply specific changes needed for the specific test of Blazegraph alternatives.

Event Timeline

Change #1217238 had a related patch set uploaded (by Gehel; author: Gehel):

[operations/puppet@production] WDQS: introduce a new role to test Blazegraph alternatives

https://gerrit.wikimedia.org/r/1217238

Change #1217238 merged by Gehel:

[operations/puppet@production] WDQS: introduce a new role to test Blazegraph alternatives

https://gerrit.wikimedia.org/r/1217238

The new role is merged and applied to the relevant servers. We should still reimage those servers to ensure they are in a clear state.

Cookbook cookbooks.sre.hosts.reimage was started by gehel@cumin1003 for host wdqs1028.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by gehel@cumin1003 for host wdqs1028.eqiad.wmnet with OS trixie completed:

  • wdqs1028 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202512111454_gehel_3523474_wdqs1028.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by gehel@cumin1003 for host wdqs1029.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by gehel@cumin1003 for host wdqs1030.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by gehel@cumin1003 for host wdqs1029.eqiad.wmnet with OS trixie executed with errors:

  • wdqs1029 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202512111543_gehel_3531760_wdqs1029.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wdqs1029.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage started by gehel@cumin1003 for host wdqs1030.eqiad.wmnet with OS trixie executed with errors:

  • wdqs1030 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202512111547_gehel_3532056_wdqs1030.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wdqs1030.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by gehel@cumin1003 for host wdqs1031.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by gehel@cumin1003 for host wdqs1031.eqiad.wmnet with OS trixie executed with errors:

  • wdqs1031 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202512112034_gehel_3605396_wdqs1031.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wdqs1031.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by gehel@cumin1003 for host wdqs1032.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by gehel@cumin1003 for host wdqs1032.eqiad.wmnet with OS trixie executed with errors:

  • wdqs1032 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202512120914_gehel_3706738_wdqs1032.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wdqs1032.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by gehel@cumin1003 for host wdqs1029.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by gehel@cumin1003 for host wdqs1029.eqiad.wmnet with OS trixie executed with errors:

  • wdqs1029 (FAIL)
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wdqs1029.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Change #1227726 had a related patch set uploaded (by Gehel; author: Gehel):

[operations/puppet@production] wdqs: setup new test servers for Blazegraph alternatives

https://gerrit.wikimedia.org/r/1227726

Change #1227726 merged by Gehel:

[operations/puppet@production] wdqs: setup new test servers for Blazegraph alternatives

https://gerrit.wikimedia.org/r/1227726

wdqs1028-1031 are working, including:

  • user and permissions in place for the Wikidata Platform team
  • dumps NFS shares mounted under /mnt/nfs

There are still issues with wdqs1032 (I can't SSH into it at the moment).