Page MenuHomePhabricator

Site: eqiad, codfw 2 VM request for postfix mx-out
Closed, ResolvedPublic

Description

Cloud VPS Project Tested: ldap-dev
Site/Location: eqiad, codfw
Number of systems: 2
Service: email
Networking Requirements: external IP
Processor Requirements: 8
Memory: 8GiB
Disks: 50GiB

Details

Event Timeline

jhathaway renamed this task from Site: eqiad, codfw 1 VM request for postfix mta-out to Site: eqiad, codfw 2 VM request for postfix mta-out.Apr 3 2024, 8:58 PM
jhathaway claimed this task.
jhathaway updated the task description. (Show Details)
jhathaway triaged this task as Medium priority.Apr 3 2024, 9:01 PM
jhathaway renamed this task from Site: eqiad, codfw 2 VM request for postfix mta-out to Site: eqiad, codfw 2 VM request for postfix mx-out.Apr 3 2024, 9:03 PM

LGTM (we probably don't need as much CPU capacity, but also fine to overcommit a little, we can easily adjust later)

Change #1017318 had a related patch set uploaded (by JHathaway; author: JHathaway):

[operations/puppet@production] email: add node definitions for mx-out boxes

https://gerrit.wikimedia.org/r/1017318

Cookbook cookbooks.sre.hosts.reimage was started by jhathaway@cumin1002 for host mx-out1001.wikimedia.org with OS bookworm

Change #1017318 merged by JHathaway:

[operations/puppet@production] email: add node definitions for mx-out boxes

https://gerrit.wikimedia.org/r/1017318

Cookbook cookbooks.sre.hosts.reimage started by jhathaway@cumin1002 for host mx-out1001.wikimedia.org with OS bookworm executed with errors:

  • mx-out1001 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404051714_jhathaway_872998_mx-out1001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" mx-out1001.wikimedia.org to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by jhathaway@cumin1002 for host mx-out1001.wikimedia.org with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jhathaway@cumin1002 for host mx-out1001.wikimedia.org with OS bookworm completed:

  • mx-out1001 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404052002_jhathaway_901337_mx-out1001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jhathaway@cumin2002 for host mx-out2001.wikimedia.org with OS bookworm