Cloud VPS Project Tested: ldap-dev
Site/Location: eqiad, codfw
Number of systems: 2
Service: email
Networking Requirements: external IP
Processor Requirements: 8
Memory: 8GiB
Disks: 50GiB
Description
Details
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| email: add node definitions for mx-out boxes | operations/puppet | production | +4 -0 |
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | jhathaway | T232343 Consider Postfix as MTA for our MXes (and OTRS/Mailman/Phab) | |||
| Open | None | T325394 Replace Exim with Postfix on mail servers | |||
| Resolved | jhathaway | T325403 MTA provisioning | |||
| Resolved | jhathaway | T325407 Provision mx-out | |||
| Resolved | jhathaway | T361750 Site: eqiad, codfw 2 VM request for postfix mx-out |
Event Timeline
LGTM (we probably don't need as much CPU capacity, but also fine to overcommit a little, we can easily adjust later)
Change #1017318 had a related patch set uploaded (by JHathaway; author: JHathaway):
[operations/puppet@production] email: add node definitions for mx-out boxes
Cookbook cookbooks.sre.hosts.reimage was started by jhathaway@cumin1002 for host mx-out1001.wikimedia.org with OS bookworm
Change #1017318 merged by JHathaway:
[operations/puppet@production] email: add node definitions for mx-out boxes
Cookbook cookbooks.sre.hosts.reimage started by jhathaway@cumin1002 for host mx-out1001.wikimedia.org with OS bookworm executed with errors:
- mx-out1001 (FAIL)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Set boot media to disk
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404051714_jhathaway_872998_mx-out1001.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" mx-out1001.wikimedia.org to get a root shellbut depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by jhathaway@cumin1002 for host mx-out1001.wikimedia.org with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jhathaway@cumin1002 for host mx-out1001.wikimedia.org with OS bookworm completed:
- mx-out1001 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Set boot media to disk
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404052002_jhathaway_901337_mx-out1001.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage was started by jhathaway@cumin2002 for host mx-out2001.wikimedia.org with OS bookworm