Bitu should be deployed to production using deb packages.
Staging will remain on Git deployments.
Description
Details
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | None | T189531 All Wikimedia developer services should use single sign-on | |||
| Resolved | None | T161859 Make Wikitech an SUL wiki | |||
| Duplicate | None | T367287 Update Wikitech's LDAP credentials to be read-only | |||
| Resolved | PRODUCTION ERROR | Tgr | T195253 Special:Notifications gives a consistent PHP exception on load ("The trash icon is not registered") for users with OpenStackManager notifications | ||
| Open | None | T106123 Extensions needing to be removed from Wikimedia wikis | |||
| Resolved | Request | None | T367220 Archive the OpenStackManager extension | ||
| Resolved | taavi | T161553 Remove OpenStackManager from Wikitech | |||
| Resolved | taavi | T196171 Developer account creation without OpenStackManager | |||
| Declined | None | T179463 Create a single application to provision and manage developer (LDAP) accounts | |||
| Resolved | None | T319405 Create an IDM for Wikimedia developer accounts | |||
| Resolved | SLyngshede-WMF | T320603 IDM milestone 2 "Initial limited deployment" | |||
| Resolved | SLyngshede-WMF | T340721 Build Debian packages for Bookworm |
Event Timeline
Change 956836 had a related patch set uploaded (by Slyngshede; author: Slyngshede):
[operations/puppet@production] P:idm allow for installation via Debian packages.
Change 957669 had a related patch set uploaded (by Slyngshede; author: Slyngshede):
[operations/puppet@production] WIP: P:idm switch idm2001 to Debian package
Plan for testing rollout of Debian packages:
Upgrade test to Bookworm:
Pre-update:
- Set idm-test1001 in maintenance mode
- Merge patch: https://gerrit.wikimedia.org/r/c/operations/puppet/+/956836
- Verify that production remains running as expected
Reimage idm-test100
ssh cumin1001.eqiad.wmnet sudo cookbook sre.hosts.reimage --os bookworm -t T340721 idm-test1001
IDM2001 upgrade
- Disable Puppet on idm2001.wikimedia.org
- Reimage idm2001.wikimedia.org
- Merge patch: https://gerrit.wikimedia.org/r/c/operations/puppet/+/957669
ssh cumin1001.eqiad.wmnet sudo cumin 'idm2001.wikimedia.org' "disable-puppet 'bitu deb install - slyngshede'" sudo cookbook sre.hosts.reimage --os bookworm -t T340721 idm2001
Switch over to IDM2001
- Merge: https://gerrit.wikimedia.org/r/c/operations/dns/+/957674
- Follow DNS deployment: https://wikitech.wikimedia.org/wiki/DNS#Changing_records_in_a_zonefile
WAIT AND ALLOW ANY BUG TO REVEAL THEMSELVES
IDM1001 Upgrade
- Disable Puppet on idm1001.wikimedia.org
- Merge: https://gerrit.wikimedia.org/r/c/operations/puppet/+/957676
- Reimage idm1001.wikimedia.org
ssh cumin1001.eqiad.wmnet sudo cumin 'idm1001.wikimedia.org' "disable-puppet 'bitu deb install - slyngshede'" sudo cookbook sre.hosts.reimage --os bookworm -t T340721 idm1001
Change 957674 had a related patch set uploaded (by Slyngshede; author: Slyngshede):
[operations/dns@master] IDM Switchover
Change 957676 had a related patch set uploaded (by Slyngshede; author: Slyngshede):
[operations/puppet@production] IDM: Deploy deb to idm1001.
Change 956836 merged by Slyngshede:
[operations/puppet@production] P:idm allow for installation via Debian packages.
Cookbook cookbooks.sre.hosts.reimage was started by slyngshede@cumin1001 for host idm-test1001.wikimedia.org with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by slyngshede@cumin1001 for host idm-test1001.wikimedia.org with OS bookworm completed:
- idm-test1001 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot media to disk
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202309141210_slyngshede_31369_idm-test1001.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage was started by slyngshede@cumin1001 for host idm-test1001.wikimedia.org with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by slyngshede@cumin1001 for host idm-test1001.wikimedia.org with OS bookworm completed:
- idm-test1001 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot media to disk
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202309151319_slyngshede_376144_idm-test1001.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Change 957669 merged by Slyngshede:
[operations/puppet@production] P:idm switch idm2001 to Debian package
Cookbook cookbooks.sre.hosts.reimage was started by slyngshede@cumin1001 for host idm2001.wikimedia.org with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by slyngshede@cumin1001 for host idm2001.wikimedia.org with OS bookworm completed:
- idm2001 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot media to disk
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run failed and logged in /var/log/spicerack/sre/hosts/reimage/202309190829_slyngshede_3921709_idm2001.out, asking the operator what to do
- First Puppet run failed and logged in /var/log/spicerack/sre/hosts/reimage/202309190831_slyngshede_3921709_idm2001.out, asking the operator what to do
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202309190845_slyngshede_3921709_idm2001.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Change 957676 merged by Slyngshede:
[operations/puppet@production] P:IDM: Failover Redis
Cookbook cookbooks.sre.hosts.reimage was started by slyngshede@cumin1001 for host idm1001.wikimedia.org with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by slyngshede@cumin1001 for host idm1001.wikimedia.org with OS bookworm completed:
- idm1001 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot media to disk
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202309200724_slyngshede_4193245_idm1001.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage was started by slyngshede@cumin1001 for host idm-test1001.wikimedia.org with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by slyngshede@cumin1001 for host idm-test1001.wikimedia.org with OS bookworm completed:
- idm-test1001 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot media to disk
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202309201331_slyngshede_78977_idm-test1001.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB