Page MenuHomePhabricator

upgrade people VMs to bookworm
Closed, ResolvedPublic

Description

Debian bookworm is now the stable Debian and the installer is ready to use (T330495#8922009) :)

The people VMs have historically been an early adopter of new distro releases since they are real production machines but with low complexity and risk.

Test if the "makevm" cookbook works, since it now takes a --os parameter, with --os=bookworm.

If it works, migrate people1003 to people1004 and people2002 to people2003, copy user data, apply puppet roles
and finally shut down bullseye VMs.

If running into problems, report to infra foundations as evaluation results, testing the installer and bookworm in general with a puppetized httpd setup like this one.

Requesting new VMs in T338998.

Event Timeline

17:03 < mutante> !log creating ganeti VM people1004 with os==bookworm passed to makevm cookbook to test bookworm and because this is traditionally an early adoptor of new distro releases

Dzahn triaged this task as Medium priority.Jun 12 2023, 5:08 PM

Change 929776 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site: add people1004/people2003 with insetup role

https://gerrit.wikimedia.org/r/929776

Change 929776 merged by Dzahn:

[operations/puppet@production] site: add people1004/people2003 with insetup role

https://gerrit.wikimedia.org/r/929776

dzahn@cumin1001:~$ sudo cookbook sre.ganeti.makevm --vcpus 1 --memory 2 --disk 80 --network private --cluster codfw --group D people2003 --os bookworm
Ready to create Ganeti VM people2003.codfw.wmnet in the codfw cluster on group D with 1 vCPUs, 2.0GB of RAM, 80GB of disk in the private network.
Dzahn changed the task status from Open to Stalled.Jun 13 2023, 7:38 PM
Dzahn changed the status of subtask T338998: Site: 2 VMs request for people from Open to Stalled.

stalled on T338998

Change 929788 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site: add microsites::peopleweb to new people VMs

https://gerrit.wikimedia.org/r/929788

Change 929789 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] peopleweb: make people1004 a new destination sync host

https://gerrit.wikimedia.org/r/929789

Dzahn changed the task status from Stalled to In Progress.Jun 13 2023, 8:48 PM

Change 929788 merged by Dzahn:

[operations/puppet@production] site: add microsites::peopleweb to new people VMs

https://gerrit.wikimedia.org/r/929788

Change 929789 merged by Dzahn:

[operations/puppet@production] peopleweb: make people1004 a new destination sync host

https://gerrit.wikimedia.org/r/929789

Change 929799 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] peopleweb: allow rsyncing user data from people2002 to people2003

https://gerrit.wikimedia.org/r/929799

Change 929799 merged by Dzahn:

[operations/puppet@production] peopleweb: allow rsyncing user data from people2002 to people2003

https://gerrit.wikimedia.org/r/929799

Change 930257 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] peopleweb: sync home dirs from people1003 to people1004

https://gerrit.wikimedia.org/r/930257

Change 930257 merged by Dzahn:

[operations/puppet@production] peopleweb: sync home dirs from people1003 to people1004

https://gerrit.wikimedia.org/r/930257

Change 930272 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] peoplweb: sync home dirs from people2002 to people1004

https://gerrit.wikimedia.org/r/930272

Change 930272 merged by Dzahn:

[operations/puppet@production] peopleweb: sync home dirs from people2002 to people1004

https://gerrit.wikimedia.org/r/930272

Change 930274 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/dns@master] people.wikimedia.org: switch backend from people2002 to people1004

https://gerrit.wikimedia.org/r/930274

Change 930274 merged by Dzahn:

[operations/dns@master] people.wikimedia.org: switch backend from people2002 to people1004

https://gerrit.wikimedia.org/r/930274

Change 930275 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] peopleweb: make people1004 new source, people2003 new destination

https://gerrit.wikimedia.org/r/930275

Change 930275 merged by Dzahn:

[operations/puppet@production] peopleweb: make people1004 new source, people2003 new destination

https://gerrit.wikimedia.org/r/930275

https://people.wikimedia.org/ is now running on bookworm backends.

The DNS name peopleweb.discovery.wmnet has been switched to point to people1004.

This means bullseye -> bookwork and also codfw -> eqiad so that we are back in the currently active main DC.

Tests pass on old and new set of VMs

[deploy1002:~] $ httpbb /srv/deployment/httpbb-tests/miscweb/test_people.yaml --hosts=people2002.codfw.wmnet,people2003.codfw.wmnet,people1003.eqiad.wmnet,people1004.eqiad.wmnet
Sending to 4 hosts...
PASS: 5 requests sent to each of 4 hosts. All assertions passed.

Oh, thank you, @mpopov I had "update wikitech" on my list. Let me create the server pages that are red links. Will also add the fingerprints from there.

Change 931699 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site: add buster people VMs to insetup role for decom

https://gerrit.wikimedia.org/r/931699

Change 931699 merged by Dzahn:

[operations/puppet@production] site: add buster people VMs to insetup role for decom

https://gerrit.wikimedia.org/r/931699

Mentioned in SAL (#wikimedia-operations) [2023-06-21T19:59:04Z] <mutante> people.wikimedia.org - disabling shell access to people1003/people2002 (bullseye), use people1004/people2002 (bookworm) or people.eqiad.wmnet / people.codfw.wmnet in your configs if you have something automated or git repos - T338827

Mentioned in SAL (#wikimedia-operations) [2023-06-21T20:04:25Z] <mutante> deleting VMs people1003.eqiad.wmnet and people2002.codfw.wmnet T338827

cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: people1003.eqiad.wmnet

  • people1003.eqiad.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster eqiad to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster eqiad to Netbox

Change 931999 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site: remove decom'ed people.wikimedia.org backends

https://gerrit.wikimedia.org/r/931999

cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: people2002.codfw.wmnet

  • people2002.codfw.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster codfw to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster codfw to Netbox

Change 931999 merged by Dzahn:

[operations/puppet@production] site: remove decom'ed people.wikimedia.org backends

https://gerrit.wikimedia.org/r/931999

This is fully completed since the previous VMs are now destroyed (by the decom cookbook) and then removed from site.pp.