Page MenuHomePhabricator

Convert mwdebug VMs to debian buster
Closed, ResolvedPublic

Related Objects

StatusSubtypeAssignedTask
StalledNone
StalledNone
StalledNone
StalledNone
OpenNone
OpenNone
OpenNone
OpenNone
Openhashar
ResolvedJdforrester-WMF
OpenMoritzMuehlenhoff
Resolvedjijiki
ResolvedMoritzMuehlenhoff
ResolvedTrizek-WMF
ResolvedDzahn
ResolvedGilles
OpenDzahn
ResolvedRequestPapaul
OpenDzahn
OpenNone
DeclinedNone
ResolvedDzahn
ResolvedPapaul
ResolvedCmjohnson
Opendancy
OpenNone
OpenRequestJclark-ctr
OpenRequestPapaul
ResolvedAndrew
ResolvedArielGlenn
ResolvedDzahn
ResolvedLegoktm
OpenNone
ResolvedPapaul
ResolvedDzahn
DeclinedGilles
OpenNone
ResolvedDzahn
OpenNone

Event Timeline

Change 662037 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: add mwdebug servers on buster

https://gerrit.wikimedia.org/r/662037

Change 662038 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] trafficserver: add new debug servers to debug routing

https://gerrit.wikimedia.org/r/662038

Why create new ones, this only creates needless churn by updating the Debug Extension twice to add/drop mwdebug servers, these can simply be reimaged in place like all the main app servers? We only added 1003 as a new testground for early tests, but at this point running the app servers on Buster simply works fine.

Change 662038 abandoned by Dzahn:
[operations/puppet@production] trafficserver: add new debug servers to debug routing

Reason:
multiple others have said we don't need new servers and should reimage the existing VMs

https://gerrit.wikimedia.org/r/662038

Change 662037 abandoned by Dzahn:
[operations/puppet@production] site: add mwdebug servers on buster

Reason:
Thanks Effie, other people have agreed on you with that so I am abandoning this.

https://gerrit.wikimedia.org/r/662037

Change 662798 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mwdebug: allow rsyncing home dirs from any mwdebug* to mwdebug1003

https://gerrit.wikimedia.org/r/662798

Joe added a subscriber: Joe.

Why create new ones, this only creates needless churn by updating the Debug Extension twice to add/drop mwdebug servers, these can simply be reimaged in place like all the main app servers? We only added 1003 as a new testground for early tests, but at this point running the app servers on Buster simply works fine.

I agree, and commented in that direction in the serviceops meeting on monday.

Joe renamed this task from create mwdebug1004, mwdebug2003 and mwdebug2004 as mediawiki::canary_appserver to Convert mwdebug VMs to debian buster.Wed, Feb 10, 10:00 AM
Joe updated the task description. (Show Details)

Yep, others agreed as well and the plan has been adjusted already, also per my mail to ops mailing list about home dirs. Thanks for renaming, that is accurate and starting with this like tomorrow.

Change 662798 merged by Dzahn:
[operations/puppet@production] mwdebug: allow rsyncing home dirs from any mwdebug* to a backup host

https://gerrit.wikimedia.org/r/662798

Change 663669 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] DHCP: switch mwdebug hosts from stretch to buster installer

https://gerrit.wikimedia.org/r/663669

Change 663669 merged by Dzahn:
[operations/puppet@production] DHCP: switch mwdebug hosts buster installer, mwdebug1003 to stretch

https://gerrit.wikimedia.org/r/663669

Mentioned in SAL (#wikimedia-operations) [2021-02-11T23:47:44Z] <mutante> reimaged mwdebug2002 with buster - since this is a VM: manually cleaned puppet cert on puppetmaster1001, signed new cert for same hostname, initial puppet run etc (T274023)

Mentioned in SAL (#wikimedia-operations) [2021-02-12T19:02:35Z] <mutante> rebooting and reimaging mwdebug2001 to buster T274023

cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: mwdebug1002.eqiad.wmnet

  • mwdebug1002.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.eqiad.wmnet to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.eqiad.wmnet to Netbox
  • COMMON_STEPS (FAIL)
    • Failed to run the sre.dns.netbox cookbook: Cumin execution failed (exit_code=2)

ERROR: some step on some host failed, check the bolded items above

Mentioned in SAL (#wikimedia-operations) [2021-02-13T00:08:32Z] <mutante> ganeti1011 - manually deleting VM mwdebug1002 - T274689 T274023

Mentioned in SAL (#wikimedia-operations) [2021-02-13T00:26:49Z] <mutante> ganeti - attempting to recreate VM mwdebug1002 with cookbook that wsa previously deleted manually (T274689 T274023)

===== NODE GROUP =====                                                                                                                         
(1) mwdebug2002.codfw.wmnet                                                                                                                    
----- OUTPUT of 'gen_fingerprints | head -10' -----                                                                                            
 +---------+---------+-----------------------------------------------------+                                                                   
 | Cipher  | Algo    | Fingerprint                                         |                                                                   
 +---------+---------+-----------------------------------------------------+
 | RSA     | SHA-256 | SHA256:Spm6GgVNLnfy+YOvVW/7NRdwYvM9Abu9UG4EMnSuOwc  |
 +---------+---------+-----------------------------------------------------+
 | ECDSA   | SHA-256 | SHA256:zFbBLpoX0jSieLhKMc8jSmlcYCGOQGDbKebqrbh9GnM  |
 +---------+---------+-----------------------------------------------------+
 | ED25519 | SHA-256 | SHA256:QvCqeqcwZNW7mc3o9Z6xNIcLyiKq/RwmsSQa9ciyudM  |
 +---------+---------+-----------------------------------------------------+

===== NODE GROUP =====                                                                                                                         
(1) mwdebug2001.codfw.wmnet                                                                                                                    
----- OUTPUT of 'gen_fingerprints | head -10' -----                                                                                            
 +---------+---------+-----------------------------------------------------+                                                                   
 | Cipher  | Algo    | Fingerprint                                         |                                                                   
 +---------+---------+-----------------------------------------------------+
 | RSA     | SHA-256 | SHA256:YPXvGbyUMAojIAetmjOLEjqlrxu/N2NWHGHzXgZCFng  |
 +---------+---------+-----------------------------------------------------+
 | ECDSA   | SHA-256 | SHA256:wU+j5Mxd+JB5KZlJb25JJn/Un+BlLJ7KhXakFW0l3jY  |
 +---------+---------+-----------------------------------------------------+
 | ED25519 | SHA-256 | SHA256:FaMIiIbj20w6P3PyzqwracGqNJWSXBfMId/Z4KksP0w  |
 +---------+---------+-----------------------------------------------------+

===== NODE GROUP =====                                                                                                                         
(1) mwdebug1001.eqiad.wmnet                                                                                                                    
----- OUTPUT of 'gen_fingerprints | head -10' -----                                                                                            
 +---------+---------+-----------------------------------------------------+                                                                   
 | Cipher  | Algo    | Fingerprint                                         |                                                                   
 +---------+---------+-----------------------------------------------------+
 | RSA     | SHA-256 | SHA256:hElw8wqZyVZDo6fcRYhirsjjDlzv9CG1EfZUShk3dcE  |
 +---------+---------+-----------------------------------------------------+
 | ECDSA   | SHA-256 | SHA256:IhAp3mazHV8KY5N8DS8PN1ujBeXveyzl5BsX18BoLD0  |
 +---------+---------+-----------------------------------------------------+
 | ED25519 | SHA-256 | SHA256:B1+ingHtJgziwG+F6nsjd1JPZuaj8nkFw7qjtCna6lI  |
 +---------+---------+-----------------------------------------------------+

===== NODE GROUP =====                                                                                                                         
(1) mwdebug1003.eqiad.wmnet                                                                                                                    
----- OUTPUT of 'gen_fingerprints | head -10' -----                                                                                            
 +---------+---------+-----------------------------------------------------+                                                                   
 | Cipher  | Algo    | Fingerprint                                         |                                                                   
 +---------+---------+-----------------------------------------------------+
 | RSA     | SHA-256 | SHA256:jVDCim1iiKGeej886y0j/DjHqGzWaE0ZGDtPHwYR78Y  |
 +---------+---------+-----------------------------------------------------+
 | ECDSA   | SHA-256 | SHA256:qvZd2EfsDgOZnXIiRDkS2QvOivf0rssTsMkfNkQRCBw  |
 +---------+---------+-----------------------------------------------------+
 | ED25519 | SHA-256 | SHA256:we6nf+KA+YnO8P1EqdvtgACjl3/gLEjV3m9LTBPHzSs  |
 +---------+---------+-----------------------------------------------------+

I've set mwdebug1002 to "inactive" in conftool to unblock deployments.

Change 664625 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] DHCP: update MAC address for new mwdebug1002

https://gerrit.wikimedia.org/r/664625

Change 664625 merged by Dzahn:
[operations/puppet@production] DHCP: update MAC address for new mwdebug1002

https://gerrit.wikimedia.org/r/664625

Mentioned in SAL (#wikimedia-operations) [2021-02-16T18:59:31Z] <mutante> puppetmaster1002 - puppet cert clean mwdebug1002.eqiad.wmnet, sign new request, initial puppet run (T274023)

I've set mwdebug1002 to "inactive" in conftool to unblock deployments.

Thanks! mwdebug1002 has been recreated now and is also repooled after scap pull and checking Icinga was all green.

[mwdebug1002:~] $ gen_fingerprints
 +---------+---------+-----------------------------------------------------+
 | Cipher  | Algo    | Fingerprint                                         |
 +---------+---------+-----------------------------------------------------+
 | RSA     | SHA-256 | SHA256:YsjRX9Y4373VevRH9HU/g5I+ThWVPBrxtDH7Z9+H/wQ  |
 +---------+---------+-----------------------------------------------------+
 | ECDSA   | SHA-256 | SHA256:g8FdC/++YoLdeccpSLbZ4cRq2WdB3az3CSUsUPyDPHE  |
 +---------+---------+-----------------------------------------------------+
 | ED25519 | SHA-256 | SHA256:ZBvjHjViBArmAA+6o+b0x6GRs96X59s61FNFXIfreF0  |
 +---------+---------+-----------------------------------------------------+

Mentioned in SAL (#wikimedia-operations) [2021-02-16T20:20:47Z] <mutante> mwdebug1002 has been recreated on buster and has been repooled after scap pull - you can find a .tar.gz in your home with the contents of your home before reimaging, fingerprint at T274023#6835116

Mentioned in SAL (#wikimedia-operations) [2021-02-16T23:54:30Z] <mutante> puppetmaster1001 - puppet cert clean mwdebug1001, sign new request, initial puppet run, now on buster (T274023)

All users can find a .tar.gz in their home dir on each host that contains what was in their home before they were reimaged.

Fingerprints can be found on the new wiki pages linked above.

All mwdebug hosts are pooled, except mwdebug1003 which remains on stretch and depooled. (Because there was an ask to keep one in case users want to compare
something to stretch). It will simply be deleted.