Page MenuHomePhabricator

replace buster machines in devtools project
Closed, ResolvedPublic

Description

In the devtools project in wmcs we have some remaining buster machines.

These all should be deleted after replacing them with bullseye or bookworm machines.

This also shows as "This project contains Buster VMs. See https://wikitech.wikimedia.org/wiki/News/Buster_deprecation and https://os-deprecation.toolforge.org/ for details" in the 2024 purge wikitech page:

https://wikitech.wikimedia.org/wiki/News/Cloud_VPS_2024_Purge#traffic


  • jenkins-releases.devtools.eqiad1.wikimedia.cloud a6b8c13b-0545-48eb-8eb3-d6487ee48289 172.16.0.235 g3.cores2.ram4.disk20 Debian Buster
  • devtools-puppetdb1001.devtools.eqiad1.wikimedia.cloud 141ac13c-f0fa-46d3-9d2a-cede8bc854c6 172.16.2.201 g3.cores2.ram4.disk20 Debian Buster
  • deploy-1004.devtools.eqiad1.wikimedia.cloud 78f652ff-ae4b-4b0d-9c6c-1097efd258d2 172.16.6.54 g3.cores2.ram4.disk20 Debian Buster
  • puppetmaster-1001.devtools.eqiad1.wikimedia.cloud 76db63cd-0538-4015-8e9c-564d82107044 172.16.0.187 g2.cores2.ram4.disk40 Debian Buster
  • gerrit-prod-1001.devtools.eqiad1.wikimedia.cloud 48c5ff1c-2885-410d-beeb-d5a57a0a91c7 172.16.0.148 g2.cores2.ram4.disk40 Debian Buster
  • Was shutdown because Gerrit used T330312. I used that for Scap development and testing Gerrit upgrade. Nothing needed there

The puppetmaster part of this is T360470.

Event Timeline

LSobanski triaged this task as Medium priority.
LSobanski moved this task from Incoming to Backlog on the collaboration-services board.

gerrit-prod-1001 - wasn't reachable via ssh, soft rebooted it, couldn't ssh as regular user still, but could get in with my separate global root access, independent of project. found this:

The last Puppet run was at Mon Feb 13 22:02:09 UTC 2023 (606106 minutes ago). Puppet is disabled. Scap is upgrading Gerrit - gerrit-deploy

regarding the puppetdb server - it was created in 2022 by @jbond and I couldn't remember why exactly we did it - to be able to run cumin in cloud VPS maybe?

Very few people have ever logged in and I see also basically nothing in instance log besides like one crash and one reboot. Asking around a bit others also weren't sure.

I shut it down just now basically to see what happens and if anything breaks or complaints if I do that.

Mentioned in SAL (#wikimedia-cloud) [2024-04-11T18:23:00Z] <mutante> - shutting down puppetmaster-1001 on buster - should now be replaced by puppetmaster-1003 on bookworm (thanks brennen) T360964 T360470

Change #1019109 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] cloud/devtools: switch default puppetmaster from 1001 to 1003

https://gerrit.wikimedia.org/r/1019109

Change #1019109 merged by Dzahn:

[operations/puppet@production] cloud/devtools: switch default puppetmaster from 1001 to 1003

https://gerrit.wikimedia.org/r/1019109

jenkins-releases.devtools was deleted by jnuche after he confirmed it was once his test instance and not used anymore

gerrit-prod-1001.devtools.eqiad1.wikimedia.cloud can be deleted: was shutdown because Gerrit used T330312. I used that for Scap development and testing Gerrit upgrade. Nothing needed there
gerrit-bullseye-test.devtools.eqiad1.wikimedia.cloud was created by @Dzahn in April 2023 I guess that was to reimage the Gerrit servers. We can get rid of it.

Mentioned in SAL (#wikimedia-releng) [2024-04-16T15:30:02Z] <hashar> devtools: deleted obsolete Buster instances gerrit-bullseye-test and gerrit-prod-1001 # T360964

After chatting about the puppetdb server with John et al, I shut it down, ran puppet on all agents to confirm there were no errors like ~ "could not upload facts" that would indicate puppet is configured to use puppdb and saw none.

So going to delete that instance.

Mentioned in SAL (#wikimedia-cloud) [2024-04-16T17:09:33Z] <mutante> - deleting devtools-puppetdb1001 instance (T360964)

Mentioned in SAL (#wikimedia-operations) [2024-04-16T20:22:48Z] <mutante> CI - re-enabled jenkins and zuul-merged on contint1002 after distro upgrade - T360964

puppetmaster-1001 shut down but not deleted just yet

gerrit-prod-1001.devtools.eqiad1.wikimedia.cloud can be deleted: was shutdown because Gerrit used T330312. I used that for Scap development and testing Gerrit upgrade. Nothing needed there

Thanks for doing this!

gerrit-bullseye-test.devtools.eqiad1.wikimedia.cloud was created by @Dzahn in April 2023 I guess that was to reimage the Gerrit servers. We can get rid of it.

This was not my original intention and for this ticket which was only about buster. I think we needed to keep that. It only took days to get a request for the missing Gerrit test instance --> T363196

Mentioned in SAL (#wikimedia-cloud) [2024-04-24T19:41:21Z] <mutante> deleting instance puppetmaster-1001 that was > 4 years old, on buster and I had shutdown a couple days ago. replaced by puppetmaster-1003 (bookworm, puppetserver) T360964 T360470

created new "puppet prefix" in Horizon called "deploy" - then used it to apply role and needed Hiera keys for a deployment server to all hosts starting with deploy*

Dzahn changed the task status from Open to Stalled.EditedWed, Apr 24, 7:59 PM

deploy-1006 is on bullseye and the current status is "puppet error with duplicate declaration related to initializing scap" and would be the first deployment_server on bullseye - so issues are expected

deploy-1004 is still up and current status is "puppet error related to geoip and volatile which is related to puppetmaster5->puppetserver7 migration" and still buster

So both are kind of broken but both aren't simple fixes.

For now shut the new instance down again so that we don't get puppet nag emails every day.

per comments on https://gerrit.wikimedia.org/r/c/operations/puppet/+/820749/14/modules/scap/manifests/master.pp#69 (Thanks Jaime Nuche!) - added profile::mediawiki::scap_client::is_master: true to deploy-prefix Hiera which fixes the duplicate declaration related to scap init on deploy-1006. machine started.

new bullseye deployment server runs into known issue T257317 with scap init again

Change #1026197 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] devtools: update gerrit and phab instance names in default Hiera

https://gerrit.wikimedia.org/r/1026197

Change #1026193 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] mediawiki/geoip: make loading geoip data from puppetserver optional

https://gerrit.wikimedia.org/r/1026193

Change #1026197 merged by Dzahn:

[operations/puppet@production] devtools: update gerrit and phab instance names in default Hiera

https://gerrit.wikimedia.org/r/1026197

Change #1026698 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] cloud/devtools: replace deploy-1004 with deploy-1006

https://gerrit.wikimedia.org/r/1026698

Change #1026698 merged by Dzahn:

[operations/puppet@production] cloud/devtools: replace deploy-1004 with deploy-1006

https://gerrit.wikimedia.org/r/1026698

Mentioned in SAL (#wikimedia-cloud) [2024-05-02T23:10:29Z] <mutante> replacing deploy-1004 (buster) with deploy-1006 (bullseye) as new deployment server in both repo and Horizon hiera T360964 T363415

Dzahn changed the task status from Stalled to In Progress.Mon, May 6, 8:07 PM

Most issues with deploy-1006 are now fixed. And the remaining one is related to T360470 and the same on both old and new deployment server.

A change for that is still in code review.

But no reason to keep the old deployment server around anymore now.

It's also already shutdown, just not deleted. And if we keep it up it just generates more nagging mails.

Mentioned in SAL (#wikimedia-cloud) [2024-05-06T20:37:05Z] <mutante> deleting buster deployment server deploy-1004, replaced by deploy-1006 - T360964

Dzahn updated the task description. (Show Details)

no more buster machines in devtools