Page MenuHomePhabricator

Upgrade deployment-prep deploy* hosts to Buster
Open, Stalled, Needs TriagePublic

Description

According to T265653 deploy* hosts work just fine on Buster, replace deployment-deploy[01-02] hosts with Buster hosts.

Tagging RelEng since your help is needed to make the required changes in Jenkins.

deployment-deploy01 also has some sort of apt repo, not sure how that works and what's needed for that.

TODO: are two hosts needed?

Event Timeline

Mentioned in SAL (#wikimedia-releng) [2021-03-30T13:35:54Z] <Majavah> create and install deployment-deploy03 T278689

Mentioned in SAL (#wikimedia-releng) [2021-03-30T14:50:03Z] <Majavah> cherry pick 675807 675814 and 675815 to deployment-puppetmaster to unblock work on deployment-deploy03 until sre has merged those T278689

deployment-deploy03 has now scap set up. I have not done anything to the aptly repo yet. My understanding is that we can just switch the publish Jenkins jobs to that host, have first scap debs deployed to it, and then switch all clients over.

Next step would be to switch Jenkins jobs to it and change hiera values to make deploy03 the primary deployment server. I don't have enough access in Jenkins to do that.

deployment-deploy03 has now scap set up. I have not done anything to the aptly repo yet. My understanding is that we can just switch the publish Jenkins jobs to that host, have first scap debs deployed to it, and then switch all clients over.

Next step would be to switch Jenkins jobs to it and change hiera values to make deploy03 the primary deployment server. I don't have enough access in Jenkins to do that.

Everyone in the https://ldap.toolforge.org/group/ciadmin group should be able to help i.e., everyone on Release-Engineering-Team

The steps here will be:

  1. Ensure that deployment-deploy03 is setup and ready
  2. Add deployment-deploy03 as a jenkins node: https://integration.wikimedia.org/ci/computer/new
  3. Apply the jenkins label BetaClusterBastion to deployment-deploy03, remove from deployment-deploy01: https://integration.wikimedia.org/ci/computer/deployment-deploy01/configure and https://integration.wikimedia.org/ci/computer/deployment-deploy03/configure
  4. Test the jobs tied to that label: https://integration.wikimedia.org/ci/label/BetaClusterBastion/ (5: beta-code-update-eqiad, beta-mediawiki-config-update-eqiad, beta-publish-deb, beta-scap-eqiad, beta-update-databases-eqiad -- beta-publish-deb last ran 9 mo. ago so maybe it's not used anymore?

beta-publish-deb last ran 9 mo. ago so maybe it's not used anymore?

Last failure was 9 months ago, last run was ~2.5h ago as it's run after every commit merged to scap.

Majavah changed the task status from Open to Stalled.Mar 30 2021, 5:47 PM
Majavah added a subscriber: hashar.

This is basically blocked on T277078, as it needs profile::ci::slave::labs::common to not require LVM storage, the new host is on Cinder.

Change 700426 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] beta: remove deployment-deploy02

https://gerrit.wikimedia.org/r/700426

Mentioned in SAL (#wikimedia-releng) [2021-06-19T13:44:46Z] <majavah> remove deployment-deploy02 T278689

Change 700426 merged by RLazarus:

[operations/puppet@production] beta: remove deployment-deploy02

https://gerrit.wikimedia.org/r/700426

Majavah removed a project: User-Majavah.