Page MenuHomePhabricator

Migrate contint* hosts to Buster
Open, MediumPublic

Description

These are currently running jessie:

  • contint1001.wikimedia.org
  • contint2001.wikimedia.org

Hosts:

Details

Due Date
Mar 29 2020, 10:00 PM
Related Gerrit Patches:

Event Timeline

Dzahn added a subscriber: Dzahn.May 29 2019, 11:50 PM
ArielGlenn triaged this task as Medium priority.Jun 11 2019, 7:51 AM

From a quick chat with @MoritzMuehlenhoff :

python2.7 is still in Stretch / Buster and somehow will keep being supported even after upstream deadline of 2020. So we can carry the current outdated Zuul (python2.7 based) to a new distribution.

Jessie is EOL June 2020. SRE internal deadline is March 2020 for the Wikimedia infrastructure.

The CI service is sensible to latency with Gerrit. I am pretty sure we had issues when Gerrit temporarily ran in codfw. So probably we do not want to migrate via contint2001.codfw.wmnet.

Buster seems to be the best target given we would have to migrate out of Stretch end of 2020, that saves us a migration.

Most probably we can take the opportunity to migrate some of the services to Ganeti VM (https://wikitech.wikimedia.org/wiki/SRE_Team_requests#Virtual_machine_requests_(Production), just like we had doc.wikimedia.org migrated out of the box.

Ultimately each service could live on their own little VM and we could dispose of the contint machine. Left to be determined is wether we will need the horse power of those production machines to build containers (docker-pkg, service pipeline) or for an hypothetical data store/search system.

I have formally added this task to the Release-Engineering-Team offsite agenda in November 2019.

hashar renamed this task from Migrate contint* hosts to Stretch/Buster to Migrate contint* hosts to Buster.Sep 20 2019, 1:46 PM
hashar set Due Date to Mar 29 2020, 10:00 PM.
hashar updated the task description. (Show Details)Sep 20 2019, 1:49 PM

@hashar, rather than blocking the migration to buster on new hardware, couldn't we just build out contint2001 on buster and then migrate prod services to point to that, rebuild contint1001 into buster, and then point back? We could then do the hardware migration later, rather than hold everything up for that.

AFAIAA we've never switched master for CI… I know we're not meant to have no fail-over, but in practice we don't anyway, right? A backup you don't test doesn't exist.

Indeed that is a very good point @Jdforrester-WMF , which if feasible would mean we can drop the hardware request at T239880.

Indeed that is a very good point @Jdforrester-WMF , which if feasible would mean we can drop the hardware request at T239880.

We will still want the new hardware when we reach hardware EOL, but that's not very soon, yeah.

Change 560365 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] install_server: switch contint2001 from jessie to buster

https://gerrit.wikimedia.org/r/560365

Change 560365 merged by Dzahn:
[operations/puppet@production] install_server: switch contint2001 from jessie to buster

https://gerrit.wikimedia.org/r/560365