The CI production servers are using Debian Jessie and have to be upgrade. We will go for Buster. The hosts are:
| Host | Role
|--|--
| contint1001.wikimedia.org | Primary
| contint2001.wikimedia.org | Spare
The services to be migrated are:
* #zuul and https://integration.wikimedia.org/zuul/
** Need port to scap with wheels for Buster: T215458
** Additional patches to use the scap deployed repo when using Buster.
* #jenkins https://integration.wikimedia.org/ci/
** Lot of configuration and data to rsync
* Website https://integration.wikimedia.org/
** Served through the WMF cache layer (ATS)
** Might have issues with the Apache upgrade.
* `docker-pkg`
** Buster is supported since [[ https://gerrit.wikimedia.org/r/#/c/operations/docker-images/docker-pkg/deploy/+/585451/ | Gerrit 585451 ]]
** Updated on April 2nd. Will need to be redeployed after upgrade: `scap deploy --limit contint.*`
* Docker
** We can afford to loose the containers. They will be redownloaded from the registry if need be.
* Pipeline containers building
Migration
=========
The overall sequence is to upgrade contint2001, migrate the services to it, upgrade contint1001, move the services back to contint1001.
Zuul to scap
-------------
Zuul has been deployed using a Debian package but that methods is painful for everyone. On Buster we will deploy it using scap. We can do all the scap related work before upgrading from Jessie to Buster, we need to feature switch based on the target distribution so that a host still reies on the Debian package as long as it is still using Jessie.
[X] Craft a scap deployment repository for Zuul
[X] Get puppet patches to vary based on the Distribution
The deployment on a host can only be done after it has been upgraded to Buster.
[ ] Update Zuul deployment process at https://www.mediawiki.org/wiki/Continuous_integration/Zuul
{icon check color=green} contint2001 upgrade
--------
{icon check color=green} zuul-merger
~~~~~~~~~
The sole production service being run on contint2001 is `zuul-merger`:
[X] Use puppet to disable zuul-merger
[X] reinstall with Buster
[X] deploy the zuul scap to the machine: `scap deploy --limit contint2001`
[X] Use puppet to enable zuul-merger https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/589013
[X] Start the service and verify
{icon check color=green} docker-pkg
~~~~~~~~
For contint2001:
[x] redeploy `docker-pkg` `scap deploy --limit contint2001`
[x] update the fabfile.py for `deploy_docker` https://gerrit.wikimedia.org/r/#/c/integration/config/+/587963/
[x] try a rebuild of containers. That should download them.
** That only download them when one has to be actually build.
Puppet run on contint2001 without errors / missing packages built:
[x] E: Unable to locate package blubber
[x] E: Unable to locate package zuul
[x] E: Unable to locate package helm
[x] E: Unable to locate package helmfile
[x] E: Unable to locate package helm-diff
[x] E: Unable to locate package kubernetes-client
[x] mod_php_7.3 - ERROR: Module mpm_event is enabled - cannot proceed due to conflicts. It needs to be disabled first (known issue -> https://gerrit.wikimedia.org/r/c/operations/puppet/+/451206)
{icon check color=green} Jenkins agent
~~~~~~~~~~~
[x] Add contint2001 as a Jenkins agent (copying `contint1001`)
[x] Disable contint1001 agent
[x] Update jobs in integration/config to point to `contint2001`
{icon check color=green} Migrate
------
[x] Stop zuul, zuul, jenkins on `contint1001`
[x] **rsync data**
[x] change DNS backend for `contint.wikimedia.org`
[x] update fabfile to have zuul reloaded on contint2001 instead of contint1001
[x] Set contint2001 as master in Puppet / Hiera
[x] Start Jenkins, verify that agents are connected and jobs set
[x] Start Zuul scheduler
contint1001 upgrade
-----------------------
[x] reinstall with Buster
[ ] deploy the zuul scap to the machine: `scap deploy --limit contint1001`
[ ] redeploy `docker-pkg` `scap deploy --limit contint1001`
[ ] ~~update the fabfile.py for `deploy_docker`~~ We now use `contint.wikimedia.org`
[ ] Stop zuul, zuul-merger, jenkins on `contint2001`
[ ] **rsync data**
[ ] change DNS backend for `contint.wikimedia.org`
[ ] ~~update fabfile to have Zuul reloaded on contint1001 instead of contint2001~~ We now use `contint.wikimedia.org`
[ ] Set contint1001 as master in Puppet / Hiera
[ ] Start Jenkins, verify that agents are connected and jobs set
[ ] Start Zuul scheduler