The CI production servers are using Debian Jessie and have to be upgrade. We will go for Buster. The hosts are:
Host | Role |
---|---|
contint1001.wikimedia.org | Primary |
contint2001.wikimedia.org | Spare |
The services to be migrated are:
- Zuul and https://integration.wikimedia.org/zuul/
- Need port to scap with wheels for Buster: T215458
- Additional patches to use the scap deployed repo when using Buster.
- Jenkins https://integration.wikimedia.org/ci/
- Lot of configuration and data to rsync
- Website https://integration.wikimedia.org/
- Served through the WMF cache layer (ATS)
- Might have issues with the Apache upgrade.
- docker-pkg
- Buster is supported since Gerrit 585451
- Updated on April 2nd. Will need to be redeployed after upgrade: scap deploy --limit contint.*
- Docker
- We can afford to loose the containers. They will be redownloaded from the registry if need be.
- Pipeline containers building
Migration
The overall sequence is to upgrade contint2001, migrate the services to it, upgrade contint1001, move the services back to contint1001.
Zuul to scap
Zuul has been deployed using a Debian package but that methods is painful for everyone. On Buster we will deploy it using scap. We can do all the scap related work before upgrading from Jessie to Buster, we need to feature switch based on the target distribution so that a host still reies on the Debian package as long as it is still using Jessie.
- Craft a scap deployment repository for Zuul
- Get puppet patches to vary based on the Distribution
The deployment on a host can only be done after it has been upgraded to Buster.
- Update Zuul deployment process at https://www.mediawiki.org/wiki/Continuous_integration/Zuul
contint2001 upgrade
zuul-merger
~~~~~~~~~
The sole production service being run on contint2001 is zuul-merger:
- Use puppet to disable zuul-merger
- reinstall with Buster
- deploy the zuul scap to the machine: scap deploy --limit contint2001
- Use puppet to enable zuul-merger https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/589013
- Start the service and verify
docker-pkg
~~~~~~~~
For contint2001:
- redeploy docker-pkg scap deploy --limit contint2001
- update the fabfile.py for deploy_docker https://gerrit.wikimedia.org/r/#/c/integration/config/+/587963/
- try a rebuild of containers. That should download them.
- That only download them when one has to be actually build.
Puppet run on contint2001 without errors / missing packages built:
- E: Unable to locate package blubber
- E: Unable to locate package zuul
- E: Unable to locate package helm
- E: Unable to locate package helmfile
- E: Unable to locate package helm-diff
- E: Unable to locate package kubernetes-client
- mod_php_7.3 - ERROR: Module mpm_event is enabled - cannot proceed due to conflicts. It needs to be disabled first (known issue -> https://gerrit.wikimedia.org/r/c/operations/puppet/+/451206)
Jenkins agent
~~~~~~~~~~~
- Add contint2001 as a Jenkins agent (copying contint1001)
- Disable contint1001 agent
- Update jobs in integration/config to point to contint2001
Migrate
- Stop zuul, zuul, jenkins on contint1001
- rsync data
- change DNS backend for contint.wikimedia.org
- update fabfile to have zuul reloaded on contint2001 instead of contint1001
- Set contint2001 as master in Puppet / Hiera
- Start Jenkins, verify that agents are connected and jobs set
- Start Zuul scheduler
contint1001 upgrade
- reinstall with Buster
- deploy the zuul scap to the machine: scap deploy --limit contint1001 - deployed as part of
- redeploy docker-pkg scap deploy --limit contint1001
-
update the fabfile.py for deploy_dockerWe now use contint.wikimedia.org
- Stop zuul, zuul-merger, jenkins on contint2001
- rsync data
- change DNS backend for contint.wikimedia.org
-
update fabfile to have Zuul reloaded on contint1001 instead of contint2001We now use contint.wikimedia.org - Set contint1001 as master in Puppet / Hiera
- Start Jenkins, verify that agents are connected and jobs set
- Start Zuul scheduler