Page MenuHomePhabricator

Move CI instances to use ceph in WMCS
Closed, ResolvedPublic

Description

WMCS has added Ceph to hold the images, it is not generalized yet but we can be part of the first adopters. Their high level task for now is T253365

In order to do so, we would simply need to use the flavor mediumram-ceph when creating a new instance.

It might not be doable to migrate on a per instance basis, so we might have to reprovision the instance entirely. That in turns might conflict with rebuilding the fleet to use Buster (T252071: Move all Wikimedia CI (WMCS integration project) instances from stretch to buster/bullseye). Cause surely we do not want to do both changes at the same time.

Event Timeline

Mentioned in SAL (#wikimedia-releng) [2020-08-20T15:40:57Z] <hashar> Created dummy instance integration-agent-docker-1020 using a Ceph backed hypervisor # T260916

Mentioned in SAL (#wikimedia-releng) [2020-08-20T16:02:51Z] <hashar> Added integration-agent-docker-1020 to Jenkins # T260916

Mentioned in SAL (#wikimedia-releng) [2020-08-20T16:07:13Z] <hashar> Depooled integration-agent-docker-1020 to Jenkins cant connnect to /var/run/docker.sock # T260916

hashar triaged this task as Medium priority.Aug 20 2020, 6:52 PM

Mentioned in SAL (#wikimedia-releng) [2020-10-16T07:48:42Z] <hashar> Disabling integration-agent-docker-1020 (the sole agent using Ceph): it is too slow # T260916 T265615

Mentioned in SAL (#wikimedia-releng) [2020-10-26T11:03:32Z] <hashar> Bring back integration-agent-docker-1020 . It is not the only one affected by T265615 which probably rules out Ceph as the slowness root cause (T260916)

Turns out Ceph adds a lot of latency and affects the CI jobs. Filed T266777

I guess we can probably decline this task since the migration happened anyway :-\

Andrew claimed this task.

All VMs have been moved to Ceph so this is done.