Page MenuHomePhabricator

GitLab Runners in WMCS are offline
Closed, ResolvedPublic


GitLab Runners in WMCS are showing as offline in the GitLab UI and are also unreachable via SSH. Timing indicates that this is related to T342621: eqiad1: cloudlb: transition DNS clients (VMs) to the new BGP-based recursor VIP.

Firewall rules have been updated in but application on the hosts is failing.

Event Timeline

Change 956784 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab_runner: change docker_subnet in WMCS

Change 956784 merged by Jelto:

[operations/puppet@production] gitlab_runner: change docker_subnet in WMCS

I hadn't realised we had a potential clash here. Unsure exactly what the answer is.

Assuming the affected machines running docker containers are VMs on you can potentially add a work-around to improve the situation until Jelto's above patch is rolled out / working everywhere, by adding static routes for the unreachable IPs via the gateway, i.e.

ip route add via

The more-specific mask on that route would take precedence over the range assigned to the local docker0 bridge and traffic should get to the affected (non-docker) hosts using 172.20.x.x addressing.

aborrero claimed this task.

Solved with:

user@laptop:~$ ssh -o StrictHostKeyChecking=no "ip route delete ; run-puppet-agent"

on the bunch of affected hosts.

Thanks @aborrero for fixing all WMCS runners!

Additional to the workaround it was necessary to delete and re-create the gitlab-runner docker network. The following command was used:

systemctl stop docker-resource-monitor.service ; systemctl stop buildkitd.service ; docker network rm gitlab-runner ; run-puppet-agent

The network looks good an all WMCS runners now:

docker network inspect gitlab-runner