Page MenuHomePhabricator

Communicate wikireplicas outage and healthcheck the system after Eqiad Row C network changes
Closed, ResolvedPublic

Description

Both dbproxy1018 and dbproxy1019 are in row C and apparently in the same rack, C5. When the row is down for maintenance, the wikireplicas will be inaccessible to cloud users of any kind (VPS, Toolforge or PAWS).

Event Timeline

aborrero triaged this task as High priority.
aborrero moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.

I'll leave this ping test running for the time of the operation:

aborrero@clouddb-wikireplicas-proxy-1:~$ ping wikireplicas-b.wikimedia.org
PING wikireplicas-b.wikimedia.org (208.80.154.243) 56(84) bytes of data.
64 bytes from wikireplicas-b.wikimedia.org (208.80.154.243): icmp_seq=1 ttl=60 time=0.442 ms
64 bytes from wikireplicas-b.wikimedia.org (208.80.154.243): icmp_seq=2 ttl=60 time=0.411 ms
[..]
aborrero@clouddb-wikireplicas-proxy-2:~$ ping wikireplicas-a.wikimedia.org
PING wikireplicas-a.wikimedia.org (208.80.154.242) 56(84) bytes of data.
64 bytes from wikireplicas-a.wikimedia.org (208.80.154.242): icmp_seq=1 ttl=60 time=0.446 ms
64 bytes from wikireplicas-a.wikimedia.org (208.80.154.242): icmp_seq=2 ttl=60 time=0.439 ms
[..]

I'll report back what I see, packets loss etc.

Done. No packet drop detected on this test.