Page MenuHomePhabricator

dbproxy1018 network interface down
Closed, ResolvedPublic

Description

jclark-ctr: XioNoX: dbproxy1018 is in C5 and went down at the same time, could you double-check you didn't accidentally do something to it?

Replaced cable , rebooted server, tested network port with laptop,

Tested new port on switch link returned.

Please configure switch to interface 6

Event Timeline

{master:2}[edit]
papaul@asw2-c-eqiad# run show interfaces ge-5/0/6 descriptions
Interface       Admin Link Description
ge-5/0/6        up    up   dbproxy1018

{master:2}[edit]
papaul@asw2-c-eqiad# run show interfaces ge-5/0/1 descriptions
Interface       Admin Link Description
ge-5/0/1        down  down DISABLED

This is complete

Kappakayala subscribed.

Removing serviceops tag as I believe there is no action required from serviceops team. Please feel free to re-tag and comment if there is anything need from serviceops team.

Marostegui added subscribers: nskaggs, fnegri, bd808 and 2 others.

This host belongs to cloud-services-team (or Data-Engineering ?)
The proxy needed a reload, because clouddb1018 was showing down, but it was really not.

I have issued the reload and it is all back again

@Andrew depooled dbproxy1018 on Friday when it went down, then repooled it one hour later when it was back online, and traffic seemed to flow correctly through it based on the haproxy logs (but maybe not to all destinations?).

haproxy on dbproxy1018 was automatically reloaded by Puppet when the server was repooled, so I'm struggling to understand why that reload wasn't enough.

Where was clouddb1018 showing down? Was it an icinga alert?

It was showing down on haproxy itself. So it wasn't reloaded until I did it manually.

@Marostegui you're right, I was confusing dbproxy with clouddb* proxies. Puppet only reloaded haproxy on clouddb-wikireplicas-proxy-2.clouddb-services.eqiad1.wikimedia.cloud, but not on dbproxy1018 because no Puppet change was made there.

My guess is then that when dbproxy1018 was rebooted, it failed to connect to clouddb1018 at the moment haproxy started after the reboot, and did not reconnect until the manual reload one day after.