Page MenuHomePhabricator

restbase.svc.eqiad.wmnet directs requests to staging if the origin is staging too
Closed, ResolvedPublic

Description

restbase.svc.eqiad.wmnet is used for distributing the load between restbase10* hosts. However, if requests are issued for restbase.svc.eqiad.wmnet:7231 from staging (xenon, cerium, praseodymium), LVS directs these requests back to staging instead of targeting the production cluster. It should not do that.

Event Timeline

The staging hosts have the LVS service IP (restbase.svc.eqiad.wmnet, 10.2.2.17) bound to their loopback IP - as every LVS backend needs to.

The consequence is, that if an LVS real-server (backend server) tries to talk to the LVS service, it's actually talking to itself - equivalent to 127.0.0.1

I don't think they're currently puppetized for lvs::realserver, but it looks like the machines had such a configuration in the past, and removing it from puppet doesn't remove the effects from the host. Probably need to remove the wikimedia-lvs-realserver package from the host, and/or remove /etc/default/wikimedia-lvs-realserver, and/or manually remove the IP from the loopback?

Mentioned in SAL (#wikimedia-operations) [2017-11-01T15:00:58Z] <mobrovac> restbase: removing wikimedia-lvs-realserver from staging hosts T179494

mobrovac claimed this task.
mobrovac edited projects, added Services (done); removed Services (watching).

Ok, after a round of apt-get remove --purge wikimedia-lvs-realserver && ip addr del 10.2.X.17/32 dev lo in both DCs, the LVS doesn't point to any of the staging hosts anymore. Thanks @mark and @BBlack for the swift help in diagnosing the issue!