Page MenuHomePhabricator

Some mw-api-int traffic is going cross-DC
Open, Needs TriagePublic

Description

with things configured as they are, for many services we're getting to see both 'sides' of the service mesh -- for instance https://trace.wikimedia.org/trace/985e9d1153fa00ca178442cd37b5876c

if you drill into the tags for the two inner GET http://commons.wikimedia... spans, you can see in the k8s.node.name values that the outer one is the 'local' listener for the original query, and the inner (and delayed!) one is on another host, in another DC:

image.png (775×1 px, 230 KB)

Event Timeline

From discussion on IRC:

  • Mediawiki likely needs to know to use the ro endpoint for internal self-GETs, but to use the rw/existing endpoint for self-POSTs (probably not presently possible)
  • Envoy service mesh configuration needs support for overriding the ro endpoint to localhost

This ticket got me thinking about how we set up MW loopbacks originally, and why we did it this way.

Background:

In short:

  • Prior to 2021, MediaWiki used $wgLocalVirtualHosts to discover host names that its own server is responsible. Thus in the rare case of two parts of MW talking to each other over HTTP, it would send internal requests for host names on this list, to localhost:80 instead of actually attempting to internally connect to the canonical hostname.
  • In 2021, Kunal added $wgLocalHTTPProxy to be able to set this to something else. We set it to localhost:6501, which is envoy:mwapi, which routes to api-rw.discovery.wmnet.
  • In k8s, we map envoy:mwapi to mw-api-int, which while not named as "rw", this is implicit given a of -ro suffix, and thus goes to the primary dc, akin to api-rw.discovery.wmnet in the past.

I guess we picked that because:

  • 2021 is before Multi DC and so maybe we wanted to play it safe by some measure.
  • or, we knew of cases where MW would make writes and wanted to make sure those worked (I can't think of any).
  • or, we didn't know of any cases, but wanted to play it safe and as first step not rule anything out. I doubt this because at the time there wasn't any distinction between -ro and -rw afaik on bare metal within the primary dc (certainly no enforcement afaik?) and we didn't have secondary DC traffi cyet.
  • or, appservers-rw was simply the default we used for most stuff and just used that out of habit.

In any event, it's 2024 now and we're multi DC. One of the things we changed for multi-DC is that the secondary DC isn't sitting passive with locked down configuration to prevnet accidental writes. MediaWIki is configured to read from local replicas and use local servives, but if a DB write does happen, it goes to the correct primary DB. This is allowed, enabled, and we have trickle of post-send deferred updates on some GET requests that make such writes.

I make these assumptions, that I haven't recently verified:

  • mw-api-int-ro and -rw are identical within the primary DC.
  • there is no enforcement or technical barrier that prevents MW when addressed as -ro from making writes, other than being in the secondary DC and thus writes would be slow, and (if not post-send) get logged as violations in Logstash.
  • there are no known cases of MW using POST requests over loop back.
  • there are definitely no nown cases of MW using POST loopback during a GET request.

If the above are correct, I suggest changing envoy:mwapi upstream from mw-api-int.discovery.wmnet" to mw-api-int-ro.discovery.wmnet". In the rare case that POST loopbacks exist, and the even rarer case of POST-during-GET, these would work fine.

I don't recommend changing it yet, though. Someone should verify whether either of these cases actually exist, whether they're used pre-send, whether the response is user-facing, whether the request in question is influenced by user input or otherwise likely to have a high tail latency, and how common these requests are.

I expect the outcome to either be: Nothing impactful and fine to change. Or bad enough to fix first, and then fine to change after that. In terms of telemetry, the DB monitoring code in MW would not have been able to see it, since on bare metal the HTTP layer basically would have laundered the HTTP verb, with the parent GET having no DB writes under it in Logstash, and the child is a clean POST with DB write allowance.

Older task: T347781: MWHttpRequest should not route read requests to the primary DC

There's also some related discussion in T342201#9169413 and following comments. As noted there, we could easily make MWHttpRequest select which proxy to use, and add a flag for the caller to declare GET requests as needing to go to the primary DC, if we have a use case for that.

There's quite a bit of incorrect information in the wall of text above, but to actually keep it short:
We went with just pointing to the read-write api because there's no system, within MediaWiki, to split requests between write and read and we didn't want to add ad-hoc logic to the service mesh just for that.

I personally don't think this is a big issue, and we can probably also measure how often this happens, and what's the impact of the additional 30 ms RTT on average performance in the secondary datacenter.

I should add, wgLocalHTTPProxy and all those mechanisms are hacks, and if we want to do such things the right way, we should have a proper configuration table for read-only and read-write paths for different domains, and then instruct the application writers to use one or the other. I'd avoid replicating the hack we have at the traffic layer, because say one request gets erroneously sent to the wrong datacenter - now instead of paying 30 ms in total of penalty, we'll pay 30 ms per query to the mysql master.