Page MenuHomePhabricator

Propagate x-request-id header from MultiHttpClient in MediaWiki (e.g. SessionStore)
Closed, ResolvedPublic

Description

I observed this in production while trying to verify the functionality of T320559.

When MediaWiki is making its HTTP call to SessionStore, the only headers included in the request are Host, Accept, content-length, and user-agent. A sample request (captured from a host in production) is below

GET /sessions/v1/enwikibooks%3AMWSession[redacted] HTTP/1.1
Host: localhost:6006
Accept: */*
content-length: 0
user-agent: wikimedia/multi-http-client v1.0

Port 6006 is where Envoy service proxy is listening for sessionstore queries, per configuration.

I would expect x-request-id to be propagated, as it is definitely set on the incoming requests. I'd also expect tracestate/traceparent to be propagated if present

Event Timeline

CDanis updated the task description. (Show Details)
CDanis added a subscriber: pmiazga.
Krinkle renamed this task from MediaWiki not propagating x-request-id header to calls to SessionStore to Propagate x-request-id header from MultiHttpClient in MediaWiki (e.g. SessionStore).Aug 28 2023, 1:14 PM

From a quick check, sessions can use RESTBagOStuff to store/retrieve session data. When setting up a RESTBagOStuff we can specify a custom HTTP client, or it will use MultiHttpClient by default.

It looks like it's a third HTTP client. And from quick skim, it doesn't look like it supports any kind of telemetry atm. […] I also noticed that there is a plenty of services that use this MultiHttpClient that most likely are affected too:

  • EventBus
  • Etcd Configs
  • SwiftFileBackend
  • VirtualRestServiceClient

The class doc of MultiHttpClient (which is a generic PHP lib that should be independent of MW) says that HttpRequestFactory::createMultiClient should be injected. I think that might not be the case yet, although even if it were, it looks like that doesn't propagate it either.

Cause

This is caused by MultiHttpClient using raw curl_*() calls instead HttpRequestFactory to get the HTTP client. On Prod env sessions use kask-session object cache, which uses RESTBagOStuff. The RESTBagOStuff could use a specific HTTP client - but we do not specify one via config (https://github.com/wikimedia/operations-mediawiki-config/blob/811b3dad11ea1cb629e06f71d94ab4e7208ccdfd/wmf-config/CommonSettings.php#L596C18-L596C3), therefore it fallback to default MultiHttpClient library to send requests (https://github.com/wikimedia/mediawiki/blob/f071c22a9a3e7e399dcf3256c96a952f15291a69/includes/libs/objectcache/RESTBagOStuff.php#L149). I also noticed it can support only MultiHttpClient and we cannot pass anything that extends MWHttpRequest which supports telemetry headers.

When curl extension is present, the MultiHttpClient will use curl to send requests (instead of Guzzle). That's why we don't see x-request-id nor tracestate/traceparent headers.

Possible Solutions

  1. Rewrite the ObjectCache to pass the HttpRequestFactory (from MediaWikiServices) to RESTBagOStuff to initialize the default HTTP client. From what I heard from @BPirkle, the curl implementation was a bit faster than our default HTTP Client (which is Guzzle atm). This solution may cause problems with performance. Also, it would require a little bit of rework in the RESTBagOStuff class to make it support the PSR HTTP Client. From what we see - the RESTBagOStuff never uses MultiCalls, it always sends a single request to GET/SET - therefore we can easily pass our own HTTP client instead of using MultiHttpClient.
  1. In CommonSettings.php object cache definition for sessions we could inject the Guzzle client or the http factory. This is most likely not an option as we would need to use MediaWIki services in config - to get the HTTPRequestFactory, plus most of the points from solution 1 still apply as the RESTBagOStuff only calls $this->client->run($req) - it doesn't know how to interact with Guzzle.
  1. Disable curl on production - when curl doesn't exist, the MultiHttpClient will fallback to MediaWikiServices::getInstance()->getHttpRequestFactory()->create() which will include telemetry headers. But most likely curl is required by other things, therefore most likely also not an option - especially because Guzzle might require curl to work.
  1. In ObjectCache, when we initialize the RESTBagOStuff we could inject the Telemetry class. Then in the RESTBagOStuff before it sends a request we could add telemetry headers. From what I see the MultiHttpClient will respect $req['headers'] array and add those to the request.
  1. We can inject Telemetry headers directly in the MultiHttpClient when settings curl_setopt($ch, CURLOPT_HTTPHEADER) but I rejected this solution immediately as MultiHttpClient should be a generic PHP lib.

I'm leaning towards solution #4 - as it's one of the cleanest and fastest. It doesn't require changing the MultiHTTPClient, and it additionally would fix passing telemetry headers in other places (EventBus, SwiftFileBackend, EtcdConfigs and VirtualRestServiceClient).
The second bet would be to go with #1, to simplify our code and stick to our own HTTP factory everywhere. This would also require us to fix other places that use RESTBagOStuff, or make some kind of adapter to allow RESTBagOStuff to work on both MWHttpClient and MultiHttpClient.

@Krinkle @daniel @DAlangi_WMF - Do you have any thoughts about it?

After a pairing session with @DAlangi_WMF we decided that the best way to tackle this :

  1. ObjectCache would pass Telemetry in the $params array when initializing the RESTBagOStuff, then RESTBagOStuff would add telemetry headers in all doSet, doGet, and doDelete methods. This change is small and well-isolated. It would also solve this problem.
  1. Make MultiHttpClient handle default headers. At this moment there is no possibility to pass the default set of headers. When we want to pass additional headers, we need to pass them when calling run() or runMulti().
  1. Make the HTTPRequestFactory pass default Telemetry headers when initializing MultiHttpClient()
  1. Refactor RESTBagOStuff to use the HTTPRequestFactory to initialize the HTTP client. This would require refactoring tests, as those pass client to inject the MultiHttpClient. After a quick search - looks like nothing else passes the client.
  1. Clean up the work done in 1. as it's not required any more (thanks to steps 2-4)
  1. Think and create tickets to make our HTTPRequestFactory and Request classes PSR compatible.

@CDanis do you have any timeline requirements for this bug? If it's not required to be solved immediately we could skip 1 and focus on the proper solution without doing a quick temporary fix to make it work in the meantime of bigger changes (fixing the HTTPFactory and MultiHttpClient).

In the meantime @DAlangi_WMF will research the possibility of using the Guzzle client in RESTBagOStuff as this class is not performing multiple requests - it always performs a single request.

Change 953689 had a related patch set uploaded (by Pmiazga; author: Pmiazga):

[mediawiki/core@master] DNM: Inject Telemetry into RESTBagOStuff

https://gerrit.wikimedia.org/r/953689

After a pairing session with @DAlangi_WMF we decided that the best way to tackle this :

  1. ObjectCache would pass Telemetry in the $params array when initializing the RESTBagOStuff, then RESTBagOStuff would add telemetry headers in all doSet, doGet, and doDelete methods. This change is small and well-isolated. It would also solve this problem.

This introduces a layering issue, as RESTBagOStuff is in the library layer (includes/libs) while Telemetry is currently in - whatever we call the layer it is in :) Would that be the behavior layer? Whatever we call it, RESTBagOStuff shouldn't know about Telemetry. Unless, that is, Telemetry is moved to the library layer. Is that a possibility? I didn't notice any problematic dependencies in Telemetry.

  1. Make MultiHttpClient handle default headers. At this moment there is no possibility to pass the default set of headers. When we want to pass additional headers, we need to pass them when calling run() or runMulti().

MultiHttpClient is also in the library layer, so we have to be careful here as well. But the suggestion is just to make it aware of the default set of headers, not to make it actually reference Telemetry, right?

It is worth mentioning that MultiHttpClient already breaks the rules. And worth mentioning that I authored the offending commit. FWIW, there is a comment for correcting that. We did make one refactoring attempt, which was reverted, but I have not really been back in that part of the code since. Also, if memory serves, this use of MultiHttpClient is executed very early in the MW startup process, which is (at least part of) why the change had to be reverted - it caused errors in that context.

None of this is a reason to not do the thing you suggested, I'm just adding some context and gotchas to watch out for, having stumbled over them myself in the past when working with this class.

Adding default headers means a few things will need to be refactored, and we'll need to be sensible about what happens if a header is included in both the constructor and the run/runMulti call. But that can all be done.

  1. Make the HTTPRequestFactory pass default Telemetry headers when initializing MultiHttpClient()

That part seems reasonable...

  1. Refactor RESTBagOStuff to use the HTTPRequestFactory to initialize the HTTP client. This would require refactoring tests, as those pass client to inject the MultiHttpClient. After a quick search - looks like nothing else passes the client.

...but this part has the layering issue.

  1. Clean up the work done in 1. as it's not required any more (thanks to steps 2-4)
  1. Think and create tickets to make our HTTPRequestFactory and Request classes PSR compatible.

@CDanis do you have any timeline requirements for this bug? If it's not required to be solved immediately we could skip 1 and focus on the proper solution without doing a quick temporary fix to make it work in the meantime of bigger changes (fixing the HTTPFactory and MultiHttpClient).

Unless CDanis says there's urgency, I vote for skipping the temporary fix, as I'm not convinced the temporary fix is as simple as we'd hoped.

In the meantime @DAlangi_WMF will research the possibility of using the Guzzle client in RESTBagOStuff as this class is not performing multiple requests - it always performs a single request.

I wonder if that's for the layering reason. We have class GuzzleHttpRequest, but that's in the includes/http directory, which RESTBagOStuff shouldn't reference. We could, of course, use the Guzzle client directly in RESTBagOStuff, but it is a bit annoying to not be able to use our own wrapper. Something to think about, I suppose.

Change 953729 had a related patch set uploaded (by Pmiazga; author: Pmiazga):

[mediawiki/core@master] WIP: Let RESTBagOStuff use HTTPFactory

https://gerrit.wikimedia.org/r/953729

Unless CDanis says there's urgency, I vote for skipping the temporary fix, as I'm not convinced the temporary fix is as simple as we'd hoped.

It's pretty straightforward: https://gerrit.wikimedia.org/r/953689 but I agree - I don't like it as it introduces the layering issue - the RESTBagOStuff shouldn't know anything about Telemetry/Tracing - it's the HTTPClient (whatever client we use) responsibility, not the RESTBagOStuff.

I came up with a small fix to handle default headers passed to MultiHttpClient (T345295), and then I'll work on next bit - use the Factory to inject those.

About the layering issues, I think, only if our HTTPRequestFactory implemented the PSR-17 HTTP Factory it could stay in the libs folder and it wouldn't break layers. The RESTBagOStuff would expect a well-defined PSR interface, therefore we wouldn't have to worry about RESTBagOStuff directly depending upon MediaWiki. I would take a PSR HTTPFactory to make a request. This way we would also decouple it from MultiHttpClient as users wouldn't have to worry about passing specific $client.

Edit: And one more comment - I wonder if is it worth keeping the hassle around MultiHttpClient as fully generic/isolated library. The lib is inside mediawiki/includes/libs folder - not a library loaded via composer - therefore to use it someone needs to get whole MediaWiki. It's not PSR compatible which in its current state makes it less tempting to use as more and more projects depend on PSR compatible libs.

Change 954058 had a related patch set uploaded (by Pmiazga; author: Pmiazga):

[mediawiki/core@master] http: Inject telemtry headers when initializing MultiHttpClient

https://gerrit.wikimedia.org/r/954058

We spoke about those changes today on Codemob and we agreed on:

  • I'll abandon https://gerrit.wikimedia.org/r/c/mediawiki/core/+/953689 as we agreed it's not RESTBagOStuff responsibility to handle Telelemtry. Its main responsibility is provide a transparent key-value store with http calls under the hood. It shouldn't worry about HTTP internals (like timeouts, telemetry) as it's HTTP client responsibility.

Long term todos/consideration:

  • Timeouts for RESTBagOStuff are hardcoded in the class itself. Most likely it would be great for those to be passed in via config (CommonSettings.php in mediawiki-operations repo)
  • Most likely Telemetry class should land in includes/libs/http folder as it's a standalone library that at current state doesn't depend upon MediaWiki.
pmiazga changed the task status from Open to In Progress.Aug 31 2023, 5:16 PM

Change 953689 abandoned by Pmiazga:

[mediawiki/core@master] DNM: Inject Telemetry into RESTBagOStuff

Reason:

Telemtry,Headers,HTTP internals shouldn't be RESTBagOStuff responsibility. It's main responsibility is to provide key-value store. The HTTP internals should be HTTP CLient responsibility.

https://gerrit.wikimedia.org/r/953689

Change 954058 abandoned by Pmiazga:

[mediawiki/core@master] http: Inject telemetry headers when initializing MultiHttpClient

Reason:

We decided to move Telemetry singleton to libs/http and pass the singleton instead of specific headers.

https://gerrit.wikimedia.org/r/954058

Change 953729 abandoned by Pmiazga:

[mediawiki/core@master] objectache: Let RESTBagOStuff use HTTPFactory

Reason:

We decided that ObjectCache will inject `Telemetry` as to RESTBagOStuff and let RESTBagOStuff keep initialising MultiHttpClient.

https://gerrit.wikimedia.org/r/953729

Today on codemob we decided to move Telemetry class to includes/libs folder and make it generic. But when doing that, I found that the Telemetry cannot be global singleton, as it uses the MediaWiki config: https://github.com/wikimedia/mediawiki/blob/5405fdccfcb7fec8c0fd9b9c9c2a73ee18d98127/includes/http/Telemetry.php#L60

Depends on global $wgAllowExternalReqID; we use different headers to provide the X-Request-Id, therefore this lib cannot go into includes/libs. We need this as a singleton, as some parts of code execution can override the requestId so we need a single source of truth that can store this.

Another option would be that all callers would have to always pass the config when interacting with Telemetry class but this makes API config more difficult, as it would always require the boolean. What we could have

Create an interface Wikimedia/Http/Telemetry in includes/libs/http/Telemetry -> an interface that would have getRequestId() and maybe getRequestHeaders() that would be used by HTTP clients.
includes/http/Telemetry -> current class, leave it as it is, bound to MediaWIki - but implement the WIkimedia/Http/Telemetry.

This way all HTTP Clients could take Wikimedia/Http/Telemetry interface which would be generic interface, and then ObjectCache/HttpRequestFactoru would inject the concrete class MediaWiki/Http/Telemetry that would implement the generic interface.

@Krinkle @DAlangi_WMF @BPirkle - what do you think?

Change 955823 had a related patch set uploaded (by Pmiazga; author: Pmiazga):

[mediawiki/core@master] http: MultiHttpClient supports RequestTelemetry

https://gerrit.wikimedia.org/r/955823

We went through a set of reviews and I'm expecting this work to be merged and this ticket to be closed this week.

pmiazga triaged this task as Medium priority.Sep 12 2023, 5:19 PM

Change 955823 merged by jenkins-bot:

[mediawiki/core@master] http: MultiHttpClient supports TelemetryHeadersInterface

https://gerrit.wikimedia.org/r/955823