Page MenuHomePhabricator

Only serve debug HTTP headers when x-wikimedia-debug is present
Open, LowestPublic

Description

We currently serve a bunch of debug headers in all our responses, that are added or let through by Varnish and aren't necessary for page functionality.

While compressed, all these headers have a cost that could be avoided. I propose that we instead have the edge use a whitelist of acceptable headers. And if the x-wikimedia-debug header is present in the request, all response headers are let through.

I think it would be only a very small inconvenience for any developer trying to debug something to have to pass the x-wikimedia-debug header, either via the browser extension or the command line.

The edge cache can still cache custom headers, they would just be filtered out by the whitelist before being transmitted to the client.

Specifically, looking at a response from the text cache, I think the following could be removed:

backend-timing (could be passed via server-timing instead, which the client/NavTiming can collect)
server
via
x-analytics (possibly, it can't be read by the client, I assume it's only there for debugging purposes @analytics may be able to shed some light on that one)
x-cache
x-cache-status (already available in the client-readable server-timing)
x-client-ip
x-powered-by
x-varnish

And for the upload cache, those additional ones:
x-timestamp
x-trans-id

If any of these are considered to be critically needed for all requests, I would suggest moving them to the server-timing header(s), which unlike other custom headers, a growing number of clients can read: https://www.chromestatus.com/feature/5695708376072192

Details

Related Gerrit Patches:
operations/puppet : productioncache: stop sending X-Varnish
operations/puppet : productionATS: unset debug HTTP headers for normal requests
operations/puppet : productionHide debugging headers

Event Timeline

Gilles created this task.Nov 27 2018, 9:53 AM
Restricted Application added a project: Operations. · View Herald TranscriptNov 27 2018, 9:53 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Peter added a subscriber: Peter.Nov 27 2018, 10:27 AM

See also T194814, which this task could resolve.

x-analytics

As I understand this, this field mainly exists to transmit data between Varnish and varnishkafka consumers (via varnishlog's shared memory buffer). It's only a header because that seemed to be the simplest way to implement that need. See also T196558, where I proposed moving this value out of the headers in favour of an internal Varnish object field only visible to varnishlog/varnishkafka (something that T196558#4261915 shows we can do, but we haven't used previously at WMF).

If we instead strip the headers from Varnish frontend VCL, that might actually preferable as it keeps things easier to reason about, and has the benefit of still being able to access it in debug mode, which can be useful.

TheDJ added a subscriber: TheDJ.Nov 27 2018, 8:56 PM

what about ?debug=true ? We already vary on that right ? might as well vary which set of headers is let true...

fdans added a subscriber: fdans.Dec 3 2018, 5:29 PM

Analytics needs x-analytics in every request, not only in debugging ones but we don't need to include it in the response headers. We don't know if downstream that header is needed.

fdans moved this task from Incoming to Radar on the Analytics board.Dec 3 2018, 5:30 PM
jijiki triaged this task as Medium priority.Dec 4 2018, 10:05 PM
ema moved this task from Triage to Caching on the Traffic board.Dec 5 2018, 10:17 AM
Anomie added a subscriber: Anomie.Dec 10 2018, 5:16 PM

server

I note that with X-Wikimedia-Debug it seems you have to specify a backend, so this wouldn't be terribly useful there either.

The header can sometimes be useful in seeing whether an intermittent problem depends on the server hit.

a whitelist of acceptable headers

Note you'll have to audit the code for every header it can output to make your whitelist, and ideally also make it clear to developers how to add a header to this whitelist when something needs to output a new header.

The API, for example, uses several headers for various purposes that clients are probably depending on.

server

I note that with X-Wikimedia-Debug it seems you have to specify a backend, [..]

Yeah, it's currently only for connecting with mwdebug hosts. The VCL behaviour and mwdebug connection/configuration behaviour are currently controlled at once. Perhaps we should separate these.

For example, we could remove the legacy handling for X-Wikimedia-Debug: 1 (currently mapped to mwdebug1001), and instead utilise that as the way to trigger the VCL behaviour only, without changing Varnish's backend host handling (e.g. keep regular cache try and app-server routing).

Gilles claimed this task.Dec 10 2018, 9:07 PM
Imarlier moved this task from Inbox to Doing on the Performance-Team board.Dec 10 2018, 9:07 PM

@Anomie very good point, I think it will be very hard for someone to find out about such a whitelist. Things will work for them on Vagrant and possibly on Beta, leading to unexpected breakage upon production deployment when their new header is just missing.

I think it'll have to be a blacklist/filtering approach which will require keeping an eye on it. I think we can probably come up with some tests hitting production (eg. going to Grafana with strict alerts in place) to verify that the amount of expected headers on something like an article doesn't grow again after the cleanup.

I think the Server header is still useful in X-Wikimedia-Debug context because it's confirmation that your request did go to the debug server you asked for. Otherwise you have to trust that the right thing is happening, and just like anything the X-Wikimedia-Debug pipeline could break. With Varnish giving you extra headers using separate logic than the routing, you could be misled by the presence of extra headers that things are working as expected while maybe the routing broke and you actually hit a production web server.

I agree with Timo that it would be nice to be able to use the header to *only* get extra headers. Instead of changing legacy behaviour that might break things for people with older versions of the browser extension or scripts, maybe we can introduce X-Wikimedia-Debug: * or X-Wikimedia-Debug: production which would be backwards-compatible ?

what about ?debug=true ? We already vary on that right ? might as well vary which set of headers is let true...

I think it's more interesting to be able to toggle headers without varying, so that caching works as it normally would. Extra headers can still be cached by Varnish, and only served to the client when the header is present. And you can still combine both debug=true and the header if you like.

Analytics needs x-analytics in every request, not only in debugging ones but we don't need to include it in the response headers. We don't know if downstream that header is needed.

You need it to appear in the Varnish logs, but it doesn't need to be served to the actual client (user browser), right? Can you point me to the ingestion code, so I can make sure that they still appear the way you need them to? Thanks.

Analytics needs x-analytics in every request, not only in debugging ones but we don't need to include it in the response headers. We don't know if downstream that header is needed.

You need it to appear in the Varnish logs, but it doesn't need to be served to the actual client (user browser), right? Can you point me to the ingestion code, so I can make sure that they still appear the way you need them to? Thanks.

As far as we know, no client-side code is using X-Analytics to, for example, read it and set it. If anyone knows otherwise, please let us know.

So yes, we just need it to appear in the Varnish logs. We read it via varnishkafka, which is configured to read X-Analytics along with all the other data that ends up in HDFS here: https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/profile/manifests/cache/kafka/webrequest.pp$141

The path is roughly varnish logs -> varnishkafka -> kafka -> camus buckets it -> hdfs -> Hive table on top of the files in hdfs is updated to know about the new partition

After brainstorming this more, since Nginx TLS termination is going to remain for the foreseeable future, even after we move backend caches to ATS, it makes this effort simpler to blacklist specific debugging header at the Nginx level and have the debug "gate" there. This way the list of filtered headers is defined in one central location and is guaranteed not to mess with Varnish log data collection.

Gilles lowered the priority of this task from Medium to Low.Dec 19 2018, 3:03 PM

Plain nginx config has the ability to remove the headers, but it can't do so conditionally...

Gilles added a subscriber: BBlack.Dec 20 2018, 3:24 PM

@BBlack would you miss x-cache, x-cache-status and x-varnish if those were completely removed at the TLS termination level? Some of that information is now exposed as part of the Server-Timing header (which unlike others, can be parsed by JS).

Change 480977 had a related patch set uploaded (by Gilles; owner: Gilles):
[operations/puppet@production] Hide debugging header

https://gerrit.wikimedia.org/r/480977

I don't know off-hand if we can live without them all for manual debugging and such, or if nginx is the best place to remove them. There might be other ways to suppress them in Varnish without affecting analytics, but that needs some digging and testing. We can't universally strip them in tlsproxy's nginx.conf like the patch above, though, because that config is also re-used for various applayer TLS termination behind varnish as well, and thus would hide any of those headers that are being sent from the applayer up to varnish itself.

Is there an nginx "site" or config specific to varnish termination?

Could be a puppet variable too, to make the filtering block conditional.

BBlack added a comment.EditedDec 20 2018, 3:52 PM

localssl.erb would probably be more appropriate and is the site file, but it's a generic TLS reverse proxy setup with a few parameters, and there's not yet any "this is for Traffic termination" flag (classparam?) puppetized, but you could make one.

Really, varnish-frontend is the "right" place to solve this rather than in the generic TLS terminator. If we're just looking at the analytics use-case, you could strip all of them trivially in Varnish at vcl_deliver time except X-Analytics without affecting that use-case. We'd need to audit which of these headers might be watched (and how, e.g. mtail) by other tooling too though. I think even the ones we do need to output on the side (like X-Analytics), there's probably a way to suppress them in Varnish without killing our output, it just needs investigating.

As far as the life expectancy of all such things: varnish-frontend will be the last part of the existing stack to be replaced during the ATS transition, but TLS termination could switch to ATS fairly quickly after the backends are switched, at which point we'd have an ATS[TLS]->Varnish[front-cache]->ATS[back-cache] stack. The earliest that (ATS for TLS) could happen would be ~Q4 for cache_upload, and ~Q1-2 of next FY for cache_text.

Gilles added a comment.EditedDec 20 2018, 4:03 PM

I wasn't aware that the latest plan was to use ATS for TLS termination.

There might be a way to do this in Varnish, but for X-Analytics it's going to involve some semi-complex VCL and require changing the way this particular header get ingested into the analytics pipeline. Doing this at the termination layer is less prone to mistakes and is much simpler code.

I agree that ideally this would be done at the edge cache and not in the TLS termination service... if the edge cache wasn't Varnish. The fact that headers actually sent to the client and logging we consume are conflated is really an obstacle to doing this cleanly.

Backend-Timing is another header being ingested from Varnish logs, btw. This is a pattern that's likely to increase, not decrease, imho. And if it requires a bunch of custom VCL for each of them, it gets pretty ugly.

I'm unfamiliar with the complexity needed in VCL to make this work, but if at all feasible, I think we should follow the approach of VCL so that our traffic routing remains centralised in VCL. The Nginx layer is currently quite application agnostic and being able to ignore it mentally as such I think is important and beneficial to being able to reason about how things work.

Note that topic of X-Analytics has already been discussed previously. At T196558 and T194814#4262008. As I understand it we found that 1: Stripping the header at the edge with VCL code from varnish-frontend would not affect the varnishlog entries that Analytics use, and 2: It would be even better for the communication between VCL and Varnishlog to not use temporary headers but instead use VCL_Log directly.

Noting that a significant portion of the header value is not coming from MediaWiki or backends, it is actually computed within VCL itself.

Change 480977 abandoned by Gilles:
Hide debugging headers

https://gerrit.wikimedia.org/r/480977

Gilles lowered the priority of this task from Low to Lowest.May 27 2019, 5:05 AM

Change 583570 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] ATS: remove debug HTTP headers if X-Wikimedia-Debug is absent

https://gerrit.wikimedia.org/r/583570

ema added a subscriber: ema.Fri, Mar 27, 10:44 AM

@Gilles: I see that X-Varnish is used by the MultimediaViewer extension, though I don't really understand how/why. Is the header actually needed by MultimediaViewer?

No, that's some ancient performance logging that dates back to when Media Viewer was launched and we needed to determine which thumbnail sizes were the best (in terms of cache hits, etc.). Removing the header won't break MultimediaViewer.

Change 583942 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache: stop sending X-Varnish

https://gerrit.wikimedia.org/r/583942

Change 583570 merged by Ema:
[operations/puppet@production] ATS: unset debug HTTP headers for normal requests

https://gerrit.wikimedia.org/r/583570

Change 583942 merged by Ema:
[operations/puppet@production] cache: stop sending X-Varnish

https://gerrit.wikimedia.org/r/583942