Page MenuHomePhabricator

Enable verbose logging without installing the WikimediaDebug extension
Closed, ResolvedPublic

Description

WikimediaDebug allows the user to set verbose logging mode, in which case all logs are recorded regardless of the levels configured in $wmgMonologChannels. This is very useful but has some limitations:

  • Installing a browser extension is fine for developers, but might be too much effort for users who report a hard-to-reproduce bug and are asked to help diagnose it.
  • The browser extension is only supported for Firefox and Chrome.
  • Mobile Chrome does not allow installing extensions.
  • The extension routes all traffic to a debug host. This is useful for debugging, but not always good for diagnosing problems (which might be related to cross-DC requests etc).

We should have a lightweight way to trigger verbose logging, probably via a cookie.

Event Timeline

Given potential for abuse, and for skewing server latency and perf metrics (due to log overhead), I think it might be a feature to limit this to mwdebug hosts. However, that isn't mutually exclusive with the goal to have this work with (only) a cookie.

The routing logic for mwdebug is a Lua script at the CDN layer (ATS), and we could read cookies from there as well (we do this already for multi-dc, so you can review that as example of how to do something similar). E.g. a certain cookie could serve as alternative way to enable a (subset) of WikimediaDebug features.

One thing I would recommend is to make this cookie really hard to set in a way that would apply for longer periods of time, for the same reasons we automatically stop WikimediaDebug after 15min (namely, to protect user privacy, and to avoid creating noise for devs doing other work on the same server). For example, it should not be possible to lazlyy set this cookie by hand via $.cookie and then have it stay for 24 hours.

We could document a copy-pastable 3-line JS snippet that encodes some kind of timestamp in the cookie name, so that the routing logic can automatically ignore ones that within ± 15min from the current time (or 16min, if you go for comparing without the 3 digits from epoch).

The main thing I want to avoid is some kind of snippet that if executed results in a cookie that remains applicable for a long duration of time. For the mobile use case, I imagine people will need a gadget or user script of some sort, so we could provide one for them that adds a button, and when clicked runs those 3 lines.

I was thinking of creating a special page where users can set/unset the cookie (maybe it can be shoehorned into WikimediaEvents). I agree proliferation is a potential concern; making the cookie name dynamic is one way to handle it, or we could require the expiry to be part of the cookie value and MediaWiki could unset the cookie if it's expired or is too long-lived.

Cookies can only be set for the registrable domain, so for testing cross-wiki features users would have to set the cookie separately on each wiki. That's annoying, but there is not much we can do about it.

[…] I agree proliferation is a potential concern; making the cookie name dynamic is one way to handle it, or we could require the expiry to be part of the cookie value and MediaWiki could unset the cookie if it's expired or is too long-lived.

That sounds good to me also.

Change 1000306 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/WikimediaEvents@master] Add X-Wikimedia-Debug special page

https://gerrit.wikimedia.org/r/1000306

Change 1000307 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[operations/mediawiki-config@master] Support cookies in XWikimediaDebug

https://gerrit.wikimedia.org/r/1000307

Change 1000307 merged by jenkins-bot:

[operations/mediawiki-config@master] Support cookies in XWikimediaDebug

https://gerrit.wikimedia.org/r/1000307

Mentioned in SAL (#wikimedia-operations) [2024-03-14T21:34:21Z] <krinkle@deploy2002> Synchronized src/XWikimediaDebug.php: Support cookies in XWikimediaDebug, I5e33e90fd, T350094 (duration: 12m 08s)

Change 1000306 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/WikimediaEvents@master] Add X-Wikimedia-Debug special page

https://gerrit.wikimedia.org/r/1000306

Current draft:

LandingActive
Screenshot 2024-03-14 at 16.59.45.png (1×2 px, 340 KB)
Screenshot 2024-03-14 at 16.59.47.png (1×2 px, 319 KB)

Proposed improvement:

LandingActive
Screenshot 2024-03-14 at 16.59.53.png (1×2 px, 346 KB)
Screenshot 2024-03-14 at 16.59.56.png (1×2 px, 324 KB)

Change 1000306 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] Add X-Wikimedia-Debug special page

https://gerrit.wikimedia.org/r/1000306

Krinkle triaged this task as Medium priority.Apr 15 2024, 3:27 PM

Change #1024932 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[operations/mediawiki-config@master] Profiler: Handle X-Wikimedia-Debug cookie

https://gerrit.wikimedia.org/r/1024932

Change #1024932 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[operations/mediawiki-config@master] Profiler: Handle X-Wikimedia-Debug cookie

https://gerrit.wikimedia.org/r/1024932

@Tgr This patch is making me wonder... Would the cookie actually exclusively reach appservers without the header? If Varnish, ATS, and the debug director need to all recognise this, that seems more risky and fragile than is needed. It might be simpler to attach one to the other much earlier in the stack. My reasoning is not about reducing time and effort (since time saving measures can often introduce debt/bugs), but rather my reasoning is as a way to reduce risk and potential for bugs.

We don't want some part of the stack to, if it forgets to check both, mistake it for a cacheble object or reconsile the two in an incorrect way. Having Varnish copy the cookie to the header as soon as it sees it, keeps the contract and implementation for Varnish and ATS simpler (route this header, don't cache this header).

The special page would be unchanged. The wmf-config code would be reduced to checking the header only. The expiry option would work unchanged and work (needlessly) in both header and cookie.

Thoughts?

Yeah I have been pondering that. The disadvantage is that (AFAIK) Varnish is pretty bad at cookie handling, so parsing the Cookie header and decoding the cookie value would have to be done manually, and if it doesn't fully match how PHP does it at the other end, weird things could happen. But it would avoid a lot of complexity - the X-Wikimedia-Debug header is handled in Varnish, ATS, haproxy and envoy, some of those are also used on beta, altogether it's over a dozen different places in the code that would have to be updated.

The disadvantage is that (AFAIK) Varnish is pretty bad at cookie handling, so parsing the Cookie header and decoding the cookie value would have to be done manually, and if it doesn't fully match how PHP does it at the other end, weird things could happen. […]

What do you mean by "fully matching how PHP does it?". Do you foresee a need to parse the value itself? I think we can avoid doing that at the Varnish level.

W onlyh need to extract the value req.http.Cookie between X-Wikimedia-Debug= and the next ;. This is something we already do in VCL for several other cookies (https://codesearch.wmcloud.org/search/?q=%5BCc%5Dookie&files=vcl&excludeFiles=test&repos=operations%2Fpuppet). For XWD, could we get away with only decoding the percent-encoded ; character within the value? I don't think we need to support anything else in the intermediary layers. In PHP itself, if/when there is need for other values, then that can be implemented only once there in PHP.
We would not need that at multiple levels, assuming that the intermediary levels are only interested in the header's existence and in the host option.

Change #1030590 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[operations/mediawiki-config@master] debug: Enable Special:WikimediaDebug

https://gerrit.wikimedia.org/r/1030590

The current code uses WebResponse::setCookie() which will result in cookie values like backend%3Dmwdebug1001.eqiad.wmnet%3Blog%3Bexpire%3D1715464158. Varnish itself does not need to know the values of the various flags within the cookie, but it does need to decode them because ATS is expecting unencoded plain text in the X-Wikimedia-Debug header (the logic is here). I think we are fine decoding only ; and =, as long as we are careful not to make changes on the PHP side.

The other question is where in Varnish we need changes. There are two places where the header is read:

The call chain for the first is vcl_recv -> cluster_fe_recv and the second from vcl_deliver or vcl_synth -> deliver_synth_ -> analytics_deliver_post. AIUI vcl_recv is called towards the beginning of request handling and vcl_deliver / vcl_synth towards the end, so if we set the header in vcl_recv, we don't need any cookie handling in the other place. I am not sure about it though.

Change #1030591 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[operations/puppet@production] varnish: Copy value of X-Wikimedia-Debug cookie to header

https://gerrit.wikimedia.org/r/1030591

Change #1030592 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/WikimediaEvents@master] WikimediaDebug: Add a note about having matching code in puppet

https://gerrit.wikimedia.org/r/1030592

Change #1024932 abandoned by Gergő Tisza:

[operations/mediawiki-config@master] Profiler: Handle X-Wikimedia-Debug cookie

Reason:

https://gerrit.wikimedia.org/r/1024932

Change #1030591 merged by JHathaway:

[operations/puppet@production] varnish: Copy value of X-Wikimedia-Debug cookie to header

https://gerrit.wikimedia.org/r/1030591

Change #1030592 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] WikimediaDebug: Add a note about having matching code in puppet

https://gerrit.wikimedia.org/r/1030592

Change #1030590 merged by jenkins-bot:

[operations/mediawiki-config@master] debug: Enable Special:WikimediaDebug

https://gerrit.wikimedia.org/r/1030590

Mentioned in SAL (#wikimedia-operations) [2024-07-23T20:12:24Z] <tgr@deploy1002> Started scap sync-world: Backport for [[gerrit:1030590|debug: Enable Special:WikimediaDebug (T350094)]]

Mentioned in SAL (#wikimedia-operations) [2024-07-23T20:14:56Z] <tgr@deploy1002> tgr: Backport for [[gerrit:1030590|debug: Enable Special:WikimediaDebug (T350094)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-07-23T20:21:53Z] <tgr@deploy1002> Finished scap: Backport for [[gerrit:1030590|debug: Enable Special:WikimediaDebug (T350094)]] (duration: 09m 28s)

Verified that the request routing and the setting of the X-Analytics header both work as expected.