T399200: Update existing cookie-based sessions to include JWT cookie will add JWT session cookies behind a feature flag. The flag is for third parties' benefit, it'd be fine to deploy the changes with the train; but since we have the flag, we could make use of it to measure the performance impact of the change, since we'll add a few hundred bytes to every request with a session.
Description
Details
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | JTweed-WMF | T398815 WE5.1.2 Verifiable MediaWiki sessions | |||
| Resolved | • Tgr | T399631 Deploy JWT cookies to production | |||
| Resolved | • Tgr | T404889 Add anon/authenticated label to NavigationTiming Prometheus data | |||
| Resolved | • Tgr | T406621 Session cookie JWTs of SUL and non-SUL wikis conflict | |||
| Resolved | • Tgr | T409018 JWT cookie causing anonymous session writes |
Event Timeline
Change #1186592 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[operations/mediawiki-config@master] Add $wgJwtPrivateKey / $wgJwtPublicKey in the fake privatre repo
Change #1186593 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[operations/mediawiki-config@master] Enable JWT session cookies on testwiki and beta
Change #1186592 merged by jenkins-bot:
[operations/mediawiki-config@master] Add $wgJwtPrivateKey / $wgJwtPublicKey in the fake privatre repo
Mentioned in SAL (#wikimedia-operations) [2025-09-09T20:48:27Z] <tgr@deploy1003> Started scap sync-world: Backport for [[gerrit:1186592|Add $wgJwtPrivateKey / $wgJwtPublicKey in the fake privatre repo (T399631)]]
Mentioned in SAL (#wikimedia-operations) [2025-09-09T20:54:34Z] <tgr@deploy1003> tgr: Backport for [[gerrit:1186592|Add $wgJwtPrivateKey / $wgJwtPublicKey in the fake privatre repo (T399631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
Mentioned in SAL (#wikimedia-operations) [2025-09-09T21:04:43Z] <tgr@deploy1003> Finished scap sync-world: Backport for [[gerrit:1186592|Add $wgJwtPrivateKey / $wgJwtPublicKey in the fake privatre repo (T399631)]] (duration: 16m 16s)
Change #1186593 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable JWT session cookies on testwiki and beta
Mentioned in SAL (#wikimedia-operations) [2025-09-16T13:52:11Z] <tgr@deploy1003> Started scap sync-world: Backport for [[gerrit:1186593|Enable JWT session cookies on testwiki and beta (T399631)]]
Mentioned in SAL (#wikimedia-operations) [2025-09-16T13:57:40Z] <tgr@deploy1003> tgr: Backport for [[gerrit:1186593|Enable JWT session cookies on testwiki and beta (T399631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
Mentioned in SAL (#wikimedia-operations) [2025-09-16T14:09:15Z] <tgr@deploy1003> Finished scap sync-world: Backport for [[gerrit:1186593|Enable JWT session cookies on testwiki and beta (T399631)]] (duration: 17m 04s)
JWT cookies are around 900 bytes, which is a nontrivial amount of overhead. They are added to most requests (not upload.wikimedia.org images, but the top-level navigation, API queries, most static assets, ResourceLoader assets). AIUI HTTP/2 will deduplicate that (although the spec gives a lot of freedom to browser implementors), but it's still added to the initial (top-level natigation) request. Maybe we can switch to a more compact encryption for the JWT signature (currently we use 4096-bit RSA with SHA-256; EdDSA is considered more secure and cuts down the JWT size to around 300 bytes, and is supported by lcobucci/jwt even if it is mildly annoying to use), but for now we should probably monitor the effect on users. For that, we'll need timing metrics filtered to logged-in users (since the overwhelming majority of requests don't have a JWT cookie). Filed T404889: Add anon/authenticated label to NavigationTiming Prometheus data about that.
Ops were interested in the time-to-first-byte impact of the change on more remote CDNs / users, so once the new label is in place, we could add a context filter to some of the NavigationTiming dashboards (or clone them if that would make them too cluttered) which show TTFB metrics and have some sort of location-based breakdown. Candidates: responseStart by CDN host, Navigation Timing by country, Navigation Timing by continent, Navigation Timing breakdown, RUM.
@Krinkle further recommended looking at the userJourneyLogin synthetic performance tests which include some logged-in pageviews (although only on enwiki, so they are only useful after full deployment, but they are not sensitive to organic changes in traffic).
We should also keep an eye on the session writes dashboard (specifically, CentralAuth and core cookie-based sessions) and the authentication metrics just in case, although I don't expect changes there.
There have six JWT validation errors in the 10 days since JWT cookies were enabled on beta:
- 2x JWT error: wrong subject with anon expected but a logged-in user in the JWT. Both happened on the login page. I think this is normal - the user got a valid JWT but then the session got invalidated in some way that didn't involve a session persist/unpersist call and so the cookie did not get updated (e.g. the user token was changed due to a logout on another device). The cookies will get cleared now, no need to worry. (But we'll probably need to reduce logging severity for this situation.)
- 1x JWT error: wrong subject with a logged-in user expected (the test user IOS-Eng-Test, in case that matters) but anon in the JWT. We are not supposed to create JWTs for anons, ever. I could imagine a user creation / autocreation edge case where we end up on a code path for logged-in users but the User object ends up being anonymous, but this test user has existed for a long time. So this is somewhat concerning.
- 2x JWT error: wrong user ID. In one case, a user and their bot; in another case, a temp user and a named user. So this probably happens when logging from one account into the other and is an artifact of JWT cookies being on the second-level domain but not enabled on some of the subdomains, so cookie consistency is not ensured. If that's the case, will go away once JWT cookies are enabled everywhere.
- 1x JWT validation failed: The token violates some mandatory constraints, details: - Token signature mismatch. Happened on the debug host, so this was probably me messing around. Maybe we should log the JWT content when this happens.
There are tens of thousands of Soft-expired JWT cookie messages, coming in blocks, from a single IP with bot UA. Probably the bot is not respecting cookie expiry? We might need to lower log level for this + again would be good to log JWT content (at least the sibject) even when the JWT is invalid.
Dashboard copies with logged-in filter (mw_context=authenticated_mainspace_view): Navigation Timing by country, Navigation Timing by continent, Navigation Timing breakdown, Real user monitoring.
Especially the RUM TTFB panel should be the thing to watch.
Change #1192613 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[operations/mediawiki-config@master] Enable JWT session cookies on group0
JWT-related "session" channel logs: https://logstash.wikimedia.org/goto/a989950fe9b2a7dbad66b54b63cbfbc8
Change #1192613 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable JWT session cookies on group0
Mentioned in SAL (#wikimedia-operations) [2025-09-30T20:32:19Z] <tgr@deploy2002> Started scap sync-world: Backport for [[gerrit:1192613|Enable JWT session cookies on group0 (T399631)]]
Mentioned in SAL (#wikimedia-operations) [2025-09-30T20:39:28Z] <tgr@deploy2002> tgr: Backport for [[gerrit:1192613|Enable JWT session cookies on group0 (T399631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
Mentioned in SAL (#wikimedia-operations) [2025-09-30T20:47:46Z] <tgr@deploy2002> Finished scap sync-world: Backport for [[gerrit:1192613|Enable JWT session cookies on group0 (T399631)]] (duration: 15m 27s)
Change #1192857 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[operations/mediawiki-config@master] Enable JWT session cookies on group1
Change #1192949 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[mediawiki/core@master] session: Move cookie JWT soft-expiry logging to sampled channel
After a day on group 0:
- 3x JWT error: wrong subject with anon expected but a logged-in user in the JWT
- 181x JWT error: wrong subject with a logged-in user expected but anon in the JWT
- 16x JWT error: wrong user ID. Still seems account switching related.
The rest is unchanged.
Change #1192949 merged by jenkins-bot:
[mediawiki/core@master] session: Move cookie JWT soft-expiry logging to sampled channel
Change #1192963 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[mediawiki/core@master] session: Log subject for soft-expired JWT cookie
Change #1192963 merged by jenkins-bot:
[mediawiki/core@master] session: Log subject for soft-expired JWT cookie
Change #1192857 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable JWT session cookies on group1
Mentioned in SAL (#wikimedia-operations) [2025-10-02T13:50:58Z] <tgr@deploy2002> Started scap sync-world: Backport for [[gerrit:1192857|Enable JWT session cookies on group1 (T399631)]]
Mentioned in SAL (#wikimedia-operations) [2025-10-02T13:58:36Z] <tgr@deploy2002> tgr: Backport for [[gerrit:1192857|Enable JWT session cookies on group1 (T399631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
Mentioned in SAL (#wikimedia-operations) [2025-10-02T14:08:38Z] <tgr@deploy2002> Finished scap sync-world: Backport for [[gerrit:1192857|Enable JWT session cookies on group1 (T399631)]] (duration: 17m 41s)
Most of the soft-expired JWT cookies come from some bot called CeL/4.5.11 (about 20K per day, now that we are on group 1). The rest is some bot pretending to be Firefox (~1K per day), go-mwclient (~200/week) and the Android app (hundreds, maybe thousands? it has a ton of different UAs so hard to aggregate).
A soft-expired JWT will be silently ignored, so this is fine for now.
JWT-related log messages after half a week on group 1 (excluding Soft-expired JWT cookie):
- 30K - wrong subject with user expected but anon found
These logs only include the user in its raw form (CentralAuth::<id>) which is not super helpful. We should probably look up the username even though the JWT is invalid.
Also we should log the provider as a separate field so it's filterable, makes issues much easier to understand.
Also we should probably have separate normalized messages for the "anon expected, user found" and "user expected, anon found" cases as they are quite different in terms of what might have happened. And the message should make it more obvious whether the "expected" or the "actual" part is the one from the JWT.
I said earlier that we don't create JWTs for anons but actually we do. I think that was an oversight and should be fixed. Not necessarily a deploy blocker though.
All that said, I think this warning is a normal consequence of JWT and other cookies getting out of sync (especially while JWTs are only refreshed on some domains) and shouldn't be a blocker.
- 500 - wrong subject with anon expected but user found
Like above, needs human-readable usernames.
Like above, I don't think this is a blocker.
- 500 - wrong user ID
Like above, needs human-readable usernames. Per T399631#11218629, not a problem otherwise.
- 16 - JWT error: wrong provider
Happens on non-SUL wikipedia.org wikis. Not super sure what to do about this one. It will cause issues with users trying to use a SUL wiki and a same-domain non-SUL wiki at the same time.
Maybe we should log the security context for these errors?
I don't think this is fixable while maintaining SREs' preference of having a consistent cookie name. We'll need to prefix the cookie for non-SUL wikis.
This is a blocker for group 2 deployment.
- 10 - The token violates some mandatory constraints, details: - Token signature mismatch
- 1 - Error while decoding from Base64Url, invalid base64 characters detected
Someone manually messing around with the cookie (probably me).
This is a significant enough change that there should be a decision record that's easy to come back to later, rather than a bunch of comments in a deploy task, so I'm filing this as a separate subtask: T406621: Session cookie JWTs of SUL and non-SUL wikis conflict
Change #1194583 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[mediawiki/core@master] session: Do not set JWT cookies for anonymous users
Change #1194605 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[operations/mediawiki-config@master] Temporarily undeploy JWT session cookies
Change #1194622 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):
[operations/mediawiki-config@master] Deploy JWT session cookies to group2
Change #1194605 merged by jenkins-bot:
[operations/mediawiki-config@master] Temporarily undeploy JWT session cookies
Mentioned in SAL (#wikimedia-operations) [2025-10-08T13:56:11Z] <lucaswerkmeister-wmde@deploy2002> Started scap sync-world: Backport for [[gerrit:1194605|Temporarily undeploy JWT session cookies (T399631)]], [[gerrit:1194603|jwt: Use core cookie settings (T406621)]], [[gerrit:1194604|jwt: Use core cookie settings (T406621)]], [[gerrit:1194607|Force OATHManage to be on central domain (T401773)]], [[gerrit:1194150|Force OATHManage to be on central domain (T401773)]]
Mentioned in SAL (#wikimedia-operations) [2025-10-08T14:01:12Z] <lucaswerkmeister-wmde@deploy2002> d3r1ck01, lucaswerkmeister-wmde, reedy, tgr: Backport for [[gerrit:1194605|Temporarily undeploy JWT session cookies (T399631)]], [[gerrit:1194603|jwt: Use core cookie settings (T406621)]], [[gerrit:1194604|jwt: Use core cookie settings (T406621)]], [[gerrit:1194607|Force OATHManage to be on central domain (T401773)]], [[gerrit:1194150|Force OATHManage to be on central domain (T401773)
Mentioned in SAL (#wikimedia-operations) [2025-10-08T14:10:11Z] <lucaswerkmeister-wmde@deploy2002> Finished scap sync-world: Backport for [[gerrit:1194605|Temporarily undeploy JWT session cookies (T399631)]], [[gerrit:1194603|jwt: Use core cookie settings (T406621)]], [[gerrit:1194604|jwt: Use core cookie settings (T406621)]], [[gerrit:1194607|Force OATHManage to be on central domain (T401773)]], [[gerrit:1194150|Force OATHManage to be on central domain (T401773)]] (duration: 14m 0
Change #1194622 merged by jenkins-bot:
[operations/mediawiki-config@master] Deploy JWT session cookies to group2
Mentioned in SAL (#wikimedia-operations) [2025-10-08T20:21:17Z] <tgr@deploy2002> Started scap sync-world: Backport for [[gerrit:1194622|Deploy JWT session cookies to group2 (T399631)]]
Mentioned in SAL (#wikimedia-operations) [2025-10-08T20:26:43Z] <tgr@deploy2002> tgr: Backport for [[gerrit:1194622|Deploy JWT session cookies to group2 (T399631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
Mentioned in SAL (#wikimedia-operations) [2025-10-08T20:35:11Z] <tgr@deploy2002> Finished scap sync-world: Backport for [[gerrit:1194622|Deploy JWT session cookies to group2 (T399631)]] (duration: 13m 53s)
There was a slight drop in the higher percentiles of time-to-first-byte around the time of enabling JWT session cookies:
It doesn't make any sense for extra cookies to result in a smaller TTFB so that was probably just random noise. But I think it's fair to say there was no significant negative performance impact.
Change #1194583 merged by jenkins-bot:
[mediawiki/core@master] session: Do not set JWT cookies for anonymous users
Filed T407194: Consider using EdDSA rather than RSA for MediaWiki session tokens about the RSA/edDSA issue, but per above, I don't think there's any pressure to look into it.
JWT related logs in the last 5 days:
- Soft-expired JWT cookie is gone (whatever bot generated it is not active anymore, or maybe it refreshed cookies eventually)
- wrong subject: 32,779
- The JWT string must have two dots: 8,122
- Error while decoding from Base64Url, invalid base64 characters detected: 7,627
- JWT error: wrong user ID: 1,892
- The token violates some mandatory constraints, details: - Token signature mismatch: 208
Two follow-ups left here:
- the minor logging improvements mentioned in the comments above
- updating the filtered sessions dashboard to exclude the log lines that are assumed to be normal
Were these followups done? Maybe they should be a separate task so we can close this?



