Page MenuHomePhabricator

Deploy separate anonymous session backend to Wikimedia production, in log-only mode
Closed, ResolvedPublic

Description

Set the anonymous and authenticated session backend to the same service but use two separate BagOStuff represenatitions for the backend, and check for every write/read that the handle being used matches the content of the read/write.

Depends on the patches in T394075: Investigate using different stores for different kinds of sessions, and on T399195: Update logging and monitoring for multiple session storage backends.

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
mediawiki/coremaster+32 -23
operations/mediawiki-configmaster+1 -3
mediawiki/corewmf/1.45.0-wmf.22+17 -18
mediawiki/corewmf/1.45.0-wmf.21+17 -18
mediawiki/coremaster+17 -18
mediawiki/corewmf/1.45.0-wmf.20+15 -8
mediawiki/corewmf/1.45.0-wmf.21+15 -8
mediawiki/corewmf/1.45.0-wmf.21+15 -0
mediawiki/corewmf/1.45.0-wmf.20+15 -0
mediawiki/coremaster+82 -0
mediawiki/coremaster+15 -8
mediawiki/coremaster+15 -0
operations/mediawiki-configmaster+1 -0
operations/mediawiki-configmaster+1 -1
mediawiki/corewmf/1.45.0-wmf.20+97 -38
mediawiki/corewmf/1.45.0-wmf.19+97 -38
operations/mediawiki-configmaster+8 -0
mediawiki/coremaster+97 -38
mediawiki/coremaster+683 -52
mediawiki/coremaster+361 -32
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change #1178870 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@master] session: Segregate anonymous sessions from authenticated sessions (p2)

https://gerrit.wikimedia.org/r/1178870

Change #1183132 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[operations/mediawiki-config@master] session: Enable MultiBackendSessionStore on `group0` wikis only

https://gerrit.wikimedia.org/r/1183132

Change #1178870 merged by jenkins-bot:

[mediawiki/core@master] session: Segregate anonymous sessions from authenticated sessions (p2)

https://gerrit.wikimedia.org/r/1178870

Change #1187513 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@master] session: Follow-up for I10101c8b928a12 (3fde556f95b4b1ce43)

https://gerrit.wikimedia.org/r/1187513

Change #1187779 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[operations/mediawiki-config@master] session: Enable MultiBackendSessionStore on `group1` wikis

https://gerrit.wikimedia.org/r/1187779

Change #1187781 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[operations/mediawiki-config@master] session: Enable MultiBackendSessionStore on `group2` wikis

https://gerrit.wikimedia.org/r/1187781

Change #1187513 merged by jenkins-bot:

[mediawiki/core@master] session: Follow-up on I10101c8b928a12 (3fde556f95b4b1ce43)

https://gerrit.wikimedia.org/r/1187513

Change #1188733 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@master] session: Improve logging and monitoring in SessionStore implementations

https://gerrit.wikimedia.org/r/1188733

Change #1188733 merged by jenkins-bot:

[mediawiki/core@master] session: Improve logging and monitoring in SessionStore implementations

https://gerrit.wikimedia.org/r/1188733

Change #1191351 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@wmf/1.45.0-wmf.20] session: Improve logging and monitoring in SessionStore implementations

https://gerrit.wikimedia.org/r/1191351

Change #1191360 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@wmf/1.45.0-wmf.19] session: Improve logging and monitoring in SessionStore implementations

https://gerrit.wikimedia.org/r/1191360

Change #1191378 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[operations/mediawiki-config@master] Enable multibackend session store on beta and testwiki

https://gerrit.wikimedia.org/r/1191378

Change #1191360 merged by jenkins-bot:

[mediawiki/core@wmf/1.45.0-wmf.19] session: Improve logging and monitoring in SessionStore implementations

https://gerrit.wikimedia.org/r/1191360

Change #1191351 merged by jenkins-bot:

[mediawiki/core@wmf/1.45.0-wmf.20] session: Improve logging and monitoring in SessionStore implementations

https://gerrit.wikimedia.org/r/1191351

Mentioned in SAL (#wikimedia-operations) [2025-09-25T13:27:03Z] <tgr@deploy1003> Started scap sync-world: Backport for [[gerrit:1191359|objectcache: Add a hit/miss flag to CachedBagOStuff]], [[gerrit:1191360|session: Improve logging and monitoring in SessionStore implementations (T399195 T402808)]], [[gerrit:1191361|hCaptcha: Fix mock for StatsFactory]], [[gerrit:1191362|NewcomerTasks: Use StatsFactory unit test helper]], [[gerrit:1191350|objectcache: Add a hit/miss flag to CachedB

Mentioned in SAL (#wikimedia-operations) [2025-09-25T13:33:33Z] <tgr@deploy1003> d3r1ck01, wmde-fisch, tgr: Backport for [[gerrit:1191359|objectcache: Add a hit/miss flag to CachedBagOStuff]], [[gerrit:1191360|session: Improve logging and monitoring in SessionStore implementations (T399195 T402808)]], [[gerrit:1191361|hCaptcha: Fix mock for StatsFactory]], [[gerrit:1191362|NewcomerTasks: Use StatsFactory unit test helper]], [[gerrit:1191350|objectcache: Add a hit/miss flag to Cache

Mentioned in SAL (#wikimedia-operations) [2025-09-25T13:43:12Z] <tgr@deploy1003> Finished scap sync-world: Backport for [[gerrit:1191359|objectcache: Add a hit/miss flag to CachedBagOStuff]], [[gerrit:1191360|session: Improve logging and monitoring in SessionStore implementations (T399195 T402808)]], [[gerrit:1191361|hCaptcha: Fix mock for StatsFactory]], [[gerrit:1191362|NewcomerTasks: Use StatsFactory unit test helper]], [[gerrit:1191350|objectcache: Add a hit/miss flag to Cached

Change #1191378 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable multibackend session store on beta and testwiki

https://gerrit.wikimedia.org/r/1191378

Mentioned in SAL (#wikimedia-operations) [2025-09-25T13:48:12Z] <tgr@deploy1003> Started scap sync-world: Backport for [[gerrit:1191378|Enable multibackend session store on beta and testwiki (T402808)]], [[gerrit:1191370|Pre-deploy Design Research participant recruitment survey on jawiki (T405577)]]

Mentioned in SAL (#wikimedia-operations) [2025-09-25T13:54:44Z] <tgr@deploy1003> tgr, d3r1ck01, dani: Backport for [[gerrit:1191378|Enable multibackend session store on beta and testwiki (T402808)]], [[gerrit:1191370|Pre-deploy Design Research participant recruitment survey on jawiki (T405577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-09-25T14:04:23Z] <tgr@deploy1003> Finished scap sync-world: Backport for [[gerrit:1191378|Enable multibackend session store on beta and testwiki (T402808)]], [[gerrit:1191370|Pre-deploy Design Research participant recruitment survey on jawiki (T405577)]] (duration: 16m 11s)

DAlangi_WMF changed the task status from Open to In Progress.Sep 26 2025, 10:53 AM

Rough plan

Thursday, September 25th -> rollout to beta cluster and testwiki ✅
Monday, September 29th -> rollout to group0 wikis
Tuesday, September 30th -> rollout to group1 wikis
Wednesday, September 31st -> rollout to group2 wikis

For each day's rollout, we'll monitor to ensure everything is working smoothly as expected. If not, we'll have to block, fix the issues, then resume or potentially roll back.

Change #1192051 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@master] tests: Expand tests for MultiBackendSessionStore

https://gerrit.wikimedia.org/r/1192051

Change #1183132 merged by jenkins-bot:

[operations/mediawiki-config@master] session: Enable MultiBackendSessionStore on `group0` wikis

https://gerrit.wikimedia.org/r/1183132

Mentioned in SAL (#wikimedia-operations) [2025-09-29T13:03:53Z] <lucaswerkmeister-wmde@deploy2002> Started scap sync-world: Backport for [[gerrit:1183132|session: Enable MultiBackendSessionStore on group0 wikis (T402808)]]

Mentioned in SAL (#wikimedia-operations) [2025-09-29T13:09:43Z] <lucaswerkmeister-wmde@deploy2002> d3r1ck01, lucaswerkmeister-wmde: Backport for [[gerrit:1183132|session: Enable MultiBackendSessionStore on group0 wikis (T402808)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-09-29T13:21:45Z] <lucaswerkmeister-wmde@deploy2002> Finished scap sync-world: Backport for [[gerrit:1183132|session: Enable MultiBackendSessionStore on group0 wikis (T402808)]] (duration: 17m 52s)

Change #1187779 merged by jenkins-bot:

[operations/mediawiki-config@master] session: Enable MultiBackendSessionStore on `group1` wikis

https://gerrit.wikimedia.org/r/1187779

Mentioned in SAL (#wikimedia-operations) [2025-09-30T13:38:41Z] <lucaswerkmeister-wmde@deploy2002> Started scap sync-world: Backport for [[gerrit:1187779|session: Enable MultiBackendSessionStore on group1 wikis (T402808)]]

Mentioned in SAL (#wikimedia-operations) [2025-09-30T13:45:07Z] <lucaswerkmeister-wmde@deploy2002> lucaswerkmeister-wmde, d3r1ck01: Backport for [[gerrit:1187779|session: Enable MultiBackendSessionStore on group1 wikis (T402808)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-09-30T13:53:21Z] <lucaswerkmeister-wmde@deploy2002> Finished scap sync-world: Backport for [[gerrit:1187779|session: Enable MultiBackendSessionStore on group1 wikis (T402808)]] (duration: 14m 40s)

Change #1192585 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@master] session: Attempt to fetch authentication store first

https://gerrit.wikimedia.org/r/1192585

Change #1192876 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@master] session: Handle `set()` edge-case with `false` as value to in-process cache

https://gerrit.wikimedia.org/r/1192876

Change #1192876 merged by jenkins-bot:

[mediawiki/core@master] session: Handle an edge-case in MultiBackendSessionStore::set()

https://gerrit.wikimedia.org/r/1192876

Change #1192884 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@wmf/1.45.0-wmf.20] session: Handle an edge-case in MultiBackendSessionStore::set()

https://gerrit.wikimedia.org/r/1192884

Change #1192885 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@wmf/1.45.0-wmf.21] session: Handle an edge-case in MultiBackendSessionStore::set()

https://gerrit.wikimedia.org/r/1192885

Change #1192585 merged by jenkins-bot:

[mediawiki/core@master] session: Lookup authenticated store first before anon store

https://gerrit.wikimedia.org/r/1192585

Change #1192884 merged by jenkins-bot:

[mediawiki/core@wmf/1.45.0-wmf.20] session: Handle an edge-case in MultiBackendSessionStore::set()

https://gerrit.wikimedia.org/r/1192884

Change #1192885 merged by jenkins-bot:

[mediawiki/core@wmf/1.45.0-wmf.21] session: Handle an edge-case in MultiBackendSessionStore::set()

https://gerrit.wikimedia.org/r/1192885

Mentioned in SAL (#wikimedia-operations) [2025-10-01T20:21:31Z] <derick@deploy2002> Started scap sync-world: Backport for [[gerrit:1192884|session: Handle an edge-case in MultiBackendSessionStore::set() (T402808)]], [[gerrit:1192885|session: Handle an edge-case in MultiBackendSessionStore::set() (T402808)]]

Mentioned in SAL (#wikimedia-operations) [2025-10-01T20:27:48Z] <derick@deploy2002> derick, d3r1ck01: Backport for [[gerrit:1192884|session: Handle an edge-case in MultiBackendSessionStore::set() (T402808)]], [[gerrit:1192885|session: Handle an edge-case in MultiBackendSessionStore::set() (T402808)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-10-01T20:34:28Z] <derick@deploy2002> Finished scap sync-world: Backport for [[gerrit:1192884|session: Handle an edge-case in MultiBackendSessionStore::set() (T402808)]], [[gerrit:1192885|session: Handle an edge-case in MultiBackendSessionStore::set() (T402808)]] (duration: 12m 57s)

Change #1193067 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@wmf/1.45.0-wmf.21] session: Lookup authenticated store first before anon store

https://gerrit.wikimedia.org/r/1193067

Change #1193069 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@wmf/1.45.0-wmf.20] session: Lookup authenticated store first before anon store

https://gerrit.wikimedia.org/r/1193069

Just so we have a written track record of IRC discussions here:

  1. The group 1 rollout cause a serious increase in Cassandra reads and Logstash Duplicate get(): "{key}" fetched {count} times logs (and had to be rolled back temporarily). That's because SessionManager::generateSessionId() warms the cache (marks the newly generated ID as not having any data to avoid a lookup) but due to the way MultiBackendSessionStore checks both stores when unsure which one to use, the session ID generation resulted in a real (uncached) store lookup. generateSessionId() is used a lot, even when the user doesn't actually have a session, so even for group1, session store load went way up.

Fixed by rMWc44e3ed7499c: session: Handle an edge-case in MultiBackendSessionStore::set().

  1. There's a smaller but still quite significant increase in duplicate gets:

This is probably due to MultiBackendSessionStore checking both backends when in doubt (even if it finds data in the store that's checked first), which should be fixed by rMWf5e029dea188: session: Lookup authenticated store first before anon store.

Change #1193067 merged by jenkins-bot:

[mediawiki/core@wmf/1.45.0-wmf.21] session: Lookup authenticated store first before anon store

https://gerrit.wikimedia.org/r/1193067

Change #1193069 merged by jenkins-bot:

[mediawiki/core@wmf/1.45.0-wmf.20] session: Lookup authenticated store first before anon store

https://gerrit.wikimedia.org/r/1193069

Mentioned in SAL (#wikimedia-operations) [2025-10-02T13:21:07Z] <lucaswerkmeister-wmde@deploy2002> Started scap sync-world: Backport for [[gerrit:1193067|session: Lookup authenticated store first before anon store (T402808)]], [[gerrit:1193069|session: Lookup authenticated store first before anon store (T402808)]]

Mentioned in SAL (#wikimedia-operations) [2025-10-02T13:27:14Z] <lucaswerkmeister-wmde@deploy2002> d3r1ck01, lucaswerkmeister-wmde: Backport for [[gerrit:1193067|session: Lookup authenticated store first before anon store (T402808)]], [[gerrit:1193069|session: Lookup authenticated store first before anon store (T402808)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-10-02T13:34:03Z] <lucaswerkmeister-wmde@deploy2002> Finished scap sync-world: Backport for [[gerrit:1193067|session: Lookup authenticated store first before anon store (T402808)]], [[gerrit:1193069|session: Lookup authenticated store first before anon store (T402808)]] (duration: 12m 56s)

There's 20K Authenticated data should not be in the anonymous store log entries in the last 7 days: https://logstash.wikimedia.org/goto/201c27467af11091c6ce70064ec2b682
This is the same error condition as T405633: Session data is authenticated, should not be an anonymous user (we use the authenticated store but the user ID in the data is 0) except it happens during get() rather than set(). So maybe the same issue? Maybe it happens when we are reading back those set()s? It happens about 10x less often though.

We have the opposite as well, Anonymous data should not be in the authenticated store, although another magnitude less: https://logstash.wikimedia.org/goto/eaf7ca64cd616f3ff220d029b62effec

Change #1194663 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@master] session: Improve logging for MultiBackendSessionStore

https://gerrit.wikimedia.org/r/1194663

Change #1194663 merged by jenkins-bot:

[mediawiki/core@master] session: Improve logging for MultiBackendSessionStore

https://gerrit.wikimedia.org/r/1194663

Change #1194963 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@wmf/1.45.0-wmf.21] session: Improve logging for MultiBackendSessionStore

https://gerrit.wikimedia.org/r/1194963

Change #1194964 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@wmf/1.45.0-wmf.22] session: Improve logging for MultiBackendSessionStore

https://gerrit.wikimedia.org/r/1194964

Change #1194963 merged by jenkins-bot:

[mediawiki/core@wmf/1.45.0-wmf.21] session: Improve logging for MultiBackendSessionStore

https://gerrit.wikimedia.org/r/1194963

Change #1194964 merged by jenkins-bot:

[mediawiki/core@wmf/1.45.0-wmf.22] session: Improve logging for MultiBackendSessionStore

https://gerrit.wikimedia.org/r/1194964

Mentioned in SAL (#wikimedia-operations) [2025-10-09T16:10:27Z] <tgr@deploy2002> Started scap sync-world: Backport for [[gerrit:1194963|session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634)]], [[gerrit:1194964|session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634)]]

Mentioned in SAL (#wikimedia-operations) [2025-10-09T16:14:09Z] <tgr@deploy2002> tgr, d3r1ck01: Backport for [[gerrit:1194963|session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634)]], [[gerrit:1194964|session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-10-09T16:30:34Z] <tgr@deploy2002> Finished scap sync-world: Backport for [[gerrit:1194963|session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634)]], [[gerrit:1194964|session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634)]] (duration: 20m 07s)

Change #1195029 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@master] session: Avoid logging in session store if UserInfo is available

https://gerrit.wikimedia.org/r/1195029

Change #1187781 merged by jenkins-bot:

[operations/mediawiki-config@master] session: Enable MultiBackendSessionStore on `group2` wikis

https://gerrit.wikimedia.org/r/1187781

Mentioned in SAL (#wikimedia-operations) [2025-10-13T13:03:45Z] <derick@deploy2002> Started scap sync-world: Backport for [[gerrit:1187781|session: Enable MultiBackendSessionStore on group2 wikis (T402808)]]

Mentioned in SAL (#wikimedia-operations) [2025-10-13T13:08:03Z] <derick@deploy2002> derick, d3r1ck01: Backport for [[gerrit:1187781|session: Enable MultiBackendSessionStore on group2 wikis (T402808)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-10-13T13:15:23Z] <derick@deploy2002> Finished scap sync-world: Backport for [[gerrit:1187781|session: Enable MultiBackendSessionStore on group2 wikis (T402808)]] (duration: 11m 39s)

Deployed to group2 today, will file a task to monitor logs for about a week or so. Resolving this now.

Change #1195029 abandoned by D3r1ck01:

[mediawiki/core@master] session: Avoid logging in session store if UserInfo is available

https://gerrit.wikimedia.org/r/1195029