Page MenuHomePhabricator

sessionstorage namespacing
Open, MediumPublic

Description

We are storing local sessions for each project, as well as for central auth, together in a single instance of Kask that we refer to as sessionstore. Each of these session types are distinct, and should be stored in their own namespace. Namespacing does occur, but it is managed at the application level by encoding names into formatted keys. For example:

image.png (253×747 px, 32 KB)

If this namespacing were happening at the storage layer, then the database would be able to provide a breakdown by operation type (read, write, and delete), rates, throughput, and storage utilization, on a per namespace basis.

Additionally, an argument could be made for using a separate storage namespace for each wiki group (i.e. group0, group1, group2, etc), or even on a per-wiki basis (i.e. enwiki, dewiki, commons, etc). Both of these are in fact examples of namespaces, whether modeled in storage or not.

/sessions/mwsession/{session_id}  ...or
/sessions/group0/{session_id} ...or
/sessions/enwiki/{session_id}

/sessions/centralauth/{session_id}

Observability concerns aside though, there are also advantages to being able to operate each namespace independently. For example: Each has its own workload and might benefit —operationally— from being separated (i.e. compaction, compression, caching, etc). We would also have the ability to truncate an individual namespace —without impacting the others— say if storage were corrupted during a partial rollout.


See also:

{T390514}

Event Timeline

Pre-Cassandra, local sessions had one hour expiry, and central sessions had 24 hour expiry. Since Kask has per-namespace expiry settings, we ended up with 24 hours expiry for everything. Which is nice for UX, I guess (longer sessions means less session loss errors if e.g. you take more than an hour to edit an article), but the gains are minimal, so if we are looking for space reduction, that seems like a good way to do it (the overwhelming majority of sessions are local ones, local session blobs are larger, and I'd expect the majority of sessions to expire and not get refreshed, so assuming fast enough garbage collection, that seems like a 90-95% space reduction).

I don't see an obvious benefit to separating the various local sessions, but no harm either; on the MediaWiki side, it seems easy enough to do.

Would renamespacing be transparent (ie. the old and new namespace would still resolve to the same Cassandra key) or would there be data loss? In the latter case, do you need some kind of gradual rollout support? The way it works on the MediaWiki side is that there is a semi-permanent authentication cookie that's used to automatically create a new session on session loss (as long the user clicked the "keep me logged in" checkbox during their last login), so for most users it should be barely perceivable, but for a while every new request to the appservers would create a new session. (So if the central sessions get migrated, a number of sessions equal to the number of active users will be created in a short amount of time; if it's the local sessions, than active users times the number of wikis they are active on.) Is there a risk of that overwhelming the service, especially since it's already having capcity issues?

Eevans added a parent task: Restricted Task.May 2 2025, 9:14 PM
Eevans triaged this task as Medium priority.Oct 31 2025, 6:01 PM

@Tgr at this point, is there any obstacle and/or objections to separating storage of central auth sessions? Using a separate store/namespace for them is perhaps even more interesting if it means we could set the TTLs accordingly.

No. It's not related to the WE 5.1.1 work (core sessions and CentralAuth sessions are completely different codebases) but AFAICS it should be doable with a straightforward configuration change.