Page MenuHomePhabricator

Investigate using different stores for different kinds of sessions
Closed, ResolvedPublic

Description

SessionManager provides an abstraction layer for the session frontend (ie. for the question of how do we read/write session-related information from/to the web request), but is tightly coupled with the implementation of the session backend (ie. how do we read/write session-related information from/to our servers). The session backend is a BagOStuff, so there is some amount of abstraction, but once you have determined what object cache backend is used for sessions, there's not much further flexibility.

There are a couple reasons we might want to change that, mostly related to how anonymous sessions differ from authenticated session. Anonymous sessions are hard to plan for because they can be triggered by simple scraping; they are relatively low risk (an attacker who is able to manipulate an anonymous session wouldn't be able to achieve that much); and they are low-value (session loss is not very disruptive). Authenticated sessions are the opposite. So we might want to have differences in handling such as:

This might be tricky because a session can go from authenticated to logged-in or vice versa, and we want to nevertheless preserve its contents (especially during login this is relevant). But we can probably handle that like we handle session ID resets. Also we need to make sure there is no situation where we don't know beforehand whether a session is authenticated or not, and it's determined by what data we read from the store (which obviously wouldn't work if authenticated-ness determines what store to use).

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Investigation summary

NOTE: Please correct me if you find holes in my write-up
Current state of affairs

Today, two config settings direct which backends that sessions (MediaWiki core/local sessions, e.g., enwiki_session, CentralAuth/central sessions, e.g., centralauth_Session, etc) are written to. These config settings are: $wgSessionCacheType[1] and $wgCentralAuthSessionCacheType[2] which point to Kask[3][4] in our production config. Kask is configured to use RESTBagOStuff internally in MediaWiki, and it's backed by a Cassandra DB (Apache Cassandra).

Main type of sessions we support

We currently support two types of sessions: anonymous sessions and authenticated sessions. For authenticated sessions, you can either log into the site or create an account, and we have a central version of this type of session that is used to create the local versions.

Anonymous sessions are obtained by logged-out users (readers) reading the site who want to log in/create an account, while authenticated sessions are obtained by editors (most) or users performing actions with elevated access (logged-in users).

@note: @Tgr made a visualization for getting insights on session writes [5]. Note that this is a sample of 1:1000 requests of local session writes (not including central sessions). The visualization shows various data points about sessions, and we can see from the "Top user" table that anonymous sessions are the highest in count. Could this suggest that about half the percentage of session writes are anonymous sessions?

Backend for session storage WMF

We use Apache Cassandra (via Kask + RESTBagOStuff) as our current backend for session storage[3].

Possible stores

Apache Cassandra: This is what we use today, but we're running into issues like {T390514} due to high-volume session writes with a high retention period (which is not needed for anonymous sessions). When Apache Cassandra is under high volume writes with relatively short periods of GC (data eviction on expiry), we could potentially run out of disk space, and if that happens, the service becomes unavailable (outage). This can lead to subsequent data loss and unreliability of the storage facility, thereby affecting users of our sites. Since anonymous sessions can be triggered just by visiting a login/account creation special page, this can cause high write volumes during scenarios like scraping, automated bot traffic, etc. Due to this, the store and how it's configured today are not suitable specifically for such traffic.

At first approximation, we can use Cassandra to potentially address the issue above by using a different key group/namespace and configure it for more frequent eviction when we're dealing with anonymous sessions. This way, even with high write rates, we also evict more quickly to balance up space and keep receiving more writes, hence reducing the probability of running out of disk space.

Per the above paragraph, I do not see how we can reduce high traffic spikes since increasing eviction frequency is also going to increase traffic to the Cassandra clusters (to delete expired items), even though it's for the benefit of freeing up capacity. Frequent deletes will also mean more synchronization/replication across all clusters and possibly data centers. Could this increase the probability of replication lags?

A slight downside for Cassandra is that it stores data on disk (compared to Memcached), which is a bit slower, but I do not think that it's a concern for this use case.

Memcached (proposed): We support Memcached via Mcrouter today [6][7], and it could be used for anonymous sessions due to its high performance and in-memory object caching. Fundamentally, the way Memcached works is that when things are popular, they stay in memory, but when they're unpopular, they're evicted (LRU). Correlating this idea with anonymous sessions, which we don't persist permanently until the user is logged in, anonymous sessions can stay in Memcached. Under heavy load, Memcached will naturally start deleting data it thinks is not needed (unpopular), rather than a complete outage as in the case with Cassandra above.

The above paragraph can also mean anonymous sessions get evicted more frequently under high traffic, but the worst that can happen is an anonymous session loss and the user will need another click (resubmit the form) to trigger a login, and since anonymous sessions are not guaranteed to stay for a long time, losing it is not that terrible.

When it comes to speed, Memcached is highly performant compared to Cassandra since Memcached is in-memory while Cassandra is on disk (making it relatively slower). Also, Memcached is designed for high traffic, meaning it might be suitable for the issue we're currently facing.

We do not need to replicate anonymous sessions to DCs at all, so keeping them in Cassandra (which will eventually replicate to all servers and clusters is not very suitable in this case). Memcached at WMF is configured to keep data within the current DC's app servers, making it suitable for data that isn't required to be replicated across DCs. It's also cheap to construct a new anonymous session object and give the user, because Memcached will throw it away soon if it's no longer in use.

One downside with Memcached is that after restarting, we lose all data, which means all anonymous sessions get deleted, and this might impact lots of users (in the process of logging in). But there is a feature from Memcached 1.5.18 that advertises restarting Memcached with a warm cache [8]. Maybe we can use that feature if we have it available to minimize the impact?

MySQL DB (proposed): We currently use this for storing parsed content with a high retention period. Like Cassandra, we could potentially run into disk issues even though in terms of capacity, I think our MySQL capacity is larger [citation needed], but that doesn't mean we can just put things in there. We can configure MySQL (SQLBagOStuff / CACHE_DB) to evict anonymous sessions more quickly (maybe in 1 hour max or so). But since this is also a database (though relational instead of NoSQL like Cassandra above), we don't want to expose it to very high volume traffic either. Maybe it's not such a pretty idea to move from Cassandra to MySQL?

Like Cassandra, for anonymous sessions, we don't need full replication IMHO, and MySQL will perform multi-directional replication across data centers, and we don't want to keep anonymous sessions everywhere.


Proposal 1: The proposal is to use Memcached for anonymous sessions and Cassandra for authenticated sessions. Then we can switch stores on the fly depending on which session we're working with. Under high traffic, the eviction rate can become really high, thereby causing deletion of sessions to occur very quickly (if unused or even used).

Proposal 2: Use a different keyspace in Cassandra for anonymous sessions with a low TTL (like 1 hour). The benefit of this is guaranteed persistence for the set TTL (which is better than unexpected eviction by Memcached above), but session cookies are supposed to be active for the entire duration of the user's browsing (meaning after the TTL expires, the session is deleted even if the user is still actively browsing).


Selecting stores on the fly (based on session type)

We could introduce a config setting, say wgAnonSessionCacheType, for configuration which backend to use for anonymous sessions (see PoC patch). wgAnonSessionCacheType could be set to CACHE_MEMCACHED then maintain whatever wgSessionCacheType is set to in production (for auth sessions).

If we finally agree to use a different store for anonymous and authenticated sessions, we will need to have a mechanism to decide, just by looking at the session, which store should be used. A few ways we could explore may be (looking at the session-related code):

  • making use of cookie headers from a web request (core cookie headers, central auth cookies, etc)
  • inspecting the user making a request and seeing if they're named or not
  • inspecting the session object if it's mutable or immutable (canChangeUser()), based on the session provider
  • maybe the request has a valid OAuth token in the request headers (?)
  • ...

The idea here is that we should have a way to predict which session we're currently using (anon / auth). The tricky part here is when a session changes from an anonymous session to an authenticated session, we need to change stores on the fly and potentially copy data to the new store.

PoC: Crude proof of concept patch for store selection

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1162377

LocalSettings.php

$wgAnonSessionCacheType = CACHE_MEMCACHED;
$wgSessionCacheType = /* Maybe */ CACHE_DB;

InitialiseSettings.php

...
'wgSessionCacheType' => [
	'default' => 'kask-session',
],
'wgAnonSessionCacheType' => [
	'default' => 'mcrouter
],

SessionCache.php

class SessionCache {

	// class properties.
	// ...

	public function __construct( ... ) {
		// Inject various dependencies.
	}

	/**
 	 * Get the store based on the type of session
 	 */
	public function getStore(): BagOStuff {
		// detect store based on various axioms
	}

	private function checkAuthCookies( WebRequest $request ): bool {
		// check if authentication cookies are set.
	}

	private function canChangeUser( SessionBackend $backend ): bool {
		// check if this session object is mutable or not
	}
}
Migration

The idea proposed above would not need any migration path for authenticated sessions, they should just work I think since the correct store will be selected. But for anonymous sessions that are already in progress, users will briefly encounter a "session hijack" error when they submit the form after we deploy. That means they'll need to resubmit the form so that a new session is generated and cached in Memcached that will be used to authenticate them.

Questions
  • Which do we prefer: Apache Cassandra key group/namespace for anon sessions, or Memcached for anon sessions?
  • Does Memcached at WMF have the capacity to handle anonymous sessions?
  • If we choose Memcached for anon sessions, how do we deal with store selections made on the fly regarding the session in question?
  • I didn't think about Redis caching, should we explore that?
  • Does the proposal above make sense? Or am I completely off track?
Refs

[1] https://gerrit.wikimedia.org/g/operations/mediawiki-config/+/1a6044c1b44d7f1fb5639687d235178c342ebde1/wmf-config/InitialiseSettings.php#11117
[2] https://gerrit.wikimedia.org/g/operations/mediawiki-config/+/1a6044c1b44d7f1fb5639687d235178c342ebde1/wmf-config/InitialiseSettings.php#11123
[3] https://www.mediawiki.org/wiki/Kask
[4] https://gerrit.wikimedia.org/g/operations/mediawiki-config/+/1a6044c1b44d7f1fb5639687d235178c342ebde1/wmf-config/CommonSettings.php#713
[5] https://logstash.wikimedia.org/app/dashboards#/view/174d68c0-2c45-11f0-a1ab-6f85ab09a17d?_g=h@d01ba18&_a=h@a4d8ee7
[6] https://wikitech.wikimedia.org/wiki/Mw-mcrouter
[7] https://wikitech.wikimedia.org/wiki/Memcached_for_MediaWiki
[8] https://github.com/memcached/memcached/wiki/ReleaseNotes1518

In practice I think this task is going to overlap T399192: Create new session store abstraction to replace BagOStuff in SessionManager since the best way to investigate it is to actually try doing it.

Change #1162377 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@master] [PoC]session: Add support for caching anonymous sessions

https://gerrit.wikimedia.org/r/1162377

Change #1166479 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@master] session: Introduce helper utility to help migration (step 1)

https://gerrit.wikimedia.org/r/1166479

Change #1166492 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@master] session: Make SessionManager use DI (step 2)

https://gerrit.wikimedia.org/r/1166492

Notes per conversation with @Krinkle

Questions from Krinkle on IRC

<Krinkle> > How do we deal with frequent anonymous session loss if we use Memcached and it’s under high traffic?

<Krinkle> What would a reasonable Memcached cluster look like (i.e. how many servers, how much RAM per server) > How much space would we have > How much space does Cassandra reliably have before it hard-fails > How much space does Memcached have before it soft-fails > What does Memcached soft-fail mean, which ones will it be evicting and would this affect the most common 99% of login interactions? > What is "high traffic" > How many login attempts would 
<Krinkle> we need before one or the other starts failing > Is this above or below per-ip rate limits > Have we ever seen more concurrent login traffic from more IPs such that Memcached would need to fail?

<Krinkle> > Maybe using a [Cassandra] is a better idea? This guarantees persistence [;‘ thereby circumventing rapid session loss with Memcached […]
<Krinkle> If I understand correctly, this assumes Cassandra can store fit more sessions than Memcached, and assumes a full-Memcached would fail in a worse way for end-users than a full-Cassandra. While I see a theoretical case where this can happen, I believe in practice the opposite happens on both points.

Attempted responses

- What would a reasonable Memcached cluster look like (i.e., how many servers, how much RAM per server)?
      **I thought we would use what we have today. Does this question suggest we will need a dedicated cluster for this?
         “RE: Memcached cluster. Yes, I would expect a dedicated small cluster for this, because remember that the main Memcached cluster is intentionally full (imagine an open water above a small fixed size glass, with a small hole in the bottom, the glass is always full with stuff falling out the bottom and top; except where Memcached keeps hold of the most popular/important stuff and ignores everything else). You don't want sessions to compete with this. Memcached keeps separate slabs by value size, which means very small values (MicroStash, rate limit counters) we can hold almost guarantee will stay for 1-2 minutes at least so long as we don't create too many keys, because that value-size slab is separate from the other slabs. Sessions are a bit larger and would compete with the general popularity stuff.”
   - How much space would we have
      ** I do not know.
         “RE: Space. You can find the size of memory allocated to Memcached in Grafana.” - Confirmed: 115GB per shard for 18 shards, making a total of 2,070 GB (about 2.07 TB of memory allocation).
   - How much space does Cassandra reliably have before it hard-fails
      ** This may be documented somewhere, but the best I can find is: https://grafana.wikimedia.org/d/kUVKEvaWz/cassandra-storage?orgId=1&from=now-30d&to=now&timezone=utc&var-datasource=000000014&var-cluster=sessionstore, so I’m thinking maybe 1TB for session store?
          “RE: Cassandra size. There is a difference between how much disk space something uses (i.e. administration, replication, history, redundancy, compression, etc) and how much logical space you have for values from the MW side. Cassandra keeps all previous versions of all values, a bit like MediaWiki edit history, although not forever, it keeps it for 7 days I believe. This means even if from the MW side sessions are changed, expired, and deleted; and you can't read the data anymore (as if the key is gone), it is still there on the server in the replication logs which powers consensus algos. Note that Cassandra is a decentralized store, unlike MySQL. This means you dont/ write to a master and then have it replicate. Instead, you write to a handful of random servers, and then the servers talk to each other to agree on what the eventual state should be, and they also take turns being the internal leader/master. This kind of setup means they all need a record of history so that they can catch up and replay changes in order to arrive at the same conclusion. This means the sum() of the values we need/want/can see at any given moment is almost 100x smaller than the size taken up on disk. About 100G for every 1G for mw sessions.”
   - How much space does Memcached have before it soft-fails?
      ** Per https://grafana.wikimedia.org/d/000000316/memcache?orgId=1&from=now-30d&to=now&timezone=utc&var-datasource=000000006&var-cluster=memcached&var-instance=$__all&viewPanel=panel-57, is it 115GB for each shard for all 18 shards, making it 115GB x 18 (per DC)?
   - What does Memcached soft-fail mean, which ones will it be evicting, and would this affect the most common 99% of login interactions?
      ** I think soft-fail (which I wouldn’t say fail) would refer to memcached experiencing high traffic on the shards, thereby evicting data more frequently to make room for more data. If the evicted data is still needed, it’ll be recomputed and the cache repopulated, causing more evictions, and the cycle continues. I don’t think it’ll affect the most common login cases, because if someone wants to log in, they’ll do so within a short time, I suppose.
   - What is "high traffic"
      ** A lot of HTTP for logged-in accounts or temporary accounts, i.e., hit-pass.
   -  How many login attempts would we need before one or the other starts failing?
      ** I’m not sure here, I can find out.
   - Is this above or below per-ip rate limits?
      ** I don’t understand this question.
   - Have we ever seen more concurrent login traffic from more IPs such that Memcached would need to fail?
      ** No, since we are not currently using Memcached as our session storage. Today, we use memcached only for login tokens (token store), and the failure we’ve seen is for a different use case (see: https://phabricator.wikimedia.org/T390784)

Change #1166479 merged by jenkins-bot:

[mediawiki/core@master] session: Introduce helper utility to help migration (step 1)

https://gerrit.wikimedia.org/r/1166479

Change #1166492 merged by jenkins-bot:

[mediawiki/core@master] session: Make SessionManager use DI (step 2)

https://gerrit.wikimedia.org/r/1166492

Change #1178870 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@master] [POC]session: Separate anon sessions from authenticated sessions (p2)

https://gerrit.wikimedia.org/r/1178870

Change #1162377 merged by jenkins-bot:

[mediawiki/core@master] session: Introduce session store abstraction interface (p1)

https://gerrit.wikimedia.org/r/1162377

Change #1180834 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@master] session: Follow-up session store protection (part 1.1)

https://gerrit.wikimedia.org/r/1180834

Change #1180834 merged by jenkins-bot:

[mediawiki/core@master] session: Follow-up session store protection (part 1.1)

https://gerrit.wikimedia.org/r/1180834

Change #1181269 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@master] session: Delete "old session data" if validation fails (p1.2)

https://gerrit.wikimedia.org/r/1181269

This is also done. I think work would continue referencing other tasks in the task tree.

Change #1181269 merged by jenkins-bot:

[mediawiki/core@master] session: Delete "old session data" if validation fails (p1.2)

https://gerrit.wikimedia.org/r/1181269

Change #1178870 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@master] session: Segregate anonymous sessions from authenticated sessions (p2)

https://gerrit.wikimedia.org/r/1178870