Page MenuHomePhabricator

Decide on anonymous session backend
Closed, ResolvedPublic

Description

Work with ops to decide the backend to use for anonymous sessions. (A separate Cassandra cluster, or Memcached? Maybe even T394076: Investigate storing anonymous sessions client-side if that seems viable, although that would probably be a bigger project.)

Depends on T402808: Deploy separate anonymous session backend to Wikimedia production, in log-only mode.

See T394075: Investigate using different stores for different kinds of sessions for some past discussion.

Event Timeline

@Eevans we are now very close to wrapping up the coding part of T400372: Separate storage backend for anonymous sessions. That will allow for separate Cassandra namespaces for anonymous and authenticated sessions (also e.g. per-wiki as proposed in T392170: sessionstorage namespacing if that's deemed useful), but also something more aggressive like using Cassandra for authenticated sessions but Memcached for anonymous sessions. (Using Memcached was proposed in T362335: Simplify MediaWiki session store at WMF but rejected because routine Memcached maintenance would then result in users getting logged out. With anonymous users only, that's not really a problem.)

What would be the best way to determine what store to use?

(cc @Krinkle @DAlangi_WMF)

@Eevans we are now very close to wrapping up the coding part of T400372: Separate storage backend for anonymous sessions. That will allow for separate Cassandra namespaces for anonymous and authenticated sessions (also e.g. per-wiki as proposed in T392170: sessionstorage namespacing if that's deemed useful), but also something more aggressive like using Cassandra for authenticated sessions but Memcached for anonymous sessions. (Using Memcached was proposed in T362335: Simplify MediaWiki session store at WMF but rejected because routine Memcached maintenance would then result in users getting logged out. With anonymous users only, that's not really a problem.)

What would be the best way to determine what store to use?

(cc @Krinkle @DAlangi_WMF)

Thanks @Tgr, this is great news!

I don't have very strong feelings about using memcached, so long as we are explicitly acknowledging the different guarantees (or lack thereof). That said: I'd suggest we keep it all under Kask/sessionstore (a different instance), at least for now. When it comes to reasoning about how all of this works, we're adding a bit of complexity; Folks will now need to understand that authenticated sessions go in one bucket, and anon in another, including any differences in config —TTL for example (are we using a different TTL?). If we use memcache for anon sessions, not only will people need to know that, but may also have to factor in the different semantics. I'm not sure it's worth this. And, if we were to later determine it is worth it, it should be very easy to change.

If no one disagrees with this, I can open a ticket and set things in motion.

@Eevans good point - if it's not too much effort to set up a new Kask instance, that definitely reduces the risk of surprises, compared to bringing in a whole new technology. So it makes a lot of sense to go with Kask, wait a month or so to make sure we didn't cause any bugs, and once we are sure the MediaWiki layer is working as intended, we can consider whether we want to use Memcached (or can just leave that option open in case scraper traffic becomes more problematic in the future).

Tgr claimed this task.

I think we can close this task, as we have decided on the backend for now, and we won't include investigating Memcached into WE 5.1.1. If we get back to it later, we can file a a new task.

I think we can close this task, as we have decided on the backend for now

Ack! The backend for both anonymous sessions and authenticated sessions is in Cassandra. See wgSessionCacheType (ref) and wgAnonSessionCacheType (ref) in our production config and this comment: T402850#11246764.