Page MenuHomePhabricator

MigrationBagOStuff
Closed, ResolvedPublic

Description

For session handling we need some code that will manage a migration period, so that we can switch over to the new session storage service without losing current users' sessions.

A MigrationBagOStuff should:

  • Have attributes for the old store and the new store ($oldStore, $newStore)
  • Take params that initialize the $oldStore and $newStore, preferably agnostic about the classes of the new and old store
  • For read methods (like get()), should check the new store first, and fall back to the old store on a miss
  • For write methods (like set()), should write only to the new store, and maybe delete from the old store
  • For delete methods (like delete()), should delete from the old store and then the new store
  • For atomic read-and-write methods (like add() or incr()), I'm not sure. Probably unwrap the read-and-write, using locks.
  • For locking methods (lock(), unlock()), I'm not sure. Probably use locks for the new store, only.

Similar classes:

  • MultiWriteBagOStuff: similar, but without the read fallback
  • ReplicatedBagOStuff: similar, but writes to one store and reads from another
  • CachedBagOStuff: similar

Ideally we'd just inherit from one of these, or something similar.

Event Timeline

Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptMay 7 2019, 4:48 PM

I think this should work for the migration period. I think if we use this for the session TTL duration (or, let's say, 2x the TTL to be safe), we could then swap it out and just use the new storage.

I have to admit that I'm not sure how our roll-out process works, but I assume there's some period of time in which some servers use the old config, and others use the new config. So I think we should be ready for the following states:

  • Old session storage (this is now)
  • Mixed old session storage, migration storage (rollout period)
  • Migration storage (all servers have migration config)
  • Mixed migration storage, new storage (rollout period 2)
  • New storage (where we want to go)

There might be some race conditions during the rollout period that we'd want to be prepared for.

Anomie added a comment.May 7 2019, 5:56 PM
  • MultiWriteBagOStuff: similar, but without the read fallback

The class comment says "Reads are implemented by reading from the caches in the order they are given in the configuration until a cache gives a positive result." The code seems to match.

It has different write semantics than you ask for though, as it will write to both the new and old stores.

I have to admit that I'm not sure how our roll-out process works, but I assume there's some period of time in which some servers use the old config, and others use the new config.

There are three answers to that question:

  • For a few seconds or minutes during the actual scap, you may have processes running with the old code/config and the new code/config at the same time. Some of that time is intentional: a few "canary" hosts are updated first, then it waits to see if error rates immediately spike on those hosts before doing the rest.
  • For code changes riding the train, the wikis are divided into three groups (0, 1, and 2) which are normally deployed on Tuesday, Wednesday, and Thursday each week. Thus, different wikis will be running different code as the train rolls out.
    • Note that configuration changes (meaning the operations/mediawiki-config repository) do not ride the train.
  • Configuration can vary by wiki, with these differences persisting until someone changes them manually. This is often used to roll out changes to a smaller set of wikis for live testing and performance monitoring under load before giving it to the big wikis like enwiki.

The class comment says "Reads are implemented by reading from the caches in the order they are given in the configuration until a cache gives a positive result." The code seems to match.
It has different write semantics than you ask for though, as it will write to both the new and old stores.

So, that's good! As long as it's always going to read in order, then just making the order [new, old] should work for this case...?

I have to admit that I'm not sure how our roll-out process works, but I assume there's some period of time in which some servers use the old config, and others use the new config.

  • For a few seconds or minutes during the actual scap, you may have processes running with the old code/config and the new code/config at the same time. Some of that time is intentional: a few "canary" hosts are updated first, then it waits to see if error rates immediately spike on those hosts before doing the rest.

This is the rollout period I was concerned about. I estimate that we write a lot of sessions (O(10^3)? O(10^4)?) during a rollout period, so we might want to consider that.

I think the multi-write configuration makes sense in this situation.

So, it seems like the MultiWriteBagOStuff does what we need. It will be slightly inefficient, since it will do unnecessary writes to the old store, but that will probably be OK and actually be more robust during rollout period.

@BPirkle could you confirm that this class works for session storage with RESTBagOStuff as the "new" back end, and maybe Redis as the "old" backend? If that's the case, I think we can close this ticket.

I've created a ticket for doing the configuration options for RESTBagOStuff at T224993, so I'm closing this down.

EvanProdromou closed this task as Resolved.Jun 4 2019, 2:34 PM