Page MenuHomePhabricator

Create SUL 3 rollout plan for Wikimedia production
Open, Needs TriagePublic

Description

We want to deploy T348388: Use central login wiki for login (SUL3) gradually, like any other major change so that problems can be identified before they would affect lots of users. The cross-domain nature of authentication makes this trickier than usual.

We also want to make it easy to identify whether the rollout affects authtentication metrics or other products.

Related Objects

Event Timeline

There are basically three functions that we can enable/disable SUL3 for:

  • login
  • signup (or autocreation, in the case of temp users)
  • autologin

Signup is relatively straightforward, since it only happens once per user so it cannot be inconsistent. We can roll out per wiki, or per IP hash, or some combination of the two. (We cannot use username / user ID hashes since the SUL2/SUL3 workflows diverge well before the username is provided.) We can use user preferences to tag the account for analytics.

For login, we'd ideally want to make sure that the same user gets the same login experience for all login attempts, at the very least on the same wiki. There are some imperfect approaches to that:

  1. Tag the browser with a long-lasting cookie when the user first gets selected/not selected to be in the SUL3 cohort. (The cookie would probably be a combination of cohort ID and SUL3 flag, so we can easily change cohorts over time.) We can set the cookie on the CentralAuth cookie domain. Login behavior will not be consistent between devices, and will not be consistent between wiki families.
  2. Tag the user account with a preference, and then rely on the username cookie which MediaWiki sets after a successful login and doesn't automatically unset or expire. This is similar to the previous option, except 1) the cookies are mostly pre-populated so the device and domain inconsistencies will be less frequent; 2) a user can be placed in a different cohort immediately, instead of only once they are in the middle of authentication.
  3. Use the shared login domain as a bottleneck; change SUL2 login so it redirects to the shared domain to check the cookie. This is in theory consistent between domains, but requires modifying the logic for both SUL2 and SUL3 logins at the same time which seems risky; adds lag to SUL2 logins; and might be affected by third-party cookie blocking (although mostly browsers don't do that for top-level redirects today).

Approach #2 seems superior here.

For autologin, we want the SUL3 flag to be sticky through a single autologin process so we need to add a query flag to SpecialCentralAutologin (and SpecialCentralLogin if we end up using that somehow for SUL3). Other than that it seems identical to the login situation. Since the cookie would only have to be checked on the page where autologin is initiated, which is always a top-level document, there wouldn't be any extra cookie blocking considerations here. For any wikis other than the central one, there is no difference between SUL2 and SUL3 autologin, so no need to worry about which wikis should be included in the edge login set. The only potential complication is that we might want to keep the SUL2 and SUL3 central domains in sync - T375796: Synchronize SUL2 and SUL3 central browser state.

Also we might want to exempt temp users initially, because errors which result in a session loss are irrecoverable for them.