Page MenuHomePhabricator

CentralAuth unable to complete account creation with lagged slave
Open, MediumPublic

Description

When lagging the mysql slave in beta, we found that the CentralAuth code produces an error and is unable to complete the request path via login-wiki.

  1. Run STOP SLAVE SQL_THREAD; on deployment-db2 as mysql root.
  2. Apply https://gerrit.wikimedia.org/r/#/c/244087/ to php-master on deployment-bastion and sync-dir php-master/includes/db/.
  3. Create account at http://en.wikipedia.beta.wmflabs.org/w/index.php?title=Special:UserLogin&type=signup.

Expected behaviour:

  • Form is submitted without errors.
  • User is redirected back to en.wikipedia via login.wikimedia.

Actual behaviour:

  • Form is submitted.
  • User is redirected to login.wikimedia where an error is displayed and the redirect chain is aborted early

Screen Shot 2015-10-06 at 17.13.28.png (718×1 px, 226 KB)

Event Timeline

Krinkle raised the priority of this task from to Unbreak Now!.
Krinkle updated the task description. (Show Details)
Krinkle added subscribers: Krinkle, aaron.

This task has "Unbreak now" priority for two weeks now. Does anybody plan to work on this?

The testing in labs was a bit harsh since it involving disable the slave threads and thus stopping ChronologyProtector from working at all. ChronologyProtector now works across domains as of 85c0f85e925e206 and will wait for the slave to catch up to the point where any local/global user rows where added. A more meaningful test would be proper delayed replication, which requires mysql 5.6.

This problem could still happen if lag is so high that ChronologyProtector gives up (10 sec timeout). Fixing T95501 would help here. I wonder if this error happens during high lag spikes now (the few we have). Would be nice to get logging in place.

aaron lowered the priority of this task from Unbreak Now! to Medium.Nov 26 2015, 5:13 AM
aaron set Security to None.