Page MenuHomePhabricator

Incident documentation for T119736: "could not find local user data for" exceptions
Closed, ResolvedPublic

Description

Splitting this task off from T119736#2451232, I think T119736 deserves a full incident report. In my opinion, T119736 showcases a number of troubling patterns that I think we ought to identify and seek to address.

  • Task was filed in November 2015.
  • Priority was raised to "unbreak now!" in December 2015.
  • It's now July 2016 and the task remains unresolved and in a state of "unbreak now!"

A task sitting in the "unbreak now!" status for so long alone is enough to warranty further investigation.

  • Code gets deployed around July 7, 2016 and the number of exceptions jumps up considerably.
  • It takes a while for people to notice.
  • Nobody reverts until July 12, 2016.

Throughout T119736, a worryingly high number of tech-savvy Wikimedia Foundation users comment to state that they've personally encountered this issue (James F., Mukunda, Jaime). One way to identify serious issues is when a number of experienced users take time to point out that they've hit the issue themselves recently. This should have been a larger and redder flag to see this unrepresentative sample growing.

Meanwhile, the responses on T119736 have been largely to fix issues as they're individually reported by affected users using maintenance scripts (thank you for fixing, but ewwwwww) or to suggest bad workarounds like locally creating accounts on new wikis (T119736#2333011). It's really not acceptable to be telling users that they're responsible for logging in to other wikis in order to resolve the exceptions that we're throwing at them.

Event Timeline

Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJul 12 2016, 5:05 AM
Base added a subscriber: Base.Jul 12 2016, 6:16 AM
JJMC89 added a subscriber: JJMC89.Jul 12 2016, 6:27 AM

I've written an incident report just for the spike in this caused by Echo: https://wikitech.wikimedia.org/wiki/Incident_documentation/20160712-EchoCentralAuth .

I've written an incident report just for the spike in this caused by Echo: https://wikitech.wikimedia.org/wiki/Incident_documentation/20160712-EchoCentralAuth .

Excellent, thank you! I noticed that you also filed T140207: Consider alternative processes for Unbreak Now bugs, especially those which cross-cut components which looks promising.

Aklapper closed this task as Resolved.Aug 11 2016, 11:33 AM
Aklapper claimed this task.