Splitting this task off from T119736#2451232, I think T119736 deserves a full incident report. In my opinion, T119736 showcases a number of troubling patterns that I think we ought to identify and seek to address.
- Task was filed in November 2015.
- Priority was raised to "unbreak now!" in December 2015.
- It's now July 2016 and the task remains unresolved and in a state of "unbreak now!"
A task sitting in the "unbreak now!" status for so long alone is enough to warranty further investigation.
- Code gets deployed around July 7, 2016 and the number of exceptions jumps up considerably.
- It takes a while for people to notice.
- Nobody reverts until July 12, 2016.
Throughout T119736, a worryingly high number of tech-savvy Wikimedia Foundation users comment to state that they've personally encountered this issue (James F., Mukunda, Jaime). One way to identify serious issues is when a number of experienced users take time to point out that they've hit the issue themselves recently. This should have been a larger and redder flag to see this unrepresentative sample growing.
Meanwhile, the responses on T119736 have been largely to fix issues as they're individually reported by affected users using maintenance scripts (thank you for fixing, but ewwwwww) or to suggest bad workarounds like locally creating accounts on new wikis (T119736#2333011). It's really not acceptable to be telling users that they're responsible for logging in to other wikis in order to resolve the exceptions that we're throwing at them.