Steps to replicate the issue (include links if applicable):
- Become a trusted volunteer
- See reports for X happening (or happened yesterday)
- Be unable to have any visible insight into the production incident process
What happens?:
https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/message/GUGCSE7JKQNSXWKCFHPCXEL7BLAVRTD6/
What should have happened instead?:
Not sure what the solution is here, but I need to be more open, especially to trusted volunteers who clearly are not adversaries here. At the very minimum:
- Volunteers should be able to confirm or deny the presence of production incidents (to give an example, I've had reports of people seeing a "too many requests" screen; is that a collateral to a misconfiguration or just anti-abuse working as intended? Should I file a bug ?)
- Be able to see why a specific incident occurred and whether it was resolved. We (the community) are stakeholders in the process and do deserve to know if major outages are being caused by scrapers, by faulty code, etc.
Besides transparency, there are sometimes significant upsides to talking to volunteers about incidents, since a community perspective can bring up problems before they manifest. To give a recent example, T261752 was discovered during a discussion about how anti-abuse measures on Discord were affecting users. This kind of discussion should have occurred on Phabricator when/before the rate limits were finalized.