Page MenuHomePhabricator

20200207-mediawiki API down
Closed, ResolvedPublic

Description

doc: https://wikitech.wikimedia.org/wiki/Incident_documentation/20200207-mediawiki_API_down

author: effie

comments from jbond (not sure if delivered):

  • Summary:
    • an example of the request with the headers, specifically the agent would be useful (or a logstash link)
  • Timeline:
    • was the only fix “Amit Emptified the templates” or did we also add vcl ratelimit/block?
  • impact:
    • could be more precise i.e. expand on almost in “API became almost unresponsive”
  • wonder if we could have more actions here
    • identify problematic templates
    • is there an action we can have to resolve “It is hard to pinpoint when an issue is due to a template”

Event Timeline

@jijiki are you able to take a look at the comments above, if its to far in the past i suggest we just make this one final, please assign back to me once comments addresses

an example of the request with the headers, specifically the agent would be useful (or a logstash link)

Updated the UA on the IR.

was the only fix “Amir Emptified the templates” or did we also add vcl ratelimit/block?

Amir's actions resolved the issue

could be more precise i.e. expand on almost in “API became almost unresponsive”

our p95 got to ~20s but I didn't gather more data at the time to find out how many requests we lost due to this latency :(

wonder if we could have more actions here

  • identify problematic templates
  • is there an action we can have to resolve “It is hard to pinpoint when an issue is due to a template”

The problem with problematic templates is that we can't see it until we do. In that specific incident this template was slow but it wasn't an issue until pages that included it started being scrapped by the bot. Not much we could have done, to my knowledge.

John and I decided we can mark this as resolved :)