At 0730 UTC on July 25th, ORES began to be hammered by ~200 requests per second from an IP address within the University of Washington. This resulted in the complete overload of CODFW and a spike in the Overload Error rate graph. See https://grafana.wikimedia.org/dashboard/db/ores?panelId=9&fullscreen&orgId=1&from=1532493325359&to=1532528355313
Just a minor note: in the past, overload events like this have resulted in the collapse of ORES worker nodes, but in this case the workers continued to serve results at their maximum capacity. I'm not sure there's anything actually pathological about what happened, this is exactly the behavior we hope the service exhibits.
An alert makes sense because the service is made less available for other clients.
The main followup work I'd like to see is that we find a way to hard throttle good-faith users like this researcher, maybe limiting the number of parallel connections from a single IP at the network layer.