In the parent ticket we are debugging a problem with missing events from edit requests. The source of this problem appears to be that a request ran for multiple hours in post-send, and many (all?) of the deferred's timed out.
To see how common this is i wrote a script to query reqIds that logged EmergencyTimeoutException, and then did an aggregation query that filtered logs for the same (host, reqId) combo (to exclude jobs that reuse reqId) and reported the delta between the earliest and latest log message. This reports requests with > 10 minutes between start and end
Script: P69109
Results for Sep 1 - 11: P69110
It's not a crazy number of requests per day, on average < 10 with 20 on the worst day, but we have multiple requests per day that run for 2+ hours. The longest request runs for 173 minutes.
I should note that this depends on the initial request logging something. If the initial request didn't log anything and the logs start at the timeout they are likely not included here. Perhaps EmergencyTimeoutException could be adjusted to report the current request runtime in the error message to give more concrete information.