In T204624 (and maybe T198421), we saw a failure scenario where, under a higher than expected request rate, Parsoid continued to fulfill some requests but did not respond with a 503 when its queues where presumably full and instead led to socket timeouts on the client side.
That task was closed after tuning change prop for the beta cluster, but without addressing the issue in Parsoid.
Yesterday's production incident looked somewhat similar, so here we are again.