For about the last 5-6 hours (since around 6-7am CT, Dec 9), Parsoid cluster load has been in the 80% and higher range, and there are also a lot of CPU timeouts. Turns out this is the expression of T119883 in full glory because there are a bunch of bots creating multiple edits per minute (in the <10 byte range for each edit) on a really large page. So, every minute, multiple parse requests are queued via RESTBase for this large page that is going to time out anyway. Here is another specimen.
I think it is time to institute various parsing limits within Parsoid till such time we get around to being able to deal with these pages. Here are some possible limit features to consider:
- Size of wikitext
- Size of an individual list
- Size of an individual table
- Number of transclusions
- Number of images
- Expected size of DOM (based on # of tokens constructed and would be fed into the HTML tree builder)
Given that these pathological pages will never yield a result, and that they are only going to be making the cluster sluggish, it makes sense to detect these failure scenarios early and return a http 500. As Parsoid gets stronger muscles to deal with these "use wikitext as a database" scenarios, we can progressively relax limits.