Maniphest T120972

Introduce various limits during parsing to deal with pathological page scenarios
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	ssastry
	Dec 9 2015, 5:34 PM

Description

For about the last 5-6 hours (since around 6-7am CT, Dec 9), Parsoid cluster load has been in the 80% and higher range, and there are also a lot of CPU timeouts. Turns out this is the expression of T119883 in full glory because there are a bunch of bots creating multiple edits per minute (in the <10 byte range for each edit) on a really large page. So, every minute, multiple parse requests are queued via RESTBase for this large page that is going to time out anyway. Here is another specimen.

I think it is time to institute various parsing limits within Parsoid till such time we get around to being able to deal with these pages. Here are some possible limit features to consider:

Size of wikitext
Size of an individual list
Size of an individual table
Number of transclusions
Number of images
Expected size of DOM (based on # of tokens constructed and would be fed into the HTML tree builder)

Given that these pathological pages will never yield a result, and that they are only going to be making the cluster sluggish, it makes sense to detect these failure scenarios early and return a http 500. As Parsoid gets stronger muscles to deal with these "use wikitext as a database" scenarios, we can progressively relax limits.

Related Objects

Mentioned In: T121854: Parsoid emits "wt2html: Exceeded max resource use: wikitextSize. Aborting!" at emergency level
T120971: Blacklist automatic updates for especially expensive pages
Mentioned Here: T75412: OCG Attribution request times out reguarly
T119883: Investigate inefficiencies in DOM construction and passes for large wikitext pages

Event Timeline

ssastry created this task.Dec 9 2015, 5:34 PM

ssastry raised the priority of this task from to High.

ssastry updated the task description. (Show Details)

ssastry added a project: Parsoid.

ssastry added subscribers: ssastry, Services, tstarling.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 9 2015, 5:34 PM

• GWicke mentioned this in T120971: Blacklist automatic updates for especially expensive pages.Dec 9 2015, 5:36 PM

https://phabricator.wikimedia.org/T51374#1835452

Change 257944 had a related patch set uploaded (by Subramanya Sastry):
WIP: T120972: Introduce configurable wt2html/html2wt limits

https://gerrit.wikimedia.org/r/257944

gerritbot added a project: Patch-For-Review.Dec 9 2015, 6:15 PM

Change 257944 merged by jenkins-bot:
T120972: Introduce configurable wt2html/html2wt limits

https://gerrit.wikimedia.org/r/257944

This is now deployed. We return a http 413 (payload too large) errors for these requests.

By cutting out all requests with wikitext > 1M or with list items > 30K or or table cells > 30K, there have been zero request timeouts (as logged in Kibana) and about 12 cpu timeouts in 40+ hours since this has been dpeloyed. This has also kept the ganglia load graph almost flat.

The urwiki bot-edited pages that caused the severe load spikes that prompted this task are covered by the list item limit. Looking at parsoid logs on various nodes, besides those urwiki pages, there have been a handful of pages that exercise the table cell limit. The other big source of http 413 are T75412: OCG Attribution request times out reguarly -- and about 50 of these requests an hour exceed the 1M wikitext size limit.

ssastry removed a project: Patch-For-Review.Dec 13 2015, 4:18 PM

ssastry set Security to None.

ssastry removed a subscriber: gerritbot.

Addressing T119883: Investigate inefficiencies in DOM construction and passes for large wikitext pages should help us increase these limits to maybe 50K list items, table cells, and 1.5M wikitext size?

ssastry mentioned this in T121854: Parsoid emits "wt2html: Exceeded max resource use: wikitextSize. Aborting!" at emergency level.Dec 18 2015, 3:48 PM

Introduce various limits during parsing to deal with pathological page scenariosClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

Introduce various limits during parsing to deal with pathological page scenarios
Closed, ResolvedPublic
Actions