Slow parsing / huge pages from HTML dump error logs: timeouts, crashers
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• GWicke
	Feb 8 2015, 2:18 AM

Description

The following pages are reported as timing out the Parsoid backend requests in the restbase logstash report (search for ETIMEDOUT pagebundle in https://logstash.wikimedia.org/#/dashboard/elasticsearch/restbase):

https://en.wikipedia.org/wiki/List_of_20th-century_classical_composers
https://en.wikipedia.org/wiki/List_of_Detroit_Red_Wings_players
https://en.wikipedia.org/wiki/List_of_Detroit_Red_Wings_draft_picks
https://en.wikipedia.org/wiki/List_of_Los_Angeles_Kings_players
List_of_law_clerks_of_the_Supreme_Court_of_the_United_States
List_of_auxiliaries_of_the_United_States_Navy
List_of_best-selling_music_artists
List_of_airline_codes
List_of_Unicode_characters
List_of_PlayStation_3_games
List_of_PlayStation_2_games
List_of_Pacific_Coast_Conference_football_standings
List_of_members_of_the_upper_house_of_the_Riksdag
List_of_songs_in_SingStar_games_(PlayStation_2)
List_of_Xbox_360_games

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		ssastry	T88915 Slow parsing / huge pages from HTML dump error logs: timeouts, crashers
		Resolved		Arlolra	T104523 Parsoid infinite recursion due to template loop involving <ref>

Event Timeline

• GWicke created this task.Feb 8 2015, 2:18 AM

• GWicke raised the priority of this task from to Needs Triage.

• GWicke updated the task description. (Show Details)

• GWicke added a project: Parsoid.

• GWicke subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 8 2015, 2:18 AM

• GWicke updated the task description. (Show Details)Feb 8 2015, 2:20 AM

• GWicke set Security to None.

• GWicke updated the task description. (Show Details)

• GWicke updated the task description. (Show Details)Feb 8 2015, 2:23 AM

Liuxinyu970226 subscribed.Feb 8 2015, 2:32 AM

• GWicke updated the task description. (Show Details)Feb 8 2015, 3:36 AM

• GWicke updated the task description. (Show Details)

Quick note I am recording here as I glance at this ticket on the weekend:

https://en.wikipedia.org/wiki/List_of_20th-century_classical_composers has a ~6500 line table with 2500+ transclusions of {{sort name|....}}

• GWicke updated the task description. (Show Details)Feb 8 2015, 6:24 PM

Arlolra triaged this task as Medium priority.Feb 10 2015, 2:20 AM

Arlolra raised the priority of this task from Medium to High.

Arlolra moved this task from Needs Triage to Performance on the Parsoid board.

Arlolra subscribed.

• marcoil added a project: Performance Issue.Feb 10 2015, 3:10 PM

• marcoil moved this task from Performance to Needs Triage on the Parsoid board.Feb 13 2015, 12:51 PM

ssastry moved this task from Needs Triage to In Progress on the Parsoid board.Mar 3 2015, 9:10 PM

• GWicke renamed this task from Slow parsing / huge pages from HTML dump error logs to Slow parsing / huge pages from HTML dump error logs: timeouts, crashers.Mar 5 2015, 9:31 PM

• GWicke mentioned this in T76518: Errors in Parsoid v2 entry point.Mar 5 2015, 9:34 PM

ssastry mentioned this in T92643: Parsoid Roadmap April - June 2015 (Q4 2014/2015).Apr 13 2015, 5:33 PM

Trying some of these locally now, I get:

[info][enwiki/List_of_20th-century_classical_composers?oldid=656432750] completed parsing in 53805 ms
[info][enwiki/List_of_Detroit_Red_Wings_players?oldid=648754782] completed parsing in 49185 ms
[info][enwiki/List_of_Los_Angeles_Kings_players?oldid=648542956] completed parsing in 63519 ms

So, it looks like some of these may have been transient errors. Or, it might be that they are timing out because of cpu starvation from lots of big pages getting parsed around the same time. In any case, we should still look if processing of any of these humongous pages can be sped up anywhere.

A few more candidates from the latest dump run:

Ding, ding ding .. http://localhost:8000/sqwiki/Marsi is a winner .... we have a candidate to debug.

@ssastry, you are welcome ;)

Looks like this could be another instance of an infinite loop like the one we fixed in https://gerrit.wikimedia.org/r/#/c/189036/ .. to be continued later.

This snippet here is sufficient to send this into a tailspin .. Try it with "node parse --trace peg --prefix sqwiki < /tmp/wt"

foo <ref name="Marsi">{{cite article
| author = NASA
| title = The Lure of Hematite
| quote = bar
| newspaper =
| date = March 28, 2001  
| pages = 1
| url = http://science.nasa.gov/science-news/science-at-nasa/2001/ast28mar_1/
| accessdate = October 6, 2014
}}</ref>

Aha .. what fun! Check this out: https://sq.wikipedia.org/w/api.php?action=expandtemplates&text={{cite%20article|%20author%20=%20NASA|%20title%20=%20The%20Lure%20of%20Hematite|%20quote%20=%20Marsi%20%28greqisht:%20Ares%29%20eshte%20Zoti%20i%20luftes.%20Planeti%20ndoshta%20e%20mori%20kete%20emer%20per%20shkak%20te%20ngjyres%20se%20kuqe,%20dhe%20gjithashtu%20Marsi%20eshte%20referuar%20disa%20here%20si%20Planeti%20i%20Kuq.|%20newspaper%20=%20|%20date%20=%20March%2028,%202001|%20pages%20=%201|%20url%20=%20http://science.nasa.gov/science-news/science-at-nasa/2001/ast28mar_1/|%20accessdate%20=%20October%206,%202014}}

I'll save you the suspense. Expanding that transclusion gives you ....

=================================
Stampa:Cite_article
---------------------------------
<ref>{{cite article
| author = NASA
| title = The Lure of Hematite
| quote = Marsi (greqisht: Ares) eshte Zoti i luftes. Planeti ndoshta e mori kete emer per shkak te ngjyres se kuqe, dhe gjithashtu Marsi eshte referuar disa here si Planeti i Kuq.
| newspaper = 
| date = March 28, 2001
| pages = 1
| url = http://science.nasa.gov/science-news/science-at-nasa/2001/ast28mar_1/
| accessdate = October 6, 2014
}}</ref>
---------------------------------

and thus, you have the infinite loop! This output above is with "--dump tplsrc" flags passed to parse.js ... but, if you parse the result of the expandtemplates api call above, this is what you get.

Not sure what the other pages will yield .. but, something to check later is if this expandtemplates behavior is broken or not.

ssastry added a subscriber: tstarling.Jun 3 2015, 3:49 PM

• GWicke updated the task description. (Show Details)Jun 4 2015, 11:04 AM

• GWicke updated the task description. (Show Details)

tstarling mentioned this in T104523: Parsoid infinite recursion due to template loop involving <ref>.Jul 2 2015, 2:03 AM

Created subtask for the Marsi infinite loop. Summary: MW tosses Parsoid a few bullets, Parsoid shoots itself in the foot. The fix will be in Parsoid.

As for [[List of 20th-century classical composers]], running it on my laptop gives 214 seconds of CPU usage in HHVM and 47 seconds in node. But running MW's expandtemplates on the whole article text only takes 10 seconds, which suggests that batching (T45888) could be a big win.

Note that this is 214s on a minimal test wiki with just Cite and ParserFunctions, I am not adding a lot of unnecessary startup overhead.

I'd also like to explore the 47s spent in node to see if there is any low-hanging fruit. For reference, MW transforms this article to HTML in 18s.

In T88915#1423980, @tstarling wrote:

As for [[List of 20th-century classical composers]], running it on my laptop gives 214 seconds of CPU usage in HHVM and 47 seconds in node. But running MW's expandtemplates on the whole article text only takes 10 seconds, which suggests that batching (T45888) could be a big win.

Definitely appears to be the case. If this is only for expandtemplates (and not parse which runs into extension state issues), this might be doable.

Note that this is 214s on a minimal test wiki with just Cite and ParserFunctions, I am not adding a lot of unnecessary startup overhead.

I'd also like to explore the 47s spent in node to see if there is any low-hanging fruit. For reference, MW transforms this article to HTML in 18s.

I am curious to see how much memory usage and GC pressure there is. I imagine tokens and array creation contribute a lot to it -- I do remember the token transformation managers create a lot of arrays.

Another interesting experiment would be to see how node time changes if you had all the expandtemplates output cached in memory and available "instantly", i.e. how much of the 47 sec is i/o wait time vs. cpu time.

In T88915#1423980, @tstarling wrote:

I'd also like to explore the 47s spent in node to see if there is any low-hanging fruit. For reference, MW transforms this article to HTML in 18s.

Also, another useful number to get would be to look at the total time spent in DOM transformations (which there are a *lot* of) and that should give us a good first cut sense of where that low hanging fruit might be.

In T88915#1424895, @ssastry wrote:

In T88915#1423980, @tstarling wrote:

I'd also like to explore the 47s spent in node to see if there is any low-hanging fruit. For reference, MW transforms this article to HTML in 18s.

Also, another useful number to get would be to look at the total time spent in DOM transformations (which there are a *lot* of) and that should give us a good first cut sense of where that low hanging fruit might be.

I just did some basic instrumentation in mediawiki.DOMPostProcessors.js and it looks like < 10% time is spent in DOM transforms across all invocations for this page. Same with the BO page.

In T88915#1424962, @ssastry wrote:

In T88915#1424895, @ssastry wrote:

In T88915#1423980, @tstarling wrote:

I'd also like to explore the 47s spent in node to see if there is any low-hanging fruit. For reference, MW transforms this article to HTML in 18s.

Also, another useful number to get would be to look at the total time spent in DOM transformations (which there are a *lot* of) and that should give us a good first cut sense of where that low hanging fruit might be.

I just did some basic instrumentation in mediawiki.DOMPostProcessors.js and it looks like < 10% time is spent in DOM transforms across all invocations for this page. Same with the BO page.

Scratch that. I will have to redo the tests at a different time with a faster / reliable internet connection. I don't think the BO page takes 40s to parse and so it might have been a bad internet connection that is skewing the percentages for me.

• MZMcBride subscribed.Jul 4 2015, 4:32 PM

In T88915#1424878, @ssastry wrote:

Another interesting experiment would be to see how node time changes if you had all the expandtemplates output cached in memory and available "instantly", i.e. how much of the 47 sec is i/o wait time vs. cpu time.

That was 47s of CPU time, measured by taking a delta of the "time" column in ps before and after test execution. I didn't measure wall clock time.

In T88915#1424895, @ssastry wrote:

Also, another useful number to get would be to look at the total time spent in DOM transformations (which there are a *lot* of) and that should give us a good first cut sense of where that low hanging fruit might be.

I'm not entirely sure if I'm abusing perf in the correct way. The following figures were generated by running perf report with the relevant function in -p, which selects all samples with the relevant function anywhere in the stack trace. Then I used the -x flag to exclude non-matching samples, and took the sample count from the resulting header.

Function	Event count	Relative
prepareDOM	7792579881	5.52%
wrapTemplates	4729666559	3.35%
markTreeBuilderFixups	3888261053	2.76%
computeDSR	1645748774	1.17%
migrateTemplateMarkerMetas	1456118058	1.03%
cleanupAndSaveDataParsoid	1205436698	0.85%
handleUnbalancedTables	471130011	0.33%
migrateTrailingNLs	385305060	0.27%
handlePres	192709197	0.14%
processRefs	111,065,891	0.08%
unpackDOMFragments	38,803,374	0.03%
stripDoubleTDs	2,239,852	0.00%
stripEmptyElements	643,888	0.00%
stripMarkerMetas	577,334	0.00%
handleLIHack	0	0.00%
handleLinkNeighbours	0	0.00%
handleTableCellTemplates	0	0.00%
logWikitextFixup	0	0.00%
-	-
total of above	21,920,285,630	15.54%
doPostProcess	20,060,665,055	14.22%
total	141,072,432,443	100.00%

In T88915#1424878, @ssastry wrote:

I am curious to see how much memory usage and GC pressure there is. I imagine tokens and array creation contribute a lot to it -- I do remember the token transformation managers create a lot of arrays.

By the same method as above, v8::internal::Heap::AllocateRaw() takes 4.6% of CPU.

I've been running a few tests with templates that expand to an empty string, to quantify the overhead of invoking a template. With MW involved as per normal, it is about 5ms of client-side CPU per template. Extrapolating, that explains about 16s of the previous 47s test case.

With the HTTP request stubbed out, replaced with a setImmediate() callback, it is about 2.6ms per template. With an immediate synchronous callback, it is about 2.5ms.

So the client-side overhead of making an async HTTP request is about 2.5ms, and that's how much we can expect to save by batching -- maybe 8s of client-side CPU in the motivating test case. So if the batched request is efficient enough to take less than 8s in HHVM, we may be able to improve overall latency, even if you assume an infinitely large HHVM cluster with no concurrency limit. We would also get a ~17% reduction in Parsoid CPU usage and ~95% reduction in MW CPU usage.

@GWicke suggested that my overhead calculations above might be biased due to HHVM causing bus contention and context switching in the normal case but not in the stubbed-out case. This seems to be the case. I re-ran the test with HTTP concurrency limited to 1, to eliminate context switching and reduce bus contention. The result was 0.9ms per template, down from 2.6ms.

In T88915#1436696, @tstarling wrote:

@GWicke suggested that my overhead calculations above might be biased due to HHVM causing bus contention and context switching in the normal case but not in the stubbed-out case. This seems to be the case. I re-ran the test with HTTP concurrency limited to 1, to eliminate context switching and reduce bus contention. The result was 0.9ms per template, down from 2.6ms.

Sorry, I screwed up that one, ab was timing out and not doing as many iterations as I thought. Actually, with HTTP concurrency reduced to 1, the per-template client-side overhead hardly changes, it is still around 2.5ms.

@tstarling, could you check the throughput with https://github.com/jeffbski/bench-rest, to establish a baseline for a simple http request from node on your hardware?

With bench-rest, 100,000 iterations against an 11KB static file served by Apache on the same host:

concurrency=10: 1582 req/s, 64.525s total client CPU time, implying 0.64ms client-side CPU overhead per request, 0.63ms mean time between responses
concurrency=1 : 1365 req/s, 65.017s total client CPU time, implying 0.65ms client-side CPU overhead per request, 0.73ms mean time between responses

Apache was observed to be using about 20% of one CPU during the c=10 test, node about 105%.

Full results: P975

@tstarling, that's very close to the ~0.4ms I saw for small responses (150 bytes or so).

I do wonder where the other 2ms or so are coming from in the Parsoid test.

tstarling mentioned this in T45888: Batch Parsoid's API requests.Jul 20 2015, 5:07 AM

• GWicke mentioned this in T114225: Some tokenizations crash the HTML tree builder.Sep 30 2015, 3:32 PM

With https://gerrit.wikimedia.org/r/#/c/244588/ all the pages listed here (except for the Marsi page which has a bug filed against it for detecting <ref> loops) parse to completion on my laptop .. almost all of them in < 40 sec. A couple take over 60 sec.

@ssastry: Wow, that's an impressive result for a one-line change!

I should try without that patch as well just to see how many of them were fixed by this patch vs. how many were already parsing successfully without .. but anywhere where there is a node with 1000s of children, the patch could potentially improve latency since it eliminates the O(n^2) part.

Yup, all of them have similar timings on master as well. I am going to close this ticket. Let us open new tickets for any individual pages that have poor performance and is reproducible locally.

Liuxinyu970226 unsubscribed.Oct 17 2015, 4:19 AM

Arlolra closed subtask T104523: Parsoid infinite recursion due to template loop involving <ref> as Resolved.Dec 12 2016, 11:32 PM

Slow parsing / huge pages from HTML dump error logs: timeouts, crashersClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Slow parsing / huge pages from HTML dump error logs: timeouts, crashers
Closed, ResolvedPublic
Actions

Related Objects
Search...