OCG Attribution request times out reguarly
Closed, DeclinedPublic
Actions

Assigned To

Authored By

	ssastry
	Nov 14 2014, 5:11 AM

Description

enwiki:MediaWiki:Coll-attribution-page seems to be the title timing out the most.

Version: unspecified
Severity: normal

Details

Reference: bz73412

Related Objects

Mentioned In: T153348: Database downtime around the time period when Parsoid was deployed
T121854: Parsoid emits "wt2html: Exceeded max resource use: wikitextSize. Aborting!" at emergency level
T120972: Introduce various limits during parsing to deal with pathological page scenarios
T114558: Investigate Oct 3 outage of the Parsoid cluster due to high cpu usage + high memory usage (sharp spike in both) around 08:35 UTC
T92643: Parsoid Roadmap April - June 2015 (Q4 2014/2015)
Mentioned Here: T85744: Excessive cpu/load parsing ur.wikipedia.org/wiki/نام_مقامات_اے
T114558: Investigate Oct 3 outage of the Parsoid cluster due to high cpu usage + high memory usage (sharp spike in both) around 08:35 UTC

Event Timeline

• bzimport raised the priority of this task from to High.Nov 22 2014, 3:57 AM

• bzimport added a project: Parsoid.

• bzimport set Reference to bz73412.

ssastry created this task.Nov 14 2014, 5:11 AM

Request timeouts at:
https://logstash.wikimedia.org/#/dashboard/elasticsearch/parsoid-req-timeouts

We can make a cpu page when this goes out,
https://github.com/wikimedia/parsoid/commit/29a17b13fd50d15754008296d9b72be56ba53751

Ah, so, we have a handful of pages that are timing out.

ssastry moved this task from Needs Triage to Performance on the Parsoid board.Dec 20 2014, 12:53 AM

https://logstash.wikimedia.org/#/dashboard/elasticsearch/parsoid-cpu-timeouts

enwiki:MediaWiki:Coll-attribution-page is some sort of special interface page. The source of the timeouts is probably the OCG bundler's calls at, https://github.com/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-bundler/blob/master/lib/attribution.js#L233

• marcoil added a project: Performance Issue.Feb 10 2015, 3:10 PM

• marcoil set Security to None.

• marcoil moved this task from Performance to Needs Triage on the Parsoid board.Feb 13 2015, 12:51 PM

Arlolra moved this task from Needs Triage to In Progress on the Parsoid board.Apr 2 2015, 5:13 PM

ssastry mentioned this in T92643: Parsoid Roadmap April - June 2015 (Q4 2014/2015).Apr 13 2015, 5:33 PM

ssastry mentioned this in T114558: Investigate Oct 3 outage of the Parsoid cluster due to high cpu usage + high memory usage (sharp spike in both) around 08:35 UTC.Oct 5 2015, 10:32 PM

I think is is a different incarnation of T85744 which caused https://wikitech.wikimedia.org/wiki/Incident_documentation/20150103-Parsoid.

Looking at the code here ( https://github.com/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-bundler/blob/45e8a2fdf96fac53ab5b5e6e9b03460173edf25a/lib/attribution.js#L224-L229 ), it is effectively generating this wikitext:

{{int:Coll-attribution-page|<ul><li>item-1</li>...<li>item-n</li></ul>|..another list..|..another list..}}

If you parse this wikitext, the tokenizer holds onto the memory of the entire list (for all 3 lists) since each of them is embedded as arguments of the transclusion. This was a very similar scenario that led to T85744 and that Jan 03 2015 outage which I fixed by making the tokenizer process lists one item at a time.

A simplified version of this can be simulated with {{echo|x\n*a\n*b\n*c}}.

[subbu@earth lib] echo '{{echo|x\n*a\n*b\n*c\n*d}}' | node parse --trace peg
0-[peg]        | ---->   [{"type":"SelfclosingTagTk","name":"template","attribs":[{"k":"echo","v":"","srcOffsets":[2,6]},{"k":"","v":["x",{"type":"NlTk","dataAttribs":{"tsr":[8,9]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[9,10]},"bullets":["*"]},"a",{"type":"NlTk","dataAttribs":{"tsr":[11,12]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[12,13]},"bullets":["*"]},"b",{"type":"NlTk","dataAttribs":{"tsr":[14,15]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[15,16]},"bullets":["*"]},"c",{"type":"NlTk","dataAttribs":{"tsr":[17,18]}},"*d"],"srcOffsets":[7,7,7,20]}],"dataAttribs":{"tsr":[0,22],"src":"{{echo|x\n*a\n*b\n*c\n*d}}"}}]
...
...
[subbu@earth lib] echo 'x\n*a\n*b\n*c\n*d' | node parse --trace peg
0-[peg]        | ---->   ["x"]
0-[peg]        | ---->   [{"type":"NlTk","dataAttribs":{"tsr":[1,2]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[2,3]},"bullets":["*"]},"a"]
0-[peg]        | ---->   [{"type":"NlTk","dataAttribs":{"tsr":[4,5]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[5,6]},"bullets":["*"]},"b"]
0-[peg]        | ---->   [{"type":"NlTk","dataAttribs":{"tsr":[7,8]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[8,9]},"bullets":["*"]},"c"]
0-[peg]        | ---->   [{"type":"NlTk","dataAttribs":{"tsr":[10,11]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[11,12]},"bullets":["*"]},"d"]
0-[peg]        | ---->   [{"type":"NlTk","dataAttribs":{"tsr":[13,14]}}]
0-[peg]        | ---->   [{"type":"EOFTk"}]
...
...

Notice how in the first case, the first chunk that is returned by the tokenizer includes the entire list embedded in the arg of the template. So, this is back to the scenario that was fixed as part of T85744.

This is the crux of the problem that I believe is responsible for T114558 as well.

I think this humongous-template-arguments scneario is the reason why large tables are composed with multiple templates ({{table-start}}\n.. table rows ..\n{{table-end}}. I think OCG should find a different way of generating attribution text instead of embedded entire lists in the {{int:Coll-attribution-page|..}} transclusion.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 8 2015, 4:30 AM

ssastry assigned this task to cscott.Oct 8 2015, 4:46 AM

The attribs in the first example don't look right.

"attribs":[{"k":"echo","v":"","srcOffsets":[2,6]},{"k":"","v":["x",{"type":"NlTk","dataAttribs":{"tsr":[8,9]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[9,10]},"bullets":["*"]},"a",{"type":"NlTk","dataAttribs":{"tsr":[11,12]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[12,13]},"bullets":["*"]},"b",{"type":"NlTk","dataAttribs":{"tsr":[14,15]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[15,16]},"bullets":["*"]},"c",{"type":"NlTk","dataAttribs":{"tsr":[17,18]}},"*d"]

The last item, *d is wrong. I've written: https://gerrit.wikimedia.org/r/244413

In T75412#1711585, @ssastry wrote:

I think this humongous-template-arguments scneario is the reason why large tables are composed with multiple templates ({{table-start}}\n.. table rows ..\n{{table-end}}.

I'd like to hear @tstarling's thoughts here, too. I seem to recall he mentioned something about huge template arguments as an issue with the hygenic-argument proposal, and then perhaps muttered something about how it could be resolved (from the PHP side). I don't remember the details well enough to know whether PHP and Parsoid have similar issues (or whether Parsoid might even be triggering an issue on the PHP side when it passes along these large arguments).

ssastry renamed this task from Title timing out to OCG Attribution request times out reguarly.Oct 16 2015, 2:46 PM

ssastry edited projects, added OfflineContentGenerator; removed Parsoid.

ssastry mentioned this in T120972: Introduce various limits during parsing to deal with pathological page scenarios.Dec 13 2015, 4:18 PM

ssastry mentioned this in T121854: Parsoid emits "wt2html: Exceeded max resource use: wikitextSize. Aborting!" at emergency level.Dec 18 2015, 3:48 PM

Jdforrester-WMF edited projects, added OCG-General; removed OfflineContentGenerator.Apr 8 2016, 2:47 PM

Arlolra mentioned this in T153348: Database downtime around the time period when Parsoid was deployed.Dec 15 2016, 7:33 PM

As already announced in Tech News, OfflineContentGenerator (OCG) will not be used anymore after October 1st, 2017 on Wikimedia sites. OCG will be replaced by Electron. You can read more on mediawiki.org.

Since OCG has been turned off.

OCG Attribution request times out reguarlyClosed, DeclinedPublicActions

Description

Details

Related Objects

Event Timeline

OCG Attribution request times out reguarly
Closed, DeclinedPublic
Actions