enwiki:MediaWiki:Coll-attribution-page seems to be the title timing out the most.
Version: unspecified
Severity: normal
enwiki:MediaWiki:Coll-attribution-page seems to be the title timing out the most.
Version: unspecified
Severity: normal
Request timeouts at:
https://logstash.wikimedia.org/#/dashboard/elasticsearch/parsoid-req-timeouts
We can make a cpu page when this goes out,
https://github.com/wikimedia/parsoid/commit/29a17b13fd50d15754008296d9b72be56ba53751
enwiki:MediaWiki:Coll-attribution-page is some sort of special interface page. The source of the timeouts is probably the OCG bundler's calls at, https://github.com/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-bundler/blob/master/lib/attribution.js#L233
I think is is a different incarnation of T85744 which caused https://wikitech.wikimedia.org/wiki/Incident_documentation/20150103-Parsoid.
Looking at the code here ( https://github.com/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-bundler/blob/45e8a2fdf96fac53ab5b5e6e9b03460173edf25a/lib/attribution.js#L224-L229 ), it is effectively generating this wikitext:
{{int:Coll-attribution-page|<ul><li>item-1</li>...<li>item-n</li></ul>|..another list..|..another list..}}
If you parse this wikitext, the tokenizer holds onto the memory of the entire list (for all 3 lists) since each of them is embedded as arguments of the transclusion. This was a very similar scenario that led to T85744 and that Jan 03 2015 outage which I fixed by making the tokenizer process lists one item at a time.
A simplified version of this can be simulated with {{echo|x\n*a\n*b\n*c}}.
[subbu@earth lib] echo '{{echo|x\n*a\n*b\n*c\n*d}}' | node parse --trace peg 0-[peg] | ----> [{"type":"SelfclosingTagTk","name":"template","attribs":[{"k":"echo","v":"","srcOffsets":[2,6]},{"k":"","v":["x",{"type":"NlTk","dataAttribs":{"tsr":[8,9]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[9,10]},"bullets":["*"]},"a",{"type":"NlTk","dataAttribs":{"tsr":[11,12]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[12,13]},"bullets":["*"]},"b",{"type":"NlTk","dataAttribs":{"tsr":[14,15]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[15,16]},"bullets":["*"]},"c",{"type":"NlTk","dataAttribs":{"tsr":[17,18]}},"*d"],"srcOffsets":[7,7,7,20]}],"dataAttribs":{"tsr":[0,22],"src":"{{echo|x\n*a\n*b\n*c\n*d}}"}}] ... ... [subbu@earth lib] echo 'x\n*a\n*b\n*c\n*d' | node parse --trace peg 0-[peg] | ----> ["x"] 0-[peg] | ----> [{"type":"NlTk","dataAttribs":{"tsr":[1,2]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[2,3]},"bullets":["*"]},"a"] 0-[peg] | ----> [{"type":"NlTk","dataAttribs":{"tsr":[4,5]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[5,6]},"bullets":["*"]},"b"] 0-[peg] | ----> [{"type":"NlTk","dataAttribs":{"tsr":[7,8]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[8,9]},"bullets":["*"]},"c"] 0-[peg] | ----> [{"type":"NlTk","dataAttribs":{"tsr":[10,11]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[11,12]},"bullets":["*"]},"d"] 0-[peg] | ----> [{"type":"NlTk","dataAttribs":{"tsr":[13,14]}}] 0-[peg] | ----> [{"type":"EOFTk"}] ... ...
Notice how in the first case, the first chunk that is returned by the tokenizer includes the entire list embedded in the arg of the template. So, this is back to the scenario that was fixed as part of T85744.
This is the crux of the problem that I believe is responsible for T114558 as well.
I think this humongous-template-arguments scneario is the reason why large tables are composed with multiple templates ({{table-start}}\n.. table rows ..\n{{table-end}}. I think OCG should find a different way of generating attribution text instead of embedded entire lists in the {{int:Coll-attribution-page|..}} transclusion.
The attribs in the first example don't look right.
"attribs":[{"k":"echo","v":"","srcOffsets":[2,6]},{"k":"","v":["x",{"type":"NlTk","dataAttribs":{"tsr":[8,9]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[9,10]},"bullets":["*"]},"a",{"type":"NlTk","dataAttribs":{"tsr":[11,12]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[12,13]},"bullets":["*"]},"b",{"type":"NlTk","dataAttribs":{"tsr":[14,15]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[15,16]},"bullets":["*"]},"c",{"type":"NlTk","dataAttribs":{"tsr":[17,18]}},"*d"]
The last item, *d is wrong. I've written: https://gerrit.wikimedia.org/r/244413
I'd like to hear @tstarling's thoughts here, too. I seem to recall he mentioned something about huge template arguments as an issue with the hygenic-argument proposal, and then perhaps muttered something about how it could be resolved (from the PHP side). I don't remember the details well enough to know whether PHP and Parsoid have similar issues (or whether Parsoid might even be triggering an issue on the PHP side when it passes along these large arguments).
As already announced in Tech News, OfflineContentGenerator (OCG) will not be used anymore after October 1st, 2017 on Wikimedia sites. OCG will be replaced by Electron. You can read more on mediawiki.org.