Page MenuHomePhabricator

OCG Attribution request times out reguarly
Closed, DeclinedPublic

Description

enwiki:MediaWiki:Coll-attribution-page seems to be the title timing out the most.


Version: unspecified
Severity: normal

Details

Reference
bz73412

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:57 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz73412.

Ah, so, we have a handful of pages that are timing out.

enwiki:MediaWiki:Coll-attribution-page is some sort of special interface page. The source of the timeouts is probably the OCG bundler's calls at, https://github.com/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-bundler/blob/master/lib/attribution.js#L233

I think is is a different incarnation of T85744 which caused https://wikitech.wikimedia.org/wiki/Incident_documentation/20150103-Parsoid.

Looking at the code here ( https://github.com/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-bundler/blob/45e8a2fdf96fac53ab5b5e6e9b03460173edf25a/lib/attribution.js#L224-L229 ), it is effectively generating this wikitext:

{{int:Coll-attribution-page|<ul><li>item-1</li>...<li>item-n</li></ul>|..another list..|..another list..}}

If you parse this wikitext, the tokenizer holds onto the memory of the entire list (for all 3 lists) since each of them is embedded as arguments of the transclusion. This was a very similar scenario that led to T85744 and that Jan 03 2015 outage which I fixed by making the tokenizer process lists one item at a time.

A simplified version of this can be simulated with {{echo|x\n*a\n*b\n*c}}.

[subbu@earth lib] echo '{{echo|x\n*a\n*b\n*c\n*d}}' | node parse --trace peg
0-[peg]        | ---->   [{"type":"SelfclosingTagTk","name":"template","attribs":[{"k":"echo","v":"","srcOffsets":[2,6]},{"k":"","v":["x",{"type":"NlTk","dataAttribs":{"tsr":[8,9]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[9,10]},"bullets":["*"]},"a",{"type":"NlTk","dataAttribs":{"tsr":[11,12]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[12,13]},"bullets":["*"]},"b",{"type":"NlTk","dataAttribs":{"tsr":[14,15]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[15,16]},"bullets":["*"]},"c",{"type":"NlTk","dataAttribs":{"tsr":[17,18]}},"*d"],"srcOffsets":[7,7,7,20]}],"dataAttribs":{"tsr":[0,22],"src":"{{echo|x\n*a\n*b\n*c\n*d}}"}}]
...
...
[subbu@earth lib] echo 'x\n*a\n*b\n*c\n*d' | node parse --trace peg
0-[peg]        | ---->   ["x"]
0-[peg]        | ---->   [{"type":"NlTk","dataAttribs":{"tsr":[1,2]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[2,3]},"bullets":["*"]},"a"]
0-[peg]        | ---->   [{"type":"NlTk","dataAttribs":{"tsr":[4,5]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[5,6]},"bullets":["*"]},"b"]
0-[peg]        | ---->   [{"type":"NlTk","dataAttribs":{"tsr":[7,8]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[8,9]},"bullets":["*"]},"c"]
0-[peg]        | ---->   [{"type":"NlTk","dataAttribs":{"tsr":[10,11]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[11,12]},"bullets":["*"]},"d"]
0-[peg]        | ---->   [{"type":"NlTk","dataAttribs":{"tsr":[13,14]}}]
0-[peg]        | ---->   [{"type":"EOFTk"}]
...
...

Notice how in the first case, the first chunk that is returned by the tokenizer includes the entire list embedded in the arg of the template. So, this is back to the scenario that was fixed as part of T85744.

This is the crux of the problem that I believe is responsible for T114558 as well.

I think this humongous-template-arguments scneario is the reason why large tables are composed with multiple templates ({{table-start}}\n.. table rows ..\n{{table-end}}. I think OCG should find a different way of generating attribution text instead of embedded entire lists in the {{int:Coll-attribution-page|..}} transclusion.

The attribs in the first example don't look right.

"attribs":[{"k":"echo","v":"","srcOffsets":[2,6]},{"k":"","v":["x",{"type":"NlTk","dataAttribs":{"tsr":[8,9]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[9,10]},"bullets":["*"]},"a",{"type":"NlTk","dataAttribs":{"tsr":[11,12]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[12,13]},"bullets":["*"]},"b",{"type":"NlTk","dataAttribs":{"tsr":[14,15]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[15,16]},"bullets":["*"]},"c",{"type":"NlTk","dataAttribs":{"tsr":[17,18]}},"*d"]

The last item, *d is wrong. I've written: https://gerrit.wikimedia.org/r/244413

I think this humongous-template-arguments scneario is the reason why large tables are composed with multiple templates ({{table-start}}\n.. table rows ..\n{{table-end}}.

I'd like to hear @tstarling's thoughts here, too. I seem to recall he mentioned something about huge template arguments as an issue with the hygenic-argument proposal, and then perhaps muttered something about how it could be resolved (from the PHP side). I don't remember the details well enough to know whether PHP and Parsoid have similar issues (or whether Parsoid might even be triggering an issue on the PHP side when it passes along these large arguments).

ssastry renamed this task from Title timing out to OCG Attribution request times out reguarly.Oct 16 2015, 2:46 PM
ssastry edited projects, added OfflineContentGenerator; removed Parsoid.

As already announced in Tech News, OfflineContentGenerator (OCG) will not be used anymore after October 1st, 2017 on Wikimedia sites. OCG will be replaced by Electron. You can read more on mediawiki.org.

Since OCG has been turned off.