OCG Attribution request times out reguarly
Open, HighPublic


enwiki:MediaWiki:Coll-attribution-page seems to be the title timing out the most.

Version: unspecified
Severity: normal


bzimport raised the priority of this task from to High.
bzimport set Reference to bz73412.
ssastry created this task.Nov 14 2014, 5:11 AM

Ah, so, we have a handful of pages that are timing out.

ssastry moved this task from Backlog to Performance on the Parsoid board.Dec 20 2014, 12:53 AM

enwiki:MediaWiki:Coll-attribution-page is some sort of special interface page. The source of the timeouts is probably the OCG bundler's calls at, https://github.com/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-bundler/blob/master/lib/attribution.js#L233

marcoil set Security to None.
marcoil moved this task from Performance to Backlog on the Parsoid board.Feb 13 2015, 12:51 PM
Arlolra moved this task from Backlog to In Progress on the Parsoid board.Apr 2 2015, 5:13 PM
ssastry added a comment.EditedOct 8 2015, 4:30 AM

I think is is a different incarnation of T85744 which caused https://wikitech.wikimedia.org/wiki/Incident_documentation/20150103-Parsoid.

Looking at the code here ( https://github.com/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-bundler/blob/45e8a2fdf96fac53ab5b5e6e9b03460173edf25a/lib/attribution.js#L224-L229 ), it is effectively generating this wikitext:

{{int:Coll-attribution-page|<ul><li>item-1</li>...<li>item-n</li></ul>|..another list..|..another list..}}

If you parse this wikitext, the tokenizer holds onto the memory of the entire list (for all 3 lists) since each of them is embedded as arguments of the transclusion. This was a very similar scenario that led to T85744 and that Jan 03 2015 outage which I fixed by making the tokenizer process lists one item at a time.

A simplified version of this can be simulated with {{echo|x\n*a\n*b\n*c}}.

[subbu@earth lib] echo '{{echo|x\n*a\n*b\n*c\n*d}}' | node parse --trace peg
0-[peg]        | ---->   [{"type":"SelfclosingTagTk","name":"template","attribs":[{"k":"echo","v":"","srcOffsets":[2,6]},{"k":"","v":["x",{"type":"NlTk","dataAttribs":{"tsr":[8,9]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[9,10]},"bullets":["*"]},"a",{"type":"NlTk","dataAttribs":{"tsr":[11,12]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[12,13]},"bullets":["*"]},"b",{"type":"NlTk","dataAttribs":{"tsr":[14,15]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[15,16]},"bullets":["*"]},"c",{"type":"NlTk","dataAttribs":{"tsr":[17,18]}},"*d"],"srcOffsets":[7,7,7,20]}],"dataAttribs":{"tsr":[0,22],"src":"{{echo|x\n*a\n*b\n*c\n*d}}"}}]
[subbu@earth lib] echo 'x\n*a\n*b\n*c\n*d' | node parse --trace peg
0-[peg]        | ---->   ["x"]
0-[peg]        | ---->   [{"type":"NlTk","dataAttribs":{"tsr":[1,2]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[2,3]},"bullets":["*"]},"a"]
0-[peg]        | ---->   [{"type":"NlTk","dataAttribs":{"tsr":[4,5]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[5,6]},"bullets":["*"]},"b"]
0-[peg]        | ---->   [{"type":"NlTk","dataAttribs":{"tsr":[7,8]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[8,9]},"bullets":["*"]},"c"]
0-[peg]        | ---->   [{"type":"NlTk","dataAttribs":{"tsr":[10,11]}},{"type":"TagTk","name":"listItem","attribs":[],"dataAttribs":{"tsr":[11,12]},"bullets":["*"]},"d"]
0-[peg]        | ---->   [{"type":"NlTk","dataAttribs":{"tsr":[13,14]}}]
0-[peg]        | ---->   [{"type":"EOFTk"}]

Notice how in the first case, the first chunk that is returned by the tokenizer includes the entire list embedded in the arg of the template. So, this is back to the scenario that was fixed as part of T85744.

This is the crux of the problem that I believe is responsible for T114558 as well.

I think this humongous-template-arguments scneario is the reason why large tables are composed with multiple templates ({{table-start}}\n.. table rows ..\n{{table-end}}. I think OCG should find a different way of generating attribution text instead of embedded entire lists in the {{int:Coll-attribution-page|..}} transclusion.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 8 2015, 4:30 AM
ssastry assigned this task to cscott.Oct 8 2015, 4:46 AM

The attribs in the first example don't look right.


The last item, *d is wrong. I've written: https://gerrit.wikimedia.org/r/244413

I think this humongous-template-arguments scneario is the reason why large tables are composed with multiple templates ({{table-start}}\n.. table rows ..\n{{table-end}}.

I'd like to hear @tstarling's thoughts here, too. I seem to recall he mentioned something about huge template arguments as an issue with the hygenic-argument proposal, and then perhaps muttered something about how it could be resolved (from the PHP side). I don't remember the details well enough to know whether PHP and Parsoid have similar issues (or whether Parsoid might even be triggering an issue on the PHP side when it passes along these large arguments).

ssastry renamed this task from Title timing out to OCG Attribution request times out reguarly.Oct 16 2015, 2:46 PM
ssastry edited projects, added OfflineContentGenerator; removed Parsoid.

As already announced in Tech News, OfflineContentGenerator (OCG) will not be used anymore after October 1st, 2017 on Wikimedia sites. OCG will be replaced by Electron. You can read more on mediawiki.org.