Page MenuHomePhabricator

Redlinks DOM pass adds mw api + network latency to parse time
Closed, DeclinedPublic

Description

Currently, adding redlinks during the page parse happens in serial in a final dom post-processing pass, since it reuses the facilities added to update redlinks in the html2html endpoint.

https://github.com/wikimedia/parsoid/blob/master/lib/wt2html/DOMPostProcessor.js#L323-L329
https://github.com/wikimedia/parsoid/blob/master/lib/api/routes.js#L604

We could probably fire off pageprop requests earlier and cache the results while doing the parse,
https://github.com/wikimedia/mediawiki-extensions-ParsoidBatchAPI/commit/57fdabb2007437bef4e3f8b03e4593372d7d9974#diff-b841ebdae64b2bed8c4e668239c8ee3e

A similar mechanism could be useful for T153080 so that all the imageinfo is present when doing that dom pass.

Event Timeline

ssastry renamed this task from Using batch API for redlinks adds full network i/o latency to parse time to Using batch API for redlinks DOM pass adds its full mw api + network latency to parse time.Oct 17 2018, 2:34 PM
ssastry triaged this task as Medium priority.
ssastry renamed this task from Using batch API for redlinks DOM pass adds its full mw api + network latency to parse time to Using batch API for redlinks DOM pass adds mw api + network latency to parse time.Oct 17 2018, 2:45 PM

Adding redlinks is only enabled when using batching so you're comparing against a no-op,
https://github.com/wikimedia/parsoid/blob/master/lib/wt2html/DOMPostProcessor.js#L323-L329

It currently reuses the facilities added for updating redlinks on a page with the html2html endpoint,
https://github.com/wikimedia/parsoid/blob/master/lib/api/routes.js#L604
where there is nowhere to hide the latency.

We could probably fire off pageprop request earlier and cache the results when doing a parse though,
https://github.com/wikimedia/mediawiki-extensions-ParsoidBatchAPI/commit/57fdabb2007437bef4e3f8b03e4593372d7d9974#diff-b841ebdae64b2bed8c4e668239c8ee3e

A similar mechanism will be needed for T153080 so that all the imageinfo is present when doing the dom pass.

Arlolra renamed this task from Using batch API for redlinks DOM pass adds mw api + network latency to parse time to Redlinks DOM pass adds mw api + network latency to parse time.Oct 17 2018, 7:35 PM
Arlolra updated the task description. (Show Details)
Arlolra lowered the priority of this task from Medium to Low.Oct 18 2018, 5:50 PM
12:02 <+subbu> arlolra, reg. the redlinks and media passes and the serial processing for those 
               ... that probably doesn't matter for the PHP port, does it? unless there are 
               ways of structuring those requests that benefits from mysql or other caching 
               across page parse requests.
12:05 <+arlolra> yeah, I wasn't planning on doing it ... but the imageinfo requests would be a 
                 short term regression
12:08 <+subbu> have you looked at how the php parser does the media and redlink processing .. 
               i.e. are they also post-processing passes or are they looked up one at a time .. 
               iirc, tim had mentioned that it does batching too.
12:09 <+arlolra> i have not
12:09 <+arlolra> but i'll check it out
12:09 <+subbu> k. asking in case the restructuring should be done in a particular way to 
               benefit form whatever existing code there is.
12:09 <+subbu> *from
Arlolra subscribed.

Parsoid/JS is not in use any more.