Page MenuHomePhabricator

Extension and transclusion content is no longer being reused from cache for parse jobs from the job queue
Open, MediumPublic

Description

While working to reduce dom size to improve VE load performance, we cleared out a lot of information that used to be lingering around in the HTML unnecessarily. One thing we cleared out is the "data-parsoid.src" attribute for extensions and transclusions since it is no longer used during serialization and it was considered useless baggage.

However, it turns out that template/extension reuse cache is keyed on the "data-parsoid.src" attribute => since the time we deployed code to reduce DOM size, we have not been reusing these expansions from the cache.

See https://github.com/wikimedia/parsoid/blob/10392facfc68bd821273892507343af7f42c4844/lib/mediawiki.DOMUtils.js#L1569-L1591

We need to fix that code to build the cache key from other available information (or if not possible, revive dp.src, if this is considered critical for performance).

Event Timeline

ssastry raised the priority of this task from to Medium.
ssastry updated the task description. (Show Details)
ssastry added a project: Parsoid.
ssastry added subscribers: ssastry, GWicke.

We could just concatenate the target and the arguments in wikitext, or even just serialize it back to wikitext at that point (which would take more time).

One reasonably space-efficient option could be to store the sha1 of the original source. All this is in data-parsoid anyway, so size isn't so critical.

On IRC, @ssastry voiced concerns about the interaction of reuse with older, stored content. I think this can be alleviated with T114413: Support various conversions in Parsoid's pb2pb endpoint by handling upgrades separately. With this in place, the reuse part shouldn't need to worry about out-of-date content.

This is probably a good thing, since RESTBase wasn't communicating the right mode with us: T114413#2290995

Pchelolo subscribed.

Currently we also are supplying original HTML and Data-Parsoid for wikitext/to/html transformations. This caused a bug in RESTBase when the transformation was failing if the original was not present and attempt to fetch it was failing with a 404. Currently, the optimization is disabled in Parsoid, we will stop fetching the original to save some roundtrips to cassandra. When(if) the optimization is reintroduced, we will need to start fetching the original again, but make sure we gracefully handle the case when it's not there.