Page MenuHomePhabricator

Lightweight parse mode where roundtripping is not required
Open, LowPublic

Description

In scenarios where Parsoid's output is not going to be used for roundtripping (ex: previews in the 2017 wikitext editor), Parsoid should be able to skip some of the work it does to ensure roundtrippability. We can skip passes related to data-parsoid computation, cleanup, save, dsr computation, template wrapping at the very least.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ssastry triaged this task as Medium priority.Mar 8 2018, 10:39 PM

I've got no problem with skipping passes, that seems like a fairly coarse-grained tool, but it does potentially open up a new way for preview bugs to creep in. It would be interesting to explore the idea enough to determine how much time we are talking about here, though.

ssastry lowered the priority of this task from Medium to Low.Mar 20 2018, 3:54 PM

Ya, I have / had the same concerns after I posted this ... testability is a concern since all our test infra. is set up for the full HTML. But, I am just deprioritizing this and letting it sit here if we ever have smarter ideas here.

Just for kicks, I turned off the DSR computation, template wrapping, and a couple other DOM passes and on a sample page, perf improved by about 5%. So, more substantial perf changes would require more radical code surgery.

@ssastry I wonder if removing rt-related fields from the token objects would also help, for not too much more surgery? I suspect memory allocation is a surprisingly large fraction of our runtime costs, and I'm guessing that turning off the computations you mention didn't actually remove the related fields from the tokens? But maybe your costs already included slimming down the token objects....

@ssastry I wonder if removing rt-related fields from the token objects would also help, for not too much more surgery? I suspect memory allocation is a surprisingly large fraction of our runtime costs, and I'm guessing that turning off the computations you mention didn't actually remove the related fields from the tokens? But maybe your costs already included slimming down the token objects....

No, it doesn't remove them .. but that is what I meant by more surgery .. since the tsr code is all over the tokenizer .. and some token transformers and utils.

OK. We should probably do a memory audit at some point; we'll probably be forced to do it by the PHP port, since we'll have to pre-declare all our token fields instead of just adding them on-the-fly like JavaScript allows. Paying attention to removing unused fields should help with memory usage, and reducing memory usage should help with performance ... my intuition is that this is another 10-20% though, not like 2x speedup or something.