Lightweight parse mode where roundtripping is not required
Open, LowPublic
Actions

Assigned To

None

Authored By

	ssastry
	Mar 8 2018, 10:39 PM

Description

In scenarios where Parsoid's output is not going to be used for roundtripping (ex: previews in the 2017 wikitext editor), Parsoid should be able to skip some of the work it does to ensure roundtrippability. We can skip passes related to data-parsoid computation, cleanup, save, dsr computation, template wrapping at the very least.

Event Timeline

ssastry created this task.Mar 8 2018, 10:39 PM

Restricted Application added a project: VisualEditor. · View Herald TranscriptMar 8 2018, 10:39 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

ssastry triaged this task as Medium priority.Mar 8 2018, 10:39 PM

ssastry moved this task from Needs Triage to Performance on the Parsoid board.Mar 15 2018, 8:48 PM

I've got no problem with skipping passes, that seems like a fairly coarse-grained tool, but it does potentially open up a new way for preview bugs to creep in. It would be interesting to explore the idea enough to determine how much time we are talking about here, though.

Ya, I have / had the same concerns after I posted this ... testability is a concern since all our test infra. is set up for the full HTML. But, I am just deprioritizing this and letting it sit here if we ever have smarter ideas here.

Dvorapa subscribed.Mar 20 2018, 8:05 PM

Just for kicks, I turned off the DSR computation, template wrapping, and a couple other DOM passes and on a sample page, perf improved by about 5%. So, more substantial perf changes would require more radical code surgery.

@ssastry I wonder if removing rt-related fields from the token objects would also help, for not too much more surgery? I suspect memory allocation is a surprisingly large fraction of our runtime costs, and I'm guessing that turning off the computations you mention didn't actually remove the related fields from the tokens? But maybe your costs already included slimming down the token objects....

In T189261#4591077, @cscott wrote:

@ssastry I wonder if removing rt-related fields from the token objects would also help, for not too much more surgery? I suspect memory allocation is a surprisingly large fraction of our runtime costs, and I'm guessing that turning off the computations you mention didn't actually remove the related fields from the tokens? But maybe your costs already included slimming down the token objects....

No, it doesn't remove them .. but that is what I meant by more surgery .. since the tsr code is all over the tokenizer .. and some token transformers and utils.

OK. We should probably do a memory audit at some point; we'll probably be forced to do it by the PHP port, since we'll have to pre-declare all our token fields instead of just adding them on-the-fly like JavaScript allows. Paying attention to removing unused fields should help with memory usage, and reducing memory usage should help with performance ... my intuition is that this is another 10-20% though, not like 2x speedup or something.

Lightweight parse mode where roundtripping is not requiredOpen, LowPublicActions

Description

Event Timeline

Lightweight parse mode where roundtripping is not required
Open, LowPublic
Actions