Page MenuHomePhabricator

Use one ownerDocument for the entire parse
Closed, ResolvedPublic

Description

Currently, each pipeline creates its own document and, at various stages, the results of those parses need to be adopted by the main document.

We expect some performance gains by eliminating that work.

See https://gerrit.wikimedia.org/r/#/c/385312/

Also, there're a few instances where we have a dummyDoc to avoid string concatenation which should make use of it.

Event Timeline

Arlolra created this task.Oct 26 2017, 4:28 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 26 2017, 4:28 PM
Arlolra triaged this task as Medium priority.Oct 26 2017, 4:32 PM
cscott added a subscriber: cscott.Oct 26 2017, 4:37 PM

I think generally the pattern should be:

var df = env.ownerDocument.createDocumentFragment();
var tempBody = env.ownerDocument.createElement('body');
df.append(tempBody);
tempBody.innerHTML = "some string to parse";

as opposed to:

var newDoc = domino.createDOMImplementation().createHTMLDocument();
newDoc.documentElement.innerHTML = "some string to parse";

This uses Document.createDocumentFragment for storage of a DOM tree which isn't directly linked into the document itself.

Arlolra claimed this task.Jul 17 2020, 6:39 PM

Change 617282 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] [WIP] One document to rule them all

https://gerrit.wikimedia.org/r/617282

Change 622425 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Remove special case for the html extension

https://gerrit.wikimedia.org/r/622425

Change 622425 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Remove special case for the html extension when unpacking

https://gerrit.wikimedia.org/r/622425

Change 625641 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.13.0-a8

https://gerrit.wikimedia.org/r/625641

Change 625641 merged by jenkins-bot:
[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.13.0-a8

https://gerrit.wikimedia.org/r/625641

Change 617282 merged by jenkins-bot:
[mediawiki/services/parsoid@master] One document to rule them all

https://gerrit.wikimedia.org/r/617282

Change 635100 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.13.0-a12

https://gerrit.wikimedia.org/r/635100

Change 635100 merged by jenkins-bot:
[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.13.0-a12

https://gerrit.wikimedia.org/r/635100

Change 662672 had a related patch set uploaded (by Paladox; owner: Arlolra):
[mediawiki/services/parsoid@REL1_35] Remove special case for the html extension when unpacking

https://gerrit.wikimedia.org/r/662672