Page MenuHomePhabricator

Prototype selective HTML updates in Parsoid
Closed, ResolvedPublic

Description

From WE5.3 Draft,

Draft hypothesis idea: On template edits, if we can implement an algorithm in Parsoid to reuse HTML of a page that depends on the edited template without processing the page from scratch and demonstrate 1.5x or higher processing speedup, we will have a potential incremental parsing solution for efficient page updates on template edits.

NOTE: We are only planning to implement this in the Parsoid library and test it on the command line. The actual integration with the processing pipeline will be followup work and will be more involved. In this prototype, we will start with templates that produce well-balanced DOM fragments.

Event Timeline

ABreault-WMF renamed this task from Prototype incremental parsing in Parsoid to Prototype selective HTML updates in Parsoid.May 3 2024, 2:59 PM

Change #1026985 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] [WIP] Selective HTML Updates

https://gerrit.wikimedia.org/r/1026985

@ssastry reports on Slack,

very early iniital results show 5x-10x speedup on reparsing of pages after template edits (as reproduced on our local wikis).

╰─➤  time MW_INSTALL_PATH=../core php bin/parse.php --integrated --pageName "Hampi"  < /dev/null > /tmp/hampi.html
 MW_INSTALL_PATH=../core php bin/parse.php --integrated --pageName "Hampi" <    3.07s user 0.24s system 71% cpu 4.618 total
╭─subbu@earth ~/work/wmf/parsoid  ‹T363421*› 
╰─➤  time MW_INSTALL_PATH=../core php bin/parse.php --integrated --pageName "Hampi" --selpar --revtextfile /tmp/hampi.wt --revhtmlfile /tmp/hampi.html --editedtemplatetitle "About" < /dev/null > /tmp/hampi.new.html
 MW_INSTALL_PATH=../core php bin/parse.php --integrated --pageName "Hampi"      0.36s user 0.05s system 89% cpu 0.459 total

Rest of this work will happen in Q1. Finishing up, addressing FIXMEs, polishing it up, adding tests will probably take at least be a week's worth full time work.

Change #1059404 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Provide previous parse results to parser when rendering

https://gerrit.wikimedia.org/r/1059404

Change #1026985 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Selective HTML Updates

https://gerrit.wikimedia.org/r/1026985

Change #1065296 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Add DataAccess::fetchTemplateTouched() for Parsoid dependency tracking

https://gerrit.wikimedia.org/r/1065296

Change #1065297 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/services/parsoid@master] Record the last modification time (page_touched) for transclusions

https://gerrit.wikimedia.org/r/1065297

Change #1059404 merged by jenkins-bot:

[mediawiki/core@master] Provide previous parse results to parser when rendering

https://gerrit.wikimedia.org/r/1059404

Change #1066683 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.20.0-a18

https://gerrit.wikimedia.org/r/1066683

Change #1066683 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.20.0-a18

https://gerrit.wikimedia.org/r/1066683

Change #1082112 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] Add a test for selective updates

https://gerrit.wikimedia.org/r/1082112

Change #1082112 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Add a test for selective updates

https://gerrit.wikimedia.org/r/1082112

Change #1083283 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] More selective update testing

https://gerrit.wikimedia.org/r/1083283

Change #1083283 merged by jenkins-bot:

[mediawiki/services/parsoid@master] More selective update testing

https://gerrit.wikimedia.org/r/1083283

Change #1083864 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.21.0-a2

https://gerrit.wikimedia.org/r/1083864

Change #1083864 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.21.0-a2

https://gerrit.wikimedia.org/r/1083864

Change #1087149 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.21.0-a3

https://gerrit.wikimedia.org/r/1087149

Change #1087149 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.21.0-a3

https://gerrit.wikimedia.org/r/1087149