Page MenuHomePhabricator

Prototype selective HTML updates in Parsoid
Open, HighPublic

Description

From WE5.3 Draft,

Draft hypothesis idea: On template edits, if we can implement an algorithm in Parsoid to reuse HTML of a page that depends on the edited template without processing the page from scratch and demonstrate 1.5x or higher processing speedup, we will have a potential incremental parsing solution for efficient page updates on template edits.

NOTE: We are only planning to implement this in the Parsoid library and test it on the command line. The actual integration with the processing pipeline will be followup work and will be more involved. In this prototype, we will start with templates that produce well-balanced DOM fragments.

Event Timeline

ABreault-WMF renamed this task from Prototype incremental parsing in Parsoid to Prototype selective HTML updates in Parsoid.May 3 2024, 2:59 PM

Change #1026985 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] [WIP] Selective HTML Updates

https://gerrit.wikimedia.org/r/1026985

@ssastry reports on Slack,

very early iniital results show 5x-10x speedup on reparsing of pages after template edits (as reproduced on our local wikis).

╰─➤  time MW_INSTALL_PATH=../core php bin/parse.php --integrated --pageName "Hampi"  < /dev/null > /tmp/hampi.html
 MW_INSTALL_PATH=../core php bin/parse.php --integrated --pageName "Hampi" <    3.07s user 0.24s system 71% cpu 4.618 total
╭─subbu@earth ~/work/wmf/parsoid  ‹T363421*› 
╰─➤  time MW_INSTALL_PATH=../core php bin/parse.php --integrated --pageName "Hampi" --selpar --revtextfile /tmp/hampi.wt --revhtmlfile /tmp/hampi.html --editedtemplatetitle "About" < /dev/null > /tmp/hampi.new.html
 MW_INSTALL_PATH=../core php bin/parse.php --integrated --pageName "Hampi"      0.36s user 0.05s system 89% cpu 0.459 total

Rest of this work will happen in Q1. Finishing up, addressing FIXMEs, polishing it up, adding tests will probably take at least be a week's worth full time work.