Page MenuHomePhabricator

Purge all old html version code from Parsoid repo once we switch VE from RESTBase to ParserCache
Closed, ResolvedPublic

Description

RESTBase is a store, not a cache. As such, there was on reliable way of knowing when old HTML versions would clear out from it and we also didn't have a mechanism to purge old versions. As such, Parsoid's codebase has a number of FIXMEs / TODOs to get rid of some code when the HTML for a version turns over in RESTBase.

But, we are now moving away from RESTBase to ParserCache where all entries turn over after X days (currently X = 21). So, once we fully switch over all wkies to ParserCache, after that X-day period, we should be able to get rid of all the b/c code in Parsoid that exists to deal with older HTML versions.

Event Timeline

daniel triaged this task as Low priority.Jun 5 2023, 6:17 PM
daniel moved this task from Unsorted to Parsoid pile on the RESTBase Sunsetting board.

Change 930649 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/services/parsoid@master] WIP: Purge b/c compatibility code for old HTML versions

https://gerrit.wikimedia.org/r/930649

Change 930649 abandoned by Subramanya Sastry:

[mediawiki/services/parsoid@master] WIP: Purge b/c compatibility code for old HTML versions

Reason:

Arlo pointed out that this is also blocked on https://phabricator.wikimedia.org/T174372 .. our dear friend Flow!

https://gerrit.wikimedia.org/r/930649

Change #1102282 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] [WIP] Remove a bunch of backwards compatibility

https://gerrit.wikimedia.org/r/1102282

In T209114, Flow started storing Parsoid's content version number, which was 2.0.0 at the time. The code that's gated on content versions before that seem like they would be no-ops now. That's mainly,

  • Whitespace heuristics, from before 1.7.0, which might mean superfluous whitespace when serializing wikitext characters

Otherwise, in T335843#10396771, we're dropping,

  • Old LST serialization, from before 1.3.0, which doesn't seem like it would be used on Flow boards
  • stx_v, from before 1.5.0, which at worst would mean serializing table rows to a newline
  • mw:DisplaySpace no longer sets mw:Placeholder as well. This was to avoid selser diffs during a two phase deployment. Selser shouldn't be relevant for Flow, regardless
  • A briefly deployed AddRedLinks bug that is unlikely have affected many posts
  • Support for the li hack, which at worst would drop the liHackSrc when serializing
  • Old rendering of mw:PageProp/defaultsort and mw:PageProp/displaytitle, which don't seem like they'd be used on Flow boards

The necessary substitutions when updating the HTML before serializing would be,

  • s/mw:(Image|Video|Audio)/mw:File/, the type may also include a format /(Frameless|Frame|Thumb)
  • s/figure-inline/span/ will be needed

Change #1102282 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Remove a bunch of backwards compatibility

https://gerrit.wikimedia.org/r/1102282

Change #1116870 had a related patch set uploaded (by Jgiannelos; author: Jgiannelos):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.21.0-a14

https://gerrit.wikimedia.org/r/1116870

Change #1116870 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.21.0-a14

https://gerrit.wikimedia.org/r/1116870

Change #1123696 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] Remove html pre b/c

https://gerrit.wikimedia.org/r/1123696

Change #1123696 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Remove html pre b/c

https://gerrit.wikimedia.org/r/1123696

Change #1124186 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/vendor@master] Bump wikimedia/parsoid to V0.21.0-a19

https://gerrit.wikimedia.org/r/1124186

Change #1124186 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.21.0-a19

https://gerrit.wikimedia.org/r/1124186

Change #1142029 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] Remove inHTMLPre

https://gerrit.wikimedia.org/r/1142029

Change #1142029 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Remove inHTMLPre

https://gerrit.wikimedia.org/r/1142029

Change #1144664 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.22.0-a2

https://gerrit.wikimedia.org/r/1144664

Change #1144664 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.22.0-a2

https://gerrit.wikimedia.org/r/1144664