Page MenuHomePhabricator

Update lint tables independently of changeprop/restbase
Open, Needs TriagePublic

Description

Currently, lint tables get updated when we use parsoid to parse page content via the parsoid extension endpoints. These get called from restbase when processing a cache pregeneration request coming from changeprop.

Since we want to remove parsoid code from restbase, we need an alternative mechanism.

current proposal

What we want is really similar to what RefreshLinksJob does. So the most straight forward approach is to allow the Linter extension to hook into RefreshLinksJob and perform a parsoid parse to update the linter tables. However, it should skip the parse if there is already up-to-date output in the ParserCache, to avoid duplicating the parses that we still trigger through RESTbase/changeprop, and the ones we will continue to do via the ParsoidCachePrewarmJob. It shoudl also skip the additional parse if the canonical rendering of the page was already done with parsoid.

Ida: In the future, linter data should be added to ParserOutput, like any other meta data.

original proposal

We could update the lint tables from ParsoidCachePrewarmJob. However, these jobs are not scheduled when pages get invalidated due to template updates, since in that case, we don't want to update the parser cache proactively.

To solve this, we should generalize the job to be a generic "ParsoidUpdateJob", which will parse page content and then optionally update the parser cache, links tables, lint tables, etc.

The job would be scheduled...

  • when a page is edited, with both parser cache and lint table update enabled
  • when a page is invalidate due to a template change, with only link/link table update enabled.
  • when a page without parser cache entry is visited, with both parser cache and lint table update enabled

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Dupe of T359964 or at least a child of it, since this answers some of the questions there.

cscott removed a subtask: Restricted Task.
cscott added a parent task: Restricted Task.

Change #1021879 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] RefreshLinksJob: add hooks for use by the Linter extension

https://gerrit.wikimedia.org/r/1021879

Change #1021882 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/extensions/Linter@master] Trigger parsoid run in RefreshLinksJob

https://gerrit.wikimedia.org/r/1021882