Page MenuHomePhabricator

Update lint tables independently of changeprop/restbase
Closed, ResolvedPublic

Description

Currently, lint tables get updated when we use parsoid to parse page content via the parsoid extension endpoints. These get called from restbase when processing a cache pregeneration request coming from changeprop.

Since we want to remove parsoid code from restbase, we need an alternative mechanism.

current proposal

What we want is really similar to what RefreshLinksJob does. So the most straight forward approach is to allow the Linter extension to hook into RefreshLinksJob and perform a parsoid parse to update the linter tables. However, it should skip the parse if there is already up-to-date output in the ParserCache, to avoid duplicating the parses that we still trigger through RESTbase/changeprop, and the ones we will continue to do via the ParsoidCachePrewarmJob. It shoudl also skip the additional parse if the canonical rendering of the page was already done with parsoid.

Rationale: the original proposal would require us to implement batching and recursive processing for ParsoidCachePrewarmJob. We already have that logic in RefreshLinksJob (and HTMLCacheUpdateJob). It seems better to re-use that logic than to re-implement it.

Ida: In the future, linter data should be added to ParserOutput, like any other meta data.

Ida: if doing the parsoid parse synchronously in RefreshLinksJob is too slow, we could schedule a ParsoidReparseJob from there.

original proposal

We could update the lint tables from ParsoidCachePrewarmJob. However, these jobs are not scheduled when pages get invalidated due to template updates, since in that case, we don't want to update the parser cache proactively.

To solve this, we should generalize the job to be a generic "ParsoidUpdateJob", which will parse page content and then optionally update the parser cache, links tables, lint tables, etc.

The job would be scheduled...

  • when a page is edited, with both parser cache and lint table update enabled
  • when a page is invalidate due to a template change, with only link/link table update enabled.
  • when a page without parser cache entry is visited, with both parser cache and lint table update enabled

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Dupe of T359964 or at least a child of it, since this answers some of the questions there.

cscott removed a subtask: Restricted Task.
cscott added a parent task: Restricted Task.

Change #1021879 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] RefreshLinksJob: add hooks for use by the Linter extension

https://gerrit.wikimedia.org/r/1021879

Change #1021882 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/extensions/Linter@master] Trigger parsoid run in RefreshLinksJob

https://gerrit.wikimedia.org/r/1021882

Change #1021879 abandoned by Daniel Kinzler:

[mediawiki/core@master] RefreshLinksJob: add hooks for use by the Linter extension

Reason:

Per discussion on T361013

https://gerrit.wikimedia.org/r/1021879

Change #1035023 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/extensions/Linter@master] Logging: add debug messages in Hooks and RecordLintJob

https://gerrit.wikimedia.org/r/1035023

Change #1035023 merged by jenkins-bot:

[mediawiki/extensions/Linter@master] Logging: add debug messages in Hooks and RecordLintJob

https://gerrit.wikimedia.org/r/1035023

Change #1038688 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[operations/mediawiki-config@master] Set LinterParseOnDerivedDataUpdate to false

https://gerrit.wikimedia.org/r/1038688

Change #1021882 merged by jenkins-bot:

[mediawiki/extensions/Linter@master] Trigger Parsoid run when page metadata is being updated

https://gerrit.wikimedia.org/r/1021882

Change #1038688 merged by jenkins-bot:

[operations/mediawiki-config@master] Set LinterParseOnDerivedDataUpdate to false

https://gerrit.wikimedia.org/r/1038688

Mentioned in SAL (#wikimedia-operations) [2024-06-05T13:30:43Z] <daniel@deploy1002> Started scap: Backport for [[gerrit:1038688|Set LinterParseOnDerivedDataUpdate to false (T361013)]]

Mentioned in SAL (#wikimedia-operations) [2024-06-05T13:34:33Z] <daniel@deploy1002> daniel: Backport for [[gerrit:1038688|Set LinterParseOnDerivedDataUpdate to false (T361013)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-06-05T13:48:33Z] <daniel@deploy1002> Finished scap: Backport for [[gerrit:1038688|Set LinterParseOnDerivedDataUpdate to false (T361013)]] (duration: 17m 50s)

The new mechanism is now implemented. It's nto yet enabled in production (T367417). We can only do that once ChangeProp no longer sends cache update requests (T367418).

Change #1053006 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[operations/mediawiki-config@master] Linter: trigger parsoid parses on template changes

https://gerrit.wikimedia.org/r/1053006

Change #1053006 merged by jenkins-bot:

[operations/mediawiki-config@master] Linter: trigger parsoid parses on template changes

https://gerrit.wikimedia.org/r/1053006

Mentioned in SAL (#wikimedia-operations) [2024-07-11T07:14:21Z] <jgiannelos@deploy1002> Started scap sync-world: Backport for [[gerrit:1053006|Linter: trigger parsoid parses on template changes (T361013)]]

Mentioned in SAL (#wikimedia-operations) [2024-07-11T07:17:02Z] <jgiannelos@deploy1002> daniel, jgiannelos: Backport for [[gerrit:1053006|Linter: trigger parsoid parses on template changes (T361013)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-07-11T07:28:47Z] <jgiannelos@deploy1002> Finished scap: Backport for [[gerrit:1053006|Linter: trigger parsoid parses on template changes (T361013)]] (duration: 14m 25s)