Page MenuHomePhabricator

Similar pages on it.wv and it.wp or en:wv do not have the same lint errors
Closed, ResolvedPublicBUG REPORT

Description

it:voy Vs it:w
In https://it.wikivoyage.org/wiki/Speciale:LintErrors/missing-end-tag?namespace=828 I can see https://it.wikivoyage.org/wiki/Modulo:Wikidata but notwithstanding this page is almost identical to https://it.wikipedia.org/wiki/Modulo:Wikidata, this one is not listed inside https://it.wikipedia.org/wiki/Speciale:LintErrors/missing-end-tag?namespace=828

it:voy Vs en:voy
In https://it.wikivoyage.org/wiki/Speciale:LintErrors/stripped-tag?namespace=828 I can see, the previous module not listed in it:w, but also https://it.wikivoyage.org/wiki/Modulo:LinkPhone is shown there, notwithstanding the very similar page https://en.wikivoyage.org/wiki/Module:LinkPhone is not shown in https://en.wikivoyage.org/wiki/Special:LintErrors/stripped-tag?namespace=828

In this second case I've made several minor changes on the code to facilitate the lint error check, but the result is still the same.

What should have happened instead?:

  1. modules listed above should be shown in both or none of the special pages

Event Timeline

Izno renamed this task from Anomalies on Italian Wikivoyage Special:LintErrors page to Similar pages on it.wv and it.wp do not have the same lint errors.Dec 28 2021, 12:40 AM
Izno updated the task description. (Show Details)
Izno subscribed.

The issue formerly also described here is probably T194872: Linter : have correct counters for categories populated with only a few errors (or none) or one of the other counting errors already tracked in the Linter project tag. I've reduced to just the other item described.

Andyrom75 renamed this task from Similar pages on it.wv and it.wp do not have the same lint errors to Similar pages on it.wv and it.wp or en:wv do not have the same lint errors.Dec 28 2021, 10:03 AM
Andyrom75 updated the task description. (Show Details)

The problem is here,
https://github.com/wikimedia/parsoid/blob/master/src/Logger/LintLogger.php#L106-L109

These pages have the "scribunto" contentmodel and so the lints will never be updated for those pages again.

We need to run a script that clears out all the lints for pages without the 'wikitext' or 'proofread-page' contentmodels.

Mentioned in SAL (#wikimedia-operations) [2022-01-28T15:14:22Z] <Amir1> start of cleaning lint errors caused by content model changes (T298343)

I started the script and I think it'll be done in a couple of hours in all wikis, number of cases caused by it is really small. The only thing is that do we have a way to make sure it doesn't happen anymore? if not, then we maybe need to a have a regular cron to run this clean up.

The only thing is that do we have a way to make sure it doesn't happen anymore?

The history is that, at one point, Parsoid was linting pages of all content models, regardless of whether that made sense or not. We then added the check in T298343#7618503 at a later date, which stopped linting stuff that wasn't wikitext. However, these pages with ostensible linting issues were left behind, never to be updated again.

This issue could arise again if we were ever to change the list of lintable content models, but I would assume that list would only ever grow and this won't likely be a problem.

Thanks. That helped me understand a lot. Just a quick question before we can close this. How do you handle content model changes inside a page? An admin can easily change a content model manually (https://test.wikipedia.org/wiki/Special:ChangeContentModel) would something like that can lead to orphan rows forever (e.g. a wikitext content model existing with lint errors, admin changes it to another model, the old data never gets cleaned)?

If that's handled somehow, I think it's safe to close this ticket.

If that's handled somehow, I think it's safe to close this ticket.

It isn't handled and, yes, would lead to this same situation. If there's hook that would notify of a content model change for a page, we could handle it in the linter extension.

There is a hook being ran in ContentModelChange class in core. It's onEditFilterMergedContent which is also ran in broader scope. You possibly could do some checks and clean up

A content model change creates a new revision, which would request a new parse / lint from Parsoid. We could have Parsoid always send an empty lint result if the content model is unlintable, which would clear out the old lints at the cost of some redundancy if the content model hasn't changed.

Or, we could try to augment the linter extension to run on that hook.

Whatever you/the team prefers. That's outside of my area of expertise.

A content model change creates a new revision, which would request a new parse / lint from Parsoid. We could have Parsoid always send an empty lint result if the content model is unlintable, which would clear out the old lints at the cost of some redundancy if the content model hasn't changed.

Or, we could try to augment the linter extension to run on that hook.

Hook handling in the linter extension is better. It fits the model better I think.

Change 761070 had a related patch set uploaded (by Sbailey; author: Sbailey):

[mediawiki/extensions/Linter@master] WIP Delete lint error records when content model changes from WT

https://gerrit.wikimedia.org/r/761070

Change 761070 merged by jenkins-bot:

[mediawiki/extensions/Linter@master] Delete lint error records when content model changes from wikitext

https://gerrit.wikimedia.org/r/761070

Arlo added 'proofread-page' as an additional lintable content model in:
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Linter/+/766195