Page MenuHomePhabricator

Lint error counts on "Page information" page do not update, even with null edit
Closed, ResolvedPublicBUG REPORT

Description

Steps to Reproduce:
Go to https://en.wikipedia.org/wiki/User:Jonesey95/sandbox21

Use LintHint to view Linter errors, or just view the wikitext with a syntax highlighter enabled. The page has 49 Linter missing end tag errors. This count can be verified by viewing the wikitext in the section below "The following players took part in the tournament:", where dozens of instances of italic markup are unclosed.

Go to Page information. Observe that under the Lint errors section, the page is listed as having just 1 missing end tag error.

This discrepancy is not fixable via a purge or a null edit.

Expected Results:
Page information should show a count of 49 missing end tag errors.

This miscount is preventing the original page (https://en.wikipedia.org/wiki/2014_Austrian_Darts_Open) and many other pages from being listed on the Special:Linterrors pages and on https://en.wikipedia.org/wiki/User:Galobot/report/Articles_by_Lint_Errors which is a report of the top 1,000 mainspace pages by error count.

2014_Austrian_Darts_Open is not the only page with this problem. At this writing, the problem is also affecting 2013 UK Masters, 2014 Dutch Darts Masters, and 2014 European Darts Grand Prix.

Event Timeline

I think it is more likely that the errors relating to using the wikimarkup for italics but not closing it (which are automatically fixed by the parser to output valid html it seems) are not counted by the linter extension but are by linthint than anything to do with things not updating.

I don't think so. For example, https://en.wikipedia.org/wiki/Brian_Willoughby has the same type of error (unclosed italics). LintHint and the Page information both show 21 missing end tag errors. My experience has been that on most pages I have worked on, the number of errors reported on Page information has matched LintHint's error count.

Another one: https://en.wikipedia.org/w/index.php?title=1956_Finnish_Cup&type=revision&diff=943578321&oldid=934271489 had 157 missing end tags before I fixed them. I found the problems using an insource search. The article did not appear on the report linked above, but it should have. Before my edit, the article was edited on 5 January 2020, so its Page information lint error count should have been updated at that time.

Perhaps I am misunderstanding how Lint error counts get updated. Everything else in WP appears to be updated when a page is null-edited (Category membership, What links here, template transclusions, rendering), but maybe Lint error counts are updated via a different process. If that is the case, please fix that process so that it keeps counts up to date. Having an article that has been sitting with 157 Linter errors for multiple years (1956 Finnish Cup had not been edited significantly since 2016) is not helping gnomes clear up these Linter errors.

Here's another one, if it helps:

https://en.wikipedia.org/wiki/Cuisine_of_Pembrokeshire

There are 35 missing end tag errors on this page, but Page information shows only 14.

And another:

https://en.wikipedia.org/wiki/Hampshire_County_Cricket_Club_in_2017

This one has 30 missing end tags and 2 stripped tags, but Page information shows only 13 missing end tags and 1 stripped tag. This problem seems pretty widespread. It would be great if someone could look at it and at least explain what is happening so that we can work around it.

ssastry triaged this task as Medium priority.Mar 3 2020, 5:02 PM
ssastry added a subscriber: ssastry.

Thanks @Jonesey95 for the bug report and examples. We'll get to revisiting Linter bugs and new lints as part of the larger project to make Parsoid the default wikitext engine but till then, we are focused other background work. But, I expect we'll get to this in the next 2-3 quarters, likely sooner than later.

This is still happening on many pages. See https://en.wikipedia.org/wiki/2012_Australian_Open_%E2%80%93_Boys%27_Singles where Page Information says that there are two errors, but LintHint reports five errors, which is the correct number.

We are now three full quarters past the most recent update to this ticket. Has any progress been made toward resolving this bug?

No progress has been made so far, as nobody has found time for this yet and/or volunteered to look into this.

Everything has been slow this year with reduced capacity. But, I did evaluate all the outstanding bug reports and it appears that some of these bugs / feature requests cannot be easily fixed without some rethinking of the db schema. In the first design, we did not anticipate the volume of lint errors that we ended up seeing once we fully deployed it with all the linter categories and so the database query performance is not good enough to do precise counts and updates. And I know there have been requests to use tracking categories. So, we are trying to evaluate that request as well and how to integrate that functionality. So, these redesign issues are another big reason why we haven't been able to easily fix some of these bugs.

This problem seemed to have gone away somehow, for many months or longer, but it is back today, if anyone can correlate some sort of system lag with the occurrence of this bug.

See https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:Linter&oldid=1126343029#Lint_problems_not_updating,_delay_in_pages?

If you just edit a page with the wikitext editor, and then view it with the normal page read view, that doesn't require parsoid to reparse the page, and thus until parsoid parses it, the linter errors will not be updated. One way I trick the page into being reparsed, instead of just being in a job queue that is delayed, is by switching between wikitext and VE and back, which forces a reparse for VE to display the page. At least on my local machine this works, and I suspect it also works on production.

This problem seemed to have gone away somehow, for many months or longer, but it is back today, if anyone can correlate some sort of system lag with the occurrence of this bug.

See https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:Linter&oldid=1126343029#Lint_problems_not_updating,_delay_in_pages?

The particular example listed there seems to have resolved now. It could have been related to T324801 (or less likely T324711)

The slowness they are describing

Also pages like misnested or link in link are taking a long time to load

could be because of the number of lints in those categories and that they are filtering by namespace. That's being worked on in T299612

If you just edit a page with the wikitext editor, and then view it with the normal page read view, that doesn't require parsoid to reparse the page, and thus until parsoid parses it, the linter errors will not be updated. One way I trick the page into being reparsed, instead of just being in a job queue that is delayed, is by switching between wikitext and VE and back, which forces a reparse for VE to display the page. At least on my local machine this works, and I suspect it also works on production.

I tried this trick at https://en.wikipedia.org/wiki/D._Daly, which is showing two Linter errors on the Page Information page but none in the rendered page. It did not change the Page Information. This bug is still happening.

It could have been related to T324801

It's somewhat related, at least. The patch that caused that also effectively disabled linting while parsing,
https://gerrit.wikimedia.org/r/c/mediawiki/core/+/864723/2/includes/Rest/Handler/ParsoidHandler.php#b733

Working on fixing it.

Change 867272 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@master] Log linter data while parsing full pages

https://gerrit.wikimedia.org/r/867272

Change 867272 merged by jenkins-bot:

[mediawiki/core@master] Log linter data while parsing full pages

https://gerrit.wikimedia.org/r/867272

Change 867274 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@wmf/1.40.0-wmf.13] Log linter data while parsing full pages

https://gerrit.wikimedia.org/r/867274

Arlolra moved this task from Needs Triage to Linting on the Parsoid board.
Arlolra moved this task from Backlog to To Deploy on the Content-Transform-Team-WIP board.

Change 867274 merged by jenkins-bot:

[mediawiki/core@wmf/1.40.0-wmf.13] Log linter data while parsing full pages

https://gerrit.wikimedia.org/r/867274

Mentioned in SAL (#wikimedia-operations) [2022-12-13T14:52:11Z] <derick@deploy1002> Started scap: Backport for [[gerrit:867274|Log linter data while parsing full pages (T246403)]]

Mentioned in SAL (#wikimedia-operations) [2022-12-13T14:53:57Z] <derick@deploy1002> derick and arlolra: Backport for [[gerrit:867274|Log linter data while parsing full pages (T246403)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2022-12-13T15:02:40Z] <derick@deploy1002> Finished scap: Backport for [[gerrit:867274|Log linter data while parsing full pages (T246403)]] (duration: 10m 28s)

I tried this trick at https://en.wikipedia.org/wiki/D._Daly, which is showing two Linter errors on the Page Information page but none in the rendered page. It did not change the Page Information. This bug is still happening.

I ran an ?action=purge on that page post-deploy and the two missing end tag lints are now cleared up,
https://en.wikipedia.org/wiki/Special:LintErrors?namespace=&titlesearch=D.+Daly&exactmatch=1

@Arlolra, I can also confirm that this is working on Group 1 (example: enwikibooks) and Group 2 (example: enwikipedia) which are still both on .13 now.

This problem seemed to have gone away somehow, for many months or longer, but it is back today, if anyone can correlate some sort of system lag with the occurrence of this bug.

If the original reported issue here has gone away for many months, we should probably close this task and file new ones should the issue ever crop up again so that we can have a more timely response and, as you say, correlated it with some sort of system change.

See https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:Linter&oldid=1126343029#Lint_problems_not_updating,_delay_in_pages?

I notified the editors there that they should expect to see this working again,
https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:Linter&type=revision&diff=1127228392&oldid=1126948274&diffmode=source

@Jonesey95 Thanks for flagging the issue and for all the linting work you do, if I don't say that enough.

It seems to sometimes stall. It has for me on English Wikisource. I am not too concerned, as I am still waiting for the changes implemented to propagate across replicas..

It seems to sometimes stall. It has for me on English Wikisource. I am not too concerned, as I am still waiting for the changes implemented to propagate across replicas..

Noting that this was also asked in T325030#8469162 so that it doesn't seem like it went unaddressed.

In follow-up:-

I will also note that it seems to only (at least for me only present in Page: namespace on English Wikisource... other namespaces have cleared almost immediately ( I did some fixes in User: and they dropped out the listing almost immediately)..

Page ns is ns104 on English Wikisource (and this non standard value might need to be specially coded for?

By comparison - https://en.wikisource.org/w/index.php?title=User:Rich_Farmbrough/DNB/J/o/John_Hippisley_(d.1748)&diff=prev&oldid=12848498 when fixed, cleared immediately.

In some earlier speculation, in T246403#8457338, I suggested that T324711 might be related. Although it wasn't exactly, it might be relevant now. The difference with the Page namespace, I assume, is that the pages there would have the proofread-page content model, instead of wikitext. But that would only make a difference as to whether the pages were being linted at all, and thus cleared, and nothing to do with any lag. Can you confirm that pages in that namespace that you edit do eventually clear?

the pages there would have the proofread-page content model, instead of wikitext. But that would only make a difference as to whether the pages were being linted at all

Judging by https://gerrit.wikimedia.org/r/c/mediawiki/core/+/866511, I suspect that's what going on here

Change 866511 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] ParsoidOutputAccess shoudl support all models that serialize to wikitext.

https://gerrit.wikimedia.org/r/866511

In some earlier speculation, in T246403#8457338, I suggested that T324711 might be related. Although it wasn't exactly, it might be relevant now. The difference with the Page namespace, I assume, is that the pages there would have the proofread-page content model, instead of wikitext. But that would only make a difference as to whether the pages were being linted at all, and thus cleared, and nothing to do with any lag. Can you confirm that pages in that namespace that you edit do eventually clear?

Eventually... It just takes a very long time I've found.

Change 866511 merged by jenkins-bot:

[mediawiki/core@master] ParsoidOutputAccess should support all models that serialize to wikitext.

https://gerrit.wikimedia.org/r/866511

In some earlier speculation, in T246403#8457338, I suggested that T324711 might be related. Although it wasn't exactly, it might be relevant now. The difference with the Page namespace, I assume, is that the pages there would have the proofread-page content model, instead of wikitext. But that would only make a difference as to whether the pages were being linted at all, and thus cleared, and nothing to do with any lag. Can you confirm that pages in that namespace that you edit do eventually clear?

Eventually... It just takes a very long time I've found.

In update to my own comment, I'm now finding that Page: namespace pages aren't generally updating in response to fixed errors, but I note that there is a patch under review which should hopefully resolve this quickly.

In update to my own comment, I'm now finding that Page: namespace pages aren't generally updating in response to fixed errors, but I note that there is a patch under review which should hopefully resolve this quickly.

Please keep in mind that regular deployments (and most of code review) is suspended over the holidays. If this isn't flagged as critical, it will probably take three or four weeks until it is fixed.