Page MenuHomePhabricator

Pages that have linter errors fixed aren't getting updated in Special:LintErrors
Closed, ResolvedPublic

Description

There have been multiple reports since yesterday that even after lint errors on pages have been fixed, the Special:LintErrors page isn't getting updated.

  1. Report from trwiki
  2. Report from enwiki
  3. IRC report via @revi ( report from kowiki )

There are 3 potential culprits in the pipe (Parsoid, Linter extension, Changeprop / Job Queue)
(a) Regression in Parsoid no longer linting pages (b) Something broken in changeprop / job queue that linter update jobs aren't getting run (c) (b) Some breakage in the Linter extension wrt updating the database

Looking at the trwiki report, the user is reporting that linthint is continuing to work properly. Since linthint makes a request to the API to find lint errors on a page which initiates a fresh Parsoid parse of the page, we can rule out a regression in Parsoid.

So, that leaves us with either a problem in changeprop or in the linter extension itself.

Event Timeline

ssastry created this task.Mar 4 2018, 7:39 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 4 2018, 7:39 PM
ssastry triaged this task as High priority.Mar 4 2018, 7:39 PM
ssastry added a project: ChangeProp.
ssastry updated the task description. (Show Details)Mar 4 2018, 7:42 PM
Elitre added a subscriber: Elitre.Mar 5 2018, 9:53 AM

Mentioned in SAL (#wikimedia-operations) [2018-03-05T13:37:41Z] <mobrovac@tin> Started restart [cpjobqueue/deploy@b5255f0]: Force RecordLintJob rebalance in Kakfa - T188870

mobrovac closed this task as Resolved.Mar 5 2018, 1:45 PM
mobrovac claimed this task.
mobrovac edited projects, added Services (done); removed Services.
mobrovac added a subscriber: mobrovac.

There was a problem with CP this weekend whereby the topic handling the RecordLintJob execution did not get reassigned to a new worker after the one handling it died. I restarted CP and the job is now being processed, so this imminent problem is fixed. We will continue the investigation into why these things happen in T179684: Kafka sometimes misses to rebalance topics properly.

The problem continues. Examples:

  • Petar Mladenov reports Multiple unclosed formatting tags, was fixed 20:30, 4 March 2018‎
  • Seasons of My Heart reports Multiple unclosed formatting tags, was fixed 20:35, 4 March 2018‎

The problem continues. Examples:

  • Petar Mladenov reports Multiple unclosed formatting tags, was fixed 20:30, 4 March 2018‎
  • Seasons of My Heart reports Multiple unclosed formatting tags, was fixed 20:35, 4 March 2018‎

Sorry, I should have been clearer when I posted on wp:lint talk page, but @mobrovac said earlier today that it will take up to 9 hours (so another 4-5 hours) for the backlog to clear up.

For the record, looking at Special:LintErrors on Dutch Wiktionary the lag between a correction and an update presently seems to be 138 hours (nearly six days), while the subpages for particular errors are updating immediately.

MarcoSwart added a comment.EditedMar 12 2018, 11:09 AM

The main page Special:LintErrors on Dutch Wiktionary has been updated this morning, showing as a new find an old page with both a misnested tag and an obsolete tag, which of course have been corrected by now. What puzzles me is that the counter for "Missing end tag" was and is at 11 pages, with the corresponding subpage only showing the 8 pages that have become our baseline. If it had been 10 I would consider it the result of rounding, but how do we get to 11?

There was another update, showing 1 stripped tag en 0 for both the misnested and obsolete tags. Only the Missing end tag remains stubbornly at 11, while the subpage is showing the correct number, 8.

Hi @MarcoSwart, as I indicated on the flow thread, for database performance reasons (where some wikis like enwiki, commonswiki have millions of linter entries), we are using mysql's estimated count feature (via the "EXPLAIN" feature) instead of maintaining accurate counts. So, the differences you are seeing are likely a result of that.

Thx, I'm starting to understand now.