Page MenuHomePhabricator

Special:LintErrors page had not been updated more than 90 mins after a page has been edited to fix errors
Closed, ResolvedPublic


This edit fixed all the obsolete <tt> tag errors on the mw:Continuous_Integration/Zuul page but the LintErrors page continues to list it as of 90+ mins after that edit.

Not sure if this is simply a delay in when Parsoid is reparsing the page via ChangeProp or if this is a bug in the Linter extension code. IIRC, the Linter code removes stale entries from the database after a page is parsed via Parsoid.

Event Timeline

We have had some events being lost today in CP due to the update to a newer version of the Kafka native driver, but that shouldn't have affected this case, as the loss pertains only to issuing dependency updates. I have forced a purge on mw:Continuous_integration/Zuul, let's see if that helps.

<mobrovac> re T160573
<mobrovac> have you noticed that today?
<subbu> yes .. i am mostly checking if that page linked in the phab task went through page prop or not
<subbu> *change prop
<subbu> the linter ext was deployed today (@ 2017-03-15T18:25:16Z )
<subbu> and James_.F edited that page.
<subbu> so, trying ot figure out if this is a bug in linter extension code or a bug in changeprop / parsoid.
<mobrovac> uh bad timing
<mobrovac> we had some event loss today in CP due to the migration to the new kafka native driver lib
<subbu> i see ... i suppose i can issue a purge vent on the page to force a reparse.
<subbu> can you update the phab task with that info?
<mobrovac> but wait
<mobrovac> how is CP involved here?
<subbu> page is edited => CP should notice that => issue reparse req. to Parsoid => linter should remove those lint errors from its db
<subbu> as i said eralier .. i am trying to figure out what in that chain is broken.
<mobrovac> I just purged the CI page
<mobrovac> ah ok
<mobrovac> i see now, so the linter is triggered on a parsoid api req?
<subbu> parsoid emits lint errors on prase to the m/w api which goes to the linter extension.
<subbu> so, yes.
<mobrovac> oh oh i see
<mobrovac> then it's not CP-related
<mobrovac> that event should have been processed
<subbu> mobrovac, the purge fixed it .. is updated now.
<mobrovac> the event loss was only for dependencies
<mobrovac> ah hm ok
<subbu> so, only explanation is that the reparse didn't happen or the event got lost somewhere.
<subbu> curious what happened there .. if you want to dig into it.
<subbu> will update the ticket with this IRC transcript.
ssastry renamed this task from Special:LintErrors page isn't updated more than 90 mins after a page has been edited to fix errors to Special:LintErrors page had not been updated more than 90 mins after a page has been edited to fix errors.Mar 15 2017, 9:24 PM
ssastry reassigned this task from Legoktm to mobrovac.
ssastry triaged this task as Medium priority.
ssastry added a subscriber: Legoktm.

Just found another case...


mysql:wikiadmin@db1077 [mediawikiwiki]> select * from linter where linter_id=30130;
| linter_id | linter_page | linter_cat | linter_start | linter_end | linter_params         |
|     30130 |      636946 |          3 |         1568 |       1638 | {"items":["options"]} |
1 row in set (0.00 sec)

Question: Do we keep logs for pages/events that changeprop processed? I'll try to investigate this from the parsoid/linter/jobqueue side too.

The specific edit that should have fixed this was revid=2427878. If changeprop did process that event, I'm starting to suspect that it might be a race condition when parsoid contacts MW...:

		if ( !$title || !$title->getArticleID()
			|| $title->getLatestRevID() != $params['revision']
		) {
			$this->dieUsage( 'Invalid, non-existent, or outdated title', 'invalid-title' );

Here's the event from Kafka that was processed:

   "comment":"Lint errors",

The event was not retried, so apparently it got processed successfully.

Change 344998 had a related patch set uploaded (by Legoktm):
[operations/mediawiki-config@master] Linter: whitelist parsoid canaries too

Change 344998 merged by jenkins-bot:
[operations/mediawiki-config@master] Linter: whitelist parsoid canaries too

Mentioned in SAL (#wikimedia-operations) [2017-03-27T19:56:12Z] <legoktm@tin> Synchronized wmf-config/InitialiseSettings.php: Linter: whitelist parsoid canaries too - - T160573 (duration: 00m 44s)

After looking into the parsoid logs, I noticed that wtp2001 wasn't whitelisted, and it turns out that we were dropping events from all of the four canary servers, about 9% of events in total. I'm not closing this yet because I saw a few other errors in the logs that need investigating, but I think this should fix the largest source of issues.

No more reports of delayed updates have come in since, closing as resolved.