Page MenuHomePhabricator

Review and deploy Linter extension to Wikimedia wikis
Closed, ResolvedPublic

Description

Parsoid has a linter that can identify common errors in wikitext (deprecated elements, fostered content, bad image options, etc.). The Linter extension collects them in a database table, and surfaces them to users. It additionally has some small client-side JS that highlights the section of wikitext with the error to make it easier for editors to fix.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 332003 had a related patch set uploaded (by Legoktm):
make-wmf-branch: Add Linter

https://gerrit.wikimedia.org/r/332003

Change 332003 merged by jenkins-bot:
make-wmf-branch: Add Linter

https://gerrit.wikimedia.org/r/332003

Mentioned in SAL (#wikimedia-operations) [2017-03-08T21:39:36Z] <legoktm@tin> Synchronized wmf-config/InitialiseSettings.php: Enable Linter on testwiki - T148609 (1/2) (duration: 00m 44s)

Mentioned in SAL (#wikimedia-operations) [2017-03-08T21:41:25Z] <legoktm@tin> Synchronized wmf-config/CommonSettings.php: Enable Linter on testwiki - T148609 (2/2) (duration: 00m 41s)

Change 342874 had a related patch set uploaded (by Legoktm):
[operations/mediawiki-config] Deploy Linter to group0 and small wikis

https://gerrit.wikimedia.org/r/342874

Change 342874 merged by jenkins-bot:
[operations/mediawiki-config] Deploy Linter to group0 and small wikis

https://gerrit.wikimedia.org/r/342874

Mentioned in SAL (#wikimedia-operations) [2017-03-15T18:25:16Z] <legoktm@tin> Synchronized wmf-config/InitialiseSettings.php: Deploy Linter to group0 and small wikis - T148609 (duration: 00m 42s)

This has been marked for inclusion in Tech News. A few questions:

a) What's the timeline, both for group0 and small wikis (in .17? .18?) and larger wikis?

b) Do we have a specific definition of small wikis in this case?

c) Would something like the text below make sense? (To be edited somewhat depending on the answers to the questions above.)

The Linter extension is now on smaller Wikimedia wikis. It helps editors find common wikitext errors so they can be fixed. It will come to other Wikimedia wikis later.

@Johan:

  • since yesterday all wikis in small.dblist and group0.dblist have this extension enabled.
  • the definition of small wiki is probably that the wiki is listed in the small.dblist database.
  • text looks okay to me, although I'd prefer if @Legoktm could give the green light.

Regards.

This has been marked for inclusion in Tech News. A few questions:

The Linter extension is now on smaller Wikimedia wikis. It helps editors find common wikitext errors so they can be fixed. It will come to other Wikimedia wikis later.

It is better and more accurate to say "find some wikitext errors" instead of "common wikitext errors". We don't find all common wikitext errors yet. If possible, you could add an additional line that says: "This list of errors will be expanded as we gain more experience with this extension.". @Legoktm thoughts?

This has been marked for inclusion in Tech News. A few questions:

a) What's the timeline, both for group0 and small wikis (in .17? .18?) and larger wikis?

I was aiming for medium wikis this week, but since the train is delayed it might be next week too.

b) Do we have a specific definition of small wikis in this case?

https://noc.wikimedia.org/conf/highlight.php?file=small.dblist

c) Would something like the text below make sense? (To be edited somewhat depending on the answers to the questions above.)

The Linter extension is now on smaller Wikimedia wikis. It helps editors find common wikitext errors so they can be fixed. It will come to other Wikimedia wikis later.

What Subbu said about just some wikitext errors. The current list is on https://www.mediawiki.org/wiki/Help:Extension:Linter so you could link to that.

When will this come to other wikis? I particularly want to get started on fixing enwiki… ;-)

We're mostly blocked on T160573, I'm still investigating that.

Oh, right. I thought that was "just" the vagaries of ChangeProp/whatever and wasn't going to be easy to fix. Getting to zero is hard when we don't have perfect data, but it's a lot better than having none. :-)

Change 346591 had a related patch set uploaded (by Legoktm):
[operations/mediawiki-config@master] Deploy Linter to medium wikis too

https://gerrit.wikimedia.org/r/346591

Change 346591 merged by jenkins-bot:
[operations/mediawiki-config@master] Deploy Linter to medium wikis too

https://gerrit.wikimedia.org/r/346591

Mentioned in SAL (#wikimedia-operations) [2017-04-06T17:13:46Z] <legoktm@tin> Synchronized wmf-config/InitialiseSettings.php: Deploy Linter to medium wikis too - T148609 (duration: 00m 40s)

Change 347439 had a related patch set uploaded (by Legoktm):
[operations/mediawiki-config@master] Deploy Linter to all wikis

https://gerrit.wikimedia.org/r/347439

Change 347439 merged by jenkins-bot:
[operations/mediawiki-config@master] Deploy Linter to all wikis

https://gerrit.wikimedia.org/r/347439

Mentioned in SAL (#wikimedia-operations) [2017-04-12T20:44:04Z] <legoktm@tin> Synchronized wmf-config/InitialiseSettings.php: Deploy Linter to all wikis - T148609 (duration: 00m 44s)

This is now deployed to all wikis.

Mentioned in SAL (#wikimedia-operations) [2017-04-14T20:53:38Z] <reedy@tin> Synchronized wmf-config/InitialiseSettings.php: Disable Linter on larger wikis T148609 (duration: 00m 41s)

We had to revert the last change on emergency because it was causing issues on commonswiki (s4) and in general on large wikis.

This the query that was creating issues:

SELECT /* MediaWiki\Linter\Database::getTotals  */  linter_cat,COUNT(*) AS count  FROM `linter` group by linter_cat;

for which the explain is on db1084 is:

root@PRODUCTION s4[commonswiki]> explain SELECT /* MediaWiki\Linter\Database::getTotals  */  linter_cat,COUNT(*) AS count  FROM `linter` group by linter_cat;
+------+-------------+--------+-------+---------------+--------------------------+---------+------+---------+-------------+
| id   | select_type | table  | type  | possible_keys | key                      | key_len | ref  | rows    | Extra       |
+------+-------------+--------+-------+---------------+--------------------------+---------+------+---------+-------------+
|    1 | SIMPLE      | linter | index | NULL          | linter_cat_page_position | 16      | NULL | 3558942 | Using index |
+------+-------------+--------+-------+---------------+--------------------------+---------+------+---------+-------------+

The result of the query on commonswiki is:

root@PRODUCTION s4[commonswiki]> SELECT /* MediaWiki\Linter\Database::getTotals  */  linter_cat,COUNT(*) AS count  FROM `linter` group by linter_cat;
+------------+---------+
| linter_cat | count   |
+------------+---------+
|          1 |      42 |
|          2 |  633635 |
|          3 |    4961 |
|          4 |  882572 |
|          5 |  820589 |
|          7 |    1313 |
|          8 | 1044567 |
+------------+---------+
7 rows in set (3.16 sec)

and during the issue it was taking up to 50 seconds for each query and there were quite a lot of them causing contention on all s4 slaves.

This is the current schema:

root@PRODUCTION s4[commonswiki]> show create table linter;
+--------+----------------------------------------------------------------------------------------------
| Table  | Create Table
+--------+----------------------------------------------------------------------------------------------
| linter | CREATE TABLE `linter` (
  `linter_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `linter_page` int(10) unsigned NOT NULL,
  `linter_cat` int(10) unsigned NOT NULL,
  `linter_start` int(10) unsigned NOT NULL,
  `linter_end` int(10) unsigned NOT NULL,
  `linter_params` blob NOT NULL,
  PRIMARY KEY (`linter_id`),
  UNIQUE KEY `linter_cat_page_position` (`linter_cat`,`linter_page`,`linter_start`,`linter_end`),
  KEY `linter_page` (`linter_page`)
) ENGINE=InnoDB AUTO_INCREMENT=3939880 DEFAULT CHARSET=binary |
+--------+----------------------------------------------------------------------------------------------

For a graph of the impact see for example:
https://grafana.wikimedia.org/dashboard/db/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1084&from=1492181842135&to=1492204222000&panelId=37&fullscreen

It's not clear at this point what triggered the issue, but from the alarms that were triggering at the same time, mobileapps endpoints on scb* hosts were failing.

I do notice there's no caching infront of the DB calls to getTotals

Possibly worth sticking them into memcached with a lowish timeout?

Mentioned in SAL (#wikimedia-operations) [2017-04-14T22:16:28Z] <Reedy> created linter tables on pawikisource T148609

Mentioned in SAL (#wikimedia-operations) [2017-04-14T22:17:26Z] <Reedy> created linter tables on wbwikimedia T148609

Change 348244 had a related patch set uploaded (by Reedy):
[mediawiki/extensions/WikimediaMaintenance@master] Add linter tables to all wikis

https://gerrit.wikimedia.org/r/348244

Mentioned in SAL (#wikimedia-operations) [2017-04-14T20:53:38Z] <reedy@tin> Synchronized wmf-config/InitialiseSettings.php: Disable Linter on larger wikis T148609 (duration: 00m 41s)

Quite a lot of these appearing in logstash.. I guess the parsoid boxes need telling there's no linter everywhere now?

{"code":"unknown_action","info":"Unrecognized value for parameter \"action\": record-lint.","docref":"See https://en.wiktionary.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce&gt; for notice of API deprecations and breaking changes."}

Guess when Linter was deployed? :)

Screen Shot 2017-04-14 at 23.30.44.png (1×2 px, 639 KB)

Mentioned in SAL (#wikimedia-operations) [2017-04-14T22:49:01Z] <volans> restarting parsoid to get the disable linter change T148609

Change 348244 merged by jenkins-bot:
[mediawiki/extensions/WikimediaMaintenance@master] Add linter tables to all wikis

https://gerrit.wikimedia.org/r/348244

Sorry about the trouble :/

As I mentioned on IRC, I think the huge increase in queries was caused by the inclusion in the API meta=siteinfo endpoint which in hindsight is called pretty frequently. We should consider not including it by default (another parameter) and putting it behind some caching.

When re-enabling we'll need to figure out what to do with the data currently in the database since it's now potentially out of date...

I believe this was really close to bring the servers down, and not only MySQL. When I logged in after getting the second alert the servers were really struggling, took me quite a lot of seconds to get the shell prompt: https://grafana.wikimedia.org/dashboard/file/server-board.json?refresh=1m&orgId=1&var-server=db1081&var-network=eth0&from=1492191958998&to=1492235158998 and it was really slow anyways.
It was nice to see that the revert happened so fast (thanks for that, very well done!) and the server recovered straightaway.
Thanks @Volans and @Reedy!

As I mentioned on IRC, I think the huge increase in queries was caused by the inclusion in the API meta=siteinfo endpoint which in hindsight is called pretty frequently. We should consider not including it by default (another parameter) and putting it behind some caching.

Submitted https://gerrit.wikimedia.org/r/#/c/348323/1 to remove the info from the API query for now.

Change 358887 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[operations/mediawiki-config@master] Deploy Linter to all wikis (try #2)

https://gerrit.wikimedia.org/r/358887

@Legoktm I'm assuming you still don't need anything from CLs since I haven't heard about this in months.

@Legoktm I'm assuming you still don't need anything from CLs since I haven't heard about this in months.

I'll chat with you once this is deployed since we want to figure out how to leverage this for Tidy replacement. On the linter page, I've identified a few categories that editors of all wikis need to work on. I was waiting on this since I didn't want to have multiple announcements for different wikis.

Change 358887 merged by jenkins-bot:
[operations/mediawiki-config@master] Deploy Linter to all wikis (try #2)

https://gerrit.wikimedia.org/r/358887

Mentioned in SAL (#wikimedia-operations) [2017-06-20T21:17:33Z] <legoktm@tin> Synchronized wmf-config/InitialiseSettings.php: Deploy Linter to all wikis (try #2) - T148609 (duration: 00m 44s)

I left this open just in case something bad happened and we had to revert but that didn't happen, so marking this as resolved again :)