Page MenuHomePhabricator

Audit heavy linter categories to see if we really need them
Open, Needs TriagePublic

Description

Currently, linter tables are among the heaviest tables in term of write and size. After talking to @ssastry in offsite, he suggested we should take a look at the linter categories and see which ones we can disable. According to the stats (collected globally) these are heaviest categories:

{
	"night-mode-unaware-background-color": 186136712,
	"obsolete-tag": 20461619,
	"missing-end-tag": 18571784,
	"misnested-tag": 17115266,
	"large-tables": 16967189,
	"duplicate-ids": 12441859,
	"stripped-tag": 10347238,
	"fostered": 1108094,
	"bogus-image-options": 1066422,
	"html5-misnesting": 1049194,
	"tidy-font-bug": 350029,
	"wikilink-in-extlink": 291255,
	"missing-image-alt-text": 252542,
	"misc-tidy-replacement-issues": 249716,
	"deletable-table-tag": 235093,
	"empty-heading": 116395,
	"missing-end-tag-in-heading": 92265,
	"multiple-unclosed-formatting-tags": 75691,
	"self-closed-tag": 74385,
	"fostered-transparent": 42694,
	"multiline-html-table-in-list": 36964,
	"pwrap-bug-workaround": 32883,
	"multi-colon-escape": 7402,
	"tidy-whitespace-bug": 4536,
	"unclosed-quotes-in-heading": 2504,
	"inline-media-caption": 1
}

Category 22 (night-mode-unaware-background-color) is adding 186M that is more than every other category combined (they add up to 100M, so basically double of every other category combined). Maybe that now we have enabled dark mode everywhere we can disable it?

Event Timeline

Wikis with more than 1M dark mode lint errors:

arwiki
5572177
--
cebwiki
4333563
--
commonswiki
17224290
--
dewiki
1925572
--
dewikisource
1018570
--
enwiki
13851902
--
enwiktionary
1710670
--
eswiki
3296663
--
fawiki
1245541
--
frwiki
3842180
--
hiwiki
1241293
--
huwiki
1340336
--
hywiki
1165824
--
idwiki
8181366
--
incubatorwiki
2522384
--
itwiki
17938235
--
jawiki
5026805
--
kowiki
2128614
--
mswiki
1775274
--
ptwiki
5377732
--
ruwiki
1472063
--
ruwikinews
4010311
--
shwiki
2073617
--
srwiki
2051348
--
svwiki
2996301
--
trwiki
4208663
--
ttwiki
1132662
--
ukwiki
3811945
--
viwiki
9811904
--
warwiki
1255387
--
zhwiki
10915653

I know @Jonesey95 (who does more lint cleanup on enwiki than pretty much anyone else) wants large table (id 20) gone because it is not actually an error. We probably spend more time telling people "this is not an error!" than fixing it. It was introduced in T334528 to "understand the problem better and find solutions", presumably referring to the problem that large tables are harder to view on mobile. Not sure if that is still needed. In any event, it doesn't track large tables by actual display size: it just checks whether there are more than five columns.

The point of the night mode linter rule is that it flags pages which might look funky in dark mode, so I think dark mode being deployed is actually a reason to keep it around. But then again, I'm not sure if enough people are working on that backlog to make the technical debt and storage worthwhile. I would weakly support its removal, but we should remove it for the right reasons.

Edited the description to use the category names instead of the numerical IDs.

Philosophically I think it's important that Linter categories remain actionable and are things editors can fix. So with that, getting rid of hidden categories like large tables seems like a quick win. My understanding is that night mode is in a similar spot, but I'll leave it to others to better explain/justify it.

It would be great if everyone could stop referring to these Linter issue lists and tables using the word "category", since that word has a specific technical meaning on Wikipedia. Linter does not cause any sort of categorization.

It would also be great if Linter issues could actually assign pages to categories so that they would be more noticeable to editors and easier to find using Search, but that is a separate, long-standing ticket.

I am unclear what this ticket proposes. If the dark mode Linter table goes away, for example, does that mean that there would be no reliable way to find pages with the syntax that currently causes pages to be tagged with that issue on their Page information page?

I would support removal of the "large table" issue entirely; it was worth a try, but it is not something that can or should be fixed on many pages, and there have always been too many false positives.

I would also support removal or refinement of the dark mode Linter issue, since it has so many false positives. If it can be refined to highlight only issues that actually cause problems in dark mode, I think the table would be a lot smaller.

And one last thought: Some wikis have seen very little attention to Linter errors, which means their counts could be dropped significantly. In 2022, I visited Commons and was able to reduce the total Linter count from about 13 million to under 5 million with just a few hundred edits to templates. That same opportunity is probably available at other wikis.

It would be great if everyone could stop referring to these Linter issue lists and tables using the word "category", since that word has a specific technical meaning on Wikipedia. Linter does not cause any sort of categorization.

It would also be great if Linter issues could actually assign pages to categories so that they would be more noticeable to editors and easier to find using Search, but that is a separate, long-standing ticket.

I am unclear what this ticket proposes. If the dark mode Linter table goes away, for example, does that mean that there would be no reliable way to find pages with the syntax that currently causes pages to be tagged with that issue on their Page information page?

I would support removal of the "large table" issue entirely; it was worth a try, but it is not something that can or should be fixed on many pages, and there have always been too many false positives.

I would also support removal or refinement of the dark mode Linter issue, since it has so many false positives. If it can be refined to highlight only issues that actually cause problems in dark mode, I think the table would be a lot smaller.

It would be great if you share some examples of false positives to help the dev teams on refining the rule. I'd appreciate it.

And one last thought: Some wikis have seen very little attention to Linter errors, which means their counts could be dropped significantly. In 2022, I visited Commons and was able to reduce the total Linter count from about 13 million to under 5 million with just a few hundred edits to templates. That same opportunity is probably available at other wikis.

I think that would be T170874: Provide additional Linter statistics about template-generated issues which is not that hard to implement. Fingers crossed it gets done soon.

I would also support removal or refinement of the dark mode Linter issue, since it has so many false positives. If it can be refined to highlight only issues that actually cause problems in dark mode, I think the table would be a lot smaller.

It would be great if you share some examples of false positives to help the dev teams on refining the rule. I'd appreciate it.

Sure. Here's one at random from the first page of results: https://en.wikipedia.org/wiki/Template:Footer_Pan_Pacific_Champions_100m_Breaststroke_Men

The template is perfectly readable in both dark mode and light mode, despite the lack of an explicit color declaration. I think the top bar may be inheriting the color from a parent .navbox-title element, but it's not entirely clear to me.

Here's another, possibly more subtle: https://en.wikipedia.org/wiki/Template:Tropical_cyclone_season_timeline

The template is perfectly readable in both dark mode and light mode, despite the lack of an explicit color declaration alongside the bgcolor declaration. Furthermore, when I inspect the problem element, I see that color:black is indeed being assigned to it, inherited from the .infobox parent element. So the element that is tagged as missing a color actually does have one, and it's working fine.

I hope that helps clarify what I mean by false positives. I reported these situations somewhere early on in the night-mode Linter era, but I don't remember where.

Thanks! I‌ will let the dev team know about this.

FWIW, I think completely disabling the dark mode color rule would be a bad idea. There are still pages where content is unreadable in dark mode on lesser edited wikis (take https://bn.wikibooks.org/wiki/টেমপ্লেট:সংগ্রহশালার_স্বয়ংক্রিয়_পরিভ্রমণ for example), and the fix is straightforward enough. I do agree with the concern about false positives raised by @Jonesey95 and I feel that’s the actual problem here rather than the existence of the rule itself. Personally, I have been running my bot on bnwikibooks to just add color declarations wherever Linter complains about them being missing,

The template is perfectly readable in both dark mode and light mode, despite the lack of an explicit color declaration alongside the bgcolor declaration. Furthermore, when I inspect the problem element, I see that color:black is indeed being assigned to it, inherited from the .infobox parent element. So the element that is tagged as missing a color actually does have one, and it's working fine.

For reference, I believe this is (partially?) covered by what you had previously filed as T393538: Linter reports "night-mode-unaware-background-color" error even though color is defined.

Personally, I have been running my bot on bnwikibooks to just add color declarations wherever Linter complains about them being missing,

Could you share a link to your source code / how your bot works?

Could you share a link to your source code / how your bot works?

Source code: https://gitlab.wikimedia.org/toolforge-repos/redminbot/-/blob/main/utils.py?ref_type=heads#L43
(It does not check if color is perhaps being set from something that is not immediately visible in the wikitext.)

The lint rules are useful to some wikis as @Redmin has demonstrated. I also know they are being acted on and helpful in a few other wikis.

I have flagged this before but despite the name it is not just dark mode this aids. Fixing these lints ensures our content is portable to any other website/app that uses our content (think Kiwix; think native apps). For example the native apps DO NOT load site CSS for security reasons so many of the dark mode fixes that currently work on English Wikipedia do not work outside the web experience.

The lint is definitely not perfect. It does only inspect inline styles as lint rules do not currently support traversal of template styles. However that is also by design. There is no guarantee certain Wikipedia viewers are not stripping those styles.

That said, I think if English Wikipedia is not finding them useful and decided not to fix them they can follow the usual on wiki RFC process to request it is disabled? If we are concerned by the number of lints we should probably be more proactive and start some "use it or lose it" conversations on those wikis.