Page MenuHomePhabricator

Filter out large-table lint records from core rest API to fix LintHint being overwhelmed with spurious large-tables lint records
Closed, ResolvedPublic

Description

The LintHint gadget uses the core parsoid lint API to retrieve all lint errors for a specific page directly from Parsoid.
We need to suppress that API from returning 'large-tables' category lint records.
Setting the linter extension extension.json value to "none' priority has no impact on this core interface.

The first patch I am proposing filters out the 'large-table' records after Parsoid generates them, and is inelegant, but safe.
It might be better to pass an option to Parsoid such that the Parsoid lint code it does not look for and generate 'large-table' lint records when called from the core lint API with a special option disabling this lint.

The issue to be determined is who else besides LintHint is accessing this API and whether they want to see 'large-table' lint records or not.

In discussion with CScott it appears that Discussion Tools is also possibly using this API, but would also like the new category 'large-tables' to not be in the results.

Event Timeline

Change 922177 had a related patch set uploaded (by Sbailey; author: Sbailey):

[mediawiki/core@master] WIP Filter out large-table lints from core test API

https://gerrit.wikimedia.org/r/922177

The issue to be determined is who else besides LintHint is accessing this API and whether they want to see 'large-table' lint records or not.

This hints at removing any logic from that endpoint for filtering lints (as in the patch above) and making the default to not emit them. Then, add a configuration so that they only get emitted when we're linting as part of a full page parse that is going to be recorded by the linter extension.

The issue to be determined is who else besides LintHint is accessing this API and whether they want to see 'large-table' lint records or not.

This hints at removing any logic from that endpoint for filtering lints (as in the patch above) and making the default to not emit them. Then, add a configuration so that they only get emitted when we're linting as part of a full page parse that is going to be recorded by the linter extension.

I agree there is some need to have that configurable, and that we need to discuss/decide how and with what granularity - the above patch was mostly meant to be a stopgap to solve the immediate issue while we think about that.

I'm wondering whether we want:

  • a "Not-A-Lint"/"Internal" category that we'd indeed filter essentially everywhere except where we need it internally
  • an "allow" list of "allowed lints" to pass through the API
  • a "deny" list of "denied lints" to pass through the API
  • all or a combination of the above

I'm somewhat leaning towards the first solution at least, because it feels like it gives us the possibility to continue experimenting with the Linter without clients having to ever see it/adjust their tools to filter it out, but giving control to the users also feels like a Good Thing - so ideally we'd get ALL OF THAT, but that'll typically take a little bit more time.

For now, filtering in ParsoidHandler should help LintHint avoid being swamped with large-tables lints.
Next step I think is to implement inside Parsoid, a "internal" versus "exposed" or something like that flag that large-tables can use to hide these not WT errors, but useful diagnostic or research lints behind.

Considering T336316#8869809, T336528, and T336679, I think you should just stop emitting these lint errors altogether until it's refined and can be suppressed from turning up in the lint http api endpoint.

Hmm, nested tables create false positives, test coverage was inadequate.
Maybe we just drop a return at the start of the current large-tables lint code in Parsoid until this lint code gets rewritten?

What's the current timeline for having the lint hidden?

That (returning at the start) sounds fine as long as it's temporary. Web team would really like to start digging into the data here and working towards general solutions to this problem - it's important we fix this within the next 3 months and progress is currently blocked on having a working lint that we can iterate on that's hidden from editor workflows.

until this lint code gets rewritten?

Can you elaborate on what's being rewritten? Or do you mean fixing the hidden category?

Note we are hoping to have a fix for T336528 by the end of next week cc @Mabualruz who is point person while I'm out next week.

Sbailey renamed this task from Filter out large-table lint records from core rest API to fix LintHint being overwhelmed with spurious large table lint records to Filter out large-table lint records from core rest API to fix LintHint being overwhelmed with spurious large-tables lint records.May 26 2023, 4:05 PM

I think we should ship the large-tables lint suppressing code for the Parsoid rest API and then back it out once the replacement large-tables lint code is in production. This will help LintHint be usable again, later next week.

Change 922177 merged by jenkins-bot:

[mediawiki/core@master] Filter out large-tables category lints from Parsoid REST API

https://gerrit.wikimedia.org/r/922177

Arlolra triaged this task as Medium priority.Aug 23 2023, 6:45 PM
Arlolra moved this task from Backlog to Action API on the MediaWiki-extensions-Linter board.

Change 1010626 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@master] Filter out all priority none lints from rest api

https://gerrit.wikimedia.org/r/1010626

Change 1010626 merged by jenkins-bot:

[mediawiki/core@master] Filter out all priority none lints from rest api

https://gerrit.wikimedia.org/r/1010626

Change #1016431 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/extensions/Linter@master] Don't include hidden category counts in page info

https://gerrit.wikimedia.org/r/1016431

Change #1016431 merged by jenkins-bot:

[mediawiki/extensions/Linter@master] Don't include hidden category counts in page info

https://gerrit.wikimedia.org/r/1016431

Change #1020959 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] Allow callers to pass in linterOverrides

https://gerrit.wikimedia.org/r/1020959

Change #1020960 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@master] Pass in linterOverrides, rather than filter after the fact

https://gerrit.wikimedia.org/r/1020960

Change #1020959 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Allow callers to pass in linterOverrides

https://gerrit.wikimedia.org/r/1020959

Change #1022258 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.20.0-a2

https://gerrit.wikimedia.org/r/1022258

Change #1022258 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.20.0-a2

https://gerrit.wikimedia.org/r/1022258

Change #1020960 merged by jenkins-bot:

[mediawiki/core@master] Pass in linterOverrides, rather than filter after the fact

https://gerrit.wikimedia.org/r/1020960