Page MenuHomePhabricator

Suggestion: API for fetching lint errors for a specific revision
Closed, ResolvedPublic

Description

Copy/pasting from https://www.mediawiki.org/wiki/Topic:Tp60an0dvayt5vhu :

Use cases:
User - interested in finding out on average how many lint errors were added to revisions of a specific article (perhaps because there is complicated markup there).
Researcher - interested in looking at the burden (e.g. cleanup efforts) new or older editor cause other editors.
Tool / script developer - interested in finding out in which revision an error was introduced to revert or to identify the culprit.
Use in extensions - for example, recentchanges could theoretically flag every revision that contains a lint error.
Background
Even in its current state the extension can make it possible to do a lot of analysis on existing data. In addition to the use cases presented above, one could for example look into historical data, e.g. run extra analysis on the Research:VisualEditor%27s_effect_on_newly_registered_editors/June_2013_study dataset to evaluate the number of errors introduced by VisualEditor vs Wikitext editors on page creation or just wikitext editor errors.
This may also be used in the ORES tool (which works on revisions) by giving extra information that can be used to help identify possible revisions containing vandalism (vandals might generally not know wikitext markup).
One possibility would be for an individual to look at their own contributions, and evaluate whether there are patterns of incorrect markup they leave behind that they can improve on. This could also be used by editors to either see if a newbie needs help, or to identify a possible vandal.
Proposed solution
A new api endpoint , e.g.:

api.php?action=query&revids=478198|54872|54894545&prop=linterrors&leprop=count|type|...

Unlike fetching lint errors for arbitrary text (which is useful by itself), this allows for much more flexibility and analysis, without using database dumps or complex scripts.

Related Objects

Event Timeline

Subbu's comment: "The specific form of this is a bit harder to support since linter is backed by parsoid right now. So, may be a separate endpoint similar to T163091 ... but this will be a lower priority one."

Change 352715 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Add an API endpoint to get lint errors for wikitext

https://gerrit.wikimedia.org/r/352715

Change 352715 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Add an API endpoint to get lint errors for wikitext

https://gerrit.wikimedia.org/r/352715

ssastry assigned this task to Arlolra.
ssastry triaged this task as Medium priority.
ssastry removed a project: Patch-For-Review.
ssastry added a subscriber: ssastry.

Actually, this requires a RESTBase side fix before this can be accessed by clients.

Actually, this requires a RESTBase side fix before this can be accessed by clients.

The endpoint is exposed,
https://en.wikipedia.org/api/rest_v1/#!/Transforms/post_transform_wikitext_to_lint_title_revision

Actually, this requires a RESTBase side fix before this can be accessed by clients.

The endpoint is exposed,
https://en.wikipedia.org/api/rest_v1/#!/Transforms/post_transform_wikitext_to_lint_title_revisio

It doesn't since wikitext is a required parameter.

Arlolra added a subscriber: Arlolra.

Services should decline if they don't want to expose this.

Pchelolo added a subscriber: Pchelolo.

Parsoid accepts just title/revision in it's wikitext/to/lint API, so all we need to do it to make wikitext an optional parameter as well, and check that either wikitext or a title is provided. Easy.

Filed a subtask for Parsoid to figure out correct redirects in case only the title is provided but not the revision. This is blocked on it for the time being.

If I understand correctly, the idea would be to fetch the list of lint errors for a specific page/revision combo? If so, I don't think making the wikitext parameter optional is a good idea, as that would imply making POST requests with an empty body. A GET end point would be much more appropriate for this, IMHO. It should be easy to expose it as /page/linterrors/{title}{/revision} (if revision is not supplied, the latest revision is assumed).

@mobrovac I've proposed that option to match the Parsoid API, but providing a GET endpoint works too.

Would it make sense to return lint errors as part of the pagebundle response from Parsoid? Also, is Parsoid handling storage for lint errors?

Would it make sense to return lint errors as part of the pagebundle response from Parsoid? Also, is Parsoid handling storage for lint errors?

Unless you want to store lint errors for all revisions, sending lint errors as part of pagebundle isn't necessary. All lint errors for the active revision are stored in a mysql db as part of the Linter extension.

Ok since the Parsoid issue was fixed I can do a quick RESTBase patch for this. Do we still need it?

Ok since the Parsoid issue was fixed I can do a quick RESTBase patch for this. Do we still need it?

Yes it would be useful .. especially for https://www.mediawiki.org/wiki/Topic:U99ywo68gg6ufgmg