Page MenuHomePhabricator

Parsoid: Add API endpoint to get lint errors for arbitrary wikitext
Closed, ResolvedPublic

Description

For tools/scripts to ensure they have resolved all of the lint errors, we need some kind of API to be able to check wikitext before it is saved instead of saving the page and then waiting for the list of lint errors to get updated.

I propose that parsoid adds an API endpoint that accepts wikitext in the POST body and returns with a JSON list of errors (e.g. in a format similar to https://www.mediawiki.org/w/api.php?action=query&list=linterrors).

MediaWiki would then have an API module that proxies requests to parsoid, handling rate limiting and sanity checking.

Related Objects

Event Timeline

Legoktm created this task.Apr 17 2017, 3:17 AM

On the Parsoid end, we have to figure out if we should add a lint=true param to https://www.mediawiki.org/wiki/Parsoid/API#Wikitext_-.3E_HTML_2 or if we should add a separate linting endpoint.

@Arlolra says: "at first glance, a qs param or flag in the payload might be ok, but only supported in pagebundle".

ssastry triaged this task as High priority.Apr 21 2017, 8:45 PM
cscott added a subscriber: cscott.Apr 21 2017, 8:59 PM

Sounds reasonable, although if we make "saving the page and waiting for the list of errors to get updated" really fast, that would also work. Not sure which is easier.

NicoV added a comment.Apr 22 2017, 6:40 AM

@cscott : the idea of this request was to be able to check if there are remaining errors before saving the page, so that you can fix them without saving multiple versions of the page...

ssastry moved this task from Backlog to Next Up on the Parsoid board.Apr 24 2017, 10:14 PM

@Arlolra says: "at first glance, a qs param or flag in the payload might be ok, but only supported in pagebundle".

If the lint issues are sent in the response in addition to the HTML output then the pagebundle and wt2html endpoints are the right thing to do.

However, if only lint issues are required, then a separate lint endpoint is better since that will not return html, data-parsoid, data-mw, etc. and /lint/ is a better description of what is expected.

Would this need to be project dependent?

Legoktm added a comment.EditedApr 28 2017, 5:12 PM

Would this need to be project dependent?

If I understand you correctly yes? Because lint errors depend upon local context like localized image options, etc.

Would this need to be project dependent?

IRC transcript regarding exposing the endpoint via the REST API in restbase.

<subbu> mobrovac, i didn't actually understand your qn. about "would this need to be project dependent?"
<mobrovac> subbu: oh sorry, the q is whether the api would need to be exposed per-project or would it be possible to put it under the global domain
<subbu> isn't it better to do it per-project like all other endpoints since it is basically a parse + extra analysis.
<mobrovac> subbu: the reason i'm asking is because this smells like a feature that would be useful only to us, not the public, so having it just on the global domain would give it less visibility
<subbu> I see what you are saying ..

Oh, hm, wait, so this would be consulted on every edit? What would be the exact flow for this API?

Oh, hm, wait, so this would be consulted on every edit? What would be the exact flow for this API?

Probably not. The workflow is probably going to be that a tool/gadget/user retrieves a list of lint errors for a page from the MW API. It then fixes up the text to resolve those lint errors. Then it posts that text to this new API endpoint that tells the client if there are any lint errors left. Repeat until lint errors are gone (or user is satisfied with what is left), and then it'll save the page using the normal MW API.

Oh, hm, wait, so this would be consulted on every edit? What would be the exact flow for this API?

Probably not. The workflow is probably going to be that a tool/gadget/user retrieves a list of lint errors for a page from the MW API. It then fixes up the text to resolve those lint errors. Then it posts that text to this new API endpoint that tells the client if there are any lint errors left. Repeat until lint errors are gone (or user is satisfied with what is left), and then it'll save the page using the normal MW API.

Isn't linting on preview also a use case?

Isn't linting on preview also a use case?

Yes but we'll probably only do this for users that care and opt-in somehow and not everyone since most users probably won't understand or know how to fix the errors.

Isn't linting on preview also a use case?

Yes but we'll probably only do this for users that care and opt-in somehow and not everyone since most users probably won't understand or know how to fix the errors.

Makes sense. I was checking that we had this preview use case in mind, even if it is only going to be a fraction of edits that will have that workflow.

I see, thank you for the context @Legoktm and @ssastry . So, it would lint the wikitext content. Is the domain/title relevant then? I would suppose the domain is (for templates and such), but if the goal is to have linter-approved wikitext, then I must say I'm not sure how knowing the title might help. Please educate me :)

Title is necessary for parsing page-specific variables e.g. {{PAGENAME}} and everything that builds on top of that.

In that case, I think we can expose it publicly under /page/, e.g. POST https://en.wikipedia.org/api/rest_v1/page/lint/Foobar. Similarly, Parsoid could expose a similar path and requests would then just be proxied from RB to it.

I'm not 100% sure on the /lint/ portion of the path. /lint-wikitext/ would probably be more accurate, but it's kind of long. Thoughts?

My first thought was that this fits with the other transform end points, which are all stateless, and also optionally take a title parameter. Perhaps /transform/wikitext/to/lints{/title} ?

Change 352715 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] WIP: Add an API endpoint to get lint errors for arbitrary wikitext

https://gerrit.wikimedia.org/r/352715

ssastry moved this task from Next Up to In Progress on the Parsoid board.May 8 2017, 10:46 PM

My first thought was that this fits with the other transform end points, which are all stateless, and also optionally take a title parameter. Perhaps /transform/wikitext/to/lints{/title} ?

That was my first thought as well, but it sounds ... weird (can you transform wikitext to lints?) and it doesn't really transform anything, rather, it evaluates it.

My first thought was that this fits with the other transform end points, which are all stateless, and also optionally take a title parameter. Perhaps /transform/wikitext/to/lints{/title} ?

That was my first thought as well, but it sounds ... weird (can you transform wikitext to lints?)

Yeah, I had a similar worry. I think it is mainly weird when looking at it with the "transform content" pattern currently dominating the transform hierarchy in mind. However, we partly chose "transform" because it is a very general term that can describe pretty much any kind of input to output transformation, as in f(x) = y. Linting fits that more general meaning, and expanding the transform hierarchy that way would let us limit the number of top level hierarchies by using transform as a fairly general catch-all.

The page hierarchy on the other hand is currently focused on content and data associated with specific, typically existing pages. It would certainly be a good place for a GET /page/lint/{title}{/revision} end point backed by storage. If we plan to add such an entry point, then a symmetric POST end point could make sense as well, although this is somewhat different with how we have done this for wikitext.

NicoV added a comment.May 9 2017, 8:47 AM

Please note that the request is for arbitrary wikitext : the endpoint must allow to provide the wikitext for the given page, independently of any existing revision (or even if the page exists at all or not). When I read your comments, I'm not sure this is still taken into account.

Change 352715 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] T163091: Add an API endpoint to get lint errors for arbitrary wikitext

https://gerrit.wikimedia.org/r/352715

Please note that the request is for arbitrary wikitext : the endpoint must allow to provide the wikitext for the given page, independently of any existing revision (or even if the page exists at all or not). When I read your comments, I'm not sure this is still taken into account.

POST requests allow for arbitrary wikitext to be posted. title and revision are optional. app.post('/:domain/v3/transform/:from/to/:format/:title?/:revision?) is the definition inside Parsoid for example. If you provide the page title, the wikitext is evaluated in the context of the page title. (defaults to Main Page if not provided).

RESTBase also has a similar interpretation. See https://en.wikipedia.org/api/rest_v1/#!/Transforms/post_transform_wikitext_to_html_title_revision

Makes sense. Let's go with /transform/wikitext/to/lint{/title}, then ?

Makes sense. Let's go with /transform/wikitext/to/lint{/title}, then ?

wfm.

Change 352715 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Add an API endpoint to get lint errors for wikitext

https://gerrit.wikimedia.org/r/352715

The RESTBase side of things is covered in PR 816.

Change 352715 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Add an API endpoint to get lint errors for wikitext

https://gerrit.wikimedia.org/r/352715

Mentioned in SAL (#wikimedia-operations) [2017-05-15T21:13:08Z] <mobrovac@tin> Started deploy [restbase/deploy@c52add0]: Expose the new /transform/wikitext/to/lint end point to the public - T163091

Mentioned in SAL (#wikimedia-operations) [2017-05-15T21:19:40Z] <mobrovac@tin> Finished deploy [restbase/deploy@c52add0]: Expose the new /transform/wikitext/to/lint end point to the public - T163091 (duration: 06m 32s)

mobrovac closed this task as Resolved.May 15 2017, 9:21 PM
mobrovac assigned this task to Arlolra.

The public end point is now live. Resolving.

Mentioned in SAL (#wikimedia-operations) [2017-05-15T22:16:50Z] <mobrovac@tin> Started deploy [restbase/deploy@d98af6f]: Wt2lint bug fix - T163091

Mentioned in SAL (#wikimedia-operations) [2017-05-15T22:23:34Z] <mobrovac@tin> Finished deploy [restbase/deploy@d98af6f]: Wt2lint bug fix - T163091 (duration: 06m 44s)