Page MenuHomePhabricator

Create deprecation plan for public parsoid endpoints
Open, HighPublic


Parsoid is exposing a number of public endpoints through RESTbase. With the sunsetting of RESTbase and the integration of parsoid into MediaWiki core, these endpoints should be decomissioned. However, they are still in use by external clients. A deprecation plan is needed.

Usage analysis:

  • /api/rest_v1/page/html/: about 100 req/s
    • main user is WikiMedia enterprise
    • about 8 req/s from REST-API-Crawler-Google/1.0
    • about 14 req/s without user agent, the vast majority originating from just four IP addresses
  • /api/rest_v1/transform/wikitext/to/html about 6 req/s
    • about 2.5 req/s from REST-API-Crawler-Google/1.0
    • about 2.5 req/s from a fake user agent that starts with "User-Agent", originating from a single IP address.
    • another 0.2 req/s from ServiceChecker-WMF/0.1.2
  • /api/rest_v1/transform/wikitext/to/lint about 1 req/min
    • Nearly all of them from the same IP address
  • /api/rest_v1/transform/html/to/wikitext about 1 req/min
    • Most of them from the same IP that also sends the lint requests.

Top users of /api/rest_v1/page/html:

Screenshot 2023-08-25 102139.png (365×893 px, 17 KB)

(numbers are per day, samled 1/128)

Related Objects

Event Timeline

I think the basic plan is this:

  • T335512: Talk to Wikimedia Enterprise and get them to transition from the old restbase endpoint to Daniel's new core page html endpoints
  • T335511: Talk to Google and get them to do a similar transition. This *may* require building a new endpoint for /transform/wikitext/to/html since there is no core equivalent for this right now (although there is one exposed via VE's action API)
  • T335513: For the /wikitext/to/lint and /html/to/wikitext endpoints I think @daniel's suggested plan was to turn these endpoints off for increasing periods of time (1hr, 1 day, 1 week) in the hope that will prompt whoever is using these (probably a community bot of some kind) to surface and file a bug / village pump request / etc and then we can properly evaluate the use and come up with a migration plan.
daniel triaged this task as Medium priority.Jun 5 2023, 6:18 PM
daniel moved this task from Unsorted to Parsoid pile on the RESTBase Sunsetting board.

Going to pop in here and stay ahead of breaking things for editors (since that seems to be the plan with T335513).

A script of mine actively uses /transform/html/to/wikitext and /page/html to perform template modifications without having to rewrite a wikitext parser in JavaScript and have it shipped to the browser on every page load. I've been keeping my eyes on RESTBase sunsetting for a long while now, and I have to ask: is there a migration plan for gadget/script developers? I have been unable to find documentation on new endpoints even after combing through all the Phab tasks related to Parsoid and RESTBase, and that really doesn't bode well with the idea of running a scream test to find out what breaks.

mw:Parsoid/API is outdated and doesn't even mention RESTBase deprecation. mw:RESTBase/deprecation and mw:RESTBase/service migration don't mention anything about Parsoid. mw:Manual:Rest.php which leads to mw:Parsoid#Development itself says "production WMF servers do not expose the Parsoid REST API to the external network", and it doesn't seem like that's changed. This ticket (nor does T335513) does not link to documentation of the sort. I know action=parse&parsoid=true exists, but there doesn't seem to be a (documented) way to perform the inverse conversion — HTML to wikitext — at least in the action API. Is there something I can read that will lead me off of the to-be-deprecated endpoints? Do new endpoints even exist? Some clarity would be appreciated.

MSantos raised the priority of this task from Medium to High.Oct 2 2023, 2:48 PM

Quick note: the task description is somewhat outdated, update to come