Page MenuHomePhabricator

Reroute RESTbase Parsoid endpoints to core's REST endpoints
Open, MediumPublic

Description

tl;dr: determine if we can reroute callers from the Parsoid endpoint currently exposed through RESTBase to replacement endpoints in MW REST API. If so, make any necessary changes to the MW Core REST endpoints and do the rerouting.

Context:

RESTBase sunset involves (among a lot of other things) moving callers away from the RESTBase implementation of several endpoints that we collectively refer to as the "Parsoid endpoints". These endpoints current exist in the Parsoid extension under /{domain}/v3/page/html/{title}, and are exposed through RESTBase as /rest_v1/page/html/{title}. Alternative endpoints exist in the MediaWiki REST API under /v1/page/{title}/html.

The initial plan was to ask callers to move to the alternatives. This was planned, rather than rerouting the existing urls, because the alternatives are not 100% compatible, and the differences were considered large enough to be problematic. This plan was encoded into the WE5.1.3 hypothesis in the Annual Plan, which is scheduled for completion by the end of Sept. 2024.

It was also written into WE5.1.3 to "implement a canonical url structure with versioning for our REST API" A structure was proposed in T366835: REST: API modularization and versioning (tracking) and its subtasks, especially T364400: map the /api/ prefix to /w/rest.php and T362480: Introduces support for modules into the REST API framework. Subsequent discussion questioned whether this is the structure we really want, with some participants wanting to examine API url structure and versioning more holistically, considering everything we might expose under it, including APIs implemented outside MW but still associated with a particular wiki, and also standalone services not connected to any wiki. The timeline for such an examination put the timeframe for WE5.1.3 in question.

Backup Plan:

This raised the question of whether the incompatibilities between the existing "Parsoid" endpoints and the alternatives are really sufficient to prohibit rerouting, and whether it would be feasible to modify the alternative MW REST endpoint such that we can reroute the existing RESTBase "Parsoid" endpoints to the alternatives. This would remove these endpoints as blockers for RESTBase Sunset.

It would not achieve WE5.1.3 according to its current wording, as a canonical url structure would not be established. But it would achieve most of its goal: "enable service migration and testing for Parsoid endpoints ... by Q1". The part that would not be achieved is that WE5.1.3 also includes "and similar services" (in place of the ellipsis). Rerouting would, however, be a step forward and we could discuss changes to WE5.1.3, and possibly a follow-up hypothesis to achieve the rest.

This Task:

The purpose of this task is to determine if we can realistically use the rereroute option, and if so to implement that. If investigation shows that we cannot realistically reroute, even with modifications to the alternative endpoints, then this task can be closed without any technical changes being performed. The checklist below assumes we are able to reroute.

It is expected that if we are able to request rerouting, a separate task, maybe a subtask of this one, will be filed for SRE to track that work.

The beginnings of a survey of differences is here.

Event Timeline

@daniel comment from Slack:

On Friday I surveyed the old and new page HTML endpoints: https://docs.google.com/spreadsheets/d/10FaxUcD6y4Xjss21HfXUwVsH98RCWO7Bs9hhZuDTfFg/edit?gid=0#gid=0

There are fewer differences than I expected - most have vanished in the past year, are irrelevant (an extra meta-tag in the HTML head), or easy to fix (we can have a backwards-compatibility mode for error representations).

The only real difference seems to be the handling of wiki redirects when requesting old revisions. I would consider the old behavior to be undesirable (the API will trigger a HTTP redirect to the current revision of the target page), but it's a breaking change.

So it seems like we could re-route /api/rest_v1/page/html/{title} to /w/rest.php/v1/page/{title}/html.

However, the accompanying meta-data endpoints use incompatible response schemas:

We could introduce a backwards-compatibility mode there as well, but it would be more involved.

The trouble is, the KR/Hypothesis focuses on HTML, it doesn't directly mention meta-data endpoints. But we still need to migrate the meta-data endpoints, either under 5.1.3, or as "essential work" under the RESTbase deprecation umbrella.

@HCoplin-WMF comment from Slack:

Fwiw, it looks like the rest.php/[...]/bare endpoints are barely (pun intended) utilized.

Mozilla 5.0 (I guess just browsers?) has over 90% of the traffic, and it looks like it's less than 1k calls per week. The only observed callers in recent history (full month of August, sitting at 3.2k calls in total, over 2.8k of which are a single user-agent) are:

  • Axios
  • Browsers (eg: specific versions of firefox; looks like they're called across a handful of IPs though -- is it safe to assume these are bots and/or scrapers if they're posing as a browser?)
  • WME (less than 70 calls a month)
  • MediaWiki REST API doc examples
  • RecentChanges bot

Note for the comment above: the call volumes I listed there are underestimated since I was not aware of Turnlio sampling constraints. Please apply a 128x multiplier to account for sample rate. Still relatively limited adoption, but higher than those numbers reflect (by two+ orders of magnitude).

daniel renamed this task from Redirect Parsoid endpoints to REST to Redirect RESTbase Parsoid endpoints to core's REST endpoints.Sep 5 2024, 4:22 PM
daniel updated the task description. (Show Details)
BPirkle renamed this task from Redirect RESTbase Parsoid endpoints to core's REST endpoints to Reroute RESTbase Parsoid endpoints to core's REST endpoints.Sep 5 2024, 9:05 PM
BPirkle updated the task description. (Show Details)