Page MenuHomePhabricator

Transition Enterprise MediaWiki from RESTBase to core page HTML endpoints
Open, HighPublic5 Estimated Story Points

Description

Enterprise MediaWiki is using /api/rest_v1/page/html/ from the RESTBase endpoint. This should be transitioned to the new core page HTML endpoint. This should be fairly straightforward, more-or-less just changing a URL (unless "special" features of the REST endpoint are/were being used).

The old endpoint follows the pattern https://en.wikipedia.org/api/rest_v1/page/html/{title}, whereas the new endpoint adopts a different pattern: https://en.wikipedia.org/w/rest.php/v1/page/{title}/html. In the new structure, the api/rest_v1 segment is replaced by w/rest.php/v1, representing a new path to access the RESTful API. Additionally, the position of the /html segment has been shifted to follow the {title} segment.

  • Deploying
  • Monitoring

Event Timeline

daniel triaged this task as High priority.Jun 5 2023, 6:17 PM
daniel moved this task from Unsorted to Parsoid pile on the RESTBase Sunsetting board.

The endpoint does not work for articles with slashes in the title.

For instance, for https://en.wikipedia.org/wiki/OS/2

https://en.wikipedia.org/w/rest.php/v1/page/OS/2/html

{
    "messageTranslations": {
        "en": "The requested relative path (/v1/page/OS/2/html) did not match any known handler"
    },
    "httpCode": 404,
    "httpReason": "Not Found"
}

Hi team, @daniel, @MSantos, @ssastry

In the last 18 hours, we had 231 instances of stopped after 10 redirects for namespace 6 related issue.
And, 121 instance of 500 Internal Server Error for non-wikitext pages.
Our dev environment is swamped with several issues cascading from this.
We will be reverting back to the old (restbase) API.

I will put this ticket to blocked until the above issues (https://phabricator.wikimedia.org/T353689 and https://phabricator.wikimedia.org/T353688 ) are fixed.

Essentially, we will not be able to use the new endpoint unless it can support all the namespaces & contentmodels supported by the old one.

Screenshot 2023-12-19 at 10.24.53 AM.png (617×878 px, 64 KB)

Screenshot 2023-12-19 at 10.23.16 AM.png (705×1 px, 94 KB)

Hi @MSantos, @daniel, team,

Switched to the new core endpoint. Observing a new issue - where for certain pages first 3-5 calls to the API result in 500 Internal server error.
Here are the relevant calls with errors and timestamp.

I am not able to reproduce these errors. When I put the URLs from the error messages into my browser, they seem to work fine (if somewhat slow). [EDIT: requests may be workign for me because your requests caused the page to be pushed into ParserCache eventually].

Given that these pages are all very large and complex, is it possible that you are seeing a timeout? Parsing would continue in the background, and after a few retries, the content would be availabel in the cache and the request would no longer time out.

I don't know why you would be seeing a 500 though - maybe some intermediate layer isn't handling timeouts nicely.

The new endpoint has a different cache regime. It is relying on MediaWiki's internal parser cache, which only contins pages that have been updated (directly or indirectly) in the last three weeks. Accessing uncached pages can be fairly slow. Typically, large and complex pages are also updated a lot. But of course, that is not always the case.

RESTbase used to pre-generate all rendered content and cache it indefinitely. This has been consuming a lot of resources to generate and store content that to a large part would never be requested before it was replaced by a new version. One motivation for sunsetting restbase was to end this wasteful practice.

The pages you list are all from projects with relatively little edits and traffic, which makes it more likely to encounter uncached pages.