We are exposing parsoid endpoints from a number of MediaWiki REST endpoints, like /w/rest.php/{domain}/v3/page/html/{title} and /w/rest.php/v1/page/{title}/html. These currently are marked with a low TTL, because we have no mechanism in place to actively purge them. We need such a mechanism before we rely on these endpoints to replace the parsoid endpoints exposed by RESTbase, namely /api/rest_v1/page/html/{title}.
Currently, the endpoints exposed by RESTbase rely on puring implemented in the parsoid.js module, which emits events whenever it updates the stored/pregenerated copy of the parsoid output.
In a world without RESTbase, we will need to extend the mechanism we are currently using to purge page view URLs to also cover the HTML endpoints.
See also T308424: Determine http cache control and active purging for REST endpoints serving parsoid output
Impact/Risk Assessment
Number of external queries that would be affected by (not) currectly puring:
- /w/rest.php/v1/page/{title}/html: less than one per minute. No gateway. This endpoint is not advertized anywhere
- /w/rest.php/{domain}/v3/page/html: less than one per month. No gateway. This endpoint is not advertized anywhere.
- /api/rest_v1/page/html: more than 100 per second.
Top users of /api/rest_v1/page/html:
(numbers are per day, samled 1/128)
Plan
- HtmlCacheUpdater::getUrls should know at least about the canonical endpoint that exposes the page HTML.
- Support for purging /api/rest_v1/page/html should be implemented in Bethos, based on the corresponding resource-change event. resource-change events apply to parsoid output and old parser output alike.
- This must not be turned on as long as RESTbase is in the loop, as it would cause a race-condition when the purge is relayed to varnish before RESTbase has updated its internal store with the latest version!
- The switch is risky as long as there are many callers of /api/rest_v1/page/html
- We should then get the major users to switch to the canonical endpoint (though we may want to expose it through a nicer URL first, something like. /api/content.v1/page).
See T334238: Create deprecation plan for public parsoid endpoints and T328559: Replace usage of RESTbase parsoid endpoints
