In order to be up-to-date with content, Parsoid uses MW hooks to notify it of any changes. This approach has known to cause problems, as it can overflow the MW API with requests in cases where a template transcluded in multiple pages is updated, since all of them need to be regenerated as well.
Until T84923 is resolved, RESTBase has opted for using the same update mechanism for updating the content in its storage. The update mechanism prompts RESTBase to place one call to the MW API (requesting revision info) and another to Parsoid for obtaining the refreshed content.
The problem lies in the fact that all of the aforementioned requests are made with Cache-Control: no-cache headers, causing the following chain of events for each page that needs updating:
- Jobrunner requests Parsoid to generate the new revision's HTML
- Parsoid fetches the content from MW API and generates it
- Jobrunner requests RESTBase to get a fresh copy of the content as well as revision info
- RESTBase calls the MW API's revprop
- RESTBase calls Parsoid's pagebundle endpoint
- Parsoid fetches the content from MW API
Concretely, the problem is fetching the content twice from the MW API (steps 2 and 6).
Since both update extensions monitor the same hooks and ultimately both update Parsoid's cache, the question is: can we deprecate Paroid's extension and rely on the RESTBase one to update it in order to minimise the impact on MW's API? That would probably need to involve some Varnish trickery given that Parsoid's update extension uses v1 API to refresh the content, while RESTBase relies on Parsoid v2 endpoints.