Some context to help make sense of requirements further below:
- A lot of code in MediaWiki assume the presence of ParserCache and work with that internal function-level API to access parser output.
- Parsoid has its content cached (stored) in RESTBase and Parsoid clients interact with the RESTBase HTTP API to access Parsoid output and do transformations. But, some of these clients will switch over to accessing Parsoid internally via a function-level API instead of the HTTP API.
- Our understanding is that Platform Engineering Is phasing out RESTBase and transitioning that functionality into other components. Given that, our understanding is that RESTBase functionality will be transitioned over to ParserCache. So, that means:
** ParserCache needs to provide multi-bucket support and ability to tie them together with a key (revid / tid, etc.). Parsoid produces 3 components per page: HTML, data-parsoid JSON blob, and data-mw JSON blob. For networking and computational efficiency reasons, these are stored separately in RESTBase (minor detail: data-mw is not stored separately right now, but will be if RESTBase continues to be around). Not all Parsoid clients need all blobs. So, the API needs to be able to fetch individual blobs.
** ParserCache (or whatever code component it is) needs to support the stashing functionality for editing clients to provide "storage semantics" (instead of caching semantics where cached content can get evicted arbitrarily as far as clients are concerned) so presence of stashed content is guaranteed within session / time windows. RESTBase provides this.
** The REST API needs to be integrated with ParserCache at some layer so that all REST API requests don't result in fresh parse requests to Parsoid.
In addition to supporting RESTBase functionality, @EvanProdromou has framed this enhanced-ParserCache functionality as a Multi-Parser-Cache (MPC from here on) solution for the following reasons:
* Switchover from core parser to Parsoid read views is going to be done in a phased manner and there might be reverts, etc. So, for quite a while, MPC needs to support caching of output from both core parser as well as Parsoid.
* Parsoid's HTML blob is roughly the same size as the core parser's HTML blob. However, Parsoid produces two additional blobs (data-parsoid & data-mw) which also need to be stored in MPC.
* Because of the two reasons above, MPC will have much higher storage needs compared to ParserCache.
* MPC should provide an unified library interface that supports both ParserCache as well as RESTBase functionality to minimize code churn for existing ParserCache and RESTBase / Parsoid clients.