Storage
During the transition period (from Parsoid/JS to Parsoid/PHP) we need to be able to store and retrieve both versions of the (title,revision) tuple. We have to do this in such a way as not to mix the two, i.e. we have to treat the two versions of Parsoid as (virtually) separate services. That is to say that each back-end has to have its own set of storage tables. Currently, we use two key_value buckets: parsoid and parsoid-stash. At the end of this process, we will have two more: parsoidphp and parsoidphp-stash (cf. T230792: Create Parsoid/PHP tables in Cassandra).
Client Variant Selection
Ideally, RESTBase would be able to detect which back-end to use, but this is probably an overkill since it would be useful only during the transition period. Instead, it should rely on clients letting it know which Parsoid HTML variant they want to use (caveat: this requires clients to enforce consistency). Clients can signal which variant to use by setting the X-Parsoid-Variant HTTP header (valid values are JS and PHP). If no such header is provided, it should be assumed the client wants the JS variant. Furthermore, RESTBase must include this header in the response, as well as an appropriate Vary header that will allow the edge to properly manage the cached resources.
Variant Configuration and Usage
RESTBase's parsoid.js module will perform exactly the same for both variants, but will operate with different configuration:
- which host to send the back-end requests to
- which set of storage tables to use
Based on the incoming request's X-Parsoid-Variant header, RESTBase selects the appropriate back-end URI and tables. Because the table names have to be hard-coded, only the new back-end URI needs to be provided as the php_host configuration stanza to the parsoid.js module.
One final detail concerns background updates. Since RESTBase has to keep both variants for each page up to date (T229019: ChangePropagation should mirror reparse events to both Parsoid/PHP and Parsoid/JS), we need to ensure that update events emitted by RESTBase are sent exclusively when the JS variant is updated (as it is the default one) in order to avoid duplicate events in the system.