Page MenuHomePhabricator

Have a Mechanism for Storing and Retrieving Parsoid HTML from JS and PHP
Closed, ResolvedPublic

Description

Storage

During the transition period (from Parsoid/JS to Parsoid/PHP) we need to be able to store and retrieve both versions of the (title,revision) tuple. We have to do this in such a way as not to mix the two, i.e. we have to treat the two versions of Parsoid as (virtually) separate services. That is to say that each back-end has to have its own set of storage tables. Currently, we use two key_value buckets: parsoid and parsoid-stash. At the end of this process, we will have two more: parsoidphp and parsoidphp-stash (cf. T230792: Create Parsoid/PHP tables in Cassandra).

Client Variant Selection

Ideally, RESTBase would be able to detect which back-end to use, but this is probably an overkill since it would be useful only during the transition period. Instead, it should rely on clients letting it know which Parsoid HTML variant they want to use (caveat: this requires clients to enforce consistency). Clients can signal which variant to use by setting the X-Parsoid-Variant HTTP header (valid values are JS and PHP). If no such header is provided, it should be assumed the client wants the JS variant. Furthermore, RESTBase must include this header in the response, as well as an appropriate Vary header that will allow the edge to properly manage the cached resources.

Variant Configuration and Usage

RESTBase's parsoid.js module will perform exactly the same for both variants, but will operate with different configuration:

  • which host to send the back-end requests to
  • which set of storage tables to use

Based on the incoming request's X-Parsoid-Variant header, RESTBase selects the appropriate back-end URI and tables. Because the table names have to be hard-coded, only the new back-end URI needs to be provided as the php_host configuration stanza to the parsoid.js module.

One final detail concerns background updates. Since RESTBase has to keep both variants for each page up to date (T229019: ChangePropagation should mirror reparse events to both Parsoid/PHP and Parsoid/JS), we need to ensure that update events emitted by RESTBase are sent exclusively when the JS variant is updated (as it is the default one) in order to avoid duplicate events in the system.

Related Objects

Event Timeline

mobrovac triaged this task as Medium priority.

PR #1207 accomplishes this. However, it is blocked on Parsoid failing to handle certain routes, cf. T235478: Varnish/ATS should not decode URIs for /w/rest.php.

mobrovac changed the task status from Open to Stalled.Oct 16 2019, 4:26 AM

PR #1207 accomplishes this. However, it is blocked on Parsoid failing to handle certain routes, cf. T235478: Varnish/ATS should not decode URIs for /w/rest.php.

This issue has been dealt with in Beta, so it's no longer blocking us. But We do have a shiny new blocker: T236382: Parsoid/PHP HTML output for PageBundle should contain Content-Language and Vary headers . Further testing/development is stalled until that is fixed.

mobrovac changed the task status from Stalled to Open.Oct 24 2019, 6:22 PM
mobrovac raised the priority of this task from Medium to High.

Both are unblocked now and we are ready to deploy this! Likely to happen next Monday, 2019-10-28

Change 546162 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[mediawiki/services/restbase/deploy@master] Initial config for Parsoid/PHP

https://gerrit.wikimedia.org/r/546162

Change 546162 merged by Mobrovac:
[mediawiki/services/restbase/deploy@master] Initial config for Parsoid/PHP

https://gerrit.wikimedia.org/r/546162

Mentioned in SAL (#wikimedia-operations) [2019-10-28T07:27:02Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@c500d7a]: Add the Parsoid proxy for JS/PHP variants, add top mediarequests end point and add mnwwiki and ge.wm.org - T230791 T235744 T236389

Mentioned in SAL (#wikimedia-operations) [2019-10-28T07:40:46Z] <mobrovac@deploy1001> Finished deploy [restbase/deploy@c500d7a]: Add the Parsoid proxy for JS/PHP variants, add top mediarequests end point and add mnwwiki and ge.wm.org - T230791 T235744 T236389 (duration: 13m 44s)

Mentioned in SAL (#wikimedia-operations) [2019-10-28T08:02:56Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@447981b]: Parsoid: Shim content-language and vary headers only for the PHP variant - T230791

Mentioned in SAL (#wikimedia-operations) [2019-10-28T08:16:38Z] <mobrovac@deploy1001> Finished deploy [restbase/deploy@447981b]: Parsoid: Shim content-language and vary headers only for the PHP variant - T230791 (duration: 13m 42s)

The Parsoid proxy RESTBase module is now live in production.

DannyS712 subscribed.

[batch] remove patch for review tag from resolved tasks