To enable switching between Wikitext and HTML in VisualEditor, we need an API that supports stashing (temporarily saving) the intermediate results of wt2html conversions. More precisely, we need to stash
- the original wikitext
- data-parsoid
- html
The stashed content can then be passed to Parsoid when switching back from HTML to wikitext, or when preparing to save the HTML after further editing. Parsoid uses this data to figure out what changed in the HTML (by diffing against the original HTML), and then using serialization tricks to reuse original wikitext chunks for areas that weren't changed (selective serialization), avoiding dirty diffs.
Stashing as a retention policy in the table storage backend
With 'stashing' I mean temporary storage of this content for a limited time, perhaps 24 hours, perhaps a week. In RESTBase, we could implement this as a specialized retention policy that sets a TTL on the data (in Cassandra), or occasionally deletes data older than the TTL (sqlite).
Using this scheme, we'd create separate stash buckets for wikitext, html and data-parsoid, using the same format as for the regular storage buckets.
API layout
The most natural option for exposing this seems to be in the transform APIs, especially the /transform/wikitext/to/html/ end point:
- Client signals the desire for stashing via a flag in the POST request
- Return HTML with an ETag of the form "<revision>/<uuid>/stash"
- Expect the client to pass this ETag back into /transform/html/to/wikitext/ end point, as If-Match header.
- Later, we could also add support for passing this ETag into the HTML save API for direct HTML saving based on the stash. It would make sense to use the base_etag POST parameter for this purpose, as serialization will be based on the stash rather than the original base revision.
Timeline
Editing have made switching between HTML and Wikitext a high priority for the first half of Q2. This means that they would like to have API support for this ASAP. We could consider cheating on the retention policy aspect, and starting with regular buckets. We'd have to eventually clean out those buckets manually (with a script).