Page MenuHomePhabricator

Provide an API that allows stashing temporary wikitext / html conversion metadata
Closed, ResolvedPublic0 Estimated Story Points


To enable switching between Wikitext and HTML in VisualEditor, we need an API that supports stashing (temporarily saving) the intermediate results of wt2html conversions. More precisely, we need to stash

  • the original wikitext
  • data-parsoid
  • html

The stashed content can then be passed to Parsoid when switching back from HTML to wikitext, or when preparing to save the HTML after further editing. Parsoid uses this data to figure out what changed in the HTML (by diffing against the original HTML), and then using serialization tricks to reuse original wikitext chunks for areas that weren't changed (selective serialization), avoiding dirty diffs.

Stashing as a retention policy in the table storage backend

With 'stashing' I mean temporary storage of this content for a limited time, perhaps 24 hours, perhaps a week. In RESTBase, we could implement this as a specialized retention policy that sets a TTL on the data (in Cassandra), or occasionally deletes data older than the TTL (sqlite).

Using this scheme, we'd create separate stash buckets for wikitext, html and data-parsoid, using the same format as for the regular storage buckets.

API layout

The most natural option for exposing this seems to be in the transform APIs, especially the /transform/wikitext/to/html/ end point:

  • Client signals the desire for stashing via a flag in the POST request
  • Return HTML with an ETag of the form "<revision>/<uuid>/stash"
  • Expect the client to pass this ETag back into /transform/html/to/wikitext/ end point, as If-Match header.
  • Later, we could also add support for passing this ETag into the HTML save API for direct HTML saving based on the stash. It would make sense to use the base_etag POST parameter for this purpose, as serialization will be based on the stash rather than the original base revision.


Editing have made switching between HTML and Wikitext a high priority for the first half of Q2. This means that they would like to have API support for this ASAP. We could consider cheating on the retention policy aspect, and starting with regular buckets. We'd have to eventually clean out those buckets manually (with a script).

Event Timeline

GWicke raised the priority of this task from to High.
GWicke updated the task description. (Show Details)
GWicke set Security to None.
GWicke added subscribers: ssastry, GWicke, Krenair and 2 others.
GWicke edited subscribers, added: mobrovac, Eevans, Pchelolo; removed: Aklapper.

To be on the safe side, we should also include some client-specific info in the ETag, so as to be able to discern different clients editing the same title. that would also give us the opportunity to remove previous stashes for the same client and title when they switch multiple times back and forth between VE and wikitext.

@mobrovac, that sounds like extra complexity for little gain. The stash volume should be very low in any case, and we can set the TTL to 24 hours.

Spec PR 21 introduces the temp revision policy allowing users to automatically remove the content after grace_ttl seconds. Internally, it is rewritten to latest with count = 0.

This is now merged & deployed in production (yay, @mobrovac and @Pchelolo!). There is now a new 'stash' post flag in!/Transforms/post_transform_wikitext_to_html_title_revision to trigger stashing, and!/Transforms/post_transform_html_to_wikitext_title_revision accepts the If-Match header containing the ETag value returned from the previous wt2html conversion to work with the stashed content.