Dummy version number bump for Parsoid html to nail down process for making changes to HTML format
Closed, ResolvedPublic0 Story Points


Parsoid HTML (and data-parsoid) is stored in RESTBase with a version number. This lets us make breaking changes to the HTML once in a while (as required). Clients / Parsoid would then have to do the right thing based on version (simplest would be to reject any requests that include an older version HTML and force-fetch updated HTML).

We need to nail down the process of what is involved in supporting this.

On the Parsoid end, we need to add a wiki page where HTML version number changes are documented as part of a ChangeLog.
On the RESTBase end, maybe it needs to provide an API end point where clients can request HTML with the latest version (or a specific version)?
How will clients like VE, Flow, CX handle this?

ssastry created this task.May 7 2015, 11:03 PM
ssastry updated the task description. (Show Details)
ssastry raised the priority of this task from to Normal.
ssastry added a subscriber: ssastry.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 7 2015, 11:03 PM
ssastry updated the task description. (Show Details)May 7 2015, 11:04 PM
ssastry set Security to None.
ssastry renamed this task from Dummy version number bump for Parsoid html to nail down process for supporting to Dummy version number bump for Parsoid html to nail down process for making changes to HTML format.

When you say, "force-fetch updated HTML", does that mean re-convert from wikitext to HTML?

For Flow, we store all content in HTML, so for format migrations we need either:

  1. Parsoid HTML v1 => Parsoid HTML v2 API we can use.
  2. Parsoid HTML v1 => wikitext => Parsoid HTML v2. This should just work as long as html2wt accepts v1, and wt2html outputs v2.

Note we currently use External Storage, so the old HTML will become orphaned (it's insert-only).

GWicke added a subscriber: GWicke.EditedMay 7 2015, 11:11 PM

RESTBase will also have the need to migrate stored HTML to a newer version. The idea is to provide an html2html end point in Parsoid that, given the old HTML, data-parsoid and mime type (with spec version number), upgrades the HTML to the latest HTML spec. We will also expose this end point in RESTBase.

@Mattflaschen: yes. Parsoid itself will conitnue to support multiple versions temporarily during transition, but clients like VE might not want to, for example, unless the format changes are simple. We have done this a couple times over the last 2 years.

So, as long as Flow preserves the version number header, you should be able to change format via pathway (2) you outlined. But, (1) is something we should think about as well since it is simpler and in some cases, we can do that migration without going through intermediate wikitext.

In CX, We save the HTML(translated content, through machine translation or manual translation) as drafts. Translator may postpone publishing and continue work on it and publish later. When they publish, the input HTML might be v1 because it was saved in CX some time back. When we publish, ie, html2wt, as long as html2wt accept v1, we are good.

One the Parsoid end, we also need to provide a html2html endpoint that provides this v1 -> v2 html upgrade service. Every time we bump the version string, we should also update that endpoint. We also need to think about how many older versions we will continue to support. Related qn. is how long should these version conversions be active. Depends on how long clients hang onto older versions. In the case of RESTBase, how long it takes for the versions to turn over in storage.

GWicke closed this task as Resolved.Jul 11 2017, 8:22 PM
GWicke claimed this task.

This is done. The latest Parsoid HTML spec lives at https://www.mediawiki.org/wiki/Specs/HTML/1.5.0, and we have since bumped the version several times. The underlying versioning policy is now documented in considerable detail at https://www.mediawiki.org/wiki/API_versioning.

Restricted Application added a project: User-Ryasmeen. · View Herald TranscriptJul 11 2017, 8:22 PM
Jdforrester-WMF set the point value for this task to 0.