Page MenuHomePhabricator

Make use of stashed data-parsoid mapping for html-to-wikitext transformation
Closed, ResolvedPublic

Description

API endpoints that transform html to wikiext should be able to make use of a previously stashed rendering to provide context for selser.

This is primarily a requirement for VisualEditor's action=visualeditoredit endpoint, but should also be supported by core endpoints, such as the (currently experimental) /teransform/html/to/wikitext endpoint in core, and the equivalent (legacy) enpoint in the Parsoid extension.

Context

Parsoid relies on the data-parsoid mapping when performing html to wikitext conversion. For this purpose, the data-parsoid mapping is stashed when Visual Editor loads the HTML to edit. When the edit is about to be saved, the modified HTML coming from the Visual Editor client needs to be converted to wikitext, with the help of the stashed data-parsoid mapping (identified by the edtag returned by the original request for editable HTML).

RESTbase's /teransform/html/to/wikitext endpoint implements this functionality by loading the stashed mapping, injected it into the HTML, and sends the result to the parsoid extension's transform endpoint.

As we move the stashing functionality into MediaWiki core, this functionality should be covered by the new transform endpoint in MW core and in the VisualEditor extension. This functionality would be implemented in a service or helper class, so it can be used by ApiVisualEditorEdit directly, bypassing the REST framework (T310377). The stashing backend is already implemented in the ParsoidOutputStash class, and is already populated when claling the page/html endpoint with stash=true.

See also T311819: Make the transform endpoint match ETags emitted by the page endpoint

Further information

  • The code in RESTbase that attaches the stashed data to the request that is then sent to parsoid lives in transformRevision in sys/parsoid.js.
  • RESTbase currently takes the etag from the If-Match header if present, and tries to extract it from an HTML meta-tag with property=mw:TimeUuid" eitherwise. We may want to support it as part of JSON payload as well. Using If-Match is problematic, see T233320, T238849, T310710, ...
  • stashed data-parsoid can be injected back into a DOM of the original HTML using the following code: PageBundle::apply( $oldBody->ownerDocument, $origPb ); This is currently in html2wt() on the ParsoidHandler base class.
  • The endpoint most relevant for VE is: /transform/pagebundle/to/wikitext/
  • Relevant e2e test in the parsoid extension: tests/api-testing/Parsoid.js, around line 1760
  • The process of selectively serializing HTML to Wikitext using a data-parsoid mapping is referred to as "SelSer". See ContentModelHandler::fromDOM().

Event Timeline

daniel updated the task description. (Show Details)
daniel updated the task description. (Show Details)
daniel triaged this task as High priority.Jun 29 2022, 10:31 AM

Change 814861 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] TransformHandler: Load stashed page bundle based on ETag.

https://gerrit.wikimedia.org/r/814861

daniel renamed this task from Make use of stashed data-parsoid mapping in transform endpoint to Make use of stashed data-parsoid mapping for html-to-tikitext transformation.Sep 23 2022, 8:45 AM
daniel renamed this task from Make use of stashed data-parsoid mapping for html-to-tikitext transformation to Make use of stashed data-parsoid mapping for html-to-wikitext transformation.
daniel updated the task description. (Show Details)

Change 831050 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] REST: HtmlInputTransformHelper: Load original data from stash

https://gerrit.wikimedia.org/r/831050

Change 831050 merged by jenkins-bot:

[mediawiki/core@master] REST: HtmlInputTransformHelper: Load original data from stash

https://gerrit.wikimedia.org/r/831050

Change 814861 merged by jenkins-bot:

[mediawiki/core@master] TransformHandler: Load stashed page bundle based on ETag.

https://gerrit.wikimedia.org/r/814861

Change 896380 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] TransformHandler: Load stashed page bundle based on ETag.

https://gerrit.wikimedia.org/r/896380

Change 896036 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/services/parsoid@master] Re-apply "Restore proper ETag handling""

https://gerrit.wikimedia.org/r/896036

Change 896380 merged by jenkins-bot:

[mediawiki/core@master] TransformHandler: Load stashed page bundle based on ETag.

https://gerrit.wikimedia.org/r/896380

Change 896036 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Re-apply "Restore proper ETag handling""

https://gerrit.wikimedia.org/r/896036

Change 901245 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/vendor@master] Bump parsoid to 0.18.0-a2

https://gerrit.wikimedia.org/r/901245

Change 901245 merged by jenkins-bot:

[mediawiki/vendor@master] Bump parsoid to 0.18.0-a2

https://gerrit.wikimedia.org/r/901245

Change 928624 had a related patch set uploaded (by C. Scott Ananian; author: Daniel Kinzler):

[mediawiki/services/parsoid@REL1_40] Re-apply "Restore proper ETag handling""

https://gerrit.wikimedia.org/r/928624

Change 928624 merged by jenkins-bot:

[mediawiki/services/parsoid@REL1_40] Re-apply "Restore proper ETag handling""

https://gerrit.wikimedia.org/r/928624