Page MenuHomePhabricator

POST /:domain/v3/transform/pagebundle/to/pagebundle/:title?/:revision?
Closed, ResolvedPublic

Description

Convert HTML in pagebundle format to HTML in another pagebundle format. Depends on the metadata within the page bundle.

Definition here: https://www.mediawiki.org/wiki/Parsoid/API#HTML_-%3E_HTML with more info here https://phabricator.wikimedia.org/T114413.

Event Timeline

WDoranWMF triaged this task as Medium priority.Aug 5 2019, 3:59 PM

The pb2pb function in routes.js calls functions like updateRedLinks and languageConversion. I searched, but didn't see PHP equivalents. Do they exist? If not, is it within the scope of this task to create them, or is that covered elsewhere?

That's my bad. rGPARdc6557d63c36: Add REST API endpoints for wt->html deleted the transpiled file src/Api/routes.php even though not all the functionality was moved to ParsoidHandler. Those methods will have to be rescued from there.
(Possibly the same is true for src/Api/ParsoidService.php although I'm not sure there's anything reusable there.)

Change 530573 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/services/parsoid@master] Implemented pb2bp function.

https://gerrit.wikimedia.org/r/530573

This endpoint has four (non-error) paths:

  1. update redlinks
  2. language conversion (variant)
  3. downgrade
  4. wt2html

I'm using the following payloads to test these four paths:

Update redlinks:

{
  "original": {
    "html": {
      "headers": {
        "content-type": "text/html; charset=utf-8; profile=\"https://www.mediawiki.org/wiki/Specs/HTML/1.2.1\""
      },
      "body": "<!DOCTYPE html><html prefix=\"dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/\"><head prefix= \"mwr: http://localhost:8080/wiki/Special:Redirect/\"><meta charset=\"utf-8\"/><meta property=\"isMainPage\" content=\"true\"/><meta property=\"mw:html:version\" content=\"2.1.0\"/><link rel=\"dc:isVersionOf\" href=\"http://localhost:8080/wiki/Main%20Page\"/><title>Main Page</title><base href=\"http://localhost:8080/wiki/\"/><link rel=\"stylesheet\" href=\"/w/load.php?modules=mediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.skinning.content.parsoid%7Cmediawiki.skinning.interface%7Cskins.vector.styles%7Csite.styles%7Cext.cite.style%7Cext.cite.styles%7Cmediawiki.page.gallery.styles&amp;only=styles&amp;skin=vector\"/><!--[if lt IE 9]><script src=\"/w/load.php?modules=html5shiv&amp;only=scripts&amp;skin=vector&amp;sync=1\"></script><script>html5.addElements(\"figure-inline\");</script><![endif]--></head><body data-parsoid='{\"dsr\":[0,485,0,0]}' class=\"mw-content-ltr sitedir-ltr ltr mw-body-content parsoid-body mediawiki mw-parser-output\" dir=\"ltr\"><!--If you are reading the source wikitext of the main page: This page is created by Vagrant, but it is safe to do changes by editing it. Automatic updates will happen via [[Template: Main Page]]. If you are reading puppet/modules/mediawiki/files/main_page.wiki: DO NOT MODIFY THIS FILE! Updates would cause all local changes to be overwritten. If you want to add content to the main page, use main_page_template.wiki instead.--><p about=\"#mwt1\" typeof=\"mw:Transclusion\" data-parsoid='{\"dsr\":[472,485,0,0],\"pi\":[[]]}' data-mw='{\"parts\":[{\"template\":{\"target\":{\"wt\":\"Main Page\",\"href\":\"./Template:Main_Page\"},\"params\":{},\"i\":0}}]}'><b>Welcome to MediaWiki-Vagrant!</b></p><span about=\"#mwt1\"></span><ul about=\"#mwt1\"><li><a rel=\"mw:ExtLink\" href=\"//www.mediawiki.org/wiki/MediaWiki-Vagrant\" class=\"external text\">MediaWiki-Vagrant help</a></li><li><a rel=\"mw:ExtLink\" href=\"//www.mediawiki.org/wiki/Help:Contents\" class=\"external text\">MediaWiki help</a></li></ul><span about=\"#mwt1\"></span><h2 about=\"#mwt1\" id=\"Help_for_enabled_roles\">Help for enabled roles</h2><span about=\"#mwt1\"></span><dl about=\"#mwt1\"><dd><i>(run <code>vagrant roles help</code> in the directory where you installed MediaWiki-Vagrant to get help about enabling roles)</i></dd></dl><span about=\"#mwt1\"></span><p about=\"#mwt1\"><a rel=\"mw:WikiLink\" href=\"./Special:Prefixindex/VagrantRole\" title=\"Special:Prefixindex/VagrantRole\">Special:Prefixindex/VagrantRole</a></p></body></html>"
    }
  },
  "updates": {
    "redlinks": 1,
    "variant": 0
  }
}

Language conversion:

{
  "original": {
    "html": {
      "headers": {
        "content-type": "text/html; charset=utf-8; profile=\"https://www.mediawiki.org/wiki/Specs/HTML/1.2.1\""
      },
      "body": "<!DOCTYPE html><html prefix=\"dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/\"><head prefix= \"mwr: http://localhost:8080/wiki/Special:Redirect/\"><meta charset=\"utf-8\"/><meta property=\"isMainPage\" content=\"true\"/><meta property=\"mw:html:version\" content=\"2.1.0\"/><link rel=\"dc:isVersionOf\" href=\"http://localhost:8080/wiki/Main%20Page\"/><title>Main Page</title><base href=\"http://localhost:8080/wiki/\"/><link rel=\"stylesheet\" href=\"/w/load.php?modules=mediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.skinning.content.parsoid%7Cmediawiki.skinning.interface%7Cskins.vector.styles%7Csite.styles%7Cext.cite.style%7Cext.cite.styles%7Cmediawiki.page.gallery.styles&amp;only=styles&amp;skin=vector\"/><!--[if lt IE 9]><script src=\"/w/load.php?modules=html5shiv&amp;only=scripts&amp;skin=vector&amp;sync=1\"></script><script>html5.addElements(\"figure-inline\");</script><![endif]--></head><body data-parsoid='{\"dsr\":[0,485,0,0]}' class=\"mw-content-ltr sitedir-ltr ltr mw-body-content parsoid-body mediawiki mw-parser-output\" dir=\"ltr\"><!--If you are reading the source wikitext of the main page: This page is created by Vagrant, but it is safe to do changes by editing it. Automatic updates will happen via [[Template: Main Page]]. If you are reading puppet/modules/mediawiki/files/main_page.wiki: DO NOT MODIFY THIS FILE! Updates would cause all local changes to be overwritten. If you want to add content to the main page, use main_page_template.wiki instead.--><p about=\"#mwt1\" typeof=\"mw:Transclusion\" data-parsoid='{\"dsr\":[472,485,0,0],\"pi\":[[]]}' data-mw='{\"parts\":[{\"template\":{\"target\":{\"wt\":\"Main Page\",\"href\":\"./Template:Main_Page\"},\"params\":{},\"i\":0}}]}'><b>Welcome to MediaWiki-Vagrant!</b></p><span about=\"#mwt1\"></span><ul about=\"#mwt1\"><li><a rel=\"mw:ExtLink\" href=\"//www.mediawiki.org/wiki/MediaWiki-Vagrant\" class=\"external text\">MediaWiki-Vagrant help</a></li><li><a rel=\"mw:ExtLink\" href=\"//www.mediawiki.org/wiki/Help:Contents\" class=\"external text\">MediaWiki help</a></li></ul><span about=\"#mwt1\"></span><h2 about=\"#mwt1\" id=\"Help_for_enabled_roles\">Help for enabled roles</h2><span about=\"#mwt1\"></span><dl about=\"#mwt1\"><dd><i>(run <code>vagrant roles help</code> in the directory where you installed MediaWiki-Vagrant to get help about enabling roles)</i></dd></dl><span about=\"#mwt1\"></span><p about=\"#mwt1\"><a rel=\"mw:WikiLink\" href=\"./Special:Prefixindex/VagrantRole\" title=\"Special:Prefixindex/VagrantRole\">Special:Prefixindex/VagrantRole</a></p></body></html>"
    }
  },
  "updates": {
    "redlinks": 0,
    "variant": 1
  }
}

downgrade:

I do not yet have a testing payload for this case

wt2html

{
  "original": {
    "html": {
      "headers": {
        "content-type": "text/html; charset=utf-8; profile=\"https://www.mediawiki.org/wiki/Specs/HTML/2.1.0\""
      },
      "body": "<!DOCTYPE html><html prefix=\"dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/\"><head prefix= \"mwr: http://localhost:8080/wiki/Special:Redirect/\"><meta charset=\"utf-8\"/><meta property=\"isMainPage\" content=\"true\"/><meta property=\"mw:html:version\" content=\"2.1.0\"/><link rel=\"dc:isVersionOf\" href=\"http://localhost:8080/wiki/Main%20Page\"/><title>Main Page</title><base href=\"http://localhost:8080/wiki/\"/><link rel=\"stylesheet\" href=\"/w/load.php?modules=mediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.skinning.content.parsoid%7Cmediawiki.skinning.interface%7Cskins.vector.styles%7Csite.styles%7Cext.cite.style%7Cext.cite.styles%7Cmediawiki.page.gallery.styles&amp;only=styles&amp;skin=vector\"/><!--[if lt IE 9]><script src=\"/w/load.php?modules=html5shiv&amp;only=scripts&amp;skin=vector&amp;sync=1\"></script><script>html5.addElements(\"figure-inline\");</script><![endif]--></head><body data-parsoid='{\"dsr\":[0,485,0,0]}' class=\"mw-content-ltr sitedir-ltr ltr mw-body-content parsoid-body mediawiki mw-parser-output\" dir=\"ltr\"><!--If you are reading the source wikitext of the main page: This page is created by Vagrant, but it is safe to do changes by editing it. Automatic updates will happen via [[Template: Main Page]]. If you are reading puppet/modules/mediawiki/files/main_page.wiki: DO NOT MODIFY THIS FILE! Updates would cause all local changes to be overwritten. If you want to add content to the main page, use main_page_template.wiki instead.--><p about=\"#mwt1\" typeof=\"mw:Transclusion\" data-parsoid='{\"dsr\":[472,485,0,0],\"pi\":[[]]}' data-mw='{\"parts\":[{\"template\":{\"target\":{\"wt\":\"Main Page\",\"href\":\"./Template:Main_Page\"},\"params\":{},\"i\":0}}]}'><b>Welcome to MediaWiki-Vagrant!</b></p><span about=\"#mwt1\"></span><ul about=\"#mwt1\"><li><a rel=\"mw:ExtLink\" href=\"//www.mediawiki.org/wiki/MediaWiki-Vagrant\" class=\"external text\">MediaWiki-Vagrant help</a></li><li><a rel=\"mw:ExtLink\" href=\"//www.mediawiki.org/wiki/Help:Contents\" class=\"external text\">MediaWiki help</a></li></ul><span about=\"#mwt1\"></span><h2 about=\"#mwt1\" id=\"Help_for_enabled_roles\">Help for enabled roles</h2><span about=\"#mwt1\"></span><dl about=\"#mwt1\"><dd><i>(run <code>vagrant roles help</code> in the directory where you installed MediaWiki-Vagrant to get help about enabling roles)</i></dd></dl><span about=\"#mwt1\"></span><p about=\"#mwt1\"><a rel=\"mw:WikiLink\" href=\"./Special:Prefixindex/VagrantRole\" title=\"Special:Prefixindex/VagrantRole\">Special:Prefixindex/VagrantRole</a></p></body></html>"
    },
    "data-parsoid": {
      "body": []
    }
  }
}

Here are some loose notes and thoughts that arose while implementing this task. Some of these may be due to my incomplete understanding.

  1. Many classes need better class comments to describe the purpose of the class
  2. Should TransformHandler (srv/parsoid/extension/Rest/Handler/TransformHandler.php) be renamed to ParsoidTransformHandler? The current naming sounds to me more like it handles generic transforms for the REST API than being Parsoid-specific.
  3. The “Parsoid” class itself presents a beautifully clean interface to the outside world, consisting of only 3 functions: wikitext2html, html2wikitext, and html2html. However, srv/parsoid/extension/Rest/Handler/ParsoidHandler uses ten classes in namespace Parsoid other than the Parsoid class itself, which seems like a lot. Which of the Parsoid classes should be used by outside code, and what convention can we establish to keep other classes from “leaking out”?
  4. Related to the previous, the Env class is used in several places in extension code. Per T221988, this is undesirable.
  5. Do we intend to keep the MWParsoid namespacing long term, after the temporary extension approach is removed? Do we have conventions about what goes into namespace MWParsoid vs namespace Parsoid?
  6. We do “new Parsoid” in two places in class ParsoidHandler. Do we prefer to directly instantiate instances of the Parsoid class rather than using a factory?
  7. Functions in ParsoidHandler and TransformHandler do a lot of testing on $from and $format, leading to heavy multipurpose handlers. Would we prefer more focused handlers, each with less code? (And if so, is there a way to get there from here?)
  8. PageBundle has a PORT-FIXME comment indicating it is a placeholder. It is important to several REST API endpoints, which makes finishing it seem like a priority. For example, code in ParsoidHandler adds an empty “ids” array so that code in PageBundle::validate() will pass, but this seems like it sidesteps actual validation.
  9. DomDataUtils (srv/parsoid/src/Utils/DomDataUtils.php) contains a PORT-FIXME in storeDataAttribs() that causes a runtime error. The related comment is “semver equivalent code required”. The related code forces a call on a null variable, which blocks execution (and therefore testing) of some code paths.
  10. https://www.mediawiki.org/wiki/Parsoid/API#HTML_-%3E_HTML does not mention a data-parsoid parameter, but it appears to be required (perhaps inlined in the html, which would require extraction). Should this be added to the docs?
  11. The parse() function in routes.js supported a “reuseExpansions” option to prevent unnecessary recomputation. This was used by wt2html() and pb2pb(). This does not appear to be supported in PHP. Should it be?

We probably want individual tasks for several of these, and I'm happy to create them. But I'm posting this rough list here first, to capture the thoughts.

  1. Should TransformHandler (srv/parsoid/extension/Rest/Handler/TransformHandler.php) be renamed to ParsoidTransformHandler? The current naming sounds to me more like it handles generic transforms for the REST API than being Parsoid-specific.

The namespace being "MWParsoid\Rest\Handler" should indicate it's specific to the MediaWiki-Parsoid integration.

  1. The “Parsoid” class itself presents a beautifully clean interface to the outside world, consisting of only 3 functions: wikitext2html, html2wikitext, and html2html. However, srv/parsoid/extension/Rest/Handler/ParsoidHandler uses ten classes in namespace Parsoid other than the Parsoid class itself, which seems like a lot. Which of the Parsoid classes should be used by outside code, and what convention can we establish to keep other classes from “leaking out”?
  2. Related to the previous, the Env class is used in several places in extension code. Per T221988, this is undesirable.

Overall a good question. My first take would be that the Parsoid class itself plus the classes included in its public interface (and those classes, etc) are what should be used. Env, for one, is specifically not intended for use outside of Parsoid.

Eventually explicit documentation would probably be helpful.

  1. Do we intend to keep the MWParsoid namespacing long term, after the temporary extension approach is removed? Do we have conventions about what goes into namespace MWParsoid vs namespace Parsoid?

The namespace will be renamed when the code is merged into MediaWiki core to be under "MediaWiki\" in some form. Exact bikeshed color yet to be determined.

  1. We do “new Parsoid” in two places in class ParsoidHandler. Do we prefer to directly instantiate instances of the Parsoid class rather than using a factory?

Eventually you'll get a parser instance of some sort from MediaWikiServices. Whether that's directly a Parsoid object, or some MediaWiki-specific wrapper that does things like wraps a ParserOptions in a PageConfig for you, is yet to be determined AFAIK.

  1. PageBundle has a PORT-FIXME comment indicating it is a placeholder. It is important to several REST API endpoints, which makes finishing it seem like a priority. For example, code in ParsoidHandler adds an empty “ids” array so that code in PageBundle::validate() will pass, but this seems like it sidesteps actual validation.
  2. DomDataUtils (srv/parsoid/src/Utils/DomDataUtils.php) contains a PORT-FIXME in storeDataAttribs() that causes a runtime error. The related comment is “semver equivalent code required”. The related code forces a call on a null variable, which blocks execution (and therefore testing) of some code paths.
  3. https://www.mediawiki.org/wiki/Parsoid/API#HTML_-%3E_HTML does not mention a data-parsoid parameter, but it appears to be required (perhaps inlined in the html, which would require extraction). Should this be added to the docs?
  4. The parse() function in routes.js supported a “reuseExpansions” option to prevent unnecessary recomputation. This was used by wt2html() and pb2pb(). This does not appear to be supported in PHP. Should it be?

These are good questions for Subbu's team.

Thanks @Anomie. That helps clarify things.

@SubrahamanyamVarma , @Arlolra , @cscott, any thoughts on the above? Particularly #8-#11, but thoughts on #3 and #4 would also be welcome. If there's somewhere I can find existing discussion/roadmaps that I've missed, please let me know. I'll mention @Tgr as well, even though I know he's rolled off this project for now, in case he has thoughts to share.

  1. Should TransformHandler (srv/parsoid/extension/Rest/Handler/TransformHandler.php) be renamed to ParsoidTransformHandler? The current naming sounds to me more like it handles generic transforms for the REST API than being Parsoid-specific.

Anything in Parsoid should be assumed Parsoid-specific unless indicated otherwise, IMO.

  1. Related to the previous, the Env class is used in several places in extension code. Per T221988, this is undesirable.

It's needed for three things: to hold the output content version (could be replaced with a property of the Handler instance, and added to the Parsoid interface), to have a container for site config etc. so the PHP code can have the same structure as the JS one (can be refactored once the PHP code is reasonably well tested and you don't care about having a parallel structure anymore) and for logging (the whole Env-based logging should be killed with fire and replaced with PSR-3 logging).

  1. Functions in ParsoidHandler and TransformHandler do a lot of testing on $from and $format, leading to heavy multipurpose handlers. Would we prefer more focused handlers, each with less code? (And if so, is there a way to get there from here?)

Two handlers for the two path prefixes looks like a reasonable setup to me. Some of their content, and most of ParsoidHandler, should be moved to helper classes once you don't care about keeping close to the JS code.

  1. PageBundle has a PORT-FIXME comment indicating it is a placeholder. It is important to several REST API endpoints, which makes finishing it seem like a priority. For example, code in ParsoidHandler adds an empty “ids” array so that code in PageBundle::validate() will pass, but this seems like it sidesteps actual validation.

Pretty sure that came from the JS code, so it's not about being finished. Due to the code coming from JS which has weaker type support, there is lots of code in Parsoid which is "arrayly typed" and should be converted into proper object members; ids in page bundles is one of those.

  1. DomDataUtils (srv/parsoid/src/Utils/DomDataUtils.php) contains a PORT-FIXME in storeDataAttribs() that causes a runtime error. The related comment is “semver equivalent code required”. The related code forces a call on a null variable, which blocks execution (and therefore testing) of some code paths.

I used Composer\Semver\Semver in the handlers, probably the same should be done here as well.

  1. https://www.mediawiki.org/wiki/Parsoid/API#HTML_-%3E_HTML does not mention a data-parsoid parameter, but it appears to be required (perhaps inlined in the html, which would require extraction). Should this be added to the docs?

The API is doucmented as converting Parsoid HTML to Parsoid HTML. What kind of attributes elements contain in Parsoid HTML can be considered internal detail IMO.

Thank you @Tgr . It seems much of this falls under "just convert it from JS for now, and worry about cleaning it up later". I have no objection, and that helps me understand how to approach things.

Regarding this:

https://www.mediawiki.org/wiki/Parsoid/API#HTML_-%3E_HTML does not mention a data-parsoid parameter, but it appears to be required (perhaps inlined in the html, which would require extraction). Should this be added to the docs?

The API is doucmented as converting Parsoid HTML to Parsoid HTML. What kind of attributes elements contain in Parsoid HTML can be considered internal detail IMO.

I still don't understand where ParsoidHandler::pb2pb() gets $revision['data-parsoid']['body']. One section of the docs shows a page bundle description with data-parsoid, but another section of the docs shows an example HTML => HTML payload without data-parsoid .

To rephrase my question: is there no "data-parsoid" in the "HTML -> HTML" example payload because this is a concise example payload for the docs, and a real payload would include "data-parsoid"? Or do I get data-parsoid from somewhere else? If I need to extract it from the html, how do I do that? I looked for extraction code in both the PHP and JS trees, but did not find it, nor did I find where the JS version of pb2pb called such code.

In code review of the attached change, @mobrovac said this, which may be related:

it may happen that $revision['data-parsoid']['body'] is in-lined in the html like data-mw, and in such case it should be appropriately extracted before this step

I very much feel like I'm missing something.

@BPirkle Parsoid produces/consumes three blobs: HTML, data-parsoid and data-mw. They can be supplied to it separately or bundled together, or ... as a combination of the two approaches. Given an HTML tag, Parsoid can receive/send the blob separate or combined:

separate
{
  "html": {
    "headers": { ... },
    "body": "<a rel='mw:Wikilink' data-mw='...'>blah</a>"
  },
  "data-parsoid": {
    "headers": { ... },
    "body": { ... }
  }
}
combined
{
  "html": {
    "headers": { ... },
    "body": "<a rel='mw:Wikilink' data-mw='...' data-parsoid='...'>blah</a>"
  }
}

Parsoid/JS understands both payloads, and so should Parsoid/PHP. Both are used in production:

  • the separate blob is the way RESTBase communicates with Parsoid
  • the combined blob is used by VE on private wikis (or, in general wherever RB is not set up)

I looked for extraction code in both the PHP and JS trees, but did not find it, nor did I find where the JS version of pb2pb called such code.

Isn't that apiUtils.extractPageBundle?

I'm not sure about the rest, my vague impression is that this is a way to override the data-parsoid attribute which is present in Parsoid-generated HTML, so it's an optional parameter and one a typical client would not need.

I looked for extraction code in both the PHP and JS trees, but did not find it, nor did I find where the JS version of pb2pb called such code.

Isn't that apiUtils.extractPageBundle?

@Tgr and @mobrovac , thank you. That makes more sense.

It turns out that my confusion regarding data-parsoid is irrelevant. Per @ssastry in IRC regarding reuseExpansions:

tldr on that is: that feature has been disabled in the JS codebase for the past 3 years and restbase has removed the support for that since then .. so, it won't be supported on the php side either right now .. eventually we will get around to it, but we can tackle it at that time. so you can tick that off your list for now.

Per earlier discussion in this task, all this code may look very different by the time reuseExpansions is functional, so trying to prepare for it at this time seems counterproductive. I'm going to remove the unnecessary code from ParsoidHandler and move on.

Edit: apologies for tagging the wrong person, I've corrected it above.

It turns out that my confusion regarding data-parsoid is irrelevant. Per @SubrahamanyamVarma in IRC regarding reuseExpansions:

I am @ssastry, btw. :-) I'll untag the other person.

Change 530573 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Implemented pb2pb function.

https://gerrit.wikimedia.org/r/530573

Change 540924 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Start porting content negotiation

https://gerrit.wikimedia.org/r/540924

Change 541580 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Implement pb2pb downgrades

https://gerrit.wikimedia.org/r/541580

Change 541656 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Port update redlinks pb2pb endpoint

https://gerrit.wikimedia.org/r/541656

Change 540924 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Start porting content negotiation

https://gerrit.wikimedia.org/r/540924

Change 541580 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Implement pb2pb downgrades

https://gerrit.wikimedia.org/r/541580

Change 541656 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Port update redlinks pb2pb endpoint

https://gerrit.wikimedia.org/r/541656

With the above patches merged, the mocha tests for this endpoint are passing. They can be run with the following,

$wgParsoidSettings = [
	'linting' => true,
	'useSelser' => true,
	'debugApi' => "http://localhost:12345/api.php",
	'wt2htmlLimits' => [
		'wikitextSize' => 20000,
	],
	'html2wtLimits' => [
		'htmlSize' => 10000,
	],
];

MOCKPORT=12345 node tests/mockAPI.js

PARSOID_URL=http://localhost:8080/rest.php/ npm run mocha

However, we'll need something better for CI before they drift. I've opened T233736 for that.

To cover questions 3 / 4 from #5439782, I've opened T235307.

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/540970 is still in review but getting that upstream will allow us to delete some duplicated code introduced in 988be37.

The language conversion code is also still being reviewed, which is necessary to call this task done.

Change 545391 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] [WIP] LC endpoints

https://gerrit.wikimedia.org/r/545391

Change 545391 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Add language converter endpoints to the REST API

https://gerrit.wikimedia.org/r/545391

The language conversion code is also still being reviewed, which is necessary to call this task done.

With that merged, let's consider this done and open follow ups where appropriate.