Page MenuHomePhabricator

Make cxserver call parsoid endpoints on MediaWiki, instead of going through RESTbase
Closed, ResolvedPublic

Description

Per T328559 we want to make internal calls to parsoid directly to the endpoints exposed by the extension, instead of going through RESTbase.

cxserver is currently sending about 150 req/s to parsoid. Switching it over would be a good test.

Potential issue: without the caching layer in restbase, latency may be a bit higher.

See https://github.com/wikimedia/mediawiki-services-cxserver/blob/master/config.prod.yaml#L72

See also the similar issue described in T311867.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@Nikerabbit I would like to know if the Language team can prioritise this work. The reason is because the next milestone for RESTBase Sunset is to fully remove Parsoid and before we do that we need the dependent services to call the new MW core endpoint.

How can we proceed? Do you need support during the transition?

Note that there are two options:

  1. call the parsoid endpoints exposed by the extension
  2. call the page endpoints exposed by core

The second option is preferred, the parsoid extension is basically a backwards compatibility shim. But the core API is a little different in some respects. If you need 100% compatibility, using the extension's endpoints would be ok for now.

Change 962920 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/services/cxserver@master] Use MediaWiki REST API endpoint instead of RESTbase

https://gerrit.wikimedia.org/r/962920

@daniel, @MSantos What would be web API for posting wikitext to /transform/wikitext/to/html? Could not see documentation for that at https://www.mediawiki.org/wiki/API:REST_API/Reference

@daniel, @MSantos What would be web API for posting wikitext to /transform/wikitext/to/html? Could not see documentation for that at https://www.mediawiki.org/wiki/API:REST_API/Reference

Got it - https://en.wikipedia.org/w/rest.php/en.wikipedia.org/v3/transform/wikitext/to/html

Change 962920 merged by jenkins-bot:

[mediawiki/services/cxserver@master] Use MediaWiki REST API endpoint instead of RESTbase

https://gerrit.wikimedia.org/r/962920

Change 964846 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2023-10-05-093231-production

https://gerrit.wikimedia.org/r/964846

Change 964846 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2023-10-11-045323-production

https://gerrit.wikimedia.org/r/964846

Change 965022 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2023-10-11-045323-production

https://gerrit.wikimedia.org/r/965022

Change 965022 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2023-10-12-080927-production

https://gerrit.wikimedia.org/r/965022

Change 967142 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/services/cxserver@master] Include host in the header to restbase API

https://gerrit.wikimedia.org/r/967142

Change 967142 merged by jenkins-bot:

[mediawiki/services/cxserver@master] Include host in the header to restbase API

https://gerrit.wikimedia.org/r/967142

The restbase endpoint is no longer working. What changed? @daniel, @MSantos

$ curl -H "Host: en.wikipedia.org"  https://en.wikipedia.org/w/rest.php/en.wikipedia.org/v3/transform/wikitext/to/html -X POST -d '{"wikitext": "[[Water]]"}' -H "Content-type: application/json"
{"messageTranslations":{"en":"The requested relative path (/en.wikipedia.org/v3/transform/wikitext/to/html) did not match any known handler"},"httpCode":404,"httpReason":"Not Found"}

I think we have a serious problem here.
At https://phabricator.wikimedia.org/T350219#9298055, @daniel wrote:

"Parsoid endpoints are not expected to work for external requests. So this is "working" as expected."

And that is what done with https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/965608

In that case, our developer machines, test instances or wmcloud cannot use this rest endpoint except the local wiki, if I understand correctly.
CX specifically need to connect to production parsoid instances for fetching pages. It does not make any sense to use local wiki pages for translation during development workflows. It will also limit our ability to debug issues related to production wiki pages.

CX specifically need to connect to production parsoid instances for fetching pages. It does not make any sense to use local wiki pages for translation during development workflows. It will also limit our ability to debug issues related to production wiki pages.

If all you need is fetching HTML, you can use the public core REST endpoints for that, e.g. https://en.wikipedia.org/w/rest.php/v1/page/Earth/with_html. On the WMF network you should go through the service mesh to access that endpoint (localhost:6500, IIRC).

If you need access to pagebundles or the transform endpoints, then we have to figure something out.

If you need access to pagebundles or the transform endpoints, then we have to figure something out.

We need tranform endpoints from production wikis in our nodejs based cxserver. We also need the same tranform endpoints while we test/debug/develop publishing as CX works with html and do wikitext tranformation at end. We use production restbase endpoints to avoid having templates and pages available in localwiki.

If you need access to pagebundles or the transform endpoints, then we have to figure something out.

We need tranform endpoints from production wikis in our nodejs based cxserver. We also need the same tranform endpoints while we test/debug/develop publishing as CX works with html and do wikitext tranformation at end. We use production restbase endpoints to avoid having templates and pages available in localwiki.

Ok, if you need the transform endpoint, that changes things. If this is for development and testing only, you could do what we do for the Mocha tests in RESTbase: use http://parsoid-external-ci-access.beta.wmflabs.org with the approproate Host header (partially broken but should soon be fixed, see T350353).

http://parsoid-external-ci-access.beta.wmflabs.org - Does this use actual production wiki? Or beta.wmflabs.org? If it is beta.wmflabs.org, then we will be limited by content and supported languages right?

More than CI, our development and debugging workflow need to access transform endpoints in production wikis to resolve templates. For example, we have tools like https://cxdebugger.toolforge.org/templates.html that we use to debug template adaptation across languages and that require transform happening at those language wikis.

http://parsoid-external-ci-access.beta.wmflabs.org - Does this use actual production wiki? Or beta.wmflabs.org? If it is beta.wmflabs.org, then we will be limited by content and supported languages right?

Right, it's beta wikis.

More than CI, our development and debugging workflow need to access transform endpoints in production wikis to resolve templates. For example, we have tools like https://cxdebugger.toolforge.org/templates.html that we use to debug template adaptation across languages and that require transform happening at those language wikis.

In that case, I'll have to leave it to @ssastry to figure something out.

Side note: There is an experimental transform endpoint in core that we could expose, nearly identical to the one in the parsoid extension. But it's unclear whether this is desirable to offer this endpoint to the public.

VE already has a transformation endpoint exposed that you can use, which they use to rerender templates after editing.

See: https://en.wikipedia.org/wiki/Special:ApiSandbox#action=visualeditor&format=json&page=Main_Page&paction=parsefragment&wikitext='''Hello%2C%20world''&formatversion=2

paction can be parse or parsefragment depending on what you need.

Change 971242 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/services/parsoid@master] Emit relative redirects

https://gerrit.wikimedia.org/r/971242

It seems we need to continue with restbase for the time being till a stable, well documented API is known as replacement, right?

cxserver(nodejs)->VisualEditor Extension -> parsoid internal endpoint(production) seems a hacky round about way. If VE already exposes the production parsoid endpoint via an API, why not make the parsoid endpoint public?

CX has many usecases where it need this html<->wikitext transformation for the language pairs. Since a development machine or beta cluster can not have that many languages and content like pages/templates, we rely on production restbase apis for our development, debug workflow.

If you are deprecating that, please ensure an alternative. If there is no alternative, it has serious implications on ability to develop/test against production wikis in various languages.

cxserver(nodejs)->VisualEditor Extension -> parsoid internal endpoint(production) seems a hacky round about way. If VE already exposes the production parsoid endpoint via an API, why not make the parsoid endpoint public?

VE calls the respective code in PHP directly now, it no longer goes through any web API to talk to Parsoid.

If you are deprecating that, please ensure an alternative. If there is no alternative, it has serious implications on ability to develop/test against production wikis in various languages.

Can you confirm that https://test.wikipedia.org/w/rest.php/coredev/v0/transform/html/to/wikitext/Main_Page does what you need? The endpoint should behave just like the one in RESTbase. It would be simple enough to enable if unde the /v1 prefix on all wikis. We just need to commit to supporting it as a public API in the future.

This endpoint was ported into core for completeness, but waas never exposed for lack of use cases.

https://test.wikipedia.org/w/rest.php/coredev/v0/transform/wikitext/to/html/Oxygen looks good. If this can be exposed for all production wikis, we can definitely move to this endpoint.

$ curl 'https://test.wikipedia.org/w/rest.php/coredev/v0/transform/wikitext/to/html/Oxygen' -H 'Content-Type: application/json' --data-raw '{"wikitext":"{{Infobox Person}}","body_only":true,"stash":true}'

<table width="100%" about="#mwt1" typeof="mw:Transclusion" data-mw='{"parts":[{"template":{"target":{"wt":"Infobox Person","href":"./Template:Infobox_Person"},"params":{},"i":0}}]}' id="mwAg"><tbody><tr><td align="left">
<dl><dd><b>{{{firstName}}}</b>  <b>{{{lastName}}}</b>.</dd></dl></td></tr><tr><td align="left">
<dl><dd></dd></dl>
[[Image:{{{image}}}|192px x 155px|thumb|right|{{{caption}}}]].</td></tr><tr><td align="left">
<dl><dd></dd>
<dd>Date Of Birth:{{{birth_date}}}.</dd>
<dd></dd>
<dd>Place Of Birth:{{{birth_place}}}.</dd>
<dd></dd>
<dd>Residence:{{{residence}}}.</dd>
<dd></dd>
<dd>Nationality:{{{nationality}}}.</dd>
<dd></dd>
<dd>Education:{{{education}}}.</dd>
<dd></dd>
<dd>Employer:{{{employer}}}.</dd>
<dd></dd>
<dd>Occupation:{{{occupation}}}.</dd>
<dd></dd>
<dd>Salary:{{{salary}}}.</dd></dl>
</td></tr></tbody></table>

Change 976152 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/services/cxserver@master] Use MediaWiki REST API endpoint instead of RESTbase

https://gerrit.wikimedia.org/r/976152

Change 976152 merged by jenkins-bot:

[mediawiki/services/cxserver@master] Use MediaWiki REST API endpoint instead of RESTbase

https://gerrit.wikimedia.org/r/976152

Change 977983 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2023-11-28-064518-production

https://gerrit.wikimedia.org/r/977983

@santhosh is there anything left for this task? Not related to it but to the overall goal, I believe that the only missing piece is to have it routed through REST Gateway rather than RESTBase, is that a fair assumption?

I believe the review and deployment of https://gerrit.wikimedia.org/r/977983 is pending. Looks like Alex has commented on the patch today and I'm expecting us to work on those next week.

Nikerabbit changed the task status from Open to In Progress.Jan 15 2024, 12:26 PM

Change 977983 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2023-12-04-083437-production

https://gerrit.wikimedia.org/r/977983