Page MenuHomePhabricator

Parsoid-PHP should be publicly accessible in beta
Closed, ResolvedPublic

Description

Before the switch to a new box, parsoid11, we used to be able to access Parsoid-PHP in beta from outside beta cluster by pointing to https://en.wikipedia.beta.wmflabs.org/w/rest.php/{domain}/v3/{format}/{title} which doesn't seem to work anymore.

We need access to Parsoid-PHP from outside of beta cluster for local RESTBase testing and RESTBase CI.

Event Timeline

The rest_v1 api works, eg:

curl -X GET "https://en.wikipedia.beta.wmflabs.org/api/rest_v1/page/html/Main_Page" -H  'accept: text/html; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/HTML/2.1.0"'

It's only the direct access to rest.php which is "broken", eg:
https://en.wikipedia.beta.wmflabs.org/w/rest.php/en.wikipedia.beta.wmflabs.org/v3/html/Main_Page
gives me
404: The requested relative path (/en.wikipedia.beta.wmflabs.org/v3/html/Main_Page) did not match any known handler

But then again, production gives the same error:

https://en.wikipedia.org/w/rest.php/en.wikipedia.org/v3/html/Main_Page

So maybe this is user error in specifying the rest.php URL? (EDIT: see below: yes, these URLs are both working 'correctly' in returning the 404.)

cscott claimed this task.

I think this is not a bug. As documented in https://www.mediawiki.org/wiki/API:REST_API#Other_APIs :

There are two families of REST APIs related to Wikimedia projects: the MediaWiki REST API described on this page and the REST API built on RESTBase. While the MediaWiki REST API is part of the MediaWiki platform and can be enabled on any wiki running MediaWiki 1.34 or later, the RESTBase API serves content specific to Wikimedia projects.

A typical URL targeting the MediaWiki REST API is:
https://en.wikipedia.org/w/rest.php/v1/page/Main_Page/history
(and in fact history seems to be the only MediaWiki REST API endpoint at this time?)
The corresponding beta URL is:
https://en.wikipedia.beta.wmflabs.org/w/rest.php/v1/page/Main_Page/history
and that works fine.

All your analysis is correct, but in beta cluster we need to have direct access to Parsoid-PHP. It was set up to be accessible via https://en.wikipedia.beta.wmflabs.org/w/rest.php/en.wikipedia.beta.wmflabs.org/v3/html/Main_Page using some black magic, which I don't know how it worked. We need to resurrect that.

We need direct access to parsoid-php for testing RESTBase locally and in CI. I have set up parsoid-php.wmflabs.org as a proxy, but it doesn't work since MW needs a correct Host header.

Can you point me to what you were using this for? Using rest.php seems to be the wrong thing here, since that would make beta's configuration unlike production -- and we try to keep production and beta as similar as possible. There should/can be another way to directly access deployment-restbase02 that doesn't involve breaking the main /w/rest.php route.

In particular, it looks like there was an alias set up for parsoid-php-beta.wmflabs.org -- that's probably what you should be using instead of trying to make /w/rest.php do magic.

Ok, the issue seems to be that the restbase test suite uses https://en.wikipedia.beta.wmflabs.org/w/rest.php as a (sketchy) way to get to the parsoid instance. I don't know why exactly this used to work, probably because all beta mediawiki machines used to load the parsoid "extension", while now beta matches core in having only the parsoid 'cluster' load the parsoid extension.

I don't really want to load the parsoid extension on the main beta mediawiki machines, because that divergence will make future troubleshooting more difficult. Nothing else runs parsoid on the main mediawiki machines, so restbase tests could fail in interesting ways that none of the "usual" use of the beta cluster would, etc.

Most of the other unit tests use graphoid-beta.wmflabs.org, citeoid-beta.wmflabs.org, etc. So first order of business is to change this to parsoid-beta.wmflabs.org like the rest. That's done. But we also use MWScript.php to serve multiple domains from the same web host. So we still need to configure things so that we send an appropriate Host header. Probably some other stuff as well to make it work completely right?

Ok, we've set up the parsoid server with a floating IP and A record: parsoid-external-ci-access.beta.wmflabs.org. The nice descriptive domain name is thanks to @Krenair. That's an entry in Network > Floating IPs and one in DNS > Zones > beta.wmflabs.org > Record Sets in https://horizon.wikimedia.org if we have to adjust it later.

It's HTTP-only, not HTTPS. So, port 80.

$ curl -H 'Host: en.wikipedia.beta.wmflabs.org' http://parsoid-external-ci-access.beta.wmflabs.org/wiki/Special:Version

now works as you'd expect. I just have to update the restbase tests to match.

This comment was removed by cscott.

Why was this reopened? It has been completed and everyone involved was satisfied with the solution. I would assumed reopened by mistake, closing.