Page MenuHomePhabricator

Parsoid REST API routes in MediaWiki
Closed, ResolvedPublic

Description

We need to implement the routes that the ParsoidJS service exposes, as closely as possible.

Ideally we'd be able to point code that uses the ParsoidJS service interface at a MediaWiki instance and it would "just work".

Event Timeline

Ideally we'd be able to point code that uses the ParsoidJS service interface at a MediaWiki instance and it would "just work".

I think this should be a strong requirement. All of the initial talks and deployment strategies we have discussed with @ssastry so far suppose this will be the case. And since the goal is a 1:1 port, I don't see why having the exact same REST API hierarchy wouldn't be possible.

I imagine we'd want to discard the domain name segment at least? And possibly namespace things differently - the RESTBase approach is that we have these functional top-level categories like page and data, and then different services stick different endpoints into them, but RESTBase is hand-managed and MediaWiki needs to be automatically extensible and scale to hundreds of extensions, so we might want to end up with something like /parsoid/page/:format/:title/:revision?/ instead. (Although that does degrade the nice REST semantics a bit...)

Also T221176: GET /_version/ is probably not useful as is; if the contents are needed, we need to find a different place for them.

I imagine we'd want to discard the domain name segment at least? And possibly namespace things differently - the RESTBase approach is that we have these functional top-level categories like page and data, and then different services stick different endpoints into them, but RESTBase is hand-managed and MediaWiki needs to be automatically extensible and scale to hundreds of extensions, so we might want to end up with something like /parsoid/page/:format/:title/:revision?/ instead. (Although that does degrade the nice REST semantics a bit...)

This is not my understanding of the situation. You seem to be conflating two things here: (a) porting Parsoid; and (b) having the REST API hierarchy built out in MW. I think it's premature to talk about (b) and instead we should focus on (a). Doing it step-wise increases robustness and minimises the chances of outages. So yes, I agree that the question of how the REST hierarchy will look like exactly is up for a debate, but that should not influence nor interfere with porting Parsoid.

My suggestion would be to have the exact same end point hierarchy in the port at first. That allows us to isolate the switch to a simple LVS switch. Once we are happy with the outcome, we can then (if needed/wanted/etc) change the actual layout to accommodate for future uses.

My suggestion would be to have the exact same end point hierarchy in the port at first. That allows us to isolate the switch to a simple LVS switch. Once we are happy with the outcome, we can then (if needed/wanted/etc) change the actual layout to accommodate for future uses.

The one thing that may get in the way of using the exact same end point hierarchy would be T221173: Resolve domains in path of endpoints for Parsoid REST API. The WMF PHP site configuration uses the SERVER_NAME to determine which deployment branch to execute in the first place. It would be nice to use that same mechanism rather than having to preprocess the URL and override it for Parsoid (or worse, for every REST service in the future too) with attendant confusion over which wiki "https:/‌/en.wikipedia.org/w/‌rest.php-or-whatever‌/zuwikibooks/v3/page/‌foo/bar/baz" refers to (and is running code from), plus confusion over cookie domains and the like.

The rest is just concern that we might wind up wanting to rename things later.

The one thing that may get in the way of using the exact same end point hierarchy would be T221173: Resolve domains in path of endpoints for Parsoid REST API. The WMF PHP site configuration uses the SERVER_NAME to determine which deployment branch to execute in the first place. It would be nice to use that same mechanism rather than having to preprocess the URL and override it for Parsoid (or worse, for every REST service in the future too) with attendant confusion over which wiki "https:/‌/en.wikipedia.org/w/‌rest.php-or-whatever‌/zuwikibooks/v3/page/‌foo/bar/baz" refers to (and is running code from), plus confusion over cookie domains and the like.

This should be pretty straightforward: the appservers are already able to translate the domain into $wgServerName via the Host header. So my suggestion would be for RESTBase to start emitting the Host header to Parsoid requests while keeping the domain as part of the end point path. On the MW side, $wgServerName will be set and then Parsoid/PHP can simply ignore it. This gives us the highest degree of interchangeability between the two Parsoid implementations, which means that at the point of the switch we can easily both progressively roll-out the switch as well roll-back if case of trouble.

The rest is just concern that we might wind up wanting to rename things later.

That's exactly my point: let's first complete the port/move with as little other changes as possible and then shuffle things around in a second step if we need to.

You seem to be conflating two things here: (a) porting Parsoid; and (b) having the REST API hierarchy built out in MW. I think it's premature to talk about (b) and instead we should focus on (a).

My understanding is the opposite: (a) is premature (in the sense of having Parsoid-PHP serve non-mirrored production traffic) as it is blocked on performance improvements which will probably take a significant amount of time; while we are aiming to have some kind of a Parsoid-PHP endpoint to send test requests to in a month or so, having it actually replace Parsoid-JS is way farther off. On the other hand, trying to design an API framework without implementing even a single real-world use case in the process seems unhealthy to say at least. So we get no benefit from having some weird Parsoid-only version of the MediaWiki REST framework that does not deal with some of the concerns such a framework has to deal with, like the namespacing of modules.

Also, replacing a fixed URL prefix with a different fixed URL prefix does not seem like a complicated thing to implement. We can do it somewhere in the network layer if we absolutely have to.