Page MenuHomePhabricator

Integrate the Virtual Rest Service (VRS) into core, and make it generally available (from RequestContext?)
Closed, ResolvedPublic

Description

A big motivation behind the Virtual Rest Service RFC was to provide a generic and universal interface for the interaction with local or remote services from PHP code. So far, we have only partially achieved this goal. The VRS code is in core, but each consumer still needs to set up their own VRS service.

To actually achieve the degree of generality and abstraction we aimed for, we should centrally configure a singleton VRS object, and make it available to any core or extension code, perhaps from RequestContext. Construction could happen lazily on first access, so that requests without VRS use don't incur overheads.

The path structure would ideally mirror that of our external APIs at /api/, without the current per-service prefixes like /restbase/. Strawman:

  • /{domain}/rest_v1/... for the REST API for $domain
  • /local/rest_v1/... for the REST API of the local domain (magically resolved)
  • /local/action/... for the action API (possibly a wrapper around FauxRequest, if that can be made to support parallelism)

Event Timeline

GWicke raised the priority of this task from to Needs Triage.
GWicke updated the task description. (Show Details)
GWicke added a project: Services.
GWicke added subscribers: GWicke, cscott.
GWicke triaged this task as Medium priority.Sep 14 2015, 5:51 PM
GWicke set Security to None.
GWicke updated the task description. (Show Details)
GWicke edited subscribers, added: Krenair, mobrovac; removed: Aklapper.

VisualEditor and Flow currently have very similar blocks of code to create a VRS object from the global VRS configuration. It would indeed be nice if this common code were lifted into core (which I think is what gwicke is suggesting).

GWicke renamed this task from Integrate the Virtual Rest Service (VRS) into core, and make is generally available to Integrate the Virtual Rest Service (VRS) into core, and make it generally available.Sep 14 2015, 6:42 PM
GWicke updated the task description. (Show Details)
GWicke renamed this task from Integrate the Virtual Rest Service (VRS) into core, and make it generally available to Integrate the Virtual Rest Service (VRS) into core, and make it generally available (from RequestContext?).Sep 14 2015, 6:45 PM
GWicke added a project: TechCom-RFC.

If a client authenticates to one service through the VRS singleton, then triest to use another service, would the VRS object attempt to authenticate to the new service on behalf of the client?

What would the path structure be for the other existing services {Parsoid,Swift}VirtualRestService ?

If a client authenticates to one service through the VRS singleton, then triest to use another service, would the VRS object attempt to authenticate to the new service on behalf of the client?

Each API hierarchy maps to a handler, which can implement things like authentication. A client using several APIs will be independently authenticated for each of those APIs, if necessary and implemented by those handlers.

What would the path structure be for the other existing services {Parsoid,Swift}VirtualRestService ?

Parsoid is already exposed via the rest_v1 API. Swift is not, but has a REST API that could either be mapped as-is, or simplified to support our use cases only.

This seems to me like the exact opposite of abstraction. $res = $rashomonService->getPage( 'Main_Page' ) is abstract. $vrs->mount( '/wiki/', $rashomonService ); $res = $vrs->run( 'GET', '/wiki/v1/pages/Main_Page' ); is concrete; it hardcodes an implementation detail about mapping logical operations to REST URLs. It also becomes rather awkward when the service is not actually accessed via a HTTP request.

This seems to me like the exact opposite of abstraction. $res = $rashomonService->getPage( 'Main_Page' ) is abstract. $vrs->mount( '/wiki/', $rashomonService ); $res = $vrs->run( 'GET', '/wiki/v1/pages/Main_Page' ); is concrete; it hardcodes an implementation detail about mapping logical operations to REST URLs. It also becomes rather awkward when the service is not actually accessed via a HTTP request.

The point of this RFC is to guarantee REST API stability, regardless of the service implementing it (hence the name Virtual REST Service). So, the way to look at it is - I am guaranteed that if I issue a request to /wiki/v1/pages/Main_Page I'll get back the page as specified in the API specification and I don't need to worry if it's RESTBase responding or some other service / mechanism.

Using $res = $rashomonService->getPage( 'Main_Page' ) ties you to a specific implementation directly. I agree that in PHP-land the latter feels more natural, but by using the API layout we gain the benefit of both internal and external clients behaving in the exact same manner, which, IMHO, is a big win.

Using $res = $rashomonService->getPage( 'Main_Page' ) ties you to a specific implementation directly.

No it doesn't. It ties you to whatever implementation the service locator provides for the hypothetical "RashomonServiceInterfaceV1" interface, which is no different from tying yourself to a URL structure such as "/wiki/v1/pages/Main_Page" in the "Virtual REST Service" service locator.

No it doesn't. It ties you to whatever implementation the service locator provides for the hypothetical "RashomonServiceInterfaceV1" interface, which is no different from tying yourself to a URL structure such as "/wiki/v1/pages/Main_Page" in the "Virtual REST Service" service locator.

I think I see @mobrovac's point here, but I have to squint, so bear with me :-)

The reason for putting the path component of the URL in the API itself is that someone is going to have to turn the API into a URL structure. Leaving URL structure as the API's problem that the implementation generally doesn't need to worry about lets implementers to do things that might work for the common case, but that don't generalize well. It makes code review for path issues harder. Having the path component in the API pushes those debates upstream to code review time, making it easier to see when bad URLs are being created at the source.

I'm with @Anomie and @Tgr. In PHP, I want to use service interfaces, for convenience and type safety. I strongly oppose any proposal to make something like $vrs->run( 'GET', '/wiki/v1/pages/Main_Page' ) the preferred way to access article content. The preferred way should be something like $articleStore->getCurrentRevisionContent( 'Main_Page' ) or some such. This gives me an interface to test against, to mock, to use in static analysis and documentation. It's ideomatic to the language.

Now, that doesn't mean I'm opposing the idea of VRS in general. But if we use it in PHP, it should be encapsulated in "native" PHP service objects. So it would mean another level of indirection:

  • we currently do SmartRecord -> SQL
  • I would prefer ServiceObject -> SQL|HTTP|...
  • We can do ServiceObject -> VRS -> SQL|HTTP|...

The question is then whether this additional layer of indirection is useful. I currently don't really see the point, but I'm ready to be convinced. How is this better than implementing service objects directly on top of whatever backend is desired?

As I see it, from the PHP perspective, using a VRS directly would mean binding to a specific implementation, as opposed to using a service interface that can be implemented based on VRS, SQL, or whatever else we come up with.

My main point is: The primary abstraction layer used in an environment should be natural to that environment, and should offer well defined narrow interfaces using the techniques available in that environment (e.g. interface declarations with type hinting in PHP).

From PHP's perspective, a generic VRS service is completely amorphous and the opposite of a well defined, narrow interface. So in PHP code, VRS services should not be used directly, but only via service objects.

I'm also worried about having a global VRS instance. I fear it would work against dependency injection, which we really need to improve testability. We would also need a global top level factory (application scope) for DI, but access to it should be restricted to static entry points, not sprinkled across the code base. Global state can't be totally avoided, but we should at least try to avoid it, not make it best practice.

PS: This reminds me a lot of T95654: RFC: Business Layer Architecture on budget. I think all the same arguments apply to VRS.

It seems to me that having a well defined, supported and easily used library to communicate with RESTBase and similar services is good and actually necessary. The counters here by @Anomie, @Tgr and @daniel I think just boil down to it not being sufficient for the typical applied usage. If you think of VRS as something akin to DatabaseBase then you should be able to imagine domain specific classes that encapsulate using VRS to manipulate a particular REST API in the same way that User or some other domain specific wrapper encapsulates using DatabaseBase to communicate with a particular set of database tables.

@bd808: if we view the proposed VRS service essentially as a PHP interface to RESTbase, I'm all for it. But it should be clear that this is a low level thing, not the preferred way for application logic to interact with services.

@bd808: if we view the proposed VRS service essentially as a PHP interface to RESTbase, I'm all for it. But it should be clear that this is a low level thing, not the preferred way for application logic to interact with services.

Yes, Daniel, we put 20 layers of abstraction on this just for you :-P It will be buried so deep in the call stack the developer will never even need to know that it exists. :-)

I think I'm with @bd808 here. But let me make it concrete: I would like to see the following bits of code in VisualEditor, Flow, and ContentTranslation standardized and moved into core:

https://github.com/wikimedia/mediawiki-extensions-VisualEditor/blob/REL1_26/ApiVisualEditor.php#L35-L117
https://github.com/wikimedia/mediawiki-extensions-ContentTranslation/blob/REL1_26/api/ApiContentTranslationPublish.php#L32-L103
https://github.com/wikimedia/mediawiki-extensions-Flow/blob/REL1_26/includes/Parsoid/Utils.php#L173-L253

There are additional bits of code that *could* be shared (which starts to approach @daniel's proposal), but the above is the minimum. Setting up a VRS service object with well-known names, and providing a mechanism to run queries against it.

Yes, Daniel, we put 20 layers of abstraction on this just for you :-P It will be buried so deep in the call stack the developer will never even need to know that it exists. :-)

Heh, right, I'm happy as long as I don't have to see it ;)

But seriously: the question is whether we need another level of abstraction. I'm not proposing to add a new one (though I's sure like to improve the one we have). VRS means adding another layer (if we are not going to have app logic use it directly).

I'm not proposing to add a new one (though I's sure like to improve the one we have).

Which service access layer are you referring to here?

VRS means adding another layer (if we are not going to have app logic use it directly).

I think it's an example of layering leading to a fairly clean separation of concerns. VRS focuses on sanely exposing the existing REST service interfaces to PHP, in a way that is easy to extend to cover all kinds of services.

Where desired and warranted by popularity, typed abstractions can then be layered on top. These abstractions do not need to deal with issues of parallelism, authentication, encodings or compression. They also don't need to care about which backend is actually used to provide a given service, as this is effectively dependency-injected in the VRS config.

I think I see @mobrovac's point here, but I have to squint, so bear with me :-)

The reason for putting the path component of the URL in the API itself is that someone is going to have to turn the API into a URL structure. Leaving URL structure as the API's problem that the implementation generally doesn't need to worry about lets implementers to do things that might work for the common case, but that don't generalize well. It makes code review for path issues harder. Having the path component in the API pushes those debates upstream to code review time, making it easier to see when bad URLs are being created at the source.

Perhaps directly reviewing the exact URLs being sent to restbase rather than leaving their construction up to the implementation of a ->getPage() method is a point, but I'm not seeing that at all in what @mobrovac wrote.

If you think of VRS as something akin to DatabaseBase then you should be able to imagine domain specific classes that encapsulate using VRS to manipulate a particular REST API in the same way that User or some other domain specific wrapper encapsulates using DatabaseBase to communicate with a particular set of database tables.

This feels like it makes sense.

Although I do note that going via VRS "/local/action/..." to make a FauxRequest into ApiMain to get to an API module seems even more wrong than FauxRequest into ApiMain directly to get to an API module, which is still wrong when there probably should be a 'domain-specific wrapper' that the API module is already using and you could just use directly.

How is this better than implementing service objects directly on top of whatever backend is desired?

This is a good question.

I'm not proposing to add a new one (though I's sure like to improve the one we have).

Which service access layer are you referring to here?

If I Was King (tm), we would be using "service objects" (TitleLookup, RevisionStore, etc) as an abstraction layer to separate application logic from backend services . That would be on the same level on which we now have "smart records" like User or Revision, not on top or beneath (except for B/C wrappers). Adding VRS to the mix adds at least one layer, unless application logic uses VRS directly, which I strongly recommend against.

I'm not opposed to adding that layer, but we should be clear about why and when we want it. And as much as I like abstraction, I have learned to fear mummification (wrappers around wrappers, layers upon layers...)

I think it's an example of layering leading to a fairly clean separation of concerns. VRS focuses on sanely exposing the existing REST service interfaces to PHP, in a way that is easy to extend to cover all kinds of services.

Where desired and warranted by popularity, typed abstractions can then be layered on top. These abstractions do not need to deal with issues of parallelism, authentication, encodings or compression. They also don't need to care about which backend is actually used to provide a given service, as this is effectively dependency-injected in the VRS config.

Using the VRS layer to encapsulate the details of parallelism, authentication, etc when dealing with remote (TCP/HTTP) services sounds good to me. Also using VRS to access services in-process seems like a bad idea, as Anomie points out. That would bring us back to T95654. Application -> Service -> VRS -> API -> Service is not really what we want, I think.

by using the API layout we gain the benefit of both internal and external clients behaving in the exact same manner, which, IMHO, is a big win.

I don't think it is as big as you expect. There seems to be a consensus that even if there is a low-level layer where everything is represented in URLs, it's not how most of the application code will deal with services. So if you look at the internal client code you will still see $storage->getPage( 'Main_Page' ) and if you look at the external client code you will still see storage.getPage( 'Main_Page' ) and what you actually care about is whether those two behave identically, not whether the same URL string gives you the same response, because there is no reason for those two to serialize themselves into the same URL string in the first place. You do not get rid of the complexity which causes inconsistencies, you just redistribute it in the codebase.

The reason for putting the path component of the URL in the API itself is that someone is going to have to turn the API into a URL structure. Leaving URL structure as the API's problem that the implementation generally doesn't need to worry about lets implementers to do things that might work for the common case, but that don't generalize well. It makes code review for path issues harder.

Basically what you are saying is that (1) we should forbid people to add new service-type classes to MediaWiki without exposing them via a web endpoint, (2) the way to do that is to force people adding new service classes to serialize everything to an URL and back.

(1) is not unreasonable, although it is IMO currently unrealistic - we don't even have a concept of "service classes" inside MediaWiki, must business logic is buried in frontend classes (including our current API classes, which are really just frontend classes for a JSON/etc fronted but mix in some of the business logic). I don't see what (2) would add though. Instead of making it a CR requirement that people implement service logic via URLs, you can just require them to also implement the URLs. It's just as easy and it fails more gracefully (when people are in a rush and shirk that responsibility, it's still better if they implement a service without URI mapping than if they don't implement a service at all and bury the logic in some frontend class). And you get URL mapping logic that is better localized in the codebase and so easier to review.

I also disagree that if something is useful in PHP but difficult to do via an URL scheme then not doing it is necessarily the right answer. That's like making people equal by taking away most of the money so everyone is poor.
(See below for an example.)

If you think of VRS as something akin to DatabaseBase then you should be able to imagine domain specific classes that encapsulate using VRS to manipulate a particular REST API in the same way that User or some other domain specific wrapper encapsulates using DatabaseBase to communicate with a particular set of database tables.

DatabaseBase is meant to encapsulate common DB-related functionality for classes which need to interact with the database. A REST adapter for similarly encapsulating common REST concerns certainly makes sense, but we are talking here about PHP classes calling PHP APIs and never actually making a HTTP request. That not only limits PHP APIs which now suddenly have to map everything down to a string and back and cannot provide a richer interface internally than externally even when it would make sense (consider how continuation would work with this, for example), it also limits the REST adapter which cannot really assume standard REST semantics such as caching behavior since the actual implementation might not be web-based at all.

  • We can do ServiceObject -> VRS -> SQL|HTTP|...

But we really shouldn't. Again, consider how something like continuation would work. If you have PHP interface first, REST on top of that, the API would return an iterator which internally does whatever it wants (for example if we find that maintaining a single DB connection is more effective than re-running the query with increasing offsets, it can do that), and the REST formatter cuts the iteration off at some predefined limit, requests a continuation token from the iterator and sends it and the resultset to the client. If you have REST first, PHP on top of it, then it becomes impossible to do that - there is no way to transform something like a DB connection to a string and back, so now the API is forced to do its querying in small batches due to an external requirement that comes from formatting, whether that's actually the best way to do it or not.

A couple of quick notes before the discussion:

  • Coverage and barrier to entry: It's not clear to me whether there is a serious proposal to wrap *every* service request in a complete service wrapper object. If so, then this would introduce a fairly large coding overhead for simple service interactions. VRS services on the other hand can be very simple, with the simplest form being a regular HTTP request. Intermediate wrappers can just prefix paths with a host name & port from the config, and it is possible to enrich path-based handlers gradually.
  • Named methods vs. reuse of path structures from external APIs: VRS aims for very low mental overhead by reusing existing REST API layouts, while separate service objects introduce a second API, but get additional type safety in exchange.
  • Reuse via composition vs. inheritance: The separate service model seems to lean towards reusing code via inheritance, while VRS is more tailored towards composition.
  • In-process calls: VRS can be used for in-process requests, but might not add much value there, especially if parallelism isn't needed / supported. Local service wrappers like FauxRequest already cover the one-request-at-a-time pattern well, and I don't see a need to replace them with VRS.
  • Parallelism: I'm not sure how requests to multiple service implementations would work in parallel without an underlying shared request processing infrastructure like the one exposed by VRS.
  • Coverage and barrier to entry: It's not clear to me whether there is a serious proposal to wrap *every* service request in a complete service wrapper object. If so, then this would introduce a fairly large coding overhead for simple service interactions.

A service wrapper object can be very simple. 20 lines. And yes, I personally would prefer for business logic to always use service objects for service interaction.

  • Named methods vs. reuse of path structures from external APIs: VRS aims for very low mental overhead by reusing existing REST API layouts, while separate service objects introduce a second API, but get additional type safety in exchange.

I think the mental overhead depends on the mental model. I find it a lot easier to think in terms of services and data records than in terms of pathes and resources. Maybe I'm just old.

  • Reuse via composition vs. inheritance: The separate service model seems to lean towards reusing code via inheritance

Proper service objects should rarely, if ever, share code through inheritance. Service objects using composition is the heart of DI. Code reuse via subclassing as a bad idea, but VRS isn't the only way to avoid it, and a DI based infrastructure doesn't lean towards it.

  • In-process calls: VRS can be used for in-process requests, but might not add much value there, especially if parallelism isn't needed / supported.

I think it would add a lot of overhead and potential for error, instead.
But as long as we are only talking about VRS as a wrapper around HTTP services, fine.

  • Parallelism: I'm not sure how requests to multiple service implementations would work in parallel without an underlying shared request processing infrastructure like the one exposed by VRS.

I agree: in places where parallelism is needed, we probably want something like VRS - at least in PHP, where parallelism means talking to separate processes via streams.

In the IRC discussion, nobody really liked the idea of using RequestContext for making VRS accessible. Alternatives:

In the IRC discussion...

That was E68: RFC Meeting on #wikimedia-office IRC channel (2015-09-30 UTC), from meeting minutes:

  • [specific work is] T112553#1689829 (ori, 21:15:49)
  • consensus is that implementing this as a singleton is preferable over RequestContext (gwicke, 21:31:48)
  • AGREED: Integrate the Virtual Rest Service into core (TimStarling, 21:37:11)

Per the IRC discussion, the above debate seems to have largely been a misunderstanding, and the scope of the RfC is to provide an abstraction layer for external REST services, not for in-process API calls (although mapping to in-process handlers could be useful for purposes internal to the VRS class such as unit testing). Mapping internal PHP APIs might be useful as a means of parallelization via curl_multi but not expected to be used otherwise.

@GWicke I have updated the proposal for T384. There is now an example for injecting a service in a hook handler at https://www.mediawiki.org/wiki/Requests_for_comment/Dependency_injection#Hook_Handler_Injection

@daniel, thanks! I might be missing something, but to me it looks like the service registry is still request-global? Isn't that roughly the same as using a singleton, except one level up?

@GWicke Yes you are right, it cannot be entirely avoided. I still thinks it's worth minimizing the impact. I have replied to you in more detail at T384#1696540, since I think others will have the same question in the context of the DI RFC.

Krinkle claimed this task.
Krinkle moved this task from Untriaged to Implemented on the TechCom-RFC (TechCom-RFC-Closed) board.