Page MenuHomePhabricator

Proxying new services through RESTBase
Closed, ResolvedPublic

Description

In T95229 there has been some discussion as to whether we should put services behind RESTBase or not. There is a patch on the way allowing services to be easily proxied/cached by RESTBase. This would allow:

  • easy API discovery: all APIs (RESTBase + other services) would be available under /domain/v1/{module_name}
  • performance gains as all of the APIs would be served from the same domain
  • automatic caching/storage provided by RESTBase, so services would not need to regenerate already-generated content unless necessary

There have been various opinions and concerns regarding this, so let's discuss it here.

Event Timeline

mobrovac raised the priority of this task from to Needs Triage.
mobrovac updated the task description. (Show Details)
Dzahn triaged this task as Medium priority.Apr 29 2015, 1:48 AM
Dzahn subscribed.

So, ops is experiencing a lot of confusion over our service deployment plans.

There's a parsoidcache varnish instance. It was initially deployed for parsoid only, and does a lot of very non-standard things, and parsoid interacts with it in very non-standard ways (looping back into it). It now has 5 different service endpoints running through it because we didn't have any better plan at the time:

  • parsoid
  • citoid
  • graphoid
  • cxserver
  • RESTBase

All of those service entry points through parsoidcache get their own hostnames, and can do what they will with URL standards within those hostnames.

RESTBase now also has a separate, official public entrypoint through our standard text-lb varnish clusters (with global endpoints and standard tier flow, etc), which lives at e.g. en.wikipedia.org:/api/rest_v1/. All requests for that path on all text-cluster domains are translated into requests for the restbase backend, using a transformed URL that looks like e.g. restbase.svc.eqiad.wmnet:/en.wikipedia.org/v1/. New services which deploy within the RB framework would obviously all be contained within that mechanism.

There was also some agreement in related past conversations that, while it may become commonplace for new services to deploy via-RB, there will always be some new services which exist alongside, but not through, RB and should use the same basic mechanism and /api/ namespace, but would have separate backend definitions and subpaths. These should all conform to the general schema of /api/$apiname/$x -> backend_$apiname:/$x (although the transformation of $x there seems ill-defined, as even RB itself requires us to do a custom transform here by including the primary hostname and removing the rest_ prefix of the API name). If nothing else, this would be the pathway for deploying RB's future replacement at /api/rest_v2/ or whatever, but odds are decent there will be some other non-RB services as well.

So here, are the outstanding questions about all related things:

  1. Are all of parsoid/citoid/graphoid/cxserver/restbase going to be removed from the parsoidcache varnish entry point in the near future and moved to RB?
  2. For those that can/are moving to RB, has this been coordinated with whoever's responsible for each? What are their timelines for moving?
  3. If some are not moving to RB anytime soon, can we work on moving them to a standard schema via text-lb as outlined above?
  4. Given that RB requires a custom transformation of the URL (as opposed to, say, interpreting the host+URL on its own, which IMHO it probably should be doing - there's no reason we can't have RB's code figure out the transform on its own here...) - are we standardizing that transform for non-RB services as well? We don't want to deploy new random transforms for each non-RB service. I'd rather we deploy no transforms for them, and simply hand them the hostname:/path of the request for all requests matching their /api/$foo/
  5. What about remaining services in the pipeline that are not yet deployed? Alex mentioned to me mobile-service-app and hierator. Are they initially deploying as (a) through RB entry point, (b) separately alongside RB on the text clusters as outlined above, or (c) intending to add to the cruft on the supposedly-dying-soon parsoidcache (I hope not!).
BBlack set Security to None.
BBlack added subscribers: faidon, mark.

@BBlack, some answers below:

So here, are the outstanding questions about all related things:

  1. Are all of parsoid/citoid/graphoid/cxserver/restbase going to be removed from the parsoidcache varnish entry point in the near future and moved to RB?

Define 'near future' ;) Of these, all but cxserver are definitely scheduled to be moved. We should perhaps set up some redirects for the old locations, but since that's a low traffic courtesy we can probably figure something out that doesn't complicate text-lb. I haven't spoken with the cxserver folks yet about moving / documenting their API, but intend to do so soonish.

  1. For those that can/are moving to RB, has this been coordinated with whoever's responsible for each? What are their timelines for moving?

Yes, and depends. Several of the above are already moved in the 'primary access through RB' sense, but the old entry point is not removed yet. Apart from CXServer, I think a removal by end of June / July should be doable. I wouldn't be surprised if CXServer was doable by then as well (as it's only used internally), but I'd like to talk to the CXServer folks first.

  1. If some are not moving to RB anytime soon, can we work on moving them to a standard schema via text-lb as outlined above?

Sure. But, lets see if we can integrate the API first.

  1. What about remaining services in the pipeline that are not yet deployed? Alex mentioned to me mobile-service-app and hierator. Are they initially deploying as (a) through RB entry point,

We plan on a).

  1. Given that RB requires a custom transformation of the URL (as opposed to, say, interpreting the host+URL on its own, which IMHO it probably should be doing - there's no reason we can't have RB's code figure out the transform on its own here...) - are we standardizing that transform for non-RB services as well? We don't want to deploy new random transforms for each non-RB service. I'd rather we deploy no transforms for them, and simply hand them the hostname:/path of the request for all requests matching their /api/$foo/

To elaborate on this a bit, in the restbase case the transform I'm talking about is the set req.url = line in the fragment below:

if (req.url ~ "^/api/rest_v1/") {
        set req.url = "/" + req.http.host + regsub(req.url, "^/api/rest_v1/", "/v1/");
        set req.backend = restbase_backend;
}

If it weren't for that line, we really could template the deployment of services that deploy similarly to how restbase is now deployed (service foo owns the namespace /api/foo/ mapped to backend foo), but it's difficult to do that if each one involves crafting a custom transformation of the publicly-visible Host+URL into a different URL for the backend, and there's no fundamental reason the backend can't decipher this on its own.

Clearly at least some new services are being deployed as RB-based services, and some legacy services have been converted (but a few are targeted and not complete I think, as in cxserver and citoid in T133001?), and there are some new services which aren't RB-based (think WDQS), which we always expected would be at least a minority case.

Is there anything to actually do here in this ticket? It doesn't seem to have much purpose except discussion, and the discussion stopped over a year ago...

GWicke claimed this task.

Yeah, there isn't much useful life in this one left. Closing.