Page MenuHomePhabricator

REST: introduce audience designations (proposal)
Open, Needs TriagePublic

Description

The task proposes the introduction of “audience designations” for REST API modules.

The idea is that each REST module should specify what audience(s) the module is intended for. A module’s audience designation can govern access control, routing, stability guarantees, and applicable terms of service.

The audience designation would become part of API URLs, to make it obvious what rules apply to that API. This way the audience designation is easily accessible across all layers, from client side code through the CDN layer, the REST framework and the actual handler code. On the other hand, the designation can also just be treated as part of the API module name. Each layer can make use of the designation, but doesn’t have to be aware of it.

Some examples of potential REST URLs including audience designations:

White paper with more details: https://docs.google.com/document/d/1yarF_xQkFzQJUOvP3rMooFTFL6tKgK3C-Bf8zV00QR4/edit

See also:

Event Timeline

Copy of the comment that Bill wrote about audenice designations on T365752:

Read through the google doc, have some thoughts. Most of them are critical. That's not because I think module designations are a bad idea, just to poke at it to see how solid it is. Some of them are questions that you've probably already thought through, so "that's not a problem" is a great response. My comments below are mostly based on the linked google doc (which talks about module designations in a general sense), not the task description (which is more focused on private modules).

First off, this is a new idea to me. Are you familiar with other APIs/API infrastructures that use a similar concept? If so, I'd like to take a look to compare and contrast, and see how this looks for them in practice. If not, we should pound on this pretty thoroughly before implementing it, as this concept would become pretty central to the REST API, and breaking new ground can come with unexpected
consequences.

Part of the point of having a REST API is to make access to our data feel "normal" for people unfamiliar with our tech stack. Module designations don't feel "normal" to me. That doesn't mean they're bad, or that we shouldn't do this. As a caller, I'd probably stop and try to look up what they were before being comfortable calling the API. Maybe that's a good thing. And maybe most things that the average caller accesses will be public and this won't even effect them. But it is probably worth making sure we're happy with adding something unusual to our conventions.

The prefixes seem powerful, but also combine multiple ideas, some relevant to callers and some not. For example, callers care that endpoints are performant, but they don't really care what happens on the server side to achieve that. Using a designation that exists mostly to guide caching/routing behavior seems like exposing details the caller shouldn't have to care about. A specific example is "app" vs "internal", although I realize another difference is that versioning is optional in one of those. But if we consider (per the document) use of "app" by something other than our apps to be TOS violation, and if we distinguish this (again per the document) by user agent, they why wouldn't use just make routing/caching behavior dependent on the user agent? Does the designation gain us anything meaningful?

I see these quotes:

"Internal module names do not need to contain a version number."
"Private module names do not need to contain a version number."

Is this really what we want? I see this says "do not need to" and not "are prohibited from having". In practice, promising that we'll always update callers in "hours" feels optimistic. As all this is just part of the path, I suppose we could always introduce version numbers after the fact. But going from "internal:rcfeed" to "internal:rcfeed.v1" seems a little awkward. I'm not sure having an eternal "v1" that never changes, which is what we've done for both RESTBase and (so far) the MW REST API is any better, but it feels like holding a spot for the version might be preferred. Maybe I'm just used to seeing versions, but even if we don't require these module names to have versions, I'd be pretty careful about omitting them.

I'm unsure whether I prefer the term "beta" or the term "unstable" for that designation. "beta" feels to me like we're implying that the endpoint is on its way to production, even if we're not promising it. That's pretty inconsequential, as the actual meaning is the same either way, but it is something I thought as I read the doc, so mentioning it here.

From the description of the "apps" designation, it sounds like the same endpoint might be exposed under "apps", "internal", and "public"? I guess with the way that designations are assigned in module definitions, this isn't syntactically burdensome. But might exposing the same endpoint under multiple paths lead to undesirable fragmentation of things like metrics? Or even caching, if they are not in practice used to route to separate caches? Maybe this has already been vetted by someone more familiar than me with our caching implementation?

"enterprise": the doc says this currently uses a separate domain. Is anyone unhappy about that? Would exposing an enterprise API from MW mean integrating Enterprise's authentication in MW? What would that involve? Is that better than proxying enterprise endpoints through enterprise's existing domain?

This has the advantage that it'd be pretty hard for callers to not realize they were (for example) calling an "internal" url. However, would it make it harder for them to see what endpoints are available, and how to construct calls to them? As I understand how the module definition files would work: modules could optionally specify one or more designations (and would default to public if nothing is specified). If anything is specified, the designation would be added by the infrastructure to the path. Callers inspecting a module definition file to determine available paths would therefore need to understand how to assemble the full prefix, including designation, in order to successfully call an endpoint. Any resulting confusion might be mitigated by making OpenAPI specs available, which presumably would list the full path including designation, module name, and version. So implementing module designations might raise the priority for us to be able to generate and publish OpenAPI specs, especially via an interactive sandbox. That's make it less necessary for prospective callers to inspect and understand the endpoint code just to know how to call things.

Use of the colon character as a separator seems technically fine. The URL specification (https://datatracker.ietf.org/doc/html/rfc3986) defines the colon character as the "scheme component delimiter" (section 1.2.3). Colon is listed as a reserved character of type "gen-delim" in section 2.2. Per section 3.3, the first path segment cannot contain a colon character, but this won't be our first path segment, so we're good there. However, use of the colon character might cause minor annoyance in some cases. I can imagine developers colloquially referring to urls as something like "private:jobqueue/run" rather than "/api/private:jobqueue/run". Some editors may try to interpret that first bit as a scheme (ex. "mailto:"), find it invalid, and give unexpected results. This seems like a minor concern, and I don't have a better alternative. But as long as we're still at the discussion phase it seemed worth at least mentioning.

First off, this is a new idea to me. Are you familiar with other APIs/API infrastructures that use a similar concept? If so, I'd like to take a look to compare and contrast, and see how this looks for them in practice. If not, we should pound on this pretty thoroughly before implementing it, as this concept would become pretty central to the REST API, and breaking new ground can come with unexpected
consequences.

As far as I know, the idea of including an audience designations in the endpoint path is novel. However, the need to disinguishing between APIs for different audiences is common. It is often addressed in an ad-hoc fashion, but somtimes also explicitly covered by usage guides and design documents. I have updated the proposal document with examples from other sites and discussion of how they related to us.

Part of the point of having a REST API is to make access to our data feel "normal" for people unfamiliar with our tech stack. Module designations don't feel "normal" to me. That doesn't mean they're bad, or that we shouldn't do this. As a caller, I'd probably stop and try to look up what they were before being comfortable calling the API. Maybe that's a good thing. And maybe most things that the average caller accesses will be public and this won't even effect them. But it is probably worth making sure we're happy with adding something unusual to our conventions.

Yes, indeed - all of that: External callers would gene4rally use public APIs, so they won't see audience designators. When they do, they should stop and investigate. And making sure we really want that is the point of that document and this ticket :)

The prefixes seem powerful, but also combine multiple ideas, some relevant to callers and some not. For example, callers care that endpoints are performant, but they don't really care what happens on the server side to achieve that. Using a designation that exists mostly to guide caching/routing behavior seems like exposing details the caller shouldn't have to care about. A specific example is "app" vs "internal", although I realize another difference is that versioning is optional in one of those.

My thinking is that using the "app" prefix says "I'm an app, using APIs intended for use by apps". What we do with that on the server side is up to us, but it's useful information to have. Similar with "internal": it says "this is client side code from the same repo calling server side code". Curious people observing such calls would know immediately that they can't rely on these APIs.

But if we consider (per the document) use of "app" by something other than our apps to be TOS violation, and if we distinguish this (again per the document) by user agent, they why wouldn't use just make routing/caching behavior dependent on the user agent? Does the designation gain us anything meaningful?

For the "app" and "internal" signal, we could absolutely use headers. It's just less obvious to the causal observer, and less convenient for manual debugging. I can send you a link to an internal API and you can run it in your browser. If we required a special header, I'd have to send you a complex CURL command. I added a section to the document discussing this.

I see these quotes:

"Internal module names do not need to contain a version number."
"Private module names do not need to contain a version number."

Is this really what we want? I see this says "do not need to" and not "are prohibited from having". In practice, promising that we'll always update callers in "hours" feels optimistic.

My point was that such endpoints don't need to provide long term stability, since we are our own customers. We are free to make breaking changes in hours if we want to - or not, if we don't. I have updated the section to clarify this:

Internal module names do not need to contain a version number to provide stability for external callers. All clients are known, change management is focussed on the deployments of new versions of MediaWiki, with grace periods for backwards compatibility may be as short as a couple of hours, depending on needs of the the callers.

As all this is just part of the path, I suppose we could always introduce version numbers after the fact. But going from "internal:rcfeed" to "internal:rcfeed.v1" seems a little awkward.

I don't have strong feeligns about versioning internal and private endpoints. I'd probably err on the side of caution and include a version number. My point is that public modules REALLY NEED a version, while internal and private modules could or maybe should have a version.

I'm unsure whether I prefer the term "beta" or the term "unstable" for that designation. "beta" feels to me like we're implying that the endpoint is on its way to production, even if we're not promising it. That's pretty inconsequential, as the actual meaning is the same either way, but it is something I thought as I read the doc, so mentioning it here.

I don't care as long as we don't end up with some modules using one and some the other :)

From the description of the "apps" designation, it sounds like the same endpoint might be exposed under "apps", "internal", and "public"? [...] But might exposing the same endpoint under multiple paths lead to undesirable fragmentation of things like metrics? Or even caching, if they are not in practice used to route to separate caches?

My thinking is that we will expose the same module under multiple designations `if we want to split the cache and metrics`. If we don't want that, we don't.

This touches on the question of configurable designations, which I haven't yet discussed in the document. I am thinking that a module's default designation should be part of the module definition. And additional designations can be added to it using configuration. So the decision whether or not requests to the module should be "split" would be per site. But that raises the question how the client code should know.

There's food for thought here. Thank you for calling it out.

"enterprise": the doc says this currently uses a separate domain. Is anyone unhappy about that? Would exposing an enterprise API from MW mean integrating Enterprise's authentication in MW? What would that involve? Is that better than proxying enterprise endpoints through enterprise's existing domain?

I don't know, and I don't know if we'd want to. My point is that "enterprise" first the idea of an "audience designation", and we could use this mechanism if we wanted to have "enterprise modules" handled by the PHP REST framework.

This has the advantage that it'd be pretty hard for callers to not realize they were (for example) calling an "internal" url. However, would it make it harder for them to see what endpoints are available, and how to construct calls to them? [...] Any resulting confusion might be mitigated by making OpenAPI specs available, which presumably would list the full path including designation, module name, and version.

Yes, that's the idea.

So implementing module designations might raise the priority for us to be able to generate and publish OpenAPI specs, especially via an interactive sandbox. That's make it less necessary for prospective callers to inspect and understand the endpoint code just to know how to call things.

Yes, indeed.

However, use of the colon character might cause minor annoyance in some cases. I can imagine developers colloquially referring to urls as something like "private:jobqueue/run" rather than "/api/private:jobqueue/run". Some editors may try to interpret that first bit as a scheme (ex. "mailto:"), find it invalid, and give unexpected results. This seems like a minor concern, and I don't have a better alternative. But as long as we're still at the discussion phase it seemed worth at least mentioning.

I picked ":" because it looks tidy and the concept of "designations" feels similar to MW namespaces. Possible would be "$" and "!" and "*", I suppose. We could also use @, but that feels backwards, and I'd rather have the designation be a prefix than a suffix.