Page MenuHomePhabricator

Investigate issues with domain overlap
Closed, ResolvedPublic

Description

@apaskulin asked me to comment on the virtual host name for the API documentation portal.

Our current plan is to use "api.wikimedia.org". This is the same virtual host name as we plan to use for the API Gateway itself.

Other major platforms don't do this; they use another virtual hostname for their portal, and "api" is just for APIs.

My concern is that they know something we don't, and that by launching the portal and gateway on the same virtual domain, we're going to make things harder for ourselves and for api developers.

I'm really reluctant to use api.w.o, the same name that we're using for the API gateway, for the following reasons:

  • Part of the value is that the API gateway host only has API endpoints. It's conceptually confusing to add a wiki in. We're taking a big step away from having API endpoints be "part of" the wikis by creating this gateway, and sticking a wiki at its root would make that less clear.
  • We'll have to take care that there's no overlap between the documentation page namespace and the API urls. I think typically our wikis have everything executable in /w/ and wiki pages in /wiki/, but it's worth examining. And it means we couldn't use any of those as namespaces for APIs.
  • We'll need path-based routing, so that some HTTP requests for api.wikimedia.org go to MediaWiki, and others go to Envoy. This has been a source of irritation before, and it would be good to avoid repeating the mistake.
  • Cookies. We'll need cookies to maintain the login state on the Portal, but most clients would share those cookies when making API calls, too, since they'd have the same hostname. I think we'd throw them away for API calls (OAuth > Cookies), and there may be a way to limit them to just certain paths on the server, but it's still a bit of a muddle.

There are a lot of other domain names to use; I'd prefer we kept the API Gateway and API Portal on separate virtual hosts.

Event Timeline

There is related discussion at https://www.mediawiki.org/wiki/Topic:Vie8y5khj6w3qs3y

The defining question for me, which also came up in the discussion linked above, is whether we are building only an API Portal, or whether we are building a Developer Portal and we just happen to be starting with the API portion because that's what we're working on right now.

If we're building a Developer Portal, the name developer.wikimedia.org makes sense to me. If we're just building an API Portal and stopping there, then the name api.wikimedia.org makes sense to me. My understanding is that we're building just an API Portal, and I don't see any technical showstoppers for using the name api.wikimedia.org, so that's the name I prefer.

Regarding the points raised by @eprodromou :

Part of the value is that the API gateway host only has API endpoints. It's conceptually confusing to add a wiki in. We're taking a big step away from having API endpoints be "part of" the wikis by creating this gateway, and sticking a wiki at its root would make that less clear.

I personally don't find that confusing.

We'll have to take care that there's no overlap between the documentation page namespace and the API urls. I think typically our wikis have everything executable in /w/ and wiki pages in /wiki/, but it's worth examining. And it means we couldn't use any of those as namespaces for APIs.

I find that mildly annoying, but not a major concern.

We'll need path-based routing, so that some HTTP requests for api.wikimedia.org go to MediaWiki, and others go to Envoy. This has been a source of irritation before, and it would be good to avoid repeating the mistake.

I have no comment here. Routing is not my specialty, and I know nothing about the past irritations. It would be good to hear the opinion of those who have been affected in the past.

Cookies. We'll need cookies to maintain the login state on the Portal, but most clients would share those cookies when making API calls, too, since they'd have the same hostname. I think we'd throw them away for API calls (OAuth > Cookies), and there may be a way to limit them to just certain paths on the server, but it's still a bit of a muddle.

This is good to consider.

I'm assuming the most common situation would be a browser-based client including the cookies with API requests made via javascript. AFAIK (again, not my specialty) javascript libraries tend to make the developer take extra steps to include cookies. This link discusses Axios and Fetch:
https://codewithhugo.com/pass-cookies-axios-fetch-requests/ In other words, just because the browser has the cookie doesn't mean it will be sent. It also doesn't mean it won't.

Also, the end user would need to be logged into api.mediawiki.org for cookies to be included. I have no speculative metric on how many end users of clients that use the API would also be logged into api.mediawiki.org, but I would guess it to be a minority.

AFAIK there's no client-side way to limit the availability of cookies in the way we'd want here. Specifying a Path directive in a Set-Cookie header makes the cookie available for the specified path and all subpaths. When I log in to our production wikis (metawiki, mediawiki) I see cookies being set with a path of "/". So even if we strip cookies somewhere within our production stack, they could still be sent, which would consume non-zero bandwidth and stripping them would use non-zero resources somewhere in our stack. Are these amounts negligible or significant at our scale and for our actual stack and the actual number of requests that would include cookies? I have no idea. It would be good if someone with more insight gives an opinion.

Of course, not all clients would have cookies available to send. If I'm logged in via my mobile browser to api.wikimedia.org and I'm also running a custom mobile app that accesses our API, the two apps aren't connected and cookies won't be sent.

Please apply Cunningham's Law to anything incorrect that I just said.

There is related discussion at https://www.mediawiki.org/wiki/Topic:Vie8y5khj6w3qs3y

The defining question for me, which also came up in the discussion linked above, is whether we are building only an API Portal, or whether we are building a Developer Portal and we just happen to be starting with the API portion because that's what we're working on right now.

If we're building a Developer Portal, the name developer.wikimedia.org makes sense to me. If we're just building an API Portal and stopping there, then the name api.wikimedia.org makes sense to me. My understanding is that we're building just an API Portal, and I don't see any technical showstoppers for using the name api.wikimedia.org, so that's the name I prefer.

I agree; If the portal is exclusively API, then api.wikimedia.org would seem to make the most sense IMO.

Regarding the points raised by @eprodromou :

Part of the value is that the API gateway host only has API endpoints. It's conceptually confusing to add a wiki in. We're taking a big step away from having API endpoints be "part of" the wikis by creating this gateway, and sticking a wiki at its root would make that less clear.

I personally don't find that confusing.

Nor do I. I've also seen others that do it similarly. In fact, OpenAPI (which I abhor for other, unrelated reasons) makes this the default when using the UI, hosting API documentation as text/html with the same hostname as the API itself. See https://en.wikipedia.org/api/rest_v1/ for a Foundation-hosted example (that has never resulted in any confusion that I am aware of).

We'll have to take care that there's no overlap between the documentation page namespace and the API urls. I think typically our wikis have everything executable in /w/ and wiki pages in /wiki/, but it's worth examining. And it means we couldn't use any of those as namespaces for APIs.

I find that mildly annoying, but not a major concern.

I'm not sure I would even go as far as annoying. We need to be very guarded about the top-level namespace regardless.

We'll need path-based routing, so that some HTTP requests for api.wikimedia.org go to MediaWiki, and others go to Envoy. This has been a source of irritation before, and it would be good to avoid repeating the mistake.

I have no comment here. Routing is not my specialty, and I know nothing about the past irritations. It would be good to hear the opinion of those who have been affected in the past.

Envoy is the sole router here. Requests for both uses would land on Envoy which would do the same thing for requests mapped to the developer portal, that it would for API endpoints, route them to the corresponding remote host and path.

Cookies. We'll need cookies to maintain the login state on the Portal, but most clients would share those cookies when making API calls, too, since they'd have the same hostname. I think we'd throw them away for API calls (OAuth > Cookies), and there may be a way to limit them to just certain paths on the server, but it's still a bit of a muddle.

This is good to consider.

I'm assuming the most common situation would be a browser-based client including the cookies with API requests made via javascript. AFAIK (again, not my specialty) javascript libraries tend to make the developer take extra steps to include cookies. This link discusses Axios and Fetch:
https://codewithhugo.com/pass-cookies-axios-fetch-requests/ In other words, just because the browser has the cookie doesn't mean it will be sent. It also doesn't mean it won't.

Also, the end user would need to be logged into api.mediawiki.org for cookies to be included. I have no speculative metric on how many end users of clients that use the API would also be logged into api.mediawiki.org, but I would guess it to be a minority.

AFAIK there's no client-side way to limit the availability of cookies in the way we'd want here. Specifying a Path directive in a Set-Cookie header makes the cookie available for the specified path and all subpaths. When I log in to our production wikis (metawiki, mediawiki) I see cookies being set with a path of "/". So even if we strip cookies somewhere within our production stack, they could still be sent, which would consume non-zero bandwidth and stripping them would use non-zero resources somewhere in our stack. Are these amounts negligible or significant at our scale and for our actual stack and the actual number of requests that would include cookies? I have no idea. It would be good if someone with more insight gives an opinion.

Of course, not all clients would have cookies available to send. If I'm logged in via my mobile browser to api.wikimedia.org and I'm also running a custom mobile app that accesses our API, the two apps aren't connected and cookies won't be sent.

Please apply Cunningham's Law to anything incorrect that I just said.

The only examples I can think of where this might happen are contrived, and I'm not sure I understand why this would be a problem (I think this might be what @BPirkle is saying here too).

eprodromou closed this task as Resolved.May 6 2020, 1:24 AM
eprodromou claimed this task.

I definitely find it confusing to have the documentation server and the API server on the same virtual hostname.

As far as I can tell, it's unique for a major API server to have any other web site at its root; looking at api.facebook.com, api.google.com, api.twilio.com, or api.twitter.com, it's either a 404 or a redirect. I'd be relieved to see an example from another major API server, if anyone can come up with one!

I also think it's hard to explain whether api.wikimedia.org/w/api.php or api.wikimedia.org/w/rest.php would be part of the API portal or the API gateway.

I haven't looked through the implications for CORS. Hopefully it doesn't turn out to be a problem; CORS is a real headache.

My last concern is for the conflicts that we haven't thought through already. I worry that we're going to run into a conflict that hasn't been considered; one that could be avoided by using apidev.w.o or apiportal.w.o or any of a billion other virtual hostnames. I'd rather be safe than sorry.

I talked it through with @apaskulin, and we agreed that if we find real conflicts in production, we'll move the portal to another name and use redirects to try to keep URLs operative. Hopefully that will be enough.

I'm closing this ticket. We can open up a separate one if an actual conflict materialises.

eprodromou updated the task description. (Show Details)May 6 2020, 10:44 AM

The defining question for me, which also came up in the discussion linked above, is whether we are building only an API Portal, or whether we are building a Developer Portal and we just happen to be starting with the API portion because that's what we're working on right now.

Just wanted to follow up and clarify this point: we're building a portal focused exclusively on the API provided by the API Gateway.