Page MenuHomePhabricator

API Gateway Integration
Open, Needs TriagePublic

Description

We want to integrate Lift Wing with the Wikimedia API Portal:
https://api.wikimedia.org

Design:

  • Review API catalog
  • Review API Guidelines
  • Create API description
  • Create reference documentation

Development:

  • Develop API spec (Service template)
  • Deploy/publish API route(s)

Open questions:

  • How do we update routes with new models? (Is this on our end?)

Event Timeline

@elukey was able to get our enwiki-goodfaith model running on the production ml-serve cluster today. I think we should use that inference service as a target for our first API route to see how everything will fit together.

We may need to review our egress rules and routing tables first (see {T284091: Review egress rules for ml-serve cluster}), although there is still some preliminary design work that can be done so no blockers yet.

I will be starting some of the preliminary design tasks this week and plan to work on a first pass throughout next week. I think we will target the enwiki-goodfaith inference service, as that's our most tested model at the moment.

Took a first pass at the Lift Wing API spec yesterday: https://docs.google.com/document/d/1QdwndQGFKbYyZyFkZHPl3nphddERiSqvopkfOFEavTU/edit?usp=sharing

I created the Open API reference spec here: https://gitlab.wikimedia.org/Accraze/liftwing-api/-/blob/main/liftwing-api.v1.yaml

I plan on cleaning it up today and will look for feedback next week. There are still some unknowns on how we can map these endpoints to our internal cluster endpoints (which require the service hostname header etc.) but I plan to make a list of these unknowns today/early next week.

I updated the API spec again after more discussion with the team. We will be matching the KServe Data Plane v1 API instead of the Predict v2 API. Also we will be using v0 versioning for all the endpoints as there is a possibility that things will change in the future after we upgrade k8s, kserve, etc.

Currently only services with a discovery configuration can be routed to.

@elukey: I just found the above in the API Guidelines: https://wikitech.wikimedia.org/wiki/API_Gateway#Other_API_routes
It seems we'll need to complete T289835 before we can officially add our routes to the API Gateway.

Also, we still need to figure out where to store the MODEL_NAME => host mappings, hopefully we can store this in the api-gateway helmfile config in the deployment-charts repo, similar to what is done for MW API.

Currently only services with a discovery configuration can be routed to.

@elukey: I just found the above in the API Guidelines: https://wikitech.wikimedia.org/wiki/API_Gateway#Other_API_routes
It seems we'll need to complete T289835 before we can officially add our routes to the API Gateway.

I am a little hesitant to proceed since, due to how the discovery records are structured, we'll need to set the LVS service as "production-ready", so SRE can get paged if anything happens to istio. Since we are not production ready I'd loved to avoid this until the last moment, but we could push the trigger anytime in theory.

Also, we still need to figure out where to store the MODEL_NAME => host mappings, hopefully we can store this in the api-gateway helmfile config in the deployment-charts repo, similar to what is done for MW API.

This is probably the most pressing thing for us, and I have some doubts after reading the api-gateway chart in the deployment-charts repository. In its value.yaml file, there is a comment about the discovery endpoints:

# Hosts with $service.discovery.wmnet records
# Note that discovery_hosts behaves differently to the hosts dictionaries above.
# the format of each entry is:
# myservice.discovery.wmnet:
#   tls: true
#   port: 12345
#   path: myservicename
# This will map api.wikimedia.org/core/v1/myservicename to
# myservice.discovery.wmnet:12345 internally.
discovery_endpoints: {}

Meanwhile, IIUC, we want something like /inference/v1/blabla. I am not super familiar with the helm chart but it probably needs another separate entry for us, and we can probably specify our eqiad LVS endpoint for the moment rather than our discovery record.

Moreover, in the api-gateway's chart value.yaml the pathing_map seems to be usable only for core/mw-api URIs (according to _config.yaml at least), so there is probably some extra code to add to the chart to make our use case to work.

Since we are not production ready I'd loved to avoid this until the last moment, but we could push the trigger anytime in theory.

Ah yeah good call on this, let's avoid pinging SRE a bunch while we get the final details ironed out.

RE: values.yaml comment in the api-gateway chart, i'm wondering if that's a typo, because i'm seeing other discovery services having a /service/ prefix like here: https://api.wikimedia.org/wiki/API_reference/Service/Link_recommendation/Get_link_recommendations

I am also noticing that there is no host variable in the discovery_endpoints template (IIUC) so I am starting to agree that we may need our own separate entry.

I'll dig in a bit more tomorrow and ask around on slack.

Change 730965 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] hemlfile.d: add the inference service to api-gateway

https://gerrit.wikimedia.org/r/730965

Change 730966 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] api-gateway: allow HTTP host header rewrite for discovery endpoints

https://gerrit.wikimedia.org/r/730966

Since we are not production ready I'd loved to avoid this until the last moment, but we could push the trigger anytime in theory.

Ah yeah good call on this, let's avoid pinging SRE a bunch while we get the final details ironed out.

RE: values.yaml comment in the api-gateway chart, i'm wondering if that's a typo, because i'm seeing other discovery services having a /service/ prefix like here: https://api.wikimedia.org/wiki/API_reference/Service/Link_recommendation/Get_link_recommendations

You are definitely right, sent a code change to fix it :)

I am also noticing that there is no host variable in the discovery_endpoints template (IIUC) so I am starting to agree that we may need our own separate entry.

I'll dig in a bit more tomorrow and ask around on slack.

I filed other two code reviews to add a host header rewrite to the api-gateway's discovery endpoint configs. Lemme know what you think about it :)

Brain dump of a discussion I had with elukey follows. It's meant as a summary of functionality needed from the API Gateway and how it may tie in with config on our side.

As a summary of the functionality we will need. Note that not all of this needs to come from the API gateway. Some of it might also be semi-duplicated (a simple version on the API gateway, an elaborate one elsewhere, or vice versa). I'll try and annotate the individual parts of functionality regarding this. The list is in no particular order, the numbers are there just to reference each bit easily

  1. Routing/Rewriting/Annotating. This includes both rewriting paths (e.g. transposing path elements) as well as deriving HTTP headers from path elements (e.g. a request to /v1/foo/bar/baz resulting in a Host: foo.bar.wmnet header). Since our internal request routing will rely on such headers, this should at least on a coarse level be done on the API gateway. We may fine-tune our own routing map to a degree here, but it is preferable (in part due to (4)) to have this at least partially done on the API gateway. The Routing aspect depends a bit on the structure of the API Gateway's configuration (is each ML model its own "cluster" from the API Gateway's POV? Are different version of the same model distinguished in this routing decision?)
  2. Authentication. (establishing who the principal is that makes a query) I was under the impression that the API gateway already has *some* user database for this, but I am not sure. If it does, the next question is how this is handled (HTTP Auth? API Keys?) and how we would add/remove principals to this database database. While atm, we don't have functionality like this for ORES (private models like sockpuppet and the like are handled entirely differently), I think it is absolutely necessary to allow for most of the other functionality below.
  3. Authorization (establishing that the principal from (1) is allowed to make this query). If/when we server models which the public should not have unlimited access to (due to PII or security concerns, or because they incur heavy load, so wee need to rate limit them), we need to have a place to specify this configuration. Since it would be highly specific to ML/Lift Wing, I think this should live in our own config/setup. It could (and likely should) still trust that the API gateway has established the identity of the principal (in (2)).
  4. Rate Limiting (both on a global/per-service level as well as per principal). This would implement both per-service as well as per-principal limits (and this also the combination), to protect against both service overload (different models will be more expensive to serve than others), as well as information extraction (we might find that unfettered access leads to information leakage. Though of course ideally, no such extraction should be possible, it would be a matter of defense-in-depth). Rate limits can to a degree be abused to implement (3), but tend to have weird edge cases and confusing error behavior. Finally, rate limits are a way to implement draining of a service in a manner that gives clear feedback to users.
  5. "Soft" audit trail (keeping track of who (IP, principal) made a query, for both monitoring and logging). For monitoring and debugging, it would be very useful to still have the information about what IP/range and what user(s) make certain queries. Therefore, the API gateway should store such information in HTTP headers for the back-end (e.g. X-Forwared-For: and the like).
  6. Circuit breakers (deny-listing IPs, principals and possibly individual query params). The above mentioned aspects of forwarded queries all would be useful to implement "circuit breakers" at various levels, including a high but still meaningful rate limit at the API gateway, reporting of the IPs making anonymous requests, limiting the rate of requests per-IP (or IP range) etc. In the case of maintenance windows, outages, bugs, upgrades and the like, these "valves" allow us to steer and prioritize traffic in a manner that is predictable and reduces disruption for users. In general the finer-grained the control is, the closer to the actual serving model it should be.

On the whole, I think the functionality that would best live in/on the API gateway are 1, 2 and parts of 3 and 4. 5 and 6 would live throughout the whole pipeline in the sense of relying information provided by the API gateway, but acted upon in Lift Wing. Depending on the state of functionality/config flexibility, 3 and 4 might be close to 5 and 6 in this regard.

It looks like most of 1 can be done in the API Gateway config, but it was unclear if 2 is possible/already implemented. At any rate 3 would mean some extra work since obviously it currently has no concept of Lift Wing.

For a timeline, I think we can make do with just 1 and 4 at the start, but I would not deem us production ready without also having 2 and 3 at least ready (i.e. we log/establish identity and check for allowed access, even if it would be a blanket "Allow" in the beginning). 5 and 6 can wait until later, but should be taken into account when making decisions about how to implement 1-4.

Thanks for the brain dump @klausman!

RE: Authentication

the next question is how this is handled (HTTP Auth? API Keys?) and how we would add/remove principals to this database database

There are OAuth2.0 workflow for users & apps and also Personal API tokens for api-gateway. It remains unclear how we can manage the auth data on our end, but it exists.

RE: Rate Limiting
The API Gateway should give us some already: https://api.wikimedia.org/wiki/Documentation/Getting_started/Rate_limits

  • Personal api tokens & OAuth2.0 creds get 5K calls per hour
  • Anonymous users get 500 calls per hour
  • 429 Status Code when limit is reached