Page MenuHomePhabricator

Decide external URL scheme (on API GW) for models on Lift Wing
Closed, ResolvedPublic

Description

As it is currently set up, to reach an inference model on Liftwing directly (i.e. without invo;ving the API GW) uses the following URLs and Host headers:

URL                                                                                Host
---------------------------------------------------------------------------------- ----------------------------------------------------------------------
https://inference.discovery.wmnet:30443/v1/models/enwiki-articlequality:predict    enwiki-articlequality.revscoring-articlequality.wikimedia.org
https://inference.discovery.wmnet:30443/v1/models/enwiki-articletopic:predict      enwiki-articletopic.revscoring-articletopic.wikimedia.org
https://inference.discovery.wmnet:30443/v1/models/enwiki-damaging:predict          enwiki-damaging.revscoring-editquality-damaging.wikimedia.org
https://inference.discovery.wmnet:30443/v1/models/enwiki-draftquality:predict      enwiki-draftquality.revscoring-draftquality.wikimedia.org
https://inference.discovery.wmnet:30443/v1/models/enwiki-drafttopic:predict        enwiki-drafttopic.revscoring-drafttopic.wikimedia.org
https://inference.discovery.wmnet:30443/v1/models/enwiki-goodfaith:predict         enwiki-goodfaith.revscoring-editquality-goodfaith.wikimedia.org
https://inference.discovery.wmnet:30443/v1/models/translatewiki-reverted:predict   translatewiki-reverted.revscoring-editquality-reverted.wikimedia.org
https://inference.discovery.wmnet:30443/v1/models/outlink-topic-model:predict      outlink-topic-model.articletopic-outlink.wikimedia.org

Note that while superficially, there is repetition in the Host header (e.g. having the drafttopic token twice), the semantics are different. The basic form of the header is:

isvc-service-name.k8s-namespace.wikimedia.org

Since the first and second sub-parts of the header refer to different things that may have similar naming, the same strings can show up twice.

For the outside world (i.e. API GW users), the above scheme is not very useful since it exposes too many implementation details and is more complex than the end user really needs. Since we are not the only tenants of the API GW, there also is a fixed prefix we will have to use:

https://api.wikimedia.org/service/lw/

Everything after lw is for us to decide, but we should do so with several things in mind:

  1. The scheme we use should be logical and simple to understand
  2. It should allow us to construct both the internal URL and Host header with relative ease and without encoding too much static mapping in the API GW config
  3. It should allow us to expand in the future, e.g. to run non-inference services on LW
  4. Adding a new ML service that doesn't use any existing namespaces or libraries (like revscoring) should be straightforward. Ideally, it would "just work" without touching the API GW config.
  5. The scheme should avoid requiring us to change the internal scheme too much. Renaming a few pods or even namespaces is fine, but we should not have to dig into kserve/istio/k8s code and config too deeply.

The scheme we decide on will likely with us for years and cannot easily be deprecated or changed, so we must have good confidence that it will serve us and the users well.

Event Timeline

There are two things to keep in mind when querying Lift Wing. Let's pick and example:

https://inference.svc.codfw.wmnet:30443/v1/models/enwiki-goodfaith:predict

Host: enwiki-goodfaith.revscoring-editquality-goodfaith.wikimedia.org

The above are details about how to query the internal endpoint of LiftWing. From curl it would look like the following:

curl "https://inference.svc.codfw.wmnet:30443/v1/models/enwiki-goodfaith:predict" -X POST -d @input.json -i -H "Host: enwiki-goodfaith.revscoring-editquality-goodfaith.wikimedia.org" --http1.1

The two things to keep in mind are related to routing:

  1. https://inference.svc.codfw.wmnet:30443/v1/models/enwiki-goodfaith:predict is needed so that we can hit the Istio Gateway endpoint, referenced by inference.svc.codfw.wmnet:. The URI is needed by KServe to properly invoke the model and generate a score (for example, if you invoke enwiki-goodfaith:predict on a damaging model, you'll get an error).
  1. The Host header is needed by the Istio Gateway to decide what backend needs to serve the URI /v1/models/enwiki-goodfaith:predict.

There may be a way to simplify this, but so far we didn't find one. This dual "config" between Istio and KServe complicates a little things, but we should find a way to hide the complexity to the external user (so the API-GW needs to offer some configurations to route properly a request without requiring the user to pass extra details).

After some discussion, we have decided that the API-GW side URL scheme for LW should look like:

/lw/inference/v1/models/[model name]:predict

so for example to reach the enwiki-articlequality model, you would use:

https://api.wikimedia.org/lw/inference/v1/models/enwiki-articlequality:predict

Or, as a curl command line:

curl -s "https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict" -X POST -d '{ "rev_id": 123555 }'

This scheme is relatively light for the API GW to implement (a prototype/WIP change is https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/844452

Note that this currently requires getting a JWT from https://api.wikimedia.org/wiki/Special:AppManagement. In the future, the API GW may allow POST access in an anonymous fashion.

Further progress on the concrete config changes for the API GW will be track in T288789 (or more likely one of its child tasks).