As it is currently set up, to reach an inference model on Liftwing directly (i.e. without invo;ving the API GW) uses the following URLs and Host headers:
URL Host ---------------------------------------------------------------------------------- ---------------------------------------------------------------------- https://inference.discovery.wmnet:30443/v1/models/enwiki-articlequality:predict enwiki-articlequality.revscoring-articlequality.wikimedia.org https://inference.discovery.wmnet:30443/v1/models/enwiki-articletopic:predict enwiki-articletopic.revscoring-articletopic.wikimedia.org https://inference.discovery.wmnet:30443/v1/models/enwiki-damaging:predict enwiki-damaging.revscoring-editquality-damaging.wikimedia.org https://inference.discovery.wmnet:30443/v1/models/enwiki-draftquality:predict enwiki-draftquality.revscoring-draftquality.wikimedia.org https://inference.discovery.wmnet:30443/v1/models/enwiki-drafttopic:predict enwiki-drafttopic.revscoring-drafttopic.wikimedia.org https://inference.discovery.wmnet:30443/v1/models/enwiki-goodfaith:predict enwiki-goodfaith.revscoring-editquality-goodfaith.wikimedia.org https://inference.discovery.wmnet:30443/v1/models/translatewiki-reverted:predict translatewiki-reverted.revscoring-editquality-reverted.wikimedia.org https://inference.discovery.wmnet:30443/v1/models/outlink-topic-model:predict outlink-topic-model.articletopic-outlink.wikimedia.org
Note that while superficially, there is repetition in the Host header (e.g. having the drafttopic token twice), the semantics are different. The basic form of the header is:
isvc-service-name.k8s-namespace.wikimedia.org
Since the first and second sub-parts of the header refer to different things that may have similar naming, the same strings can show up twice.
For the outside world (i.e. API GW users), the above scheme is not very useful since it exposes too many implementation details and is more complex than the end user really needs. Since we are not the only tenants of the API GW, there also is a fixed prefix we will have to use:
https://api.wikimedia.org/service/lw/
Everything after lw is for us to decide, but we should do so with several things in mind:
- The scheme we use should be logical and simple to understand
- It should allow us to construct both the internal URL and Host header with relative ease and without encoding too much static mapping in the API GW config
- It should allow us to expand in the future, e.g. to run non-inference services on LW
- Adding a new ML service that doesn't use any existing namespaces or libraries (like revscoring) should be straightforward. Ideally, it would "just work" without touching the API GW config.
- The scheme should avoid requiring us to change the internal scheme too much. Renaming a few pods or even namespaces is fine, but we should not have to dig into kserve/istio/k8s code and config too deeply.
The scheme we decide on will likely with us for years and cannot easily be deprecated or changed, so we must have good confidence that it will serve us and the users well.