Page MenuHomePhabricator

Create a staging ingress configuration for ml-staging-codfw
Closed, ResolvedPublic

Description

We have now to create ores-legacy-staging.svc.codfw.wmnet for the ml-staging-codfw cluster, but instead of creating a specialized VIP + config we could follow what Service Ops did to keep things in sync between the teams, namely a single endpoint.

The overall idea is to:

  • create ml-staging.svc.codfw.wmnet as CNAME for one of the ml-staging200[12] nodes.
  • create a cergen certificate like the one for wikikube staging to deploy on tls-proxy containers (mesh).
  • modify the ingress module in deployment-charts to generate the correct values for this new endpoint.

On the knative / istio side:

  • Allow the definition of multiple Istio ingresses in our helmfile Istio config.
  • Allow to use a different ingress for knative-serving.

Event Timeline

Change 914306 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] modules: duplicate the istio ingress template for 1.0.2

https://gerrit.wikimedia.org/r/914306

Change 914307 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] modules: add ml-staging cfg to the istio template

https://gerrit.wikimedia.org/r/914307

Change 914728 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] conftool-data: add config for the k8s ingress for ml-staging

https://gerrit.wikimedia.org/r/914728

Change 914735 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Add conftool and service config for k8s-ingress-ml-staging

https://gerrit.wikimedia.org/r/914735

Change 914730 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/dns@master] Add the VIP settings for the K8s ingress for ml-staging

https://gerrit.wikimedia.org/r/914730

Change 914306 merged by Elukey:

[operations/deployment-charts@master] modules: duplicate the istio ingress template for 1.0.2

https://gerrit.wikimedia.org/r/914306

Change 914307 merged by Elukey:

[operations/deployment-charts@master] modules: add ml-staging cfg to the istio template

https://gerrit.wikimedia.org/r/914307

Change 914793 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] fast-api: update ingress.istio module version to 1.0.2

https://gerrit.wikimedia.org/r/914793

Change 914795 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] ml-services: enable the 'mlstaging' ingress flag for ores-legacy

https://gerrit.wikimedia.org/r/914795

Change 914793 merged by Elukey:

[operations/deployment-charts@master] fast-api: update ingress.istio module version to 1.0.2

https://gerrit.wikimedia.org/r/914793

Change 914795 merged by Elukey:

[operations/deployment-charts@master] ml-services: enable the 'mlstaging' ingress flag for ores-legacy

https://gerrit.wikimedia.org/r/914795

Change 914859 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] admin_ng: add ml-staging among helmfile_namespace_certs's options

https://gerrit.wikimedia.org/r/914859

Change 914859 merged by Elukey:

[operations/deployment-charts@master] admin_ng: add ml-staging among helmfile_namespace_certs's options

https://gerrit.wikimedia.org/r/914859

Change 915416 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] admin_ng: remove tls hostname override for ores-legacy-staging

https://gerrit.wikimedia.org/r/915416

Change 915416 merged by Elukey:

[operations/deployment-charts@master] admin_ng: remove tls hostname override for ores-legacy-staging

https://gerrit.wikimedia.org/r/915416

Change 914728 merged by Elukey:

[operations/puppet@production] conftool-data: add config for the k8s ingress for ml-staging

https://gerrit.wikimedia.org/r/914728

Change 914735 merged by Elukey:

[operations/puppet@production] Add conftool and service config for k8s-ingress-ml-staging

https://gerrit.wikimedia.org/r/914735

Change 915685 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] custom_deploy.d: fix istio config for ml-staging-codfw

https://gerrit.wikimedia.org/r/915685

First result! from any node, like stat100x, one could use:

curl "https://ores-legacy.k8s-ml-staging.discovery.wmnet:31443/v3/scores" -i --http1.1 --resolve  ores-legacy.k8s-ml-staging.discovery.wmnet:31443:10.192.0.201

Scoring a single element doesn't work yet, since the envoy proxies that we use to contact lift wing don't support POSTs in their default config.

The approach of using two istio ingresses seems to work fine, so now the next steps are:

  • Create the new VIP for k8s-ml-staging.discovery.wmnet
  • Create the DNS CNAME *k8s-ml-staging.discovery.wmnet

Change 915685 merged by Elukey:

[operations/deployment-charts@master] custom_deploy.d: fix istio config for ml-staging-codfw

https://gerrit.wikimedia.org/r/915685

Change 914730 merged by Elukey:

[operations/dns@master] Add the VIP settings for the K8s ingress for ml-staging

https://gerrit.wikimedia.org/r/914730

Change 918409 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] service::catalog: set lvs_setup for k8s-ingress-ml-staging

https://gerrit.wikimedia.org/r/918409

Change 919785 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/dns@master] Add discovery settings for k8s-ingress-ml-staging

https://gerrit.wikimedia.org/r/919785

Change 918409 merged by Elukey:

[operations/puppet@production] service::catalog: set lvs_setup for k8s-ingress-ml-staging

https://gerrit.wikimedia.org/r/918409

Mentioned in SAL (#wikimedia-operations) [2023-05-15T08:26:44Z] <elukey> restart pybal on lvs2010 and lvs2009 to pick up new LVS VIP for ml-staging k8s ingress - T335756

Change 919795 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] service::catalog: switch k8s-ingress-ml-staging to production

https://gerrit.wikimedia.org/r/919795

Change 919795 merged by Elukey:

[operations/puppet@production] service::catalog: switch k8s-ingress-ml-staging to production

https://gerrit.wikimedia.org/r/919795

Change 919785 merged by Elukey:

[operations/dns@master] Add discovery settings for k8s-ingress-ml-staging

https://gerrit.wikimedia.org/r/919785

And we are finally done!

elukey@stat1004:~$ time curl "https://ores-legacy.k8s-ml-staging.discovery.wmnet:31443/v3/scores/enwiki/123433/damaging" -i --http1.1
HTTP/1.1 200 OK
date: Mon, 15 May 2023 13:26:50 GMT
server: istio-envoy
content-length: 377
content-type: application/json
x-envoy-upstream-service-time: 134

{
  "enwiki": {
    "models": {
      "damaging": {
        "version": "0.5.1"
      }
    }, 
    "scores": {
      "123433": {
        "damaging": {
          "score": {
            "prediction": false, 
            "probability": {
              "false": 0.9899875154315122, 
              "true": 0.010012484568487832
            }
          }
        }
      }
    }
  }
}
real	0m0.260s
user	0m0.023s
sys	0m0.003s