Page MenuHomePhabricator

Move Wikikube services to Istio ingress (where possible)
Open, Needs TriagePublic

Description

The Istio Ingress support in Wikikube has been tested with various services, and it would be nice to move pre-existing services (where possible) to it.

Main benefits:

  • The more services we have the better to find/iron-out issues with Istio.
  • Bootstrapping a new service would require way less time since there will be no need of an extra LVS IP/config.
  • Standardized metrics for the SLO dashboards (istio ones rather than the envoy/mesh ones). We already have Prometheus recording rules for the istio gateway metrics, so it is convenient and more performant (see T389357), but tere is no real need for the switch since we could do it for envoy too. Given the above point it would be nice to have a single standard.

I'd like to move some services to Ingress, improve the documentation and possibly set the standard for new services.

Procedure to move a live service to Ingress:

  • Enable the ingress module, and make sure the nodePort is kept. Example for citoid: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1135378\
  • Deploy and test on staging. For citoid, the URL was: https://citoid.k8s-staging.discovery.wmnet:30443/mediawiki/10.1038%2Fs41586-021-03470-x -k.
    • Please note: we have a CNAME like *.k8s-staging.discovery.wmnet in our DNS config, so once you deployed the ingress config to staging that should work for your service as well without any extra DNS changes.
  • Deploy and test in production.
    • For citoid: curl https://citoid.discovery.wmnet:30443/mediawiki/10.1038%2Fs41586-021-03470-x -k --resolve citoid.discovery.wmnet:30443:$(dig +short k8s-ingress-wikikube-ro.discovery.wmnet)
    • Please note that the ingress module will take care of configuring the Istio Gateway to accept $service.discovery.wmnet as SNI.
  • At this point we cannot set a CNAME like citoid.discovery.wmnet => k8s-ingress-wikikube-ro.discovery.wmnet because there is already an A record registered (the one reserved for the LVS service).
  • As last you can point the clients using $service.discovery.wmnet to $service-ingress.discovery.wmnet. Very easy to rollback if anything goes wrong too.

After the above procedure there are two roads:

  • Keep the $service-ingress.discovery.wmnet service name and clean up service.yaml and LVS from the old one.
  • Remove the A records for $service.discovery.wmnet in the DNS repo and create a CNAME to k8s-ingress-wikikube-ro.discovery.wmnet). Then point clients to $service.discovery.wmnet again and remove the $service-ingress CNAME.

Event Timeline

Currently being tested: Citoid

https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1135378
https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1135396

The first step is to add the Ingress config and keep the nodeport configuration (so no impact on the clients yet).

elukey@deploy1003:~$ curl https://citoid.k8s-staging.discovery.wmnet:30443/mediawiki/10.1038%2Fs41586-021-03470-x -k
[{"key":"I3LW5PQ3","version":0,"itemType":"journalArticle","tags":[],"publicationTitle":"Nature","volume":"593","issue":"7858","language":"en","ISSN":["0028-0836","1476-4687"],"date":"2021-05-13","pages":"266–269","DOI":"10.1038/s41586-021-03470-x","url":"https://www.nature.com/articles/s41586-021-03470-x","title":"Assessing transmissibility of SARS-CoV-2 lineage B.1.1.7 in England","libraryCatalog":"CrossRef","accessDate":"2025-04-09","author":[["","The COVID-19 Genomics UK (COG-UK) consortium"],["Erik","Volz"],["Swapnil","Mishra"],["Meera","Chand"],["Jeffrey C.","Barrett"],["Robert","Johnson"],["Lily","Geidelberg"],["Wes R.","Hinsley"],["Daniel J.","Laydon"],["Gavin","Dabrera"],["Áine","O’Toole"],["Robert","Amato"],["Manon","Ragonnet-Cronin"],["Ian","Harrison"],["Ben","Jackson"],["Cristina V.","Ariani"],["Olivia","Boyd"],["Nicholas J.","Loman"],["John T.","McCrone"],["Sónia","Gonçalves"],["David","Jorgensen"],["Richard","Myers"],["Verity","Hill"],["David K.","Jackson"],["Katy","Gaythorpe"],["Natalie","Groves"],["John","Sillitoe"],["Dominic P.","Kwiatkowski"],["Seth","Flaxman"],["Oliver","Ratmann"],["Samir","Bhatt"],["Susan","Hopkins"],["Axel","Gandy"],["Andrew","Rambaut"],["Neil M.","Ferguson"]],"source":["Zotero"]}]
elukey@deploy1003:~$ curl https://staging.svc.eqiad.wmnet:4003/mediawiki/10.1038%2Fs41586-021-03470-x
[{"key":"Y4V9DMA3","version":0,"itemType":"journalArticle","tags":[],"publicationTitle":"Nature","volume":"593","issue":"7858","language":"en","ISSN":["0028-0836","1476-4687"],"date":"2021-05-13","pages":"266–269","DOI":"10.1038/s41586-021-03470-x","url":"https://www.nature.com/articles/s41586-021-03470-x","title":"Assessing transmissibility of SARS-CoV-2 lineage B.1.1.7 in England","libraryCatalog":"CrossRef","accessDate":"2025-04-09","author":[["","The COVID-19 Genomics UK (COG-UK) consortium"],["Erik","Volz"],["Swapnil","Mishra"],["Meera","Chand"],["Jeffrey C.","Barrett"],["Robert","Johnson"],["Lily","Geidelberg"],["Wes R.","Hinsley"],["Daniel J.","Laydon"],["Gavin","Dabrera"],["Áine","O’Toole"],["Robert","Amato"],["Manon","Ragonnet-Cronin"],["Ian","Harrison"],["Ben","Jackson"],["Cristina V.","Ariani"],["Olivia","Boyd"],["Nicholas J.","Loman"],["John T.","McCrone"],["Sónia","Gonçalves"],["David","Jorgensen"],["Richard","Myers"],["Verity","Hill"],["David K.","Jackson"],["Katy","Gaythorpe"],["Natalie","Groves"],["John","Sillitoe"],["Dominic P.","Kwiatkowski"],["Seth","Flaxman"],["Oliver","Ratmann"],["Samir","Bhatt"],["Susan","Hopkins"],["Axel","Gandy"],["Andrew","Rambaut"],["Neil M.","Ferguson"]],"source":["Zotero"]}]

Change #1135402 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] modules: comment out gatewayHosts->domains

https://gerrit.wikimedia.org/r/1135402

Change #1135428 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] services: point rest-gateway in staging to the ingress citoid endpoint

https://gerrit.wikimedia.org/r/1135428

Change #1135433 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/dns@master] Add citoid CNAMEs for the Istio ingress

https://gerrit.wikimedia.org/r/1135433

Change #1135449 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] services: add extra fqdn to the citoid's ingress config

https://gerrit.wikimedia.org/r/1135449

Change #1135433 merged by Elukey:

[operations/dns@master] Add citoid-ingress CNAMEs for the Istio ingress

https://gerrit.wikimedia.org/r/1135433

Change #1135449 merged by Elukey:

[operations/deployment-charts@master] services: add extra fqdn to the citoid's ingress config

https://gerrit.wikimedia.org/r/1135449

Change #1135428 merged by Elukey:

[operations/deployment-charts@master] services: point rest-gateway to the ingress citoid endpoint

https://gerrit.wikimedia.org/r/1135428

Change #1135683 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] rest-gateway: enable ingress at route-level for citoid

https://gerrit.wikimedia.org/r/1135683

Change #1135683 merged by Elukey:

[operations/deployment-charts@master] rest-gateway: enable ingress at route-level for citoid

https://gerrit.wikimedia.org/r/1135683

Change #1133389 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] services: enable ingress for Kartotherian

https://gerrit.wikimedia.org/r/1133389