Page MenuHomePhabricator

Migrate internal traffic to k8s
Open, In Progress, MediumPublic

Description

We need to progressively migrate traffic from our services to call the api in the mw-api-int cluster on k8s.

Right now we have (via this thanos query:

  • Mobileapps making 3k rps to the mediawiki API (!!!) <- Moved in 2nd stage
  • restbase making 600 rps <- Moved in 2nd stage
  • ores making 75-100 rps <- Deprecated
  • wikifeeds making ~ 70 rps <- Moved in 2nd stage
  • flink making ~ 40 rps <- Moved in 2nd stage

Everything else is basically marginal.

I propose we start moving all services on kubernetes to use mw-api-int now, with the exception of the ones named above.

Kubernetes services calling mediawiki

Related Objects

StatusSubtypeAssignedTask
StalledNone
OpenNone
OpenNone
OpenNone
StalledNone
OpenNone
StalledNone
StalledFeatureNone
StalledKrinkle
OpenNone
StalledNone
OpenNone
In ProgressClement_Goubert
ResolvedClement_Goubert
ResolvedClement_Goubert
ResolvedClement_Goubert
InvalidClement_Goubert
ResolvedJoe
ResolvedClement_Goubert
ResolvedClement_Goubert
ResolvedClement_Goubert
ResolvedClement_Goubert
ResolvedJoe
ResolvedJoe
ResolvedJoe
ResolvedJMeybohm
ResolvedJoe
ResolvedClement_Goubert
ResolvedClement_Goubert
ResolvedClement_Goubert
DeclinedClement_Goubert
ResolvedClement_Goubert
Openelukey

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Clement_Goubert updated Other Assignee, added: Joe.
Clement_Goubert moved this task from Backlog to In Progress on the MW-on-K8s board.

Change 903595 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] P:services_proxy::envoy: Add mw-api-int

https://gerrit.wikimedia.org/r/903595

Change 903646 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/deployment-charts@master] cxserver: Switch to mw-api-int-async on k8s

https://gerrit.wikimedia.org/r/903646

Change 904060 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] service_catalog: Add mw-api-int k8s service - 2

https://gerrit.wikimedia.org/r/904060

Change 904061 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] service_catalog: Add mw-api-int k8s service - 3

https://gerrit.wikimedia.org/r/904061

Change 904065 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/dns@master] mw-api-int: add geo and metafo records

https://gerrit.wikimedia.org/r/904065

Mentioned in SAL (#wikimedia-operations) [2023-03-29T09:57:03Z] <claime> Adding mw-api-int to service_catalog in service_setup - T333120

Change 903217 merged by Clément Goubert:

[operations/puppet@production] service_catalog: Add mw-api-int k8s service - 1

https://gerrit.wikimedia.org/r/903217

Mentioned in SAL (#wikimedia-operations) [2023-03-29T09:58:56Z] <claime> running puppet on O:kubernetes::worker and O:lvs::balancer - T333120

Mentioned in SAL (#wikimedia-operations) [2023-03-29T10:37:09Z] <claime> Switching mw-api-int to lvs_setup - T333120

Change 904060 merged by Clément Goubert:

[operations/puppet@production] service_catalog: Add mw-api-int k8s service - 2

https://gerrit.wikimedia.org/r/904060

Mentioned in SAL (#wikimedia-operations) [2023-03-29T10:41:03Z] <cgoubert@cumin1001> START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2010*} and A:lvs (T333120)

Mentioned in SAL (#wikimedia-operations) [2023-03-29T10:42:59Z] <cgoubert@cumin1001> END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2010*} and A:lvs (T333120)

Mentioned in SAL (#wikimedia-operations) [2023-03-29T10:46:23Z] <cgoubert@cumin1001> START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T333120)

Mentioned in SAL (#wikimedia-operations) [2023-03-29T10:49:17Z] <cgoubert@cumin1001> END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T333120)

Mentioned in SAL (#wikimedia-operations) [2023-03-29T10:50:22Z] <claime> START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T333120)

Mentioned in SAL (#wikimedia-operations) [2023-03-29T10:50:57Z] <claime> Switching mw-api-int to production - T333120

Change 904061 merged by Clément Goubert:

[operations/puppet@production] service_catalog: Add mw-api-int k8s service - 3

https://gerrit.wikimedia.org/r/904061

Mentioned in SAL (#wikimedia-operations) [2023-03-29T10:52:30Z] <claime> Running puppet on dns-auth - T333120

Change 904065 merged by Clément Goubert:

[operations/dns@master] mw-api-int: add discovery records

https://gerrit.wikimedia.org/r/904065

Mentioned in SAL (#wikimedia-operations) [2023-03-29T10:58:30Z] <claime> authdns-update successful on all nodes - T333120

mw-api-int and mw-api-int-ro services now in production, we can proceed with creating the envoy listeners in https://gerrit.wikimedia.org/r/c/operations/puppet/+/903595/ and then switching services to use them.

Change 903595 merged by Clément Goubert:

[operations/puppet@production] P:services_proxy::envoy: Add mw-api-int

https://gerrit.wikimedia.org/r/903595

Change 908542 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/deployment-charts@master] admin_ng: Add mw-on-k8s Egress rules

https://gerrit.wikimedia.org/r/908542

Change 908542 merged by jenkins-bot:

[operations/deployment-charts@master] admin_ng: Add mw-on-k8s Egress rules

https://gerrit.wikimedia.org/r/908542

Change 908553 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/deployment-charts@master] cxserver: Add mesh egress

https://gerrit.wikimedia.org/r/908553

Change 908553 merged by jenkins-bot:

[operations/deployment-charts@master] cxserver: Add mesh egress

https://gerrit.wikimedia.org/r/908553