Page MenuHomePhabricator

Migrate restbase from mwapi-async to mw-api-int
Closed, ResolvedPublic

Assigned To
Authored By
Clement_Goubert
Feb 22 2024, 1:06 PM
Referenced Files
F43529353: image.png
Mar 27 2024, 3:46 PM
F43515625: image.png
Mar 27 2024, 11:54 AM
F43515539: image.png
Mar 27 2024, 11:54 AM
F43515489: image.png
Mar 27 2024, 11:54 AM
F43443512: image.png
Mar 26 2024, 2:58 PM

Description

Re-using the mw-api-async-transition listener that was used for mobileapps, progressively transfer restbase's backend mediawiki api calls to mw-api-int

  • 10%
  • 50%
  • 100%

Event Timeline

Clement_Goubert changed the task status from Open to In Progress.Feb 22 2024, 1:07 PM
Clement_Goubert triaged this task as Medium priority.
Clement_Goubert moved this task from Incoming 🐫 to this.quarter 🍕 on the serviceops board.

Change 1005756 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] restbase: Start moving mwapi calls to mw-on-k8s

https://gerrit.wikimedia.org/r/1005756

Mentioned in SAL (#wikimedia-operations) [2024-03-26T11:15:18Z] <claime> Stopping puppet on P:restbase to deploy 1005756 - T358213

Change #1005756 merged by Clément Goubert:

[operations/puppet@production] restbase: Start moving mwapi calls to mw-on-k8s

https://gerrit.wikimedia.org/r/1005756

Mentioned in SAL (#wikimedia-operations) [2024-03-26T11:19:13Z] <claime> enabling and running puppet on restbase2021.codfw.wmnet - T358213

Mentioned in SAL (#wikimedia-operations) [2024-03-26T11:24:04Z] <claime> enabling and running puppet on restbase1035.eqiad.wmnet - T358213

Change #1014493 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] restbase: Migrate backend traffic to mw-api-int

https://gerrit.wikimedia.org/r/1014493

Mentioned in SAL (#wikimedia-operations) [2024-03-26T12:54:15Z] <claime> enabling and running puppet on restbase2021.codfw.wmnet - T358213

Mentioned in SAL (#wikimedia-operations) [2024-03-26T12:54:20Z] <claime> enabling and running puppet on restbase1035.eqiad.wmnet - T358213

Mentioned in SAL (#wikimedia-operations) [2024-03-26T14:19:45Z] <claime> enabling and running puppet on P:restbase - T358213

Mentioned in SAL (#wikimedia-operations) [2024-03-26T14:20:51Z] <claime> Deploying split listener for 10% of backend restbase traffic to mw-api-int - T358213

10% of RESTbase's backend mwapi requests are now made to mw-api-int

image.png (500×1 px, 46 KB)

Change #1015016 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] restbase: Moving 50% of mwapi calls to mw-on-k8s

https://gerrit.wikimedia.org/r/1015016

Things to keep an eye on:

  • Upstream error rate is higher on mw-api-int than bare-metal

image.png (500×1 px, 63 KB)

  • Connection establishment time is way higher on mw-api-int

image.png (500×1 px, 91 KB)

  • Upstream latencies are consistently higher on mw-api-int

image.png (500×1 px, 139 KB)

As the envoy listener configuration is the same for both, it may come from the tls termination envoy configuration in mw-on-k8s, since we are seeing similar numbers for connections to mw-parsoid.
Some of it can also be explained by routing through kubernetes, as connection establishment time from restbase to other k8s services (for example mobileapps or cxserver) are around 25ms average p99.

In any case, I can't see an impact on client side latency or request rate at 10% of traffic, and propose moving forward to 50%.

Mentioned in SAL (#wikimedia-operations) [2024-03-27T15:08:47Z] <claime> Disabling puppet on P:restbase - T358213

Change #1015016 merged by Clément Goubert:

[operations/puppet@production] restbase: Moving 50% of mwapi calls to mw-on-k8s

https://gerrit.wikimedia.org/r/1015016

Mentioned in SAL (#wikimedia-operations) [2024-03-27T15:11:28Z] <claime> enabling and running puppet on restbase2021.codfw.wmnet - T358213

Mentioned in SAL (#wikimedia-operations) [2024-03-27T15:14:26Z] <claime> enabling and running puppet on restbase1035.eqiad.wmnet - T358213

Mentioned in SAL (#wikimedia-operations) [2024-03-27T15:17:15Z] <claime> enabling and running puppet on P:restbase - T358213

Mentioned in SAL (#wikimedia-operations) [2024-03-27T15:51:21Z] <claime> 50% of backend RESTbase traffic to mw-api-int - T358213

Change #1015278 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/deployment-charts@master] mw-api-int: Double envoy concurrency

https://gerrit.wikimedia.org/r/1015278

Change #1015278 merged by jenkins-bot:

[operations/deployment-charts@master] mw-api-int: Double envoy concurrency

https://gerrit.wikimedia.org/r/1015278

Mentioned in SAL (#wikimedia-operations) [2024-03-28T11:04:37Z] <claime> RESTbase: Migrate backend traffic to mw-api-int - T358213

Mentioned in SAL (#wikimedia-operations) [2024-03-28T11:04:44Z] <claime> Disabling puppet on P:restbase - T358213

Change #1014493 merged by Clément Goubert:

[operations/puppet@production] restbase: Migrate backend traffic to mw-api-int

https://gerrit.wikimedia.org/r/1014493

Mentioned in SAL (#wikimedia-operations) [2024-03-28T11:09:50Z] <claime> enabling and running puppet on restbase2021.codfw.wmnet - T358213

Mentioned in SAL (#wikimedia-operations) [2024-03-28T11:12:27Z] <claime> enabling and running puppet on restbase1035.eqiad.wmnet - T358213

Mentioned in SAL (#wikimedia-operations) [2024-03-28T11:15:26Z] <claime> enabling and running puppet on P:restbase - T358213