Page MenuHomePhabricator

Migrate rdf-streaming-updater to connect to mw-on-k8s
Closed, ResolvedPublic

Description

We are going to migrate the requests from flink to the mediawiki api to go to mediawiki on kubernetes instead than to the traditional api cluster.

Given flink does on average about 20 rps per datacenter, we should be able to add resources in line with mw-api-ext or less.

Event Timeline

Joe renamed this task from Migrate flink-cluster-taskmanager to connect to mw-on-k8s to Migrate rdf-streaming-updater to connect to mw-on-k8s.Jul 19 2023, 1:45 PM
Joe created this task.

@dcausse not sure if you're the right person to ask, if not apologies; but I wanted to know if we're making any write request to mediawiki from the rdf updater, or we're just reading data.

In the latter case, we can make flink talk to the local mw api cluster in all datacenters, distributing resources better and reducing latencies.

@dcausse not sure if you're the right person to ask, if not apologies; but I wanted to know if we're making any write request to mediawiki from the rdf updater, or we're just reading data.

In the latter case, we can make flink talk to the local mw api cluster in all datacenters, distributing resources better and reducing latencies.

We do only reads indeed and we actually initially tested with the api-ro endpoint so it should work and I'm all for it, @bking or myself can take care of this if this helps.

@dcausse not sure if you're the right person to ask, if not apologies; but I wanted to know if we're making any write request to mediawiki from the rdf updater, or we're just reading data.

In the latter case, we can make flink talk to the local mw api cluster in all datacenters, distributing resources better and reducing latencies.

We do only reads indeed and we actually initially tested with the api-ro endpoint so it should work and I'm all for it, @bking or myself can take care of this if this helps.

I'll have to do some prep work, but sure once I'm ready I'm happy to let you drive the migration :)

Change 939700 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/puppet@production] services_proxy: add mw-api-int-async-ro

https://gerrit.wikimedia.org/r/939700

Change 939701 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/deployment-charts@master] mw-api-int: bump replicas to 8

https://gerrit.wikimedia.org/r/939701

Change 939702 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/deployment-charts@master] rdf-streaming-updater: move to mw-api-int, use readonly endpoint

https://gerrit.wikimedia.org/r/939702

Change 939716 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/deployment-charts@master] mw-api-int: increase namespace limits

https://gerrit.wikimedia.org/r/939716

Change 939700 merged by Giuseppe Lavagetto:

[operations/puppet@production] services_proxy: add mw-api-int-async-ro

https://gerrit.wikimedia.org/r/939700

Joe changed the task status from Open to In Progress.Jul 20 2023, 11:57 AM
Joe claimed this task.
Joe triaged this task as Medium priority.
Joe moved this task from Incoming 🐫 to Doing 😎 on the serviceops board.

Change 939716 merged by jenkins-bot:

[operations/deployment-charts@master] mw-api-int: increase namespace limits

https://gerrit.wikimedia.org/r/939716

Change 939701 merged by jenkins-bot:

[operations/deployment-charts@master] mw-api-int: bump replicas to 8

https://gerrit.wikimedia.org/r/939701

Change 939702 merged by jenkins-bot:

[operations/deployment-charts@master] rdf-streaming-updater: move to mw-api-int, use readonly endpoint

https://gerrit.wikimedia.org/r/939702

Change 940870 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/deployment-charts@master] rdf-streaming-updater (dse-k8s test): use mw-api-int-async-ro

https://gerrit.wikimedia.org/r/940870

Now rdf-streaming-updater uses the read-only endpoint on k8s, meaning it should also be more efficient in its API usage.

Change 940870 merged by jenkins-bot:

[operations/deployment-charts@master] rdf-streaming-updater (dse-k8s test): use mw-api-int-async-ro

https://gerrit.wikimedia.org/r/940870

Change 940886 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/deployment-charts@master] mw-api-int: Raise php container CPU limits

https://gerrit.wikimedia.org/r/940886

Change 940886 merged by jenkins-bot:

[operations/deployment-charts@master] mw-api-int: Raise php container CPU limits

https://gerrit.wikimedia.org/r/940886

Change 940888 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/deployment-charts@master] admin_ng: Raise max cpu per pod to 10 for mw-api-int

https://gerrit.wikimedia.org/r/940888

Change 940888 merged by jenkins-bot:

[operations/deployment-charts@master] admin_ng: Raise max cpu per pod to 10 for mw-api-int

https://gerrit.wikimedia.org/r/940888

We've been experiencing throttling on mw-api-int and raising the container's CPU limit has helped, but not fixed the issue.

image.png (2×3 px, 180 KB)

I will raise the number of replicas for this deployment, however looking at the other deployments, I can see they also get throttled.
We may want to think about removing the cpu limits for the latency-sensitive mw-on-k8s deployments.

Change 941900 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/deployment-charts@master] mw-api-int: Raise number of replicas to 10

https://gerrit.wikimedia.org/r/941900

Change 941900 abandoned by Clément Goubert:

[operations/deployment-charts@master] mw-api-int: Raise number of replicas to 10

Reason:

We've gone way beyond 10 replicas by now

https://gerrit.wikimedia.org/r/941900