Page MenuHomePhabricator

Evacuate all kafka-mirrormaker instances to Kubernetes
Closed, ResolvedPublic

Assigned To
Authored By
brouberol
Feb 13 2026, 2:35 PM
Referenced Files
F73156838: Screenshot 2026-03-19 at 17.49.57.png
Mar 19 2026, 4:50 PM
Restricted File
Mar 19 2026, 10:16 AM
Restricted File
Mar 11 2026, 11:57 AM
F72805168: Screenshot 2026-03-11 at 11.29.40.png
Mar 11 2026, 10:30 AM
F72804521: Screenshot 2026-03-11 at 10.09.32.png
Mar 11 2026, 9:10 AM
F72573993: Screenshot 2026-03-05 at 09.45.54.png
Mar 5 2026, 8:46 AM
F72435655: Screenshot 2026-02-26 at 09.35.14.png
Feb 26 2026, 8:37 AM

Description

We've had the ability to run kafka mirrormaker v1 on Kubernetes since https://phabricator.wikimedia.org/T304373. We're currently running in a state in which one MM1 instance runs on Kubernetes

brouberol@deploy2002:~$ kube-env kafka-mirrormaker dse-k8s-eqiad
brouberol@deploy2002:~$ k get pod
NAME                                                              READY   STATUS    RESTARTS   AGE
kafka-mirrormaker-logging-eqiad-to-jumbo-eqiad-7f75b974c6-rm2bw   1/1     Running   0          24h

and the other instances run alongside the brokers, on the various kafka clusters.

We should stop running these MM1 instances directly on the broker hosts themselves, as it will make the kafka upgrade plan easier.

We should:

  • agree on which k8s cluster we'd like to run all MM1 instances
  • migrate all remaining MM1 instances to this k8s cluster

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/cookbooksmaster+0 -78
operations/puppetproduction+2 -1
operations/deployment-chartsmaster+1 -6
operations/deployment-chartsmaster+22 -0
operations/deployment-chartsmaster+10 -14
operations/puppetproduction+2 -1
operations/puppetproduction+2 -1
operations/deployment-chartsmaster+30 -0
operations/deployment-chartsmaster+31 -1
operations/deployment-chartsmaster+2 -1
operations/deployment-chartsmaster+6 -2
operations/deployment-chartsmaster+1 -1
operations/deployment-chartsmaster+70 -7
operations/deployment-chartsmaster+6 -1
operations/deployment-chartsmaster+4 -4
operations/deployment-chartsmaster+4 -73
operations/deployment-chartsmaster+4 -8
operations/puppetproduction+2 -0
operations/deployment-chartsmaster+3 -3
operations/puppetproduction+1 -0
operations/deployment-chartsmaster+89 -0
operations/deployment-chartsmaster+1 -0
operations/puppetproduction+4 -0
operations/deployment-chartsmaster+29 -4
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change #1247066 merged by Brouberol:

[operations/deployment-charts@master] kafka-mirrormkaer: ensure consumer group names are the same than on the puppetized kafka clusters

https://gerrit.wikimedia.org/r/1247066

The plan mostly worked as expected

brouberol@kafka-logging1005:~$ kafka consumer-groups --group mirrormaker-logging-eqiad-to-jumbo-eqiad --export --reset-offsets --to-current --dry-run  --all-topics > offsets.csv

I also had to remove the first line from offsets.csv that read

kafka-consumer-groups --bootstrap-server kafka-logging1001.eqiad.wmnet:9092,kafka-logging1002.eqiad.wmnet:9092,kafka-logging1003.eqiad.wmnet:9092,kafka-logging1004.eqiad.wmnet:9092,kafka-logging1005.eqiad.wmnet:9092 --group kafka-mirror-logging-eqiad_to_jumbo-eqiad --reset-offsets --from-file offsets.txt --all-topics --execute

to ensure the file contained valid csv data.

I then deployed the new kafka-mirrormaker chart on dse-k8s-eqiad until I could see the new consumer in the output of

brouberol@kafka-logging1005:~$ kafka consumer-groups --list

and then scaled down the deployment to 0. At that point I was able to run

brouberol@kafka-logging1005:~$ kafka consumer-groups --group kafka-mirror-logging-eqiad_to_jumbo-eqiad --reset-offsets --from-file offsets.csv  --all-topics --execute
eqiad.eventgate-logging-external.error.validation,2,1864
eqiad.w3c.reportingapi.network_error,2,1218670819
eqiad.kaios_app.error,2,7693
eqiad.w3c.reportingapi.network_error,3,1214012652
eqiad.eventgate-logging-external.test.event,3,10512329
eqiad.kaios_app.error,3,7895
eqiad.mediawiki.client.error,3,57602083
eqiad.w3c.reportingapi.network_error,1,1218666300
eqiad.mediawiki.client.error,4,57771210
eqiad.mediawiki.client.error,1,57751009
eqiad.kaios_app.error,4,7574
eqiad.mediawiki.client.error,0,57621501
eqiad.eventgate-logging-external.test.event,5,10561927
eqiad.w3c.reportingapi.network_error,5,1218328096
eqiad.eventgate-logging-external.error.validation,3,1902
eqiad.kaios_app.error,1,7885
eqiad.eventgate-logging-external.test.event,0,10512154
eqiad.kaios_app.error,5,7676
eqiad.mediawiki.client.error,2,57861166
eqiad.w3c.reportingapi.network_error,4,1218110162
eqiad.mediawiki.client.error,5,57791887
eqiad.kaios_app.error,0,7743
eqiad.eventgate-logging-external.test.event,4,10557441
eqiad.eventgate-logging-external.test.event,1,10563690
eqiad.eventgate-logging-external.error.validation,1,1883
eqiad.eventgate-logging-external.error.validation,5,1873
eqiad.eventgate-logging-external.error.validation,2,1864
eqiad.eventgate-logging-external.test.event,3,10512329
eqiad.kaios_app.error,3,7895
eqiad.mediawiki.client.error,3,57602083
eqiad.w3c.reportingapi.network_error,1,1218666300
eqiad.mediawiki.client.error,4,57771210
eqiad.mediawiki.client.error,1,57751009
eqiad.kaios_app.error,4,7574
eqiad.mediawiki.client.error,0,57621501
eqiad.eventgate-logging-external.test.event,5,10561927
eqiad.w3c.reportingapi.network_error,5,1218328096
eqiad.eventgate-logging-external.error.validation,3,1902
eqiad.kaios_app.error,1,7885
eqiad.eventgate-logging-external.test.event,0,10512154
eqiad.kaios_app.error,5,7676
eqiad.mediawiki.client.error,2,57861166
eqiad.w3c.reportingapi.network_error,4,1218110162
eqiad.mediawiki.client.error,5,57791887
eqiad.kaios_app.error,0,7743
eqiad.eventgate-logging-external.test.event,4,10557441
eqiad.eventgate-logging-external.test.event,1,10563690
eqiad.eventgate-logging-external.error.validation,1,1883
eqiad.eventgate-logging-external.error.validation,5,1873
eqiad.eventgate-logging-external.test.event,2,10585887
eqiad.w3c.reportingapi.network_error,0,1213970811
eqiad.eventgate-logging-external.error.validation,4,1887
eqiad.eventgate-logging-external.error.validation,0,1882

We can see that new messages are coming into kafka-jumbo-eqiad just fine

brouberol@kafka-jumbo1017:~$ kafkacat -C -b $KAFKA_BOOTSTRAP_SERVERS -t eqiad.w3c.reportingapi.network_error -c 1 -o end | jq .meta.request_id
"fd6c8b66-795c-4a61-b357-027d4ff5f311"
brouberol@kafka-jumbo1017:~$ kafkacat -C -b $KAFKA_BOOTSTRAP_SERVERS -t eqiad.w3c.reportingapi.network_error -c 1 -o end | jq .meta.request_id
"c3b3868d-7172-42c1-9df3-e8656beddbf9"

Screenshot 2026-03-05 at 09.45.54.png (1,399×833 px, 104 KB)

We are now ready to seamlessly move other kafka mirrormaker instances to Kubernetes. The question now becomes:

  • can we move these existing mirrormaker instances from dse-k8s-eqiad/codfw to aux-k8s-eqiad/codfw? This should be seamless as the consumer group id would be the same
  • can we deploy each consumer group in a multi-dc fashion? Again, my understanding is that this would create an 8-member consumer group (or we could even reduce the number of consumers from 4 to 2 in each deployment) spanning 2 DCs and should be just fine ™

This is the next mirrormaker instance in line for being migrated to k8s:

brouberol@kafka-jumbo1017:~$ kafka consumer-groups --describe --state --group kafka-mirror-jumbo-eqiad_to_test-eqiad
COORDINATOR (ID)                        ASSIGNMENT-STRATEGY       STATE                #MEMBERS
kafka-jumbo1016.eqiad.wmnet:9092 (1016) roundrobin                Stable               5

Having 5 members is probably overkill, so we could assign a single stream per dc:

brouberol@kafka-jumbo1017:~$ kafka consumer-groups --describe --group kafka-mirror-jumbo-eqiad_to_test-eqiad --members
CONSUMER-ID                                                                   HOST            CLIENT-ID                                #PARTITIONS
kafka-mirror-jumbo-eqiad_to_test-eqiad-0-b8be1d49-fba0-41e7-8132-7e14553a4e76 /10.64.16.164   kafka-mirror-jumbo-eqiad_to_test-eqiad-0 0
kafka-mirror-jumbo-eqiad_to_test-eqiad-0-d21dfada-7d89-413a-a83b-4c496c7a8eb1 /10.64.16.146   kafka-mirror-jumbo-eqiad_to_test-eqiad-0 0
kafka-mirror-jumbo-eqiad_to_test-eqiad-0-20d4af58-9c31-4b37-8909-8efa599ec857 /10.64.16.158   kafka-mirror-jumbo-eqiad_to_test-eqiad-0 1
kafka-mirror-jumbo-eqiad_to_test-eqiad-0-92f662b3-948a-4dd8-9484-797b8031abae /10.64.16.165   kafka-mirror-jumbo-eqiad_to_test-eqiad-0 0
kafka-mirror-jumbo-eqiad_to_test-eqiad-0-006426fc-8fbe-49f3-a59f-3f1ea0221b45 /10.64.16.163   kafka-mirror-jumbo-eqiad_to_test-eqiad-0 1

Change #1248401 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] deployment_server: provision the kafka-mirrormaker kubeconfigs in the aux clusters

https://gerrit.wikimedia.org/r/1248401

Change #1248404 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] aux-k8s: define the kafka-mirrormaker namespace

https://gerrit.wikimedia.org/r/1248404

Change #1248405 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] aux-k8s: define the kafka-mirrormaker-jumbo-eqiad-to-test-eqiad releases

https://gerrit.wikimedia.org/r/1248405

Change #1248401 merged by Brouberol:

[operations/puppet@production] deployment_server: add the kafka-mirrormaker kubeconfigs in the aux clusters

https://gerrit.wikimedia.org/r/1248401

Change #1248404 merged by Brouberol:

[operations/deployment-charts@master] aux-k8s: define the kafka-mirrormaker namespace

https://gerrit.wikimedia.org/r/1248404

Change #1248405 merged by Brouberol:

[operations/deployment-charts@master] aux-k8s: define the kafka-mirrormaker-jumbo-eqiad-to-test-eqiad releases

https://gerrit.wikimedia.org/r/1248405

Change #1249215 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] kafka-test: disable the mirrormaker instance

https://gerrit.wikimedia.org/r/1249215

Change #1249215 merged by Brouberol:

[operations/puppet@production] kafka-test: disable the mirrormaker instance

https://gerrit.wikimedia.org/r/1249215

Change #1249235 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] Reduce the allotted memory to the mm container from 8GB to 3GB

https://gerrit.wikimedia.org/r/1249235

Change #1249235 merged by Brouberol:

[operations/deployment-charts@master] Reduce the allotted memory to the mm container from 8GB to 3GB

https://gerrit.wikimedia.org/r/1249235

Change #1249240 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] kafka: allow ingress traffic to test/jumbo clusters from the aux k8s clusters

https://gerrit.wikimedia.org/r/1249240

Change #1249240 merged by Brouberol:

[operations/puppet@production] kafka: allow ingress traffic to test/jumbo clusters from the aux k8s clusters

https://gerrit.wikimedia.org/r/1249240

We successfully migrated the jumbo-eqiad_to_test-eqaid MM instance to both the aux k8s clusters!

Before doing anything else

jumbo1017.eqiad.wmnet:9092,kafka-jumbo1018.eqiad.wmnet:9092 --describe --group kafka-mirror-jumbo-eqiad_to_test-eqiad --state

COORDINATOR (ID)                        ASSIGNMENT-STRATEGY       STATE                #MEMBERS
kafka-jumbo1016.eqiad.wmnet:9092 (1016)                           Stable                5
brouberol@kafka-jumbo1017:~$

After having disabled Mirrormaker on kafka-test:

brouberol@kafka-jumbo1017:~$ kafka consumer-groups --describe --group kafka-mirror-jumbo-eqiad_to_test-eqiad --state
Consumer group 'kafka-mirror-jumbo-eqiad_to_test-eqiad' has no active members.

COORDINATOR (ID)                        ASSIGNMENT-STRATEGY       STATE                #MEMBERS
kafka-jumbo1016.eqiad.wmnet:9092 (1016)                           Empty                0

After having deployed kafka-mirrormaker-jumbo-eqiad-to-test-eqiad on aux-k8s-eqiad:

brouberol@kafka-jumbo1017:~$ kafka consumer-groups --describe --group kafka-mirror-jumbo-eqiad_to_test-eqiad --state
COORDINATOR (ID)                        ASSIGNMENT-STRATEGY       STATE                #MEMBERS
kafka-jumbo1016.eqiad.wmnet:9092 (1016) roundrobin                Stable               1

After having deployed kafka-mirrormaker-jumbo-eqiad-to-test-eqiad on aux-k8s-codfw:

brouberol@kafka-jumbo1017:~$ kafka consumer-groups --describe --group kafka-mirror-jumbo-eqiad_to_test-eqiad --state
COORDINATOR (ID)                        ASSIGNMENT-STRATEGY       STATE                #MEMBERS
kafka-jumbo1016.eqiad.wmnet:9092 (1016) roundrobin                Stable               2

Taking the remarks from the latest kafka SIG meeting, and especially @Ottomata's

Iirc, is better to run mirromaker in the target DC, not the source DC. Consumer state can more easily recover from cross dc issues. Producers have to buffer. So it is better to consume cross DC than to produce cross DC

that leaves us with this layout:

Screenshot 2026-03-11 at 10.09.32.png (1,174×1,100 px, 78 KB)

Actually, as @JAllemandou pointed out to me, the previous graph lacked the multi-DC consumption of kafka-main. This new one should work better: {F73119238}

flowchart TD
  subgraph eqiad
    kafka-main-eqiad
    kafka-test-eqiad
    kafka-logging-eqiad
    kafka-jumbo-eqiad
    aux-k8s-eqiad
  end

  subgraph codfw
    kafka-main-codfw
    kafka-logging-codfw
    aux-k8s-codfw
  end

  kafka-main-codfw ----> aux-k8s-eqiad ----> kafka-main-eqiad
  kafka-main-codfw ----> aux-k8s-codfw ----> kafka-main-eqiad

  kafka-main-eqiad ----> aux-k8s-eqiad ----> kafka-main-codfw
  kafka-main-eqiad ----> aux-k8s-codfw ----> kafka-main-codfw

  kafka-logging-eqiad ----> aux-k8s-eqiad ----> kafka-jumbo-eqiad
  kafka-logging-codfw ----> aux-k8s-eqiad ----> kafka-jumbo-eqiad

  kafka-jumbo-eqiad ----> aux-k8s-eqiad ----> kafka-test-eqiad
  kafka-main-eqiad  ----> aux-k8s-eqiad ----> kafka-jumbo-eqiad

  linkStyle 0,1,2,3 stroke:#ff0000
  linkStyle 4,5,6,7 stroke:#0000ff
  linkStyle 8,9     stroke:#00ff00
  linkStyle 10,11   stroke:#00ffff
  linkStyle 12,13   stroke:#eedd33
  linkStyle 14,15   stroke:#a444cc

Change #1250586 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] kafka-mirrormaker: only deploy jumbo-eqiad->test-eqiad to eqiad

https://gerrit.wikimedia.org/r/1250586

Change #1250587 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] kafka-mirrormaker: migrate logging-{eqiad,codfw}->jumbo-eqiad to aux-eqiad

https://gerrit.wikimedia.org/r/1250587

Change #1250586 merged by jenkins-bot:

[operations/deployment-charts@master] kafka-mirrormaker: only deploy jumbo-eqiad->test-eqiad to eqiad

https://gerrit.wikimedia.org/r/1250586

Change #1250587 merged by jenkins-bot:

[operations/deployment-charts@master] kafka-mirrormaker: migrate logging-{eqiad,codfw}->jumbo-eqiad to aux-eqiad

https://gerrit.wikimedia.org/r/1250587

Change #1250631 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] kafka-mirrormaker: allow multiple releases to be installed in the same namespace

https://gerrit.wikimedia.org/r/1250631

Sweeeet!

Just FYI for awareness: T276972: Set up cross DC topic mirroring for Kafka logging clusters. (kafka logging mirror maker cross dc is different than other cross dc stuff).
This is not actionable or relevant to this current effort, just didn't want to lose that context.

Change #1250631 merged by Brouberol:

[operations/deployment-charts@master] kafka-mirrormaker: allow multiple releases to be installed in the same namespace

https://gerrit.wikimedia.org/r/1250631

Change #1255620 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] aux-k8s/kafka-mirrormaker: add missing releases

https://gerrit.wikimedia.org/r/1255620

Change #1255620 merged by Brouberol:

[operations/deployment-charts@master] aux-k8s/kafka-mirrormaker: add missing releases

https://gerrit.wikimedia.org/r/1255620

The kafka logging MM instances are now all running in aux-k8s-eqiad/kafka-mirrormaker:

brouberol@deploy2002:~$ kube-env kafka-mirrormaker aux-k8s-eqiad
brouberol@deploy2002:~$ kubectl get pod
NAME                                                              READY   STATUS    RESTARTS   AGE
kafka-mirrormaker-jumbo-eqiad-to-test-eqiad-5c5c7958bf-wzsnm      1/1     Running   0          5d23h
kafka-mirrormaker-logging-codfw-to-jumbo-eqiad-6b6558d7c8-5728c   1/1     Running   0          37m
kafka-mirrormaker-logging-eqiad-to-jumbo-eqiad-86f6ff6b54-6ns8d   1/1     Running   0          39m
brouberol@kafka-logging1005:~$ kafka consumer-groups --group kafka-mirror-logging-eqiad_to_jumbo-eqiad --describe --state
COORDINATOR (ID)                          ASSIGNMENT-STRATEGY       STATE                #MEMBERS
kafka-logging1004.eqiad.wmnet:9092 (1004) roundrobin                Stable               4
brouberol@kafka-logging1005:~$ kafka consumer-groups --group kafka-mirror-logging-eqiad_to_jumbo-eqiad --describe --members
CONSUMER-ID                                                                      HOST            CLIENT-ID                                   #PARTITIONS
kafka-mirror-logging-eqiad_to_jumbo-eqiad-0-ae42d9e6-da42-4b0b-a808-cae40faff70f /10.67.80.174   kafka-mirror-logging-eqiad_to_jumbo-eqiad-0 12
kafka-mirror-logging-eqiad_to_jumbo-eqiad-2-2c8f564f-9647-4f7b-8369-45902d2b07ef /10.67.80.174   kafka-mirror-logging-eqiad_to_jumbo-eqiad-2 12
kafka-mirror-logging-eqiad_to_jumbo-eqiad-3-b0ceb0f1-510d-44e1-87cf-879d3a187325 /10.67.80.174   kafka-mirror-logging-eqiad_to_jumbo-eqiad-3 12
kafka-mirror-logging-eqiad_to_jumbo-eqiad-1-6ee9cbe0-9746-40d0-b88d-50dfef56396a /10.67.80.174   kafka-mirror-logging-eqiad_to_jumbo-eqiad-1 12

Change #1255656 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] kafka-main-codfw: disable mirroring to kafka-main-eqiad

https://gerrit.wikimedia.org/r/1255656

Change #1255657 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] kafka-main-eqiad: disable mirroring to kafka-main-codfw

https://gerrit.wikimedia.org/r/1255657

Change #1255658 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] kafka-jumbo-eqiad: disable mirroring from kafka-main-eqiad

https://gerrit.wikimedia.org/r/1255658

Change #1255659 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] aux-k8s/kafka-mirrormaker: add main-eqiad-to-main-codfw

https://gerrit.wikimedia.org/r/1255659

Change #1255660 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] aux-k8s/kafka-mirrormaker: add main-codfw-to-main-eqiad

https://gerrit.wikimedia.org/r/1255660

Change #1255661 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] aux-k8s/kafka-mirrormaker: add main-eqad-to-jumbo-eqiad

https://gerrit.wikimedia.org/r/1255661

Change #1255662 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] aux-k8s/kafka-mirrormaker: cleanup helmfile of duplicated namespace definitions

https://gerrit.wikimedia.org/r/1255662

Everything is ready to deploy, but I'm going to put a pin in this, because the MM instances currently running in kubernetes do not export prometheus metrics, meaning that we don't have any visibility into how well they are running. Before migrating, we need to:

  • install prometheus-jmx-exporter in the OCI image
  • configure the metrics and the prometheus collection in the chart
  • redeploy everything we already have deployed
  • setup alerts per release & dc

Change #1255755 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] kafka-mirrormaker: enable JMX metrics collection

https://gerrit.wikimedia.org/r/1255755

Change #1255767 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] kafka-mirrormaker: update base image to include prometheus-jmx-exporter

https://gerrit.wikimedia.org/r/1255767

Change #1255755 merged by jenkins-bot:

[operations/deployment-charts@master] kafka-mirrormaker: enable JMX metrics collection

https://gerrit.wikimedia.org/r/1255755

Change #1255767 merged by jenkins-bot:

[operations/deployment-charts@master] kafka-mirrormaker: update base image to include prometheus-jmx-exporter

https://gerrit.wikimedia.org/r/1255767

Screenshot 2026-03-19 at 17.49.57.png (1,482×887 px, 121 KB)
We're now collecting metrics exported by each MirrorMaker pod!

Change #1255792 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] kafka-mirrormaker: ensure the right prometheus annotations are set on the pod

https://gerrit.wikimedia.org/r/1255792

Change #1255792 merged by Brouberol:

[operations/deployment-charts@master] kafka-mirrormaker: ensure the right prometheus annotations are set on the pod

https://gerrit.wikimedia.org/r/1255792

Change #1255799 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] kafka-mirrormaker: add the mirror_name pod label

https://gerrit.wikimedia.org/r/1255799

Change #1255799 merged by Brouberol:

[operations/deployment-charts@master] kafka-mirrormaker: add the mirror_name pod label

https://gerrit.wikimedia.org/r/1255799

I'll deploy all remaining Mirrormaker instances to Kubernetes on monday. Prometheus metrics are now collected from the kubernetes deployments in a way that is compatible with our existing alerts.

brouberol changed the task status from Open to In Progress.Mar 21 2026, 5:41 PM
brouberol moved this task from 2026-03-06 - 2026-03-27 to Needs Review on the Data-Platform-SRE board.
brouberol moved this task to To be Deployed on the Data-Platform-SRE board.
brouberol raised the priority of this task from Medium to High.Mar 21 2026, 5:47 PM

Change #1255657 merged by Brouberol:

[operations/puppet@production] kafka-main-eqiad: disable mirroring to kafka-main-codfw

https://gerrit.wikimedia.org/r/1255657

Change #1255659 merged by Brouberol:

[operations/deployment-charts@master] aux-k8s/kafka-mirrormaker: add main-eqiad-to-main-codfw

https://gerrit.wikimedia.org/r/1255659

Change #1255656 merged by Brouberol:

[operations/puppet@production] kafka-main-codfw: disable mirroring to kafka-main-eqiad

https://gerrit.wikimedia.org/r/1255656

Change #1255660 merged by Brouberol:

[operations/deployment-charts@master] aux-k8s/kafka-mirrormaker: add main-codfw-to-main-eqiad

https://gerrit.wikimedia.org/r/1255660

Change #1258688 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] aux-k8s/kafka-mirrormaker: fix values by not overriding the app config

https://gerrit.wikimedia.org/r/1258688

Change #1258688 merged by jenkins-bot:

[operations/deployment-charts@master] aux-k8s/kafka-mirrormaker: fix values by not overriding the app config

https://gerrit.wikimedia.org/r/1258688

Change #1255661 merged by jenkins-bot:

[operations/deployment-charts@master] aux-k8s/kafka-mirrormaker: add main-eqad-to-jumbo-eqiad

https://gerrit.wikimedia.org/r/1255661

Change #1255662 merged by jenkins-bot:

[operations/deployment-charts@master] aux-k8s/kafka-mirrormaker: cleanup helmfile of duplicated namespace definitions

https://gerrit.wikimedia.org/r/1255662

I attempted to deploy the kafka-mirror-main-codfw_to_main-eqiad instance this morning, but got blocked by the fact that the aux clusters are very low on resource. I happened to realize that we have 4 pending hosts in each DC, with each host having 48CPU and 128GB of RAM (cf T393053 and T393054). @elukey is working on reimaging them to trixie so we can start to add them into the cluster. After which, we should be finally ready to deploy MM to k8s.

Change #1255658 merged by Brouberol:

[operations/puppet@production] kafka-jumbo-eqiad: disable mirroring from kafka-main-eqiad

https://gerrit.wikimedia.org/r/1255658

brouberol@deploy2002:/srv/deployment-charts/helmfile.d/aux-k8s-services/kafka-mirrormaker$ kubectl get pod
NAME                                                              READY   STATUS    RESTARTS   AGE
kafka-mirrormaker-jumbo-eqiad-to-test-eqiad-766ccfd75b-6gk2t      1/1     Running   0          5d17h
kafka-mirrormaker-logging-codfw-to-jumbo-eqiad-585b57d898-96ddm   1/1     Running   0          5d1h
kafka-mirrormaker-logging-eqiad-to-jumbo-eqiad-76b47659fd-p4gkx   1/1     Running   0          5d1h
kafka-mirrormaker-main-codfw-to-main-eqiad-899b8f95f-dj2ml        1/1     Running   0          52m
kafka-mirrormaker-main-codfw-to-main-eqiad-899b8f95f-lzpn2        1/1     Running   0          52m
kafka-mirrormaker-main-codfw-to-main-eqiad-899b8f95f-mx5hq        1/1     Running   0          52m
kafka-mirrormaker-main-codfw-to-main-eqiad-899b8f95f-qnnwk        1/1     Running   0          52m
kafka-mirrormaker-main-codfw-to-main-eqiad-899b8f95f-sf5tn        1/1     Running   0          52m
kafka-mirrormaker-main-eqiad-to-jumbo-eqiad-845b4db85b-4tt9g      1/1     Running   0          39m
kafka-mirrormaker-main-eqiad-to-jumbo-eqiad-845b4db85b-8qg72      1/1     Running   0          39m
kafka-mirrormaker-main-eqiad-to-jumbo-eqiad-845b4db85b-9trnl      1/1     Running   0          39m
kafka-mirrormaker-main-eqiad-to-jumbo-eqiad-845b4db85b-sz8mp      1/1     Running   0          39m
kafka-mirrormaker-main-eqiad-to-jumbo-eqiad-845b4db85b-txc9v      1/1     Running   0          39m
kafka-mirrormaker-main-eqiad-to-main-codfw-68c4bf8d76-g9v2j       1/1     Running   0          45m
kafka-mirrormaker-main-eqiad-to-main-codfw-68c4bf8d76-hw62l       1/1     Running   0          45m
kafka-mirrormaker-main-eqiad-to-main-codfw-68c4bf8d76-jt85b       1/1     Running   0          45m
kafka-mirrormaker-main-eqiad-to-main-codfw-68c4bf8d76-k6lgz       1/1     Running   0          45m
kafka-mirrormaker-main-eqiad-to-main-codfw-68c4bf8d76-rnvqv       1/1     Running   0          45m
brouberol@deploy2002:/srv/deployment-charts/helmfile.d/aux-k8s-services/kafka-mirrormaker$ kube-env kafka-mirrormaker aux-k8s-codfw
brouberol@deploy2002:/srv/deployment-charts/helmfile.d/aux-k8s-services/kafka-mirrormaker$ kubectl get pod
NAME                                                          READY   STATUS    RESTARTS   AGE
kafka-mirrormaker-main-codfw-to-main-eqiad-899b8f95f-k85d2    1/1     Running   0          51m
kafka-mirrormaker-main-codfw-to-main-eqiad-899b8f95f-nlhr4    1/1     Running   0          51m
kafka-mirrormaker-main-codfw-to-main-eqiad-899b8f95f-qc9q2    1/1     Running   0          51m
kafka-mirrormaker-main-codfw-to-main-eqiad-899b8f95f-r7l8q    1/1     Running   0          51m
kafka-mirrormaker-main-codfw-to-main-eqiad-899b8f95f-x24fk    1/1     Running   0          51m
kafka-mirrormaker-main-eqiad-to-main-codfw-68c4bf8d76-447ln   1/1     Running   0          45m
kafka-mirrormaker-main-eqiad-to-main-codfw-68c4bf8d76-gdwkl   1/1     Running   0          45m
kafka-mirrormaker-main-eqiad-to-main-codfw-68c4bf8d76-t84wk   1/1     Running   0          45m
kafka-mirrormaker-main-eqiad-to-main-codfw-68c4bf8d76-tp28b   1/1     Running   0          45m
kafka-mirrormaker-main-eqiad-to-main-codfw-68c4bf8d76-zslrc   1/1     Running   0          45m

The migration is done!

Change #1261971 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/cookbooks@master] Remove kafka.roll-restart-mirror-maker cookbook

https://gerrit.wikimedia.org/r/1261971

Change #1261971 merged by Brouberol:

[operations/cookbooks@master] Remove kafka.roll-restart-mirror-maker cookbook

https://gerrit.wikimedia.org/r/1261971