Page MenuHomePhabricator

Migrate cpjobqueue to kubernetes
Closed, ResolvedPublic

Description

Subtask of the TEC3:O3:O3.1:Q4 Goal to migrate cpjobqueue to use the deployment pipeline

Details

ProjectBranchLines +/-Subject
operations/puppetproduction+0 -156
operations/puppetproduction+0 -2
operations/deployment-chartsmaster+58 -1
operations/deployment-chartsmaster+939 -0
operations/puppetproduction+2 -2
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
mediawiki/services/change-propagation/jobqueue-deploymaster+3 -145
operations/deployment-chartsmaster+6 -6
operations/deployment-chartsmaster+289 -270
operations/deployment-chartsmaster+2 -2
operations/deployment-chartsmaster+6 -6
operations/deployment-chartsmaster+287 -268
operations/deployment-chartsmaster+20 -22
mediawiki/services/change-propagation/jobqueue-deploymaster+4 -16
operations/deployment-chartsmaster+47 -45
operations/deployment-chartsmaster+12 -1
mediawiki/services/change-propagation/jobqueue-deploymaster+1 -2
operations/deployment-chartsmaster+6 -18
operations/deployment-chartsmaster+3 -18
operations/deployment-chartsmaster+21 -5
operations/deployment-chartsmaster+266 -244
operations/deployment-chartsmaster+21 -0
operations/puppetproduction+6 -0
operations/puppetproduction+8 -0
labs/privatemaster+8 -0
operations/deployment-chartsmaster+253 -234
operations/deployment-chartsmaster+1 K -8
operations/deployment-chartsmaster+5 -5
operations/deployment-chartsmaster+249 -241
operations/deployment-chartsmaster+182 -163
operations/deployment-chartsmaster+189 -161
operations/deployment-chartsmaster+176 -156
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Peachey88 updated the task description. (Show Details)Dec 15 2019, 10:48 AM

Change 573521 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] Package changeprop charts and update index

https://gerrit.wikimedia.org/r/573521

Change 573521 merged by jenkins-bot:
[operations/deployment-charts@master] Package changeprop charts and update index

https://gerrit.wikimedia.org/r/573521

Change 575108 had a related patch set uploaded (by Holger Knust; owner: Holger Knust):
[operations/deployment-charts@master] WIP: changeprop/cpjobqueue: Added new config template for cpjobqueue

https://gerrit.wikimedia.org/r/575108

Change 576335 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] changeprop: Package 0.9.5

https://gerrit.wikimedia.org/r/576335

Change 576335 merged by jenkins-bot:
[operations/deployment-charts@master] changeprop: Package 0.9.5

https://gerrit.wikimedia.org/r/576335

Change 576344 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] changeprop: Correctly align the prometheus-statsd.conf call

https://gerrit.wikimedia.org/r/576344

Change 576344 merged by jenkins-bot:
[operations/deployment-charts@master] changeprop: Correctly align the prometheus-statsd.conf call

https://gerrit.wikimedia.org/r/576344

hnowlan claimed this task.May 7 2020, 10:20 AM
hnowlan added a subscriber: holger.knust.

Change 594973 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] changeprop: add cpjobqueue configuration switching

https://gerrit.wikimedia.org/r/594973

Change 595501 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] changeprop: make changeprop settings their own dict

https://gerrit.wikimedia.org/r/595501

Change 595501 merged by jenkins-bot:
[operations/deployment-charts@master] changeprop: make changeprop settings their own dict

https://gerrit.wikimedia.org/r/595501

Change 595548 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] changeprop: Fix ores config location

https://gerrit.wikimedia.org/r/595548

Change 595548 merged by jenkins-bot:
[operations/deployment-charts@master] changeprop: Fix ores config location

https://gerrit.wikimedia.org/r/595548

Change 594973 merged by jenkins-bot:
[operations/deployment-charts@master] changeprop: add cpjobqueue configuration switching

https://gerrit.wikimedia.org/r/594973

Change 595981 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] changeprop: release new version

https://gerrit.wikimedia.org/r/595981

Change 595981 merged by jenkins-bot:
[operations/deployment-charts@master] changeprop: release new version

https://gerrit.wikimedia.org/r/595981

Change 595985 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[labs/private@master] changeprop-jobqueue: add stubs for secrets

https://gerrit.wikimedia.org/r/595985

Change 595985 merged by Hnowlan:
[labs/private@master] changeprop-jobqueue: add stubs for secrets

https://gerrit.wikimedia.org/r/595985

Change 596183 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/puppet@production] role::deployment_server: add changeprop-jobqueue

https://gerrit.wikimedia.org/r/596183

Change 596183 merged by Hnowlan:
[operations/puppet@production] role::deployment_server: add changeprop-jobqueue

https://gerrit.wikimedia.org/r/596183

Change 596236 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/puppet@production] ci::master: changeprop-jobqueue definitions

https://gerrit.wikimedia.org/r/596236

Change 596236 merged by Hnowlan:
[operations/puppet@production] ci::master: changeprop-jobqueue definitions

https://gerrit.wikimedia.org/r/596236

Change 596244 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] namespace: add changeprop-jobqueue

https://gerrit.wikimedia.org/r/596244

Change 596244 merged by jenkins-bot:
[operations/deployment-charts@master] namespace: add changeprop-jobqueue

https://gerrit.wikimedia.org/r/596244

Change 598074 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] changeprop-jobqueue: Set correct port, fix config indentation.

https://gerrit.wikimedia.org/r/598074

Change 598074 merged by jenkins-bot:
[operations/deployment-charts@master] changeprop-jobqueue: Set correct port, fix config indentation.

https://gerrit.wikimedia.org/r/598074

Change 598493 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] changeprop-jobqueue: Correct port used for liveness check.

https://gerrit.wikimedia.org/r/598493

Change 598493 merged by jenkins-bot:
[operations/deployment-charts@master] changeprop-jobqueue: Correct port used for liveness check.

https://gerrit.wikimedia.org/r/598493

Change 598500 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[mediawiki/services/change-propagation/jobqueue-deploy@master] exclude updateBetaFeaturesUserCounts job

https://gerrit.wikimedia.org/r/598500

Change 598802 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] changeprop-jobqueue: change testjob to thumbnailRender

https://gerrit.wikimedia.org/r/598802

Change 598802 merged by jenkins-bot:
[operations/deployment-charts@master] changeprop-jobqueue: change testjob to thumbnailRender

https://gerrit.wikimedia.org/r/598802

Change 599062 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] cpjobqueue: use https to talk to jobrunner and videoscaler

https://gerrit.wikimedia.org/r/599062

Change 599062 merged by jenkins-bot:
[operations/deployment-charts@master] cpjobqueue: use https to talk to jobrunner and videoscaler

https://gerrit.wikimedia.org/r/599062

Change 598500 merged by Hnowlan:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Disable RenderThumbnail for testing

https://gerrit.wikimedia.org/r/598500

Change 599080 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] cpjobqueue: fix service name and metrics name

https://gerrit.wikimedia.org/r/599080

Change 599080 merged by jenkins-bot:
[operations/deployment-charts@master] cpjobqueue: fix service name and metrics name

https://gerrit.wikimedia.org/r/599080

Change 599383 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] changeprop-jobqueue: enable all high-traffic jobs, increase replicas

https://gerrit.wikimedia.org/r/599383

Change 599387 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Disable all high traffic jobs in scb

https://gerrit.wikimedia.org/r/599387

Change 599383 merged by jenkins-bot:
[operations/deployment-charts@master] changeprop-jobqueue: enable all high-traffic jobs, increase replicas

https://gerrit.wikimedia.org/r/599383

Change 599387 merged by Hnowlan:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Disable all high traffic jobs in scb

https://gerrit.wikimedia.org/r/599387

Change 601798 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] changeprop-jobqueue: enable all remaining jobs

https://gerrit.wikimedia.org/r/601798

Change 601798 merged by jenkins-bot:
[operations/deployment-charts@master] changeprop-jobqueue: enable all remaining jobs

https://gerrit.wikimedia.org/r/601798

Change 602094 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] changeprop-jobqueue: fix indentation of partitions

https://gerrit.wikimedia.org/r/602094

Change 602094 merged by jenkins-bot:
[operations/deployment-charts@master] changeprop-jobqueue: fix indentation of partitions

https://gerrit.wikimedia.org/r/602094

Change 602103 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[mediawiki/services/change-propagation/jobqueue-deploy@master] changeprop-jobqueue: disable all jobs in scb

https://gerrit.wikimedia.org/r/602103

Change 602133 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] changeprop-jobqueue: disable partition jobs

https://gerrit.wikimedia.org/r/602133

Change 602133 merged by jenkins-bot:
[operations/deployment-charts@master] changeprop-jobqueue: disable partition jobs

https://gerrit.wikimedia.org/r/602133

Change 602137 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] changeprop-jobqueue: disable low traffic jobs

https://gerrit.wikimedia.org/r/602137

Change 602137 merged by jenkins-bot:
[operations/deployment-charts@master] changeprop-jobqueue: disable low traffic jobs

https://gerrit.wikimedia.org/r/602137

Change 602358 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] changeprop-jobqueue: fix rendering of ignore topics list.

https://gerrit.wikimedia.org/r/602358

Change 602358 merged by jenkins-bot:
[operations/deployment-charts@master] changeprop-jobqueue: fix rendering of ignore topics list.

https://gerrit.wikimedia.org/r/602358

Change 602430 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] changeprop-jobqueue: enable partitioned jobs

https://gerrit.wikimedia.org/r/602430

Change 602430 merged by jenkins-bot:
[operations/deployment-charts@master] changeprop-jobqueue: enable partitioned jobs

https://gerrit.wikimedia.org/r/602430

Change 602103 merged by Hnowlan:
[mediawiki/services/change-propagation/jobqueue-deploy@master] changeprop-jobqueue: disable all jobs in scb

https://gerrit.wikimedia.org/r/602103

Change 603534 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/puppet@production] changeprop: remove changeprop from puppet

https://gerrit.wikimedia.org/r/603534

cpjobqueue is now running fully on Kubernetes. The instance running on scb has no rules enabled. All that remains to be done is to remove the puppet configuration and delete the deploy repo once that's done.

Change 604316 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/puppet@production] service::docker: Change volume parameter type

https://gerrit.wikimedia.org/r/604316

Change 604316 merged by Hnowlan:
[operations/puppet@production] service::docker: Change volume parameter type

https://gerrit.wikimedia.org/r/604316

Change 604354 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/puppet@production] service::docker: correct param name

https://gerrit.wikimedia.org/r/604354

Change 604354 merged by Hnowlan:
[operations/puppet@production] service::docker: correct param name

https://gerrit.wikimedia.org/r/604354

Change 604373 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] docker-service-shim: Fix ERB syntax

https://gerrit.wikimedia.org/r/604373

Change 604373 merged by Hnowlan:
[operations/puppet@production] docker-service-shim: Fix ERB syntax

https://gerrit.wikimedia.org/r/604373

Change 604425 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] changeprop-jobqueue: add beta configuration skeleton

https://gerrit.wikimedia.org/r/604425

Change 575108 abandoned by Ppchelko:
Added new chart for cpjobqueue

https://gerrit.wikimedia.org/r/c/operations/deployment-charts/ /575108

hnowlan closed this task as Resolved.Jul 7 2020, 12:41 PM

Change 604425 merged by jenkins-bot:
[operations/deployment-charts@master] changeprop-jobqueue: add beta configuration skeleton

https://gerrit.wikimedia.org/r/604425

Change 603534 merged by Hnowlan:
[operations/puppet@production] changeprop: remove changeprop from puppet

https://gerrit.wikimedia.org/r/603534

Change 628171 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] kafka: remove cpjobqueue-admin group as well

https://gerrit.wikimedia.org/r/628171

Change 628171 merged by Dzahn:
[operations/puppet@production] kafka: remove cpjobqueue-admin group as well

https://gerrit.wikimedia.org/r/628171

@hnowlan @akosiaris https://gerrit.wikimedia.org/r/603534 deleted 2 admin groups , changeprop-admin and cpjobqueue-admin. But these groups were still applied on kafka-main hosts in Hiera. This broke puppet on all kafka-main hosts as reported by @elukey

I merged 2 follow-ups that removed both groups from the kafka-main hosts and that unbroke puppet but it also actively removed the shell users eevans, ppchelko, mobrovac,... So now only root users can get on kafka-main hosts. I don't know why they were originally added yet. Just making sure that is expected and they actually don't need kafka hosts anymore. Do we need to look closer and ask them?

Cross-posting from gerrit https://gerrit.wikimedia.org/r/c/operations/puppet/+/603534/4#message-a15aeb97a3211f6d70133fdadcc500c02cd6193b

I would like to keep shell access to kafka hosts. Technically, we are still running both change-prop and cpjobqueue, just in k8s, and we still need the shell access for the same reasons as we did before, when services were running on scb.

Dzahn added a comment.Sep 17 2020, 6:50 PM

ACK, that means either @hnowlan's change has to be partially reverted to recreate these groups or we need to make new admin groups for this purpose. Should it just be something like "kafka-users" maybe?

Should it just be something like "kafka-users" maybe?

Sounds good. However, thinking more about it, mobrovac has left the foundation, @Eevans probably doesn't need kafka access. I only use my privileges to run kafkacat on the hosts, so I could probably live without access if there's another hosts in production where kafkacat is installed. I can run my queries form stat1007 for example.

Dzahn added a comment.Sep 17 2020, 6:58 PM

Here is the list of hosts that have kafkacat installed:

https://debmonitor.wikimedia.org/packages/kafkacat

Thank you. I can live with that, I have access to a number of places. Sorry I didn't think of a workaround like that. In case this will not be sufficient, I'll re-request access to kafka hosts.

Dzahn added a comment.Sep 17 2020, 7:00 PM

Ok, sounds good. Not uploading the patch to create a new group then. But if you need it just re-request as you said.