Page MenuHomePhabricator

Find a better solution than `concurrencyPolicy: Replace` for sidecars in CronJob
Open, MediumPublic

Description

concurrencyPolicy: Replace is working, but it is kind of ugly. Kubernetes never recognizes jobs as complete and cluster resources are used to keep containers running that are doing no useful work.

In an IRC conversation @Joe mentioned the idea of using the spectacularly named POST /quitquitquit endpoint of the envoy service to signal it to terminate. A POST to localhost:1666/quitquitquit could be added to a runner script that the CronJob executes to implement this. It may also be possible to find a Kubernetes lifecycle signal to attach it to, but that needs further investigation.

Event Timeline

Change 729887 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[wikimedia/toolhub@main] k8s: Add helper script for running crawler

https://gerrit.wikimedia.org/r/729887

bd808 triaged this task as Medium priority.
bd808 moved this task from Backlog to In Progress on the Toolhub board.

Change 729891 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[operations/deployment-charts@master] toolhub: Crawler CronJob concurrencyPolicy back to Forbid

https://gerrit.wikimedia.org/r/729891

Change 729887 merged by jenkins-bot:

[wikimedia/toolhub@main] k8s: Add helper script for running crawler

https://gerrit.wikimedia.org/r/729887

Change 729891 merged by jenkins-bot:

[operations/deployment-charts@master] toolhub: Crawler CronJob concurrencyPolicy back to Forbid

https://gerrit.wikimedia.org/r/729891

Change 730221 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[operations/deployment-charts@master] toolhub: Bump container version to 2021-10-12-152757-production

https://gerrit.wikimedia.org/r/730221

Change 730221 merged by jenkins-bot:

[operations/deployment-charts@master] toolhub: Bump container version to 2021-10-12-152757-production

https://gerrit.wikimedia.org/r/730221

The trick did not work, but I'm not currently sure why. Trying to find any signs in the logs from job.batch/toolhub-main-crawler-1634061600 which was the first job instance triggered with the new wrapper script that tries to shutdown envoy.

Change 730276 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[operations/deployment-charts@master] toolhub: add values.yaml setting for crawler concurency

https://gerrit.wikimedia.org/r/730276

Change 730276 merged by jenkins-bot:

[operations/deployment-charts@master] toolhub: add values.yaml setting for crawler concurency

https://gerrit.wikimedia.org/r/730276

Change 730278 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[operations/deployment-charts@master] toolhub: Set concurrencyPolicy=Replace temporarily

https://gerrit.wikimedia.org/r/730278

Change 730278 merged by jenkins-bot:

[operations/deployment-charts@master] toolhub: Set concurrencyPolicy=Replace temporarily

https://gerrit.wikimedia.org/r/730278

Raymond_Ndibe lowered the priority of this task from Medium to Low.Apr 29 2022, 4:10 PM
Raymond_Ndibe raised the priority of this task from Low to Medium.
bd808 removed bd808 as the assignee of this task.Jul 21 2023, 8:43 PM

unlicking this stale cookie