Page MenuHomePhabricator

[jobs-api,jobs-cli] Support multiple replicas of continuous jobs
Closed, ResolvedPublicFeature

Description

Feature summary (what you would like to be able to do and where):
Allow configuring a continuous job to have multiple replicas.

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):
Sometimes there's a need to run multiple instances of the exact same thing, for example for multiple runner processes.

Benefits (why should this be implemented?):
Reduces duplicate work.

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
d/changelog: bump to 16.1.3repos/cloud/toolforge/jobs-cli!69raymond-ndibebump_versionmain
d/changelog: bump to 16.1.2repos/cloud/toolforge/jobs-cli!68raymond-ndibebump_versionmain
jobs-api: bump to 0.0.336-20240919135748-c8ffa589repos/cloud/toolforge/toolforge-deploy!523ghostbump_jobs-apimain
[toolforge-deploy] test multi-replica support for continuous jobsrepos/cloud/toolforge/toolforge-deploy!521raymond-ndibetest_multi_replica_support_for_continuous_jobmain
[jobs-cli] remove unknown keys from dumprepos/cloud/toolforge/jobs-cli!64raymond-ndiberemove_unknown_keys_in_dumpmain
[jobs-cli] multi-replica support for continuous jobsrepos/cloud/toolforge/jobs-cli!63raymond-ndibemulti-replica-cont-jobmain
[jobs-api] multi-replica support for continuous jobsrepos/cloud/toolforge/jobs-api!115raymond-ndibemulti-replica-cont-jobmain
Draft: [maintain-kubeusers] increment default quota for pods, cpu, memrepos/cloud/toolforge/maintain-kubeusers!58raymond-ndibeincrement_default_quota_pod_cpu_memmain
Customize query in GitLab

Related Objects

Event Timeline

I committed https://github.com/toollabs/Rotatebot/commit/8938d15165acc2c8cd4689da48a61dbeb84b1d80 on the assumption that this won’t happen. Others may have done the same, so please make it opt-in if it happens.

I committed https://github.com/toollabs/Rotatebot/commit/8938d15165acc2c8cd4689da48a61dbeb84b1d80 on the assumption that this won’t happen. Others may have done the same, so please make it opt-in if it happens.

This task is only for continuous jobs, scheduled jobs will continue having the same parallelism limitations (just 1 essentially), so don't be worried :)

dcaro renamed this task from Support multiple replicas of continuous jobs to [jobs-api,jobs-cli] Support multiple replicas of continuous jobs.Mar 11 2024, 12:02 PM
dcaro triaged this task as Medium priority.
dcaro raised the priority of this task from Medium to High.
dcaro edited projects, added Toolforge; removed Toolforge Jobs framework.
dcaro moved this task from Backlog to Ready to be worked on on the Toolforge board.

how will this affect the current 3 continuous jobs limit? does 2 replicas of a continuous job count as 1 or 2 when considering limits?

how will this affect the current 3 continuous jobs limit? does 2 replicas of a continuous job count as 1 or 2 when considering limits?

I would consider it as one, that uses 2x the amount of resources.

how will this affect the current 3 continuous jobs limit? does 2 replicas of a continuous job count as 1 or 2 when considering limits?

I would consider it as one, that uses 2x the amount of resources.

mmm, yeaaa makes sense. On the kubernetes level this will just be a deployment with 2 or more pods I guess

Raymond_Ndibe changed the task status from Open to In Progress.Aug 12 2024, 5:26 PM

project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/523

jobs-api: bump to 0.0.336-20240919135748-c8ffa589

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T14:05:42Z] <wmbot~raymondndibe@wmf3402> START - Cookbook wmcs.toolforge.component.deploy for component jobs-api (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T14:10:00Z] <wmbot~raymondndibe@wmf3402> END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T14:42:50Z] <wmbot~raymondndibe@wmf3402> START - Cookbook wmcs.toolforge.component.deploy for component jobs-api (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T14:47:36Z] <wmbot~raymondndibe@wmf3402> END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T14:50:41Z] <wmbot~raymondndibe@wmf3402> START - Cookbook wmcs.toolforge.component.deploy for component jobs-api (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T14:55:59Z] <wmbot~raymondndibe@wmf3402> END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T14:57:08Z] <wmbot~raymondndibe@wmf3402> START - Cookbook wmcs.toolforge.component.deploy for component jobs-api (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T15:01:22Z] <wmbot~raymondndibe@wmf3402> END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T15:07:32Z] <wmbot~raymondndibe@wmf3402> START - Cookbook wmcs.toolforge.component.deploy for component jobs-api (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T15:08:40Z] <wmbot~raymondndibe@wmf3402> END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T15:18:16Z] <wmbot~raymondndibe@wmf3402> START - Cookbook wmcs.toolforge.component.deploy for component jobs-api (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T15:19:34Z] <wmbot~raymondndibe@wmf3402> END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T15:27:11Z] <wmbot~raymondndibe@wmf3402> START - Cookbook wmcs.toolforge.component.deploy for component jobs-api (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T15:28:20Z] <wmbot~raymondndibe@wmf3402> END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T16:26:46Z] <wmbot~raymondndibe@wmf3402> START - Cookbook wmcs.toolforge.component.deploy for component jobs-api (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T16:38:04Z] <wmbot~raymondndibe@wmf3402> START - Cookbook wmcs.toolforge.component.deploy for component jobs-api (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T16:46:34Z] <wmbot~raymondndibe@wmf3402> START - Cookbook wmcs.toolforge.component.deploy for component jobs-api (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T16:48:20Z] <wmbot~raymondndibe@wmf3402> END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T17:06:54Z] <wmbot~raymondndibe@wmf3402> START - Cookbook wmcs.toolforge.component.deploy for component jobs-api (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T17:13:20Z] <wmbot~raymondndibe@wmf3402> END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T17:26:17Z] <wmbot~raymondndibe@wmf3402> START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T17:26:21Z] <wmbot~raymondndibe@wmf3402> END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T17:27:06Z] <wmbot~raymondndibe@wmf3402> START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T17:27:28Z] <wmbot~raymondndibe@wmf3402> END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T17:28:52Z] <raymond-ndibe@cloudcumin1001> START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T17:34:25Z] <raymond-ndibe@cloudcumin1001> END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T17:35:57Z] <raymond-ndibe@cloudcumin1001> START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T17:41:08Z] <raymond-ndibe@cloudcumin1001> END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T17:41:44Z] <raymond-ndibe@cloudcumin1001> START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-19T17:47:30Z] <raymond-ndibe@cloudcumin1001> END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T19:19:34Z] <raymond-ndibe@cloudcumin1001> START - Cookbook wmcs.toolforge.component.deploy for component calico (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T19:24:49Z] <raymond-ndibe@cloudcumin1001> END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component calico (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T19:26:29Z] <raymond-ndibe@cloudcumin1001> START - Cookbook wmcs.toolforge.component.deploy for component calico (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T19:30:29Z] <raymond-ndibe@cloudcumin1001> END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T19:31:08Z] <raymond-ndibe@cloudcumin1001> START - Cookbook wmcs.toolforge.component.deploy for component calico (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T19:36:23Z] <raymond-ndibe@cloudcumin1001> END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component calico (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T20:06:47Z] <raymond-ndibe@cloudcumin1001> START - Cookbook wmcs.toolforge.component.deploy for component calico (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T20:08:23Z] <raymond-ndibe@cloudcumin1001> END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component calico (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T20:08:29Z] <raymond-ndibe@cloudcumin1001> START - Cookbook wmcs.toolforge.component.deploy for component calico (T341066)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-20T20:12:51Z] <raymond-ndibe@cloudcumin1001> END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico (T341066)