Page MenuHomePhabricator

Design the Jobs service in k8s
Closed, ResolvedPublic

Description

We've found that specific settings and possible even alpha features (ttlSecondsAfterFinished, specifically) are needed to make the jobs/cronjobs work well.

We likely also need some tooling (new backend for jsub or a new command)!

Related Objects

StatusSubtypeAssignedTask
Resolved JHedden
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolved Bstorm
Resolvedaborrero
Resolvedaborrero
ResolvedJJMC89
Resolvedaborrero
ResolvedBUG REPORTaborrero
ResolvedBUG REPORTaborrero
ResolvedBUG REPORTaborrero
ResolvedFeatureaborrero
ResolvedBUG REPORTaborrero
ResolvedFeatureaborrero
Resolvedaborrero
Resolvedaborrero
Resolved Bstorm
InvalidBUG REPORTNone
ResolvedFeatureaborrero
Resolvedtaavi
ResolvedBUG REPORTSo9q
Resolvedaborrero
ResolvedFeatureRaymond_Ndibe
ResolvedFeatureRaymond_Ndibe
DuplicateNone
DuplicateFeatureRaymond_Ndibe
ResolvedBUG REPORTJJMC89
ResolvedBUG REPORTRaymond_Ndibe
ResolvedBUG REPORTaborrero
ResolvedBUG REPORTtaavi
ResolvedFeaturetaavi
ResolvedFeaturetaavi
DuplicateBUG REPORTNone
DuplicateFeatureNone
Resolvedtaavi
ResolvedRaymond_Ndibe
Resolvedaborrero

Event Timeline

Bstorm triaged this task as High priority.
Bstorm created this task.
Bstorm removed a project: Tools.
Bstorm edited subscribers, added: JHedden; removed: AntiCompositeNumber.

The conversation in T251027: "signatures" tool has failed job pods on Kubernetes cluster is a good touchpoint for this ticket.

Probably the most simple note here is that batch objects don't appear to be quota-able directly (just the pods they schedule), which may or may not merit controllers to do just that.

Bstorm added a subscriber: aborrero.

I think you are working on this @aborrero, so assigning to you to prevent duplication. Feel free to delete if you have another one.

Change 681424 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[cloud/toolforge/jobs-framework-api@main] tjf: include deployment configuration for toolsbeta kubernetes

https://gerrit.wikimedia.org/r/681424

Just a note, that will have significant changes with Kubernetes 1.18/1.19.

Change 681725 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] toolforge: nginx-ingress-jobs: specify ingress-class

https://gerrit.wikimedia.org/r/681725

Change 692633 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[cloud/toolforge/jobs-framework-api@main] jobs-framework-api: introduce initial docker image

https://gerrit.wikimedia.org/r/692633

Change 692633 merged by Arturo Borrero Gonzalez:

[cloud/toolforge/jobs-framework-api@main] jobs-framework-api: introduce initial docker image

https://gerrit.wikimedia.org/r/692633

Change 681424 merged by Arturo Borrero Gonzalez:

[cloud/toolforge/jobs-framework-api@main] jobs-framework-api: include deployment configuration for toolsbeta kubernetes

https://gerrit.wikimedia.org/r/681424

Closing this task as the design phase is mostly completed. Now moving on to the devel phase.