Page MenuHomePhabricator

--timeout flag for mwscript-k8s
Closed, ResolvedPublic

Description

On wikitech-l, @Dreamy_Jazz points out that sometimes maintenance scripts are run under the timeout command, in order to interrupt them after a set interval. With mwscript-k8s this technique doesn't work, since the maintenance script continues to run after mwscript-k8s terminates.

@CDanis suggests a new command-line flag for mwscript-k8s, which we pipe through the Helm chart to .spec.activeDeadlineSeconds in the Job configuration. (Of course the default will remain to leave activeDeadlineSeconds unset, so scripts run to completion.)

If the timeout is reached, Kubernetes will terminate the job, in state Failed. (That strikes me as correct, since the only option other than Failed is Complete. In this case the job could be called a "successful failure" in that it terminated on schedule, but never completed.) Script owners might have to inspect the job to differentiate between a failure due to timeout and a failure due to some unexpected error, but the rest of the mwscript-k8s apparatus, including cleanup, will work normally.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change #1078720 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/deployment-charts@master] mediawiki: Allow setting mwscript job activeDeadlineSeconds

https://gerrit.wikimedia.org/r/1078720

Change #1078721 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/puppet@production] deployment_server: Add --timeout flag to mwscript-k8s

https://gerrit.wikimedia.org/r/1078721

Change #1078720 merged by jenkins-bot:

[operations/deployment-charts@master] mediawiki: Allow setting mwscript job activeDeadlineSeconds

https://gerrit.wikimedia.org/r/1078720

Change #1078721 merged by RLazarus:

[operations/puppet@production] deployment_server: Add --timeout flag to mwscript-k8s

https://gerrit.wikimedia.org/r/1078721

This is now supported!

--timeout TIMEOUT     Set a deadline for the job, to interrupt it after a set interval. Examples: 1d, 2h, 30m, 40s, 40 -- number without unit is in seconds. (Default: No deadline)