Page MenuHomePhabricator

toolforge-jobs job emails should have information on why events happened
Open, Needs TriagePublicFeature

Description

Example email:

* Job 'redis2irc' (continuous) (emails: all) had 2 events:
  -- Pod 'redis2irc-7bbc8dc544-zn9rv'. Phase: 'running'. Container state: 'running'. Start timestamp 2022-04-17T10:00:27Z. 
  -- Pod 'redis2irc-7bbc8dc544-zn9rv'. Phase: 'running'. Container state: 'waiting'. With reason 'ContainerCreating'. 

* Job 'wikibugs-phab' (continuous) (emails: all) had 7 events:
  -- Pod 'wikibugs-phab-858dbb66fb-248vg'. Phase: 'running'. Container state: 'running'. Start timestamp 2022-04-17T09:52:46Z. 
  -- Pod 'wikibugs-phab-858dbb66fb-248vg'. Phase: 'running'. Container state: 'waiting'. With reason 'ContainerCreating'. 
  -- Pod 'wikibugs-phab-858dbb66fb-4t8lb'. Phase: 'pending'. Container state: 'waiting'. With reason 'ContainerCreating'. 
  -- Pod 'wikibugs-phab-858dbb66fb-4t8lb'. Phase: 'running'. Container state: 'running'. Start timestamp 2022-04-17T10:12:46Z. 
  -- Pod 'wikibugs-phab-858dbb66fb-4t8lb'. Phase: 'running'. Container state: 'waiting'. With reason 'ContainerCreating'. 
  -- Pod 'wikibugs-phab-858dbb66fb-vf9sh'. Phase: 'pending'. Container state: 'waiting'. With reason 'ContainerCreating'. 
  -- Pod 'wikibugs-phab-858dbb66fb-vf9sh'. Phase: 'running'. Container state: 'running'. Start timestamp 2022-04-17T10:15:15Z. 

* Job 'grrrrit' (continuous) (emails: all) had 5 events:
  -- Pod 'grrrrit-654c5cbbf-p4lxs'. Phase: 'pending'. Container state: 'waiting'. With reason 'ContainerCreating'. 
  -- Pod 'grrrrit-654c5cbbf-p4lxs'. Phase: 'running'. Container state: 'running'. Start timestamp 2022-04-17T10:12:46Z. 
  -- Pod 'grrrrit-654c5cbbf-p4lxs'. Phase: 'running'. Container state: 'waiting'. With reason 'ContainerCreating'. 
  -- Pod 'grrrrit-654c5cbbf-ng659'. Phase: 'pending'. Container state: 'waiting'. With reason 'ContainerCreating'. 
  -- Pod 'grrrrit-654c5cbbf-ng659'. Phase: 'running'. Container state: 'running'. Start timestamp 2022-04-17T10:15:16Z.

From the current email, it is difficult to determine _why_ these events happened. Was it a manual start/restart (and if so, who, when, which command?), did they get automatically restarted, etc?

Event Timeline

JJMC89 changed the subtype of this task from "Task" to "Feature Request".Apr 17 2022, 4:31 PM

I don't know for sure that Kubernetes has all of the answers that @valhallasw is hoping for here, but one new thing that we could add to the email is the last N lines of log output from the job. Basically I'm thinking about the equivalent of toolforge jobs logs --last N $JOB. Making N something large enough to be likely to provide some context about what the job last did while not dumping the entirety of the captured logs which could be multiple megabytes is a bit of a guessing game. Starting with N=500 might be ok. N could also be put under user control with some new option to toolforge jobs run ...

This is of increased importance until and unless we get a fix for T353537: [jobs-cli,jobs-api] Allow using file logs with build service images. Right now a tool like Tool-gitlab-account-approval just spews out emails that look like the sample below. These are mostly unactionable by maintainers as the window that logs may still be available to toolforge jobs logs $JOB is quite short with the job running every 3 minutes.

We wanted to notify you about the activity of some jobs in the 'gitlab-account-approval' Toolforge tool.

* Job 'approve' (cronjob) (emails: onfailure) had 1 events:
  -- Pod 'approve-28462938-kktbc'. Phase: 'failed'. Container state: 'terminated'. Start timestamp 2024-02-12T22:18:09Z. Finish timestamp 2024-02-12T22:18:18Z. Exit code was '1'. With reason 'Error'.



If you requested 'filelog' for any of the jobs mentioned above, you may find additional information about what happened in the associated log files. Check them from Toolforge bastions as usual.

You are receiving this email because:
 1) when the job was created, it was requested to send email notfications.
 2) you are listed as tool maintainer for this tool.

Find help and more information in wikitech: https://wikitech.wikimedia.org/

Thanks for your contributions to the Wikimedia movement.