Page MenuHomePhabricator

jobs not getting loaded properly
Open, In Progress, Needs TriagePublic

Description

to reproduce just try running toolforge jobs load jobs.yaml multiple times, where the content of jobs.yaml is something like:

- name: contjob
  image: bookworm
  command: ./contjob.sh
  continuous: true
  no-filelog: true
  replicas: 2
- name: schdjob
  image: bookworm
  command: "date; sleep 30"
  schedule: "* * * * *"
  no-filelog: true
- name: one-off
  image: bookworm
  command: "date; sleep 30"
  no-filelog: true

It's expected to report either that job was created, updated or already up-to-date for each job (because one-off jobs go away after completion, they will be reported as created if the next run was after the previous one completed and went away).

However I'm getting the below messages:

local.tf-test@lima-kilo:~$ toolforge jobs load jobs.yaml
INFO: loading job 'contjob'...
Job contjob created in storage and runtime
INFO: loading job 'schdjob'...
Job schdjob created in storage and runtime
INFO: loading job 'one-off'...
Job one-off created in storage and runtime
INFO: 3 job(s) loaded successfully
...
local.tf-test@lima-kilo:~$ toolforge jobs load jobs.yaml
INFO: loading job 'contjob'...
Job contjob was updated in runtime only
INFO: loading job 'schdjob'...
Job schdjob was updated in runtime only
INFO: loading job 'one-off'...
ERROR: TjfCliError: Unable to find object in storage
ERROR: Please report this issue to the Toolforge admins if it persists: https://w.wiki/6Zuu

Notice that in the second attempt we did not modify the jobs.yaml file, so it should have reported all as up-to-date, but that's not happening.
You can also see that the one-off job load is just straight up erroring out. That should not be happening.

Some Context:

  • We recently introduced a new storage backend for jobs, this is likely one of the remaining kinks.
  • Historically we've had issues with getting loading correctly. Because we sometimes transform the job values provided by the user or change defaults based on some other values, it's a bit tricky to keep track of the exact values the user provided, which is one of the reasons we introduced the storage thing in the first place.

Event Timeline

Raymond_Ndibe changed the task status from Open to In Progress.Apr 16 2026, 2:40 PM
Raymond_Ndibe moved this task from Backlog to In progress on the tools-platform-team board.