Page MenuHomePhabricator

Unable to load Toolforge job: ERROR: TjfCliError: Unknown error (403 Client Error: Forbidden for url
Closed, ResolvedPublicBUG REPORT

Description

tools.multichill@tools-bastion-12:~$ toolforge jobs load jobs.yml --job coord-from-exif-mysql
INFO: loading job 'coord-from-exif-mysql'...
ERROR: TjfCliError: Unknown error (403 Client Error: Forbidden for url: https://k8s.tools.eqiad1.wikimedia.cloud:6443/apis/batch/v1/namespaces/tool-multichill/cronjobs?dryRun=All)
ERROR: Please report this issue to the Toolforge admins if it persists: https://w.wiki/6Zuu

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Mentioned in SAL (#wikimedia-cloud) [2025-07-13T21:20:44Z] <wmbot~multichill@tools-bastion-12> Unable to add jobs, created T399417

I don't see this on my tools, so might be related to the specific situation of this tool, looking

dcaro triaged this task as High priority.Jul 14 2025, 9:27 AM
dcaro moved this task from Backlog to Ready to be worked on on the Toolforge board.

I think that the issue is the quota:

tools.multichill@tools-bastion-13:~$ toolforge jobs quota
Running jobs                                  Used    Limit
--------------------------------------------  ------  -------
Total running jobs at once (Kubernetes pods)  2       16
Running one-off and cron jobs                 1       15
CPU                                           1.0     8.0
Memory                                        1.5Gi   16.0Gi

Per-job limits    Used    Limit
----------------  ------  -------
CPU                       3.0
Memory                    6.0Gi

Job definitions                             Used    Limit
----------------------------------------  ------  -------
Cron jobs                                     50       50
Continuous jobs (including web services)       1       16

To check if a job is the same as an existing one when loading the file, we dryRun create it to get the equivalent k8s object, and k8s seems to error as there's no quota (probably correctly)

This might not be an issue soon-ish, but let me extend your quota a bit to avoid you being blocked

@Multichill can you try now? I extended your job quota a bit, you should try to keep it not full until https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/182 is merged

dcaro changed the task status from Open to In Progress.Jul 22 2025, 12:51 PM
dcaro moved this task from Next Up to In Progress on the Toolforge (Toolforge iteration 22) board.

Thanks! Looking good:

tools.multichill@tools-bastion-12:~$ toolforge jobs load jobs.yml --job coord-from-exif-mysql
INFO: loading job 'coord-from-exif-mysql'...
Job coord-from-exif-mysql created
INFO: 51 job(s) loaded successfully

Still haven't recovered all grid jobs and also adding new ones (like this one). Will probably reach at least 80 cronjobs. Is it possible to bump that limit a bit more?

Mentioned in SAL (#wikimedia-cloud) [2025-07-28T09:35:55Z] <wmbot~multichill@tools-bastion-12> Started the coord-from-exif-mysql job (T399417)

Did just get this error, but second try gave normal output. Might be related?

tools.multichill@tools-bastion-12:~$ toolforge jobs list
ERROR: TjfCliError: Unknown error (404 Client Error: Not Found for url: https://k8s.tools.eqiad1.wikimedia.cloud:6443/apis/ba
tch/v1/namespaces/tool-multichill/jobs/sdoc-cc-by-sa-4.0-29228247)

@Multichill yep, quite likely, can you open a new task for the quota bump? It helps us keep the record straight

dcaro moved this task from In Progress to Done on the Toolforge (Toolforge iteration 22) board.

Closing this as https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/182 has been deployed, you should not find the issue updating an existing job using the toolforge jobs load myjobs.yaml :)

Please create the quota bump if you still need extra quota for new jobs.