Page MenuHomePhabricator

toolforge-jobs and packbuild images
Closed, ResolvedPublic

Description

Trying to create a cronjob from a packbuild image.

Successfully created the image.
Can run it with
toolforge jobs run --image tool-milhistbot/autocheck:latest --mount=all --command "web $OPTS" autocheck3

Now want to run it as a cron job. Tried:

# autocheck2
- name: autocheck2
  command: web -ff -n=1000
  image: tool-milhistbot/autocheck:latest
  schedule: "45 12 * * *"
  emails: onfailure

(1) Is this correct?
(2) Normally I run the job with mount=all but no way to specify that in a yaml file?

Ross

Event Timeline

  • Job 'autocheck2' (cronjob) (emails: onfailure) had 2 events:
    • Pod 'autocheck2-28678365-b2wv7'. Phase: 'running'. Container state: 'terminated'. Start timestamp 2024-07-11T12:45:38Z. Finish timestamp 2024-07-11T12:45:40Z. Exit code was '134'. With reason 'Error'.
    • Pod 'autocheck2-28678365-b2wv7'. Phase: 'failed'. Container state: 'terminated'. Start timestamp 2024-07-11T12:45:38Z. Finish timestamp 2024-07-11T12:45:40Z. Exit code was '134'. With reason 'Error'.

Looks like it does require mount=all?

Trying locally fails with:

local.tf-test@lima-kilo:~$ toolforge jobs run --image tool-milhistbot/autocheck:latest --mount=all --command "web $OPTS" autocheck3

local.tf-test@lima-kilo:~$ toolforge jobs logs autocheck3
2024-07-19T07:53:52+00:00 [autocheck3-r2qgp] Password for user MilHistBot not found in /workspace/heroku_output/credx.xml
2024-07-19T07:53:52+00:00 [autocheck3-r2qgp] Unhandled exception. Wikimedia.BotException: Password for user MilHistBot not found in /workspace/heroku_output/credx.xml
2024-07-19T07:53:52+00:00 [autocheck3-r2qgp]    at Wikimedia.Bot..ctor() in /Users/ram900/milhistbot-wikimedia/Bot.cs:line 794
2024-07-19T07:53:52+00:00 [autocheck3-r2qgp]    at AutoCheck..ctor(String[] args) in /workspace/Program.cs:line 243
2024-07-19T07:53:52+00:00 [autocheck3-r2qgp]    at AutoCheck.Main(String[] args) in /workspace/Program.cs:line 279
2024-07-19T07:53:52+00:00 [autocheck3-r2qgp] Aborted (core dumped)

I might be missing some envvars config? Note that the tool home is mounted under $TOOL_DATA_DIR, so if you are looking for files there, they'll bu under that path (ex. $TOOL_DATA_DIR/credx.xml).
Just fyi, using envvars is preferred when possible over using files for secrets, as they are encrypted and it's harder to leak (https://wikitech.wikimedia.org/wiki/Help:Toolforge/Tool_Accounts#File_permissions, https://wikitech.wikimedia.org/wiki/Help:Toolforge/Developing_successful_tools#Secure_passwords_and_other_credentials)

To get the yaml config, the easiest is to run the job the first time manually, and then dump the xml:

local.tf-test@lima-kilo:~$ toolforge jobs run --schedule "* * * * *" --image tool-milhistbot/autocheck:latest --mount=all --command "web $OPTS" autocheck3

local.tf-test@lima-kilo:~$ toolforge jobs dump
- command: 'launcher web '
  image: tool-milhistbot/autocheck:latest
  mount: all
  name: autocheck3
  schedule: '* * * * *'

you can see there the exact syntax for the mount option

That's exactly what I did. I used envvars list to hold the password:

 tools.milhistbot@tools-bastion-13:~$ toolforge envvars list
name                       value
CREDX_MILHISTBOT_PASSWORD  MilHistBot@*******************
TOOL_DATA_DIR              /data/project/milhistbot/logs

(Censored out the password, but it is there and correct.)

If it is not picking up this environment variable, then toolforge envvars is not working and we need another way of passing the passwords to toolforge jobs.

Logging is non-existent, which makes things hard to debug:

tools.milhistbot@tools-bastion-13:~$ toolforge jobs logs autocheck2
ERROR: Error: Job 'autocheck2' does not have any logs available

I tried the mount: all and it gave an error... no error now though. I don't know where "launcher" comes from.

I all works when I run it:

tools.milhistbot@tools-bastion-13:~$ cat ~/bin/runit-autocheck
#!/usr/bin/bash
OPTS="$(printf " %q" "${@}")"
toolforge jobs run --image tool-milhistbot/autocheck:latest --mount=all --command "web $OPTS" autocheck3

tools.milhistbot@tools-bastion-13:~$ runit-autocheck -n=10 -ff
 tail -f AutoCheck.log
02:19 20 July 2024 started
02:19 20 July 2024 Aleria standoff
02:19 20 July 2024      Updated to B class
02:19 20 July 2024 Aleria standoff
02:19 20 July 2024      Changed from  to B class
02:19 20 July 2024 1 articles newly rated, 0 downgraded, 0 upgraded, 1 unchanged - total 2
02:19 20 July 2024 done

Working. So why didn't it work for you?

That's exactly what I did. I used envvars list to hold the password:

 tools.milhistbot@tools-bastion-13:~$ toolforge envvars list
name                       value
CREDX_MILHISTBOT_PASSWORD  MilHistBot@*******************
TOOL_DATA_DIR              /data/project/milhistbot/logs

(Censored out the password, but it is there and correct.)

If it is not picking up this environment variable, then toolforge envvars is not working and we need another way of passing the passwords to toolforge jobs.

Logging is non-existent, which makes things hard to debug:

tools.milhistbot@tools-bastion-13:~$ toolforge jobs logs autocheck2
ERROR: Error: Job 'autocheck2' does not have any logs available

When not logging to filesystem the logs are stored with the k8s pod, if the job fails, they get deleted after a few minutes :/, we have pending centralizing that so we can persist them for longer yep

I tried the mount: all and it gave an error... no error now though. I don't know where "launcher" comes from.

The launcher is injected because it's a buildpack based image, that launcher binary sets up the environment (LD_PRELOAD, PATH, ...) so things work inside as usual (as close as possible at least).

I all works when I run it:

tools.milhistbot@tools-bastion-13:~$ cat ~/bin/runit-autocheck
#!/usr/bin/bash
OPTS="$(printf " %q" "${@}")"
toolforge jobs run --image tool-milhistbot/autocheck:latest --mount=all --command "web $OPTS" autocheck3

tools.milhistbot@tools-bastion-13:~$ runit-autocheck -n=10 -ff
 tail -f AutoCheck.log
02:19 20 July 2024 started
02:19 20 July 2024 Aleria standoff
02:19 20 July 2024      Updated to B class
02:19 20 July 2024 Aleria standoff
02:19 20 July 2024      Changed from  to B class
02:19 20 July 2024 1 articles newly rated, 0 downgraded, 0 upgraded, 1 unchanged - total 2
02:19 20 July 2024 done

Working. So why didn't it work for you?

For me it's because I don't have the password :), I ran it locally

If it's working for you is good enough.

So just to confirm, right now, for you it's working as expected?

Yes.

The jobs.yaml file now reads:

# autocheck2
- name: autocheck2
  command: launcher web -ff -n=1000
  image: tool-milhistbot/autocheck:latest
  schedule: "45 12 * * *"
  emails: onfailure
  mount: all

launcher is present now - is it required? Works okay with it.

The generated log file now contains:

12:45 23 July 2024 started
12:45 23 July 2024 École du Pharo
12:45 23 July 2024      Changed from B to C class
12:45 23 July 2024 École du Pharo
12:45 23 July 2024      Updated to C class
12:45 23 July 2024 HDMS Dronning Anna Sophia (1722)
12:45 23 July 2024      Changed from  to Start class
12:45 23 July 2024 The Maratha rebellion
12:45 23 July 2024      Changed from Start to C class
12:45 23 July 2024 Battle of Thorgo
12:45 23 July 2024      Changed from  to B class
12:45 23 July 2024 Kamp Sint-Michielsgestel
12:45 23 July 2024      Changed from  to C class
12:45 23 July 2024 Outram Prison
12:45 23 July 2024      Changed from  to Start class
12:45 23 July 2024 Socialist Action (Poland)
12:45 23 July 2024      Changed from  to Start class
12:45 23 July 2024 5 articles newly rated, 1 downgraded, 1 upgraded, 1 unchanged - total 8
12:45 23 July 2024 done

So it looks good. I can proceed with moving jobs to this architecture after I get back from the Olympics/Paralympics. Thanks for your assistance.

dcaro claimed this task.

Glad to hear it's working :)

launcher is present now - is it required? Works okay with it.

Yep, it's ok, you can remove it too, it will be added automatically (you get it back when doing dump).