Page MenuHomePhabricator

scap error on Gerrit first setup
Closed, ResolvedPublic

Description

upon reimaging gerrit1003 (T417246: Reimage gerrit1003), I consistently got the following scap error:

Error: Execution of '/usr/bin/scap deploy-local --repo gerrit/gerrit -D log_json:False' returned 1: 11:53:30 Fetch from: http://deploy2002.codfw.wmnet/gerrit/gerrit/.git
11:53:30 Unhandled error:
Traceback (most recent call last):
  File "/var/lib/scap/scap/lib/python3.11/site-packages/scap/cli.py", line 827, in run
    exit_status = app.main(app.extra_arguments)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/scap/scap/lib/python3.11/site-packages/scap/deploy.py", line 174, in main
    getattr(self, stage)()
  File "/var/lib/scap/scap/lib/python3.11/site-packages/scap/deploy.py", line 352, in fetch
    git.lfs_install()
  File "/var/lib/scap/scap/lib/python3.11/site-packages/scap/git.py", line 180, in lfs_install
    gitcmd("lfs", *lfsargs)
  File "/var/lib/scap/scap/lib/python3.11/site-packages/scap/runcmd.py", line 88, in gitcmd
    return _runcmd(["git", subcommand] + list(args), **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/scap/scap/lib/python3.11/site-packages/scap/runcmd.py", line 75, in _runcmd
    raise FailedCommand(argv, p.returncode, stdout, stderr)
scap.runcmd.FailedCommand: Command 'git lfs install' failed with exit code 2;
stdout:
warning: error running /usr/lib/git-core/git 'config' '--includes' '--global' '--replace-all' 'filter.lfs.clean' 'git-lfs clean -- %f': 'fatal: $HOME not set' 'exit status 128'
Run `git lfs install --force` to reset Git configuration.

stderr:

11:53:30 deploy-local failed: <FailedCommand> Command 'git lfs install' failed with exit code 2;
stdout:
warning: error running /usr/lib/git-core/git 'config' '--includes' '--global' '--replace-all' 'filter.lfs.clean' 'git-lfs clean -- %f': 'fatal: $HOME not set' 'exit status 128'
Run `git lfs install --force` to reset Git configuration.

stderr:
 (scap version: 4.241.0) (duration: 00m 00s)

Error: /Stage[main]/Gerrit/Scap::Target[gerrit/gerrit]/Package[gerrit/gerrit]/ensure: change from 'absent' to 'present' failed: Execution of '/usr/bin/scap deploy-local --repo gerrit/gerrit -D log_json:False' returned 1: 11:53:30 Fetch from: http://deploy2002.codfw.wmnet/gerrit/gerrit/.git
11:53:30 Unhandled error:
Traceback (most recent call last):
  File "/var/lib/scap/scap/lib/python3.11/site-packages/scap/cli.py", line 827, in run
    exit_status = app.main(app.extra_arguments)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/scap/scap/lib/python3.11/site-packages/scap/deploy.py", line 174, in main
    getattr(self, stage)()
  File "/var/lib/scap/scap/lib/python3.11/site-packages/scap/deploy.py", line 352, in fetch
    git.lfs_install()
  File "/var/lib/scap/scap/lib/python3.11/site-packages/scap/git.py", line 180, in lfs_install
    gitcmd("lfs", *lfsargs)
  File "/var/lib/scap/scap/lib/python3.11/site-packages/scap/runcmd.py", line 88, in gitcmd
    return _runcmd(["git", subcommand] + list(args), **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/scap/scap/lib/python3.11/site-packages/scap/runcmd.py", line 75, in _runcmd
    raise FailedCommand(argv, p.returncode, stdout, stderr)
scap.runcmd.FailedCommand: Command 'git lfs install' failed with exit code 2;
stdout:
warning: error running /usr/lib/git-core/git 'config' '--includes' '--global' '--replace-all' 'filter.lfs.clean' 'git-lfs clean -- %f': 'fatal: $HOME not set' 'exit status 128'
Run `git lfs install --force` to reset Git configuration.

stderr:

11:53:30 deploy-local failed: <FailedCommand> Command 'git lfs install' failed with exit code 2;
stdout:
warning: error running /usr/lib/git-core/git 'config' '--includes' '--global' '--replace-all' 'filter.lfs.clean' 'git-lfs clean -- %f': 'fatal: $HOME not set' 'exit status 128'
Run `git lfs install --force` to reset Git configuration.

stderr:
 (scap version: 4.241.0) (duration: 00m 00s)

Which I guess triggers

Notice: /Stage[main]/Profile::Java/Java::Cacert[wmf:puppetca.pem]/Exec[java__cacert_wmf:puppetca.pem]/returns: keytool error: java.io.FileNotFoundException: /usr/share/ca-certificates/wikimedia/Puppet5_Internal_CA.crt (No such file or directory)
Error: '/usr/bin/keytool -import -trustcacerts -noprompt -cacerts     -file /usr/share/ca-certificates/wikimedia/Puppet5_Internal_CA.crt -storepass changeit -alias wmf:puppetca.pem
' returned 1 instead of one of [0]
Error: /Stage[main]/Profile::Java/Java::Cacert[wmf:puppetca.pem]/Exec[java__cacert_wmf:puppetca.pem]/returns: change from 'notrun' to ['0'] failed: '/usr/bin/keytool -import -trustcacerts -noprompt -cacerts     -file /usr/share/ca-certificates/wikimedia/Puppet5_Internal_CA.crt -storepass changeit -alias wmf:puppetca.pem
' returned 1 instead of one of [0] (corrective)

and the rest of puppet execution is Skipping because of failed dependencies

Event Timeline

ABran-WMF triaged this task as High priority.

setting high priority to mirror the parent task

warning: error running /usr/lib/git-core/git 'config' '--includes' '--global' '--replace-all' 'filter.lfs.clean' 'git-lfs clean -- %f': 'fatal: $HOME not set' 'exit status 128'

Because a global git config is stored under the invoking user home directory:

git-config(1)
`--global`

For writing options: write to global ~/.gitconfig file rather than the repository .git/config, write to $XDG_CONFIG_HOME/git/config file if this file exists and the ~/.gitconfig file doesn’t.

Which sounds to me the scap deploy local script does not pass the invoking user environment variable (os.environ) or at least HOME/PATH/USER. Or it might be an issue with sudo not passing them if sudo is involved.

That sounds like a scap issue and others in my group are more qualified to investigate it. I am ping the team on Slack #developer-experience.

Do you have the full log of the puppet run to see more context?

Change #1240372 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/puppet@production] scap3 install provider: Set HOME for deploy_user when running scap

https://gerrit.wikimedia.org/r/1240372

@dancy found a reason $HOME would not be set in the initial puppet run: the Puppet Provider's execution method deletes some env vars

Change #1240372 merged by Dzahn:

[operations/puppet@production] scap3 install provider: Set env vars for deploy_user when running scap

https://gerrit.wikimedia.org/r/1240372

thanks @thcipriani @hashar @Dzahn @dancy for the fix!

I still have some issues after the Nth puppet run, I've logged all puppet-agent-run output in files in /home/arnaudb/:

arnaudb@gerrit1003:~ $ ls -l *.log
-rw-r--r-- 1 arnaudb wikidev 94744 Feb 19 06:36 puppet_run_1.log
-rw-r--r-- 1 arnaudb wikidev 36646 Feb 19 06:39 puppet_run_2.log
-rw-r--r-- 1 arnaudb wikidev 31609 Feb 19 06:41 puppet_run_3.log
-rw-r--r-- 1 arnaudb wikidev 31609 Feb 19 06:42 puppet_run_4.log

the last blocker seems to come from the puppet5 CA:

Notice: /Stage[main]/Profile::Java/Java::Cacert[wmf:puppetca.pem]/Exec[java__cacert_wmf:puppetca.pem]/returns: keytool error: java.io.FileNotFoundException: /usr/share/ca-certificates/wikimedia/Puppet5_Internal_CA.crt (No such file 
or directory)                                                                                                                                                                                                                           
Error: '/usr/bin/keytool -import -trustcacerts -noprompt -cacerts     -file /usr/share/ca-certificates/wikimedia/Puppet5_Internal_CA.crt -storepass changeit -alias wmf:puppetca.pem                                                    
' returned 1 instead of one of [0]                                                                                                                                                                                                      
Error: /Stage[main]/Profile::Java/Java::Cacert[wmf:puppetca.pem]/Exec[java__cacert_wmf:puppetca.pem]/returns: change from 'notrun' to ['0'] failed: '/usr/bin/keytool -import -trustcacerts -noprompt -cacerts     -file /usr/share/ca-c
ertificates/wikimedia/Puppet5_Internal_CA.crt -storepass changeit -alias wmf:puppetca.pem                                                                                                                                               
' returned 1 instead of one of [0] (corrective)                                                                                                                                                                                         
Notice: /Stage[main]/Profile::Gerrit/Motd::Script[replica warning]/File[/etc/update-motd.d/01-replica-warning]: Dependency Exec[java__cacert_wmf:puppetca.pem] has failures: true

it comes back consistently after the 2nd puppet-agent run:

arnaudb@gerrit1003:~ $ diff puppet_run_4.log puppet_run_3.log
184c184
< Notice: Applied catalog in 24.23 seconds
---
> Notice: Applied catalog in 24.47 seconds

Change #1240477 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] java: update puppet certificate

https://gerrit.wikimedia.org/r/1240477

ABran-WMF assigned this task to dancy.

closing the task as the initial issue has been fixed

Change #1240477 merged by Arnaudb:

[operations/puppet@production] java: update puppet certificate

https://gerrit.wikimedia.org/r/1240477

the last blocker seems to come from the puppet5 CA

This sounds related to the recent work by infra foundations to remove puppet5 / puppetmasters.

the last blocker seems to come from the puppet5 CA

This sounds related to the recent work by infra foundations to remove puppet5 / puppetmasters.

yes indeed! we swapped the certificates based on @MoritzMuehlenhoff and @elukey recommendations