Page MenuHomePhabricator

scap shallows CheckInvalid message leading to confusion
Open, Needs TriagePublic

Description

When deploying Zuul with scap3 I had a nice confusion that cost me an hour or so. I had a check defined as:

scap/checks.yaml
checks:
  type: script
  stage: promote
  command: /srv/deployment/zuul/deploy/scap/promote-install_artifacts.sh

On the deployment server with scap deploy --verbose I got the following output:

14:22:52 [deploy1001] [u'/usr/bin/scap', u'deploy-local', u'-v', u'--repo', u'zuul/deploy', u'-g', u'ci', u'fetch', u'--refresh-config'] on contint2001.wikimedia.org returned [70]: OpenSSH_7.4p1 Debian-10+deb9u7, OpenSSL 1.0.2u  20 Dec 2019
<some stdout/stderr from the client>

Unhandled error:
deploy-local failed: <CheckInvalid> {}  # <---------- MISSING MESSAGE

14:22:52 [deploy1001] 1 targets had deploy errors
14:22:52 [deploy1001] 1 targets failed
14:22:52 [deploy1001] 1 of 1 ci targets failed, exceeding limit

CheckInvalid is an AssertionError initialized with a message but somehow it is not used when logging the error.

scap/cli.sh
def _handle_exception(self, ex):
    """
    Handle unhandled exceptions and errors.

    :returns: exit status
    """
    logger = self.get_logger()
    exception_type = type(ex).__name__
    backtrace = True
    message = '%s failed: <%s> %s'

    if isinstance(ex, lock.LockFailedError):
        backtrace = False

    if backtrace:
        logger.warning('Unhandled error:', exc_info=True)

    logger.error(message, self.program_name, exception_type, ex)
    return 70

Eventually I found the actual error in the scap log via:

scap deploy-log -f /srv/deployment/zuul/deploy/scap/log/scap-sync-2020-04-10-0012-2-g10c06e8.log
14:22:52 [contint2001.wikimedia.org] Unhandled error:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/scap/cli.py", line 342, in run
    exit_status = app.main(app.extra_arguments)
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 150, in main
    status = self._execute_checks(stage, group)
  File "/usr/lib/python2.7/dist-packages/scap/utils.py", line 402, in context_wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 495, in _execute_checks
    chks = checks.load(self.config, check_environment)
  File "/usr/lib/python2.7/dist-packages/scap/checks.py", line 176, in load
    **options
  File "/usr/lib/python2.7/dist-packages/scap/checks.py", line 219, in __init__
    self.validate()
  File "/usr/lib/python2.7/dist-packages/scap/script.py", line 75, in validate
    raise checks.CheckInvalid(msg)
CheckInvalid: the script '/srv/deployment/zuul/deploy/scap/promote-install_artifacts.sh' was not found
14:22:52 [contint2001.wikimedia.org] deploy-local failed: <CheckInvalid> {}

The exception trace is shown via logger.warning('Unhandled error:', exc_info=True) which does output the actual error message: the script '/srv/deployment/zuul/deploy/scap/promote-install_artifacts.sh' was not found.

But somehow the error logging does not:

>>> logger.error(message, self.program_name, exception_type, ex)
deploy-local failed: <CheckInvalid> {}

The lack of message is an entire mystery and I can't figure it out :\

(note the fix was to change the type of check from script to command).