Page MenuHomePhabricator

scap eats underlying commands output (such as maintenance script stacktrace)
Closed, DeclinedPublic

Description

On beta cluster we have a log configuration error (T97138) causing maintenance scripts to spurts a PHP stracktrace.

The http://integration.wikimedia.org/ci/job/beta-scap-eqiad/ beta-scap-eqiad jenkins job runs scap and abort when running mergeMessageFileList.php (T96920) but the php script trace isn't shown:

https://integration.wikimedia.org/ci/job/beta-scap-eqiad/50109/console

Started by upstream project "beta-code-update-eqiad" build number 52826
originally caused by:
 Started by timer
Building remotely on deployment-bastion.eqiad (betaclusterEqiad deployment-bastion-eqiad) in workspace /mnt/home/jenkins-deploy/workspace/beta-scap-eqiad
No emails were triggered.
[beta-scap-eqiad] $ /bin/bash -xe /tmp/hudson2089674335862586503.sh
+ /srv/deployment/scap/scap/bin/scap 'beta-scap-eqiad (build #50109)'
21:03:58 Started scap: beta-scap-eqiad (build #50109)
21:03:58 Copying to deployment-bastion.eqiad.wmflabs from deployment-bastion.eqiad.wmflabs
21:03:58 Started rsync common
21:04:07 Finished rsync common (duration: 00m 08s)
21:04:07 Compiled /srv/mediawiki/wikiversions-labs.json to /srv/mediawiki/wikiversions-labs.cdb
21:04:07 Started mw-update-l10n
21:04:07 Updating ExtensionMessages-master.php
21:04:08 Finished mw-update-l10n (duration: 00m 00s)
21:04:08 Unhandled error:
Traceback (most recent call last):
  File "/mnt/srv/deployment/scap/scap/scap/cli.py", line 276, in run
    exit_status = app.main(extra_args)
  File "/mnt/srv/deployment/scap/scap/scap/main.py", line 39, in main
    self._before_cluster_sync()
  File "/mnt/srv/deployment/scap/scap/scap/main.py", line 218, in _before_cluster_sync
    version, wikidb, self.verbose, self.config)
  File "/mnt/srv/deployment/scap/scap/scap/tasks.py", line 496, in update_localization_cache
    logger)
  File "/mnt/srv/deployment/scap/scap/scap/utils.py", line 308, in sudo_check_call
    raise subprocess.CalledProcessError(proc.returncode, cmd)
CalledProcessError: Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="eowiki" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.VpRueQKWIR" ' returned non-zero exit status 1
21:04:08 scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="eowiki" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.VpRueQKWIR" ' returned non-zero exit status 1 (duration: 00m 09s)

Running that maintenance script manually shows the PHP trace from T97138

Event Timeline

hashar raised the priority of this task from to Needs Triage.
hashar updated the task description. (Show Details)
hashar subscribed.

it's got "--output" to a temp file - is the file empty?

The stdout/stderr of the proc should be going to the logger at debug level. Is jenkins running with the --verbose flag? I can't remember if I got that to pass through to everything or not yet.

as far as I could tell it should be working, verbose flag not necessary... but for some reason it does seem to eat the output - note I closely examined the scap code and the called php code - I could not see anything that would cause the output to get eaten... The only part I didn't examine thoroughly is the debug logger - I don't know how that is set up or whether it really works - maybe the logger is eating the output?

The only part I didn't examine thoroughly is the debug logger - I don't know how that is set up or whether it really works - maybe the logger is eating the output?

The logger is setup by scap.log.setup_loggers() which is called by scap.cli.Application._setup_loggers(). The default console log level is INFO unless --verbose is passed to the app. This is why I asked if Jenkins was passing --verbose.

Ok that explains a lot. For some reason I was thinking about passing --verbose to the subcommand, not to scap it's self. Sorry for being dense. I'll try to dig in to it further.

mmodell triaged this task as Medium priority.Jun 1 2015, 6:04 PM
mmodell moved this task from To Triage to Backlog (Tech) on the Deployments board.

Scap added much better logging. Should we backport that stuff to the rest of scap or just wait for mediawiki to be deployed with scap3?

Declining since I don't think we have hit that again, the invoked tool stacktrace can be found by running the command manually if needed.