Page MenuHomePhabricator

scap eats underlying commands output (such as maintenance script stacktrace)
Open, NormalPublic

Description

On beta cluster we have a log configuration error (T97138) causing maintenance scripts to spurts a PHP stracktrace.

The http://integration.wikimedia.org/ci/job/beta-scap-eqiad/ beta-scap-eqiad jenkins job runs scap and abort when running mergeMessageFileList.php (T96920) but the php script trace isn't shown:

https://integration.wikimedia.org/ci/job/beta-scap-eqiad/50109/console

Started by upstream project "beta-code-update-eqiad" build number 52826
originally caused by:
 Started by timer
Building remotely on deployment-bastion.eqiad (betaclusterEqiad deployment-bastion-eqiad) in workspace /mnt/home/jenkins-deploy/workspace/beta-scap-eqiad
No emails were triggered.
[beta-scap-eqiad] $ /bin/bash -xe /tmp/hudson2089674335862586503.sh
+ /srv/deployment/scap/scap/bin/scap 'beta-scap-eqiad (build #50109)'
21:03:58 Started scap: beta-scap-eqiad (build #50109)
21:03:58 Copying to deployment-bastion.eqiad.wmflabs from deployment-bastion.eqiad.wmflabs
21:03:58 Started rsync common
21:04:07 Finished rsync common (duration: 00m 08s)
21:04:07 Compiled /srv/mediawiki/wikiversions-labs.json to /srv/mediawiki/wikiversions-labs.cdb
21:04:07 Started mw-update-l10n
21:04:07 Updating ExtensionMessages-master.php
21:04:08 Finished mw-update-l10n (duration: 00m 00s)
21:04:08 Unhandled error:
Traceback (most recent call last):
  File "/mnt/srv/deployment/scap/scap/scap/cli.py", line 276, in run
    exit_status = app.main(extra_args)
  File "/mnt/srv/deployment/scap/scap/scap/main.py", line 39, in main
    self._before_cluster_sync()
  File "/mnt/srv/deployment/scap/scap/scap/main.py", line 218, in _before_cluster_sync
    version, wikidb, self.verbose, self.config)
  File "/mnt/srv/deployment/scap/scap/scap/tasks.py", line 496, in update_localization_cache
    logger)
  File "/mnt/srv/deployment/scap/scap/scap/utils.py", line 308, in sudo_check_call
    raise subprocess.CalledProcessError(proc.returncode, cmd)
CalledProcessError: Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="eowiki" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.VpRueQKWIR" ' returned non-zero exit status 1
21:04:08 scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="eowiki" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.VpRueQKWIR" ' returned non-zero exit status 1 (duration: 00m 09s)

Running that maintenance script manually shows the PHP trace from T97138

Event Timeline

hashar created this task.Apr 24 2015, 3:11 PM
hashar raised the priority of this task from to Needs Triage.
hashar updated the task description. (Show Details)
hashar added a subscriber: hashar.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 24 2015, 3:11 PM
greg set Security to None.

it's got "--output" to a temp file - is the file empty?

bd808 added a subscriber: bd808.Apr 24 2015, 9:32 PM

The stdout/stderr of the proc should be going to the logger at debug level. Is jenkins running with the --verbose flag? I can't remember if I got that to pass through to everything or not yet.

as far as I could tell it should be working, verbose flag not necessary... but for some reason it does seem to eat the output - note I closely examined the scap code and the called php code - I could not see anything that would cause the output to get eaten... The only part I didn't examine thoroughly is the debug logger - I don't know how that is set up or whether it really works - maybe the logger is eating the output?

bd808 added a comment.Apr 26 2015, 4:32 PM

The only part I didn't examine thoroughly is the debug logger - I don't know how that is set up or whether it really works - maybe the logger is eating the output?

The logger is setup by scap.log.setup_loggers() which is called by scap.cli.Application._setup_loggers(). The default console log level is INFO unless --verbose is passed to the app. This is why I asked if Jenkins was passing --verbose.

Ok that explains a lot. For some reason I was thinking about passing --verbose to the subcommand, not to scap it's self. Sorry for being dense. I'll try to dig in to it further.

mmodell triaged this task as Normal priority.Jun 1 2015, 6:04 PM
mmodell moved this task from To Triage to Backlog (Tech) on the Deployments board.
greg edited projects, added scap2; removed Deployments.Feb 9 2016, 11:52 PM

Scap added much better logging. Should we backport that stuff to the rest of scap or just wait for mediawiki to be deployed with scap3?

mmodell edited projects, added Scap; removed scap2.Feb 10 2017, 6:22 PM
mmodell moved this task from Needs triage to Debt on the Scap board.Feb 1 2018, 12:18 AM