Page MenuHomePhabricator

scap failed: IOError [Errno 13] Permission denied: '/srv/mediawiki-staging/wikiversions-labs.cdb.tmp'
Closed, ResolvedPublic

Description

From https://integration.wikimedia.org/ci/job/beta-scap-eqiad/46596/console :

The scap job fails every time, which 1) must be spamming the crap out of the qa-alerts mailing list, and 2) makes it impossible to distinguish these false alarm failures from real failures using normal tools.

21:08:32 21:08:32 Finished sync_wikiversions (duration: 00m 00s)
21:08:32 21:08:32 Unhandled error:
21:08:32 Traceback (most recent call last):
21:08:32   File "/mnt/srv/deployment/scap/scap/scap/cli.py", line 276, in run
21:08:32     exit_status = app.main(extra_args)
21:08:32   File "/mnt/srv/deployment/scap/scap/scap/main.py", line 66, in main
21:08:32     self._after_cluster_sync()
21:08:32   File "/mnt/srv/deployment/scap/scap/scap/main.py", line 241, in _after_cluster_sync
21:08:32     self._get_apache_list(), self.config)
21:08:32   File "/mnt/srv/deployment/scap/scap/scap/tasks.py", line 308, in sync_wikiversions
21:08:32     compile_wikiversions_cdb('stage', cfg)
21:08:32   File "/mnt/srv/deployment/scap/scap/scap/tasks.py", line 150, in compile_wikiversions_cdb
21:08:32     with open(tmp_cdb_file, 'wb') as fp:
21:08:32 IOError: [Errno 13] Permission denied: '/srv/mediawiki-staging/wikiversions-labs.cdb.tmp'
21:08:32 21:08:32 scap failed: IOError [Errno 13] Permission denied: '/srv/mediawiki-staging/wikiversions-labs.cdb.tmp' (duration: 01m 20s)
21:08:32 Build step 'Execute shell' marked build as failure
21:08:32 Email was triggered for: Failure - Any
21:08:32 Sending email for trigger: Failure - Any
21:08:32 Sending email to: qa-alerts@lists.wikimedia.org
21:08:32 Finished: FAILURE

Event Timeline

Catrope raised the priority of this task from to Needs Triage.
Catrope updated the task description. (Show Details)
Catrope subscribed.
hashar renamed this task from beta-scap-eqiad works but throws errors to scap failed: IOError [Errno 13] Permission denied: '/srv/mediawiki-staging/wikiversions-labs.cdb.tmp'.Mar 27 2015, 9:37 PM
hashar added a project: Deployments.
hashar set Security to None.
hashar added subscribers: bd808, mmodell, hashar.
IOError: [Errno 13] Permission denied: '/srv/mediawiki-staging/wikiversions-labs.cdb.tmp'

That sounds very similar to an issue @bd808 and @mmodell investigated during the week (cant find original task sorry).

bd808 triaged this task as High priority.

Change 200248 had a related patch set uploaded (by BryanDavis):
beta: Stop running git clone as mwdeploy

https://gerrit.wikimedia.org/r/200248

Change 199988 had a related patch set uploaded (by Thcipriani):
Trebuchet group wikidev; mw-staging owner mwdeploy

https://gerrit.wikimedia.org/r/199988

This seems to have been another small bout of file permissions and active user fallout from the recent efforts to consolidate beta and production scap roles. Historically things on deployment-bastion were being executed at the mwdeploy user for both the management of the git clones and execution of scap. This has changed but not everything was changed at the same time and a few little details were missed.

We had some fallout related to this with the train deploy merges on 2015-03-25 that I just did chmods to get around. @thcipriani did some puppet cleanup to make fixes for this on 2015-03-26 after we chatted on irc. The errors that came up in this ticket made me look deeper and change my mind about some of what I had told Tyler the day before.

The new hotness is that the git clones are being updated as the jenkins-deploy user which happens to also be a member of the wikidev group. Permissions on deployment-bastion:/srv/mediawiki-staging have been updated to reflect this and I have a Puppet config patch in gerrit to match the new ownership. I have also reapplied the ::beta::autoupdater class on deployment-bastion which manages the bits of the process that are unique to beta. This class and it's associated resources could be moved to the ::contint Puppet module if @yuvipanda would prefer as they are really things that are specific to the automation of beta cluster updates by Jenkins rather than intrinsic to the operation of the cluster itself.

I got a successful run! https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-eqiad/46608/console Let's hope it lasts the weekend.

Change 200285 had a related patch set uploaded (by BryanDavis):
Run wmf-beta-autoupdate.py as jenkins-deploy

https://gerrit.wikimedia.org/r/200285

Change 200285 merged by jenkins-bot:
Run wmf-beta-autoupdate.py as jenkins-deploy

https://gerrit.wikimedia.org/r/200285

Change 200248 merged by Yuvipanda:
beta: Fix ::beta::autoupdater to work again

https://gerrit.wikimedia.org/r/200248

Change 199988 merged by Yuvipanda:
Trebuchet group wikidev; mw-staging owner mwdeploy

https://gerrit.wikimedia.org/r/199988