Page MenuHomePhabricator

On all slaves, /srv/deployment/integration/slave-scripts permissions went crazy
Closed, DuplicatePublic

Description

(04:56:48 PM) cscott-free: is hashar around? jenkins hates jslint today. https://integration.wikimedia.org/ci/job/parsoidsvc-jslint/3153/console
(04:58:49 PM) hashar: cscott: jshint must not like your code !
(04:58:59 PM) hashar: cscott: I think jshint got upgraded this week
(05:00:31 PM) hashar: cscott: ah perm denied bah
(05:01:02 PM) cscott-free: jshint got "upgraded"
(05:01:16 PM) hashar: the file belong to root:root :-/
(05:01:32 PM) ***cscott-free sings root, root, root for the home team...
(05:01:36 PM) hashar: how is that even possible
(05:04:32 PM) hashar: cscott: too late for me. Can you fill a bug stating that on integration-slave1007 the directory /srv/deployment/integration/slave-scripts is borked ?
(05:05:04 PM) hashar: cscott: it is a git clone of integration/jenkins.git maintained by puppet. For some reason some files are not group/other writables

Event Timeline

cscott assigned this task to hashar.
cscott raised the priority of this task from to Needs Triage.
cscott updated the task description. (Show Details)
cscott added a project: acl*sre-team.
cscott subscribed.
Krinkle renamed this task from On integration-slave1007 the directory /srv/deployment/integration/slave-scripts is borked to On all slaves, /srv/deployment/integration/slave-scripts permissions went crazy.Jan 7 2015, 12:46 AM

@hashar The JSHint upgrade is unrelated. No permission changes occurred in that change. I tested it on integration-dev-precise using a local checkout of jenkins.git slave-scripts before merging that change. In fact I drafted that commit on integration-dev-precise (in order to ensure npm-install installs resources for Ubuntu Precise instead of Trusty or OSX). There we no permission changes or errors of any kind.

The files belonging to root is normal because they're git-clones and managed by Puppet. However they're normally readable by all users and thus executable just fine. Somehow the files were modified so that no user could read them, only root.

I found the same permission errors on all slaves. Including integration-slave1003 (precise), slave1005 and 1007 (trusty).

Symptoms:

  • Running git status (as regular user) resulted in git saying all files in the repository are newly added and staged.
  • Running git status (via sudo) resulted in clean result with no modified, untracked or staged changes.
  • chmod -R go+r /srv/deployment/integration/slave-scripts did not fix it.
  • After the chmod JSHint complains about ./lib/node not found permission denied ./src/cli.js, but both are caused by permissions trouble.

Both on integration-dev-precise and on my local machine, the slave-scripts git repo works fine and has none of these issues.

The following fixed it for the time being:

krinkle $ ssh integration-slave100
krinkle $ cd /srv/deployment/integration
krinkle $ sudo -s
root $ rm -rf slave-scripts/ && git clone https://gerrit.wikimedia.org/r/p/integration/jenkins.git slave-scripts

Doesn't seem fixed yet .. See https://gerrit.wikimedia.org/r/#/c/181250/ and the attempts to get that patch merged.

Looks like I either forgot a few of the slaves or they regressed again.

Iterated over all 9 integration slaves and had to re-apply it to:

  • integration-slave1001
  • integration-slave1004
  • integration-slave1005
  • integration-slave1009

Fixed. (again)

Hm. The fact that this 'regressed' by itself makes me think that puppet is making this change deliberately. If we retickle puppet on these slaves will the bug reappear?