Page MenuHomePhabricator

Puppet is causing changed/added files in 'slave-scripts' git::clone on integration slaves in labs to become root read-only
Closed, ResolvedPublic

Description

This was first noticed yesterday where a commit to integration/jenkins.git (deployed as /srv/deployment/integration/slave-scripts on the integration slaves in labs) caused the modified files to become root read-only after the next puppet run.

@bd808 and myself spent some time debugging this but concluded that the existing puppet manifests didn't appear to contain any code that would cause this.

Relevant puppet manifests:

I confirmed it to be deterministic, when debugging on one of the slaves, by running git reset --hard HEAD^ and fixing permissions on the existing files. After we ran the puppet agent again, we observed the same issue happening again. The modified files since the previous commit were now all formatted as -rwx------ 1 root root.

In attempt to confirm there wasn't any preexisting state with sticky bits or umasks, I depoled one of the slaves, rm -rf'ed the slave-scripts checkout and had puppet re-create it. It looked okay at first. The files were properly readable by the Jenkins user and other users.

However after making another commit (e.g. I4d94af4673), the same issue happened again:

integration-slave1007:

$ ll /srv/deployment/integration/slave-scripts/bin
-rwxr-xr-x 1 root root 1.1K Jan 28 23:21 mw-run-update-script.sh*
-rwxr-xr-x 1 root root  682 Jan 28 23:21 mw-send-to-coveralls.sh*
-rwx------ 1 root root  158 Jan 29 02:39 mw-set-env-qunit.sh*     <--
-rwxr-xr-x 1 root root 1.1K Jan 28 23:21 mw-set-env.sh*

Event Timeline

Krinkle raised the priority of this task from to Needs Triage.
Krinkle updated the task description. (Show Details)
Krinkle added subscribers: Krinkle, bd808.
Krinkle triaged this task as Unbreak Now! priority.Jan 29 2015, 6:08 AM
Krinkle set Security to None.
gerritbot subscribed.

Change 187331 had a related patch set uploaded (by BBlack):
fix git::clone umask issues T87843

https://gerrit.wikimedia.org/r/187331

Patch-For-Review

I'm not entirely confident in the above patch given how much reuse git::clone sees all over puppet, but I suspect it's the right general avenue to head down, and might be worth testing...

Change 187465 had a related patch set uploaded (by Krinkle):
Example commit to test T87843

https://gerrit.wikimedia.org/r/187465

Patch-For-Review

Change 187465 merged by jenkins-bot:
Example commit to test T87843

https://gerrit.wikimedia.org/r/187465

Change 187477 had a related patch set uploaded (by Krinkle):
Touch mw-set-env-qunit.sh to test T87843

https://gerrit.wikimedia.org/r/187477

Patch-For-Review

Change 187477 merged by jenkins-bot:
Touch mw-set-env-qunit.sh to test T87843

https://gerrit.wikimedia.org/r/187477

Krinkle claimed this task.

https://gerrit.wikimedia.org/r/187331 has been cherry-picked to integration-puppetmaster. I've done two deployments since then and files appear to be created in a way that is readable by Jenkins. Closing for now.

Change 187331 merged by BBlack:
fix git::clone umask issues T87843

https://gerrit.wikimedia.org/r/187331

Today I re-created all integration slaves and the same problem pops up on all of them.

integration-slave1201.eqiad.wmflabs
integration-slave1202.eqiad.wmflabs
integration-slave1203.eqiad.wmflabs
integration-slave1204.eqiad.wmflabs
integration-slave1401.eqiad.wmflabs
integration-slave1402.eqiad.wmflabs
integration-slave1403.eqiad.wmflabs
integration-slave1404.eqiad.wmflabs
integration-slave1405.eqiad.wmflabs
.. integration-slave1401.eqiad.wmflabs ..
/srv/deployment/integration/slave-scripts/bin/mw-teardown.sh: Permission denied
[23:55 UTC] root at integration-slave1201.eqiad.wmflabs in /
/srv
drwxr-xr-x  4 root           root 4.0K Feb 26 14:24 ./
drwx------  3 root           root 4.0K Feb 26 14:25 deployment/
drwxr-xr-x  4 jenkins-deploy root 4.0K Feb 26 14:18 localhost/

/srv/deployment
drwx------ 3 root root 4.0K Feb 26 14:25 ./
drwx------ 8 root root 4.0K Feb 26 14:25 integration/

/srv/deployment/integration
drwx------ 8 root root 4.0K Feb 26 14:25 ./
drwxr-xr-x 4 root root 4.0K Feb 26 14:25 composer/
drwxr-xr-x 4 root root 4.0K Feb 26 14:25 mediawiki-tools-codesniffer/
drwxr-xr-x 4 root root 4.0K Feb 26 14:25 phpcs/
drwxr-xr-x 9 root root 4.0K Feb 26 14:25 slave-scripts/
..

/srv/deployment/integration/slave-scripts
drwxr-xr-x 9 root root 4.0K Feb 26 14:25 .git/
drwxr-xr-x 2 root root 4.0K Feb 26 14:25 bin/
..

/srv/deployment/integration/slave-scripts/bin
-rwxr-xr-x 1 root root  680 Feb 26 14:25 mw-install-sqlite.sh*
-rwxr-xr-x 1 root root  177 Feb 26 14:25 mw-teardown.sh*
..

Change 193303 had a related patch set uploaded (by Krinkle):
contint: Create /srv/deployment as 755 instead of 700

https://gerrit.wikimedia.org/r/193303

Change 193303 merged by BBlack:
contint: Create /srv/deployment as 755 instead of 700

https://gerrit.wikimedia.org/r/193303