Page MenuHomePhabricator

"/usr/local/bin/zuul-cloner" broken on new instances
Closed, ResolvedPublic

Description

Existing instances: https://integration.wikimedia.org/ci/job/mediawiki-core-qunit-karma/878/consoleFull

00:55:37 Building remotely on integration-slave1007 (phpflavor-hhvm contintLabsSlave UbuntuTrusty) in workspace /mnt/jenkins-workspace/workspace/mediawiki-core-qunit-karma
00:55:37 [mediawiki-core-qunit-karma] $ /bin/bash -xe /tmp/hudson1117254930415673357.sh
00:55:37 + /usr/local/bin/zuul-cloner --version
00:55:38 Zuul version: 2.0.0.316.gd63faae

New instance: https://integration.wikimedia.org/ci/job/mediawiki-core-qunit-karma/876/console

00:49:22 Building remotely on integration-slave1401 (krinkle) in workspace /mnt/jenkins-workspace/workspace/mediawiki-core-qunit-karma
00:49:22 [mediawiki-core-qunit-karma] $ /bin/bash -xe /tmp/hudson4701236895335413744.sh
00:49:22 + /usr/local/bin/zuul-cloner --version
00:49:22 /tmp/hudson4701236895335413744.sh: line 2: /usr/local/bin/zuul-cloner: No such file or directory
00:49:22 Build step 'Execute shell' marked build as failure

@hashar Can you look into why this is no longer being installed properly? It seems /usr/local/src/zuul does exist on the new instances, but jobs are looking for /usr/local/bin/zuul-cloner.

Details

Event Timeline

Krinkle assigned this task to hashar.
Krinkle raised the priority of this task from to Unbreak Now!.
Krinkle updated the task description. (Show Details)
Krinkle added subscribers: Krinkle, hashar.

Weird. It exists on integration-slave1405 but not integration-slave1401.

Krinkle set Security to None.

Looks like the disk space on integration-puppetmaster was the immediate cause.

Due to integration-puppetmaster being low on space, the puppet run stopped between setting up the /usr/local/src/zuul clone and installing it as /usr/local/bin/zuul-cloner.

When I purged puppetmaster space manually, and removed /usr/local/src/zuul on integration-slave1401, and re-ran sudo /usr/local/sbin/puppet-run, it re-created it and this time installed properly.

So there's a bug in our puppet manifest where zuul is not being ensured to be installed. It only ensures the git repo is created and does a one-time install afterwards that not actually succeed.

It created it but with the wrong permissions.

$ dsh-ci-slaves 'ls -l /usr/local/bin/zuul-cloner'
integration-slave1401.eqiad.wmflabs: -rwx------ 1 root root 154 Mar  2 15:56 /usr/local/bin/zuul-cloner
integration-slave1402.eqiad.wmflabs: -rwx------ 1 root root 154 Mar  2 15:53 /usr/local/bin/zuul-cloner
integration-slave1403.eqiad.wmflabs: -rwx------ 1 root root 154 Mar  2 15:42 /usr/local/bin/zuul-cloner
integration-slave1405.eqiad.wmflabs: -rwx------ 1 root root 154 Mar  2 15:42 /usr/local/bin/zuul-cloner
integration-slave1404.eqiad.wmflabs: -rwx------ 1 root root 154 Mar  2 15:50 /usr/local/bin/zuul-cloner

Compared to old slaves:

integration-slave1010.eqiad.wmflabs$ ls -l /usr/local/bin/zuul-cloner
-rwxr-xr-x 1 root root 154 Feb 20 01:21 /usr/local/bin/zuul-cloner

Next error:

00:00:00.024 Building remotely on integration-slave1401
00:00:00.041 + /usr/local/bin/zuul-cloner --version
00:00:00.057 Traceback (most recent call last):
00:00:00.058   File "/usr/local/bin/zuul-cloner", line 6, in <module>
00:00:00.058     from zuul.cmd.cloner import main
00:00:00.058 ImportError: No module named zuul.cmd.cloner
00:00:00.065 Build step 'Execute shell' marked build as failure

Change 193836 had a related patch set uploaded (by Krinkle):
zuul: Use umask 022 for installing zuul

https://gerrit.wikimedia.org/r/193836

Krinkle renamed this task from "/usr/local/bin/zuul-cloner: No such file or directory" on new instances to "/usr/local/bin/zuul-cloner" broken on new instances.Mar 2 2015, 4:38 PM

/usr/local/src/zuul is a local git clone made by puppet. The files and modules are then installed via setup.py under /usr/local/lib/python2.7/dist-packages .

The umask error screwed up the modules permissions in /usr/local/lib/python2.7/dist-packages

Should be fixed with:

sudo -s
find /usr/local/lib/python2.7/dist-packages -type d -exec chmod 2755 {} \;
find /usr/local/lib/python2.7/dist-packages -type f -exec chmod 0444 {} \;

Ah, okay. After I applied the umask fix, I removed /usr/local/src/zuul to let puppet re-create it. But didn't know about /usr/local/lib/python2.7.

Ran the following to give puppet another chance:

$ dsh-ci-slaves 'sudo rm -rf /usr/local/src/zuul /usr/local/bin/zuu* /usr/local/lib/python2.7/dist-packages/*'

It's now installed on Trusty instances, but breaking on Precise still.

..
Mar  2 21:56:21 integration-slave1401 puppet-agent[8049]: (/Stage[main]/Contint::Packages::Labs/Package[tox]/ensure) created
Mar  2 21:56:22 integration-slave1401 puppet-agent[8049]: (/Stage[main]/Ldap::Client::Utils/File[/usr/local/lib/python2.7/dist-packages/ldapsupportlib.py]/ensure) defined content as '{md5}e020cf3d0c0650317cf6ed989276f16a'
..
Mar  2 21:56:40 integration-slave1401 puppet-agent[8049]: (/Stage[main]/Zuul/Git::Clone[integration/zuul]/File[/usr/local/src/zuul]/ensure) created
Mar  2 21:56:41 integration-slave1401 puppet-agent[8049]: (/Stage[main]/Zuul/Git::Clone[integration/zuul]/Exec[git_clone_integration/zuul]/returns) executed successfully
Mar  2 21:56:41 integration-slave1401 puppet-agent[8049]: (Git::Clone[integration/zuul]) Scheduling refresh of Exec[install_zuul]
Mar  2 21:56:41 integration-slave1401 crontab[8929]: (root) LIST (root)
Mar  2 21:56:43 integration-slave1401 puppet-agent[8049]: (/Stage[main]/Role::Labs::Lvm::Mnt/Labs_lvm::Volume[second-local-disk]/Labs_lvm::Extend[/mnt]/Exec[extend-vd-/mnt]/returns) executed successfully
Mar  2 21:56:51 integration-slave1401 puppet-agent[8049]: (/Stage[main]/Zuul/Exec[install_zuul]) Triggered 'refresh' from 1 events
Mar  2 21:56:52 integration-slave1401 puppet-agent[8049]: Finished catalog run in 46.72 seconds
Mar  2 21:47:51 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Role::Labs::Instance/Notify[instanceproject: integration]/message) defined 'message' as 'instanceproject: integration'
Mar  2 21:47:57 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Contint::Packages::Labs/Package[tox]/ensure) created
Mar  2 21:47:58 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Ldap::Client::Utils/File[/usr/local/lib/python2.7/dist-packages/ldapsupportlib.py]/ensure) defined content as '{md5}e020cf3d0c0650317cf6ed989276f16a'
..
Mar  2 21:48:11 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Zuul/Git::Clone[integration/zuul]/File[/usr/local/src/zuul]/ensure) created
Mar  2 21:48:12 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Zuul/Git::Clone[integration/zuul]/Exec[git_clone_integration/zuul]/returns) executed successfully
Mar  2 21:48:12 integration-slave1204 puppet-agent[5089]: (Git::Clone[integration/zuul]) Scheduling refresh of Exec[install_zuul]
Mar  2 21:48:13 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Role::Labs::Lvm::Mnt/Labs_lvm::Volume[second-local-disk]/Labs_lvm::Extend[/mnt]/Exec[extend-vd-/mnt]/returns) executed successfully
Mar  2 21:48:20 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Zuul/Exec[install_zuul]/returns) running install
Mar  2 21:48:20 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Zuul/Exec[install_zuul]/returns) Requirement already satisfied (use --upgrade to upgrade): pbr>=0.5.21,<1.0 in /usr/lib/python2.7/dist-packages
Mar  2 21:48:20 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Zuul/Exec[install_zuul]/returns) Requirement already satisfied (use --upgrade to upgrade): PyYAML>=3.1.0 in /usr/lib/python2.7/dist-packages
Mar  2 21:48:20 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Zuul/Exec[install_zuul]/returns) Requirement already satisfied (use --upgrade to upgrade): Paste in /usr/lib/python2.7/dist-packages
Mar  2 21:48:20 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Zuul/Exec[install_zuul]/returns) Requirement already satisfied (use --upgrade to upgrade): WebOb in /usr/lib/python2.7/dist-packages
Mar  2 21:48:20 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Zuul/Exec[install_zuul]/returns) Requirement already satisfied (use --upgrade to upgrade): paramiko in /usr/lib/python2.7/dist-packages
Mar  2 21:48:20 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Zuul/Exec[install_zuul]/returns) Requirement already satisfied (use --upgrade to upgrade): GitPython==0.3.2.RC1 in /usr/lib/pymodules/python2.7
Mar  2 21:48:20 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Zuul/Exec[install_zuul]/returns) Requirement already satisfied (use --upgrade to upgrade): lockfile>=0.8 in /usr/lib/python2.7/dist-packages
Mar  2 21:48:20 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Zuul/Exec[install_zuul]/returns) Requirement already satisfied (use --upgrade to upgrade): python-daemon<2.0 in /usr/lib/python2.7/dist-packages
Mar  2 21:48:20 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Zuul/Exec[install_zuul]/returns) Requirement already satisfied (use --upgrade to upgrade): extras in /usr/lib/python2.7/dist-packages
Mar  2 21:48:20 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Zuul/Exec[install_zuul]/returns) Downloading/unpacking statsd>=1.0.0,<3.0
Mar  2 21:48:20 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Zuul/Exec[install_zuul]/returns)   Cannot fetch index base URL http://pypi.python.org/simple/
Mar  2 21:48:20 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Zuul/Exec[install_zuul]/returns)   Could not find any downloads that satisfy the requirement statsd>=1.0.0,<3.0
Mar  2 21:48:20 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Zuul/Exec[install_zuul]/returns) No distributions at all found for statsd>=1.0.0,<3.0
Mar  2 21:48:20 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Zuul/Exec[install_zuul]/returns) Storing complete log in /root/.pip/pip.log
Mar  2 21:48:20 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Zuul/Exec[install_zuul]/returns) error: /usr/bin/python -m pip.__init__ install   'pbr>=0.5.21,<1.0' 'PyYAML>=3.1.0' 'Paste' 'WebOb' 'paramiko' 'GitPython==0.3.2.RC1' 'lockfile>=0.8' 'python-daemon<2.0' 'extras' 'statsd>=1.0.0,<3.0' 'voluptuous>=0.7' 'gear>=0.5.4,<1.0.0' 'apscheduler>=2.1.1,<3.0' 'PrettyTable>=0.6,<0.8' 'babel>=1.0' 'six>=1.6.0' returned 1
Mar  2 21:48:20 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Zuul/Exec[install_zuul]) Failed to call refresh: python setup.py install returned 1 instead of one of [0]
Mar  2 21:48:20 integration-slave1204 puppet-agent[5089]: (/Stage[main]/Zuul/Exec[install_zuul]) python setup.py install returned 1 instead of one of [0]
Mar  2 21:48:21 integration-slave1204 puppet-agent[5089]: Finished catalog run in 32.73 seconds

Change 193836 merged by BBlack:
zuul: Use umask 022 for installing zuul

https://gerrit.wikimedia.org/r/193836

I will finish up the fixage of the half backed / broken / terrible installation process. The middle term way to fix it is to have Zuul packaged which I am working on.

Zuul requires python-statsd < 0.3 but we bumped the Debian package to 0.3+ hence the failure

I just did the install using pip off of pypi on all Precise install using:

sudo -s
cd /usr/local/src/zuul
python setup.py install

That installed the dependencies under /usr/local/lib/python2.7/dist-packages and got zuul-cloner to work.

The mess will be cleared out whenever I have Zuul packaged.