Page MenuHomePhabricator

Puppet broken on integration slaves: install_zuul
Closed, ResolvedPublic

Description

When provisioning integration-slave1005 with role::ci::slave::labs (see documentation), it fails on install_zuul:

..
(cut)
..
Notice: /Stage[main]/Contint::Packages/Package[php5-dev]/ensure: ensure changed 'purged' to 'present'
Notice: /Stage[main]/Zuul/Exec[install_zuul]/returns: running install
Notice: /Stage[main]/Zuul/Exec[install_zuul]/returns: Requirement already satisfied (use --upgrade to upgrade): pbr>=0.5.21,<1.0 in /usr/lib/python2.7/dist-packages
Notice: /Stage[main]/Zuul/Exec[install_zuul]/returns: Requirement already satisfied (use --upgrade to upgrade): PyYAML>=3.1.0 in /usr/lib/python2.7/dist-packages
Notice: /Stage[main]/Zuul/Exec[install_zuul]/returns: Requirement already satisfied (use --upgrade to upgrade): Paste in /usr/lib/python2.7/dist-packages
Notice: /Stage[main]/Zuul/Exec[install_zuul]/returns: Requirement already satisfied (use --upgrade to upgrade): WebOb in /usr/lib/python2.7/dist-packages
Notice: /Stage[main]/Zuul/Exec[install_zuul]/returns: Requirement already satisfied (use --upgrade to upgrade): paramiko in /usr/lib/python2.7/dist-packages
Notice: /Stage[main]/Zuul/Exec[install_zuul]/returns: Requirement already satisfied (use --upgrade to upgrade): GitPython==0.3.2.RC1 in /usr/lib/pymodules/python2.7
Notice: /Stage[main]/Zuul/Exec[install_zuul]/returns: Requirement already satisfied (use --upgrade to upgrade): lockfile>=0.8 in /usr/lib/python2.7/dist-packages
Notice: /Stage[main]/Zuul/Exec[install_zuul]/returns: Requirement already satisfied (use --upgrade to upgrade): python-daemon in /usr/lib/python2.7/dist-packages
Notice: /Stage[main]/Zuul/Exec[install_zuul]/returns: Requirement already satisfied (use --upgrade to upgrade): extras in /usr/lib/python2.7/dist-packages
Notice: /Stage[main]/Zuul/Exec[install_zuul]/returns: Downloading/unpacking statsd>=1.0.0,<3.0
Notice: /Stage[main]/Zuul/Exec[install_zuul]/returns:   Cannot fetch index base URL http://pypi.python.org/simple/
Notice: /Stage[main]/Zuul/Exec[install_zuul]/returns:   Could not find any downloads that satisfy the requirement statsd>=1.0.0,<3.0
Notice: /Stage[main]/Zuul/Exec[install_zuul]/returns: No distributions at all found for statsd>=1.0.0,<3.0
Notice: /Stage[main]/Zuul/Exec[install_zuul]/returns: Storing complete log in /root/.pip/pip.log
Notice: /Stage[main]/Zuul/Exec[install_zuul]/returns: error: /usr/bin/python -m pip.__init__ install   'pbr>=0.5.21,<1.0' 'PyYAML>=3.1.0' 'Paste' 'WebOb' 'paramiko' 'GitPython==0.3.2.RC1' 'lockfile>=0.8' 'python-daemon' 'extras' 'statsd>=1.0.0,<3.0' 'voluptuous>=0.7' 'gear>=0.5.4,<1.0.0' 'apscheduler>=2.1.1,<3.0' 'PrettyTable>=0.6,<0.8' 'babel>=1.0' 'six>=1.6.0' returned 1
Error: /Stage[main]/Zuul/Exec[install_zuul]: Failed to call refresh: python setup.py install returned 1 instead of one of [0]
Error: /Stage[main]/Zuul/Exec[install_zuul]: python setup.py install returned 1 instead of one of [0]
Notice: /Stage[main]/Apache::Mod::Rewrite/Apache::Mod_conf[rewrite]/Exec[ensure_present_mod_rewrite]/returns: executed successfully

Event Timeline

Krinkle raised the priority of this task from to Unbreak Now!.
Krinkle updated the task description. (Show Details)
Krinkle changed Security from none to None.
Krinkle subscribed.

Yeah I messed up with it on monday and haven't fixed it. The slaves install out of the labs branch which is obsolete. The master branch has a wrong commit. Additionally Zuul require python-statsd <3.0 but we now have 3.0.1 provided via deb package.

So in short: huge mess.

@hashar Which chang(es) in which repo do we revert to restore it to the way that worked? It's obviously not doing anything useful by being broken so I assume it's uncontroversial to revert and try again later.

I can't find anything in puppet. Look in open/merged commits and changes cherry-picked on integration-puppetmaster.

https://github.com/wikimedia/operations-puppet/blob/15f922fd/modules/zuul/manifests/init.pp#L72-L83:

exec { 'install_zuul':
  command     => 'python setup.py install',
  ..
  subscribe   => Git::Clone['integration/zuul'],
}

Or is it one of the commits at https://github.com/wikimedia/integration-zuul/commits/master ? Looks like master and labs branch are the same.

I have tagged most of my deploys. Reviewing them none would work out of the box unfortunately :-(

Tags from december were experiments to teach zuul-cloner how to use ref-updated events. They are ALL broken.

wmf-deploy-20141030-4 looks reasonably good. The requirements.txt file needs to be tweaked though! A bunch of dependencies are not available as Debian packages nor can we upgrade the Debian packages on the shared apt.wikimedia.org. I came up with the dependencies branch which provides the modules as tarballs.

First did that with tag wmf-deploy-20140924-1 which came up with:

- PrettyTable>=0.6,<0.8
- babel>=1.0
- six>=1.6.0
++## WMF hack: provided via tarballs
++##PrettyTable>=0.6,<0.8
++##babel>=1.0
++##six>=1.6.0
++
++# pytz is a dependency of Babel
++pytz-2014.4.tar.gz
++Babel-1.3.tar.gz
++prettytable-0.7.2.tar.bz2
++six-1.7.3.tar.gz

The deploys after were to update Zuul and pick my patches pending for review, still bringing the dependencies branch. Looks like when I did the merge commits I forgot to tweak the requirements.txt file to point to the tar ball. It went unnoticed since the dependencies have already been installed in september on all slaves.

So that is definitely a mess to sort out, as the potential to bring the Zuul server in a VERY bad shape.

I am unable to work on it tomorrow so potentially over the week-end I will:

  • create a new merge commit that brings dependencies, the patch I need and update requirements.txt
  • update master and labs to it
  • upgrade / restart zuul-server and zuul-merger on gallium
  • upgrade zuul on all slaves

Interestingly the upgrade needs to uninstall Zuul before installing it, that is because Zuul version system uses git describe which is not monotonic :-/

More long term: stop using git repo / pip to deploy zuul. Instead switch to a Debian package that would embed the modules we need and quilt the patches we want. We can install the packages locally by downloading the .deb and using dpkg -i. The debian packages can be build by Jenkins itself via jenkins-debian-glue.

I'm not sure what that has to do with the actual error. The error seems to be because of a dependency on a version of a package that does not exist (Could not find .. statsd>=1.0.0,<3.0).

However when git-cloning integration-zuul.git on integration-dev-precise manually and running sudo python setup.py install it finishes without any errors.

I'm also confused as to why, on integration-slave1001, the puppet run directly after the failed one has no trace of install_zuul. Should it not be trying again? Looks like it incorrectly satisfied the dependency.

As for the actual error, I think it's just a network timeout on the request for https://pypi.python.org/simple/ - which is quite large indeed.

Okay, so my suspicion is correct. integration-slave1001 has a clean puppet run but install_zuul was never finished and was actually causing jenkins-jobs that use zuul-cloner to fail due to zuul-cloner not existing.

When running the install command manually (including HTTP_PROXY=. HTTPS_PROXY=., which disable networking) it also fails on the same error:

# sudo HTTP_PROXY=. HTTPS_PROXY=. python setup.py install
running install
Requirement already satisfied (use --upgrade to upgrade): pbr>=0.5.21,<1.0 in /usr/lib/python2.7/dist-packages
Requirement already satisfied (use --upgrade to upgrade): PyYAML>=3.1.0 in /usr/lib/python2.7/dist-packages
Requirement already satisfied (use --upgrade to upgrade): Paste in /usr/lib/python2.7/dist-packages
Requirement already satisfied (use --upgrade to upgrade): WebOb in /usr/lib/python2.7/dist-packages
Requirement already satisfied (use --upgrade to upgrade): paramiko in /usr/lib/python2.7/dist-packages
Requirement already satisfied (use --upgrade to upgrade): GitPython==0.3.2.RC1 in /usr/lib/pymodules/python2.7
Requirement already satisfied (use --upgrade to upgrade): lockfile>=0.8 in /usr/lib/python2.7/dist-packages
Requirement already satisfied (use --upgrade to upgrade): python-daemon in /usr/lib/python2.7/dist-packages
Requirement already satisfied (use --upgrade to upgrade): extras in /usr/lib/python2.7/dist-packages
Downloading/unpacking statsd>=1.0.0,<3.0
  Cannot fetch index base URL http://pypi.python.org/simple/
  Could not find any downloads that satisfy the requirement statsd>=1.0.0,<3.0
No distributions at all found for statsd>=1.0.0,<3.0
Storing complete log in /home/krinkle/.pip/pip.log
error: /usr/bin/python -m pip.__init__ install   'pbr>=0.5.21,<1.0' 'PyYAML>=3.1.0' 'Paste' 'WebOb' 'paramiko' 'GitPython==0.3.2.RC1' 'lockfile>=0.8' 'python-daemon' 'extras' 'statsd>=1.0.0,<3.0' 'voluptuous>=0.7' 'gear>=0.5.4,<1.0.0' 'apscheduler>=2.1.1,<3.0' 'PrettyTable>=0.6,<0.8' 'babel>=1.0' 'six>=1.6.0' returned 1

So the package exists fine in PyPi, but because we don't allow fetching from external, it needs to be pre-installed via puppet or Debian package. I've re-ran the command without the proxy config (as it's on a labs instance anyway) and that worked fine.

However, since the current git master (or labs) branch of zuul broken (produces some error about invalid arguments, https://integration.wikimedia.org/ci/job/mwext-UploadWizard-qunit/691/console), I checked out out branch wmf-deploy-20141030-3 instead and ran the installer again (since that's what integration-slave1002 and integration-slave1003 are on as well). It's working fine now.

I won't touch the integration/zuul.git repo, but I'd recommend doing a git reset --hard origin/wmf-deploy-20141030-3` in the labs branch and pushing that. Whatever came after or in place of it is not used and doesn't work. Maybe experiment in a separate branch so that the labs branch remains deployable.

Krinkle lowered the priority of this task from Unbreak Now! to High.
Krinkle removed a project: Puppet.

On a new Trusty instance (e.g. integration-slave1005), the latest master of integration/zuul (wmf-deploy-20141208-2) actually installs without problems. So I guess that package is only missing on Precise.

However, while it installs without errors, the version is still the wrong. Job mediawiki-phpunit-hhvm fails with zuul-cloner: error: Can not mix change and refupdate parameters.

Trusty instances (integration-slave1007, integration-slave1009) seem to be at version wmf-deploy-20140924-1 (different from the Precise instances). I've installed that version on integration-slave1005 and updated the documentation.

I have cleaned up the mess: created a new tag wmf-deploy-20141221-1 and pushed master and labs to point to it.

It is based on upstream 685ca22 with:

  • requirements.txt tweaks
  • merge of the dependencies branch to get missing tarballs as modules
  • a commit that points requirements.txt to the tarballs
  • a couple patches I proposed

I have uninstalled Zuul on all labs instance and the two production boxes then reinstalled it. On gallium restarted zuul server and merger.

Listing tarballs in zuul requirements.txt caused the integration-config-tox-py27 Jenkins job to fail. That is because pip does not recognize the tarballs being listed there :-/

I have pushed a new commit serie tagged as wmf-deploy-20141221-2 which no more list the tarballs in the requirements.txt file. So the install procedure is:

pip install pytz-2014.4.tar.gz Babel-1.3.tar.gz prettytable-0.7.2.tar.bz2 six-1.7.3.tar.gz
HTTP_PROXY=. HTTPS_PROXY=. python setup.py install

I guess it is time to have Zuul to be in a Debian package which will hopefully solve that mess.

I think the puppet logic for this is now working for existing and new nodes. Re-open if otherwise.