Page MenuHomePhabricator

Wikipedia Android CI tests are failing
Closed, ResolvedPublic


Hello! We we're seeing errors this morning on a job that nearly never fails. I think it was an issue with integration-slave-precise-1004 specifically. Example error: .

17:47:07 [tox-flake8] $ /bin/bash -xe /tmp/
17:47:07 + rm -fR log
17:47:07 + mkdir -p log
17:47:07 + set -o pipefail
17:47:07 + tee log/flake8.log
17:47:07 + PY_COLORS=1
17:47:07 + tox -v -e flake8
17:47:07 Traceback (most recent call last):
17:47:07   File "/usr/local/bin/tox", line 5, in <module>
17:47:07     from pkg_resources import load_entry_point
17:47:07   File "/usr/lib/python2.7/dist-packages/", line 2707, in <module>
17:47:07     working_set.require(__requires__)
17:47:07   File "/usr/lib/python2.7/dist-packages/", line 686, in require
17:47:07     needed = self.resolve(parse_requirements(requirements))
17:47:07   File "/usr/lib/python2.7/dist-packages/", line 584, in resolve
17:47:07     raise DistributionNotFound(req)
17:47:07 pkg_resources.DistributionNotFound: py>=1.4.17

Event Timeline

Niedzielski raised the priority of this task from to High.
Niedzielski updated the task description. (Show Details)
Niedzielski added subscribers: Niedzielski, Paladox.

Regarding tests "not being executed": that's actually not true (thankfully), things are just really slow (sadly) due to the php5.3 -> 5.5 change. See:

greg renamed this task from Wikipedia Android CI tests are failing or not running to Wikipedia Android CI tests are failing.Feb 10 2016, 9:14 PM
greg updated the task description. (Show Details)
greg set Security to None.

@Niedzielski should we close this since you said it passed now.

Niedzielski claimed this task.

@Paladox, I haven't seen the tests run on the 1004 server yet but I'll open a new issue if they fail again. Thanks! :)

@Niedzielski oh so the tests are meant to run on 1004 server. I'm not sure if the tests are being moved around servers because of php5.5.

@Paladox, they aren't tied to 1004 but can execute on there. Based on the past few failures, it seemed to be specific to that server and I haven't seen them pass on it lately.

@Niedzielski did you notice in the last 3 days that it wasent running on 1004. It may be because of php 5.5 migration. On mediawiki/core we switched from php 5.3 to 5.5 and caused everything to slow down because all the processing power was going to php 5.3 which did not need it so krinkle and hashar were free them off so they could be added to php5.5.

But I think that it will run on any server because they are tied to increase performance.

That is definitely the same as T110506. There is a puppet patch but it does not properly --upgrade setuptools. Gahhhh

The fix is to run:

sudo pip install --upgrade setuptools

I did so on the four Precise slaves I have provisioned.

All the tox jobs are supposed to be on Jessie nowadays ( was T119141 ). Looks like some are still on Precise and will need to be migrated.

@hashar, I believe the issue is here[0] but I don't know what implications changing to Jessie would have. How can we get this fixed? Should I open a new issue to track this?


Legoktm claimed this task.

Change 269893 had a related patch set uploaded (by Legoktm):
Switch apps/android/wikipedia to use tox-jessie

Change 269893 merged by jenkins-bot:
Switch apps/android/wikipedia to use tox-jessie

So the root cause was definitely Precise nodes not upgrading setuptools properly due to puppet/mistake whatever. The proper course of action is indeed to migrate to Jessie and specially Nodepool instances which @Legoktm did.

The rest of tox jobs still running on Precise/Trusty will all be migrated to Jessie as part of T126532.

Thank you to have taken the time to fill that bug.