Page MenuHomePhabricator

Exception while launching job: TypeError: 'int' object has no attribute '__getitem__'
Closed, ResolvedPublic

Description

2018-02-03 03:49:38,038 ERROR zuul.DependentPipelineManager: Exception while launching job mediawiki-phpunit-php55-jessie for change <Change 0x7f4308053610 407165,3>:
Traceback (most recent call last):
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/scheduler.py", line 1518, in _launchJobs
    dependent_items)
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/launcher/gearman.py", line 318, in launch
    destination_path = os.path.join(item.change.getBasePath(),
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/model.py", line 904, in getBasePath
    self.number[-2:], self.number, self.patchset)
TypeError: 'int' object has no attribute '__getitem__'
2018-02-03 03:49:38,038 INFO zuul.Gearman: Launch job mediawiki-core-php70-phan-docker (uuid: d2f0a5b29ea84206a4d17842be5294f1) for change <Change 0x7f4308053610 407165,3> with dependent changes []
2018-02-03 03:49:38,038 ERROR zuul.DependentPipelineManager: Exception while launching job mediawiki-core-php70-phan-docker for change <Change 0x7f4308053610 407165,3>:
Traceback (most recent call last):
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/scheduler.py", line 1518, in _launchJobs
    dependent_items)
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/launcher/gearman.py", line 318, in launch
    destination_path = os.path.join(item.change.getBasePath(),
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/model.py", line 904, in getBasePath
    self.number[-2:], self.number, self.patchset)
TypeError: 'int' object has no attribute '__getitem__'

That happens whenever a change having Depends-On enters gate-and-submit. Zuul does a query to Gerrit to find whether the dependent changes are still open, if so Gerrit emits a json payload having "number": 123456 which python consider an int. Zuul then tries to do a string operation ([-2]:]) on the int which causes the TypeError exception.

https://review.openstack.org/#/c/433748/ fixed the general use case code path but has left behind the Depends-On code path.

Workaround

CR+2 the reverse dependencies first and wait for the change to be merged/closed

Then CR+2 the change having dependencies once they got all merged.

Event Timeline

Legoktm triaged this task as Unbreak Now! priority.Feb 3 2018, 3:50 AM
Legoktm created this task.

It appears that this was caused specifically by change 407165,3. I restarted zuul (dropping the entire queue), and things look back to normal now.

Paladox lowered the priority of this task from Unbreak Now! to High.Feb 3 2018, 12:11 PM
Paladox added a subscriber: Paladox.

Per "I restarted zuul (dropping the entire queue), and things look back to normal now."

Oh i see what happened here.

We didn't test using Depends-On: Ie7073f2048ba2b79a8b36ad913453008ec3555ce . so we missed that bug.

hmm that works for me on my test site.

Paladox raised the priority of this task from High to Unbreak Now!.Feb 6 2018, 12:15 AM

Broke again

[00:14:32] <legoktm> no_justification: paladox: thcipriani: the stuck TemplateStyles patch is the int/str zuul thing

This is currently affecting 393285,5 / TemplateStyles. Luckily it's in the test queue and not blocking all merges, but still. Same traceback. :/

cc @hashar i think we need to cover self.number with str so it's str(self.number[-2:])? and str(self.number)

That is what https://review.openstack.org/#/c/433748/ is supposed to fix. Namely Gerrit sends the change number as an integer which later caused self.number[-2:] to fail.

It is the latest patch in the list of our cherry picks (branch: patch-queue/debian/jessie-wikimedia)

It is included in the packaging branch debian/jessie-wikimedia as the patch debian/patches/0014-fix-gerrit-2-14-support.patch

That should be in the Zuul 2.5.1-wmf1 Debian package. I have confirmed on contint1001 that the source file has the code.

This comment was removed by Paladox.

@hashar it seems change number is converted to a string in zuul, but patch number isen't. we had this problem in wikibugs which @Legoktm fixed in https://gerrit.wikimedia.org/r/c/407887/1/grrrrit.py

event.patch_number = patchset.get('number') > event.patch_number = str(patchset.get('number'))

When a Depends-On header is set, Zuul queries Gerrit for open changes for the project which eventually invokes:

zuul/source/gerrit.py
def getProjectOpenChanges(self, project):
    # This is a best-effort function in case Gerrit is unable to return
    # a particular change.  It happens.
    query = "project:%s status:open" % (project.name,)
    self.log.debug("Running query %s to get project open changes" %
                   (query,))
    data = self.connection.simpleQuery(query)
    changes = []
    for record in data:
        try:
            changes.append(
                self._getChange(record['number'],
                                record['currentPatchSet']['number']))
        except Exception:
            self.log.exception("Unable to query change %s" %
                               (record.get('number'),))
    return changes

Note how it creates a Changeish object straight from the Gerrit json output using record['number'] (that is the change number) and record['currentPatchSet']['number'] (the patchset number, not a cause of the issue).

$ gerrit query --format json --current-patch-set "project:integration/config is:open" limit:1|head -n1 |jq .
{
  "project": "integration/config",
  "url": "https://gerrit.wikimedia.org/r/408368",
  "number": 408368,
...

And here we have. A json integer to describe the change number, which escalates to the same issue https://review.openstack.org/#/c/433748/ fixed.

record['currentPatchSet']['number'] should be str(record['currentPatchSet']['number']) too? and str(record['number']) ?

Happened again, this time with 408295. Both by me, but otherwise unconnected (different repos), so not sure what's triggering the breakage.

That happens whenever a change having Depends-On enters gate-and-submit. Zuul does a query to Gerrit to find whether the dependent changes are still open, if so Gerrit emits a json payload having "number": 123456 which python consider an int. Zuul then tries to do a string operation ([-2]:]) on the int which causes the TypeError exception.

https://review.openstack.org/#/c/433748/ fixed the general use case code path but has left behind the Depends-On code path.

Workaround

CR+2 the reverse dependencies first and wait for the change to be merged/closed

Then CR+2 the change having dependencies once they got all merged.

I am hunting the code path used by Zuul when it processes the json. My plan to cast it to a string as close as possible from the original source.

Mentioned in SAL (#wikimedia-operations) [2018-02-06T21:14:06Z] <legoktm> restarted zuul due to patch being stuck (T186381)

Mentioned in SAL (#wikimedia-releng) [2018-02-06T21:14:21Z] <legoktm> restarted zuul due to patch being stuck (T186381)

Change 408630 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/zuul@patch-queue/debian/jessie-wikimedia] wmf: change number must be a string when formatting

https://gerrit.wikimedia.org/r/408630

Change 408686 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/zuul@debian/jessie-wikimedia] 2.5.1-wmf2: fix ChangeIsh.basePath

https://gerrit.wikimedia.org/r/408686

Mentioned in SAL (#wikimedia-releng) [2018-02-06T21:41:05Z] <hashar> Rebuilding Zuul package to hotfix T186381

Mentioned in SAL (#wikimedia-operations) [2018-02-06T21:41:35Z] <hashar> Going to shutdown Zuul in a few for an emergency hotfix | T186381

https://people.wikimedia.org/~hashar/debs/zuul_2.5.1-wmf2/

$ debdiff zuul_2.5.1-wmf1_amd64.deb zuul_2.5.1-wmf2_amd64.deb
[The following lists of changes regard files as different if they have
different names, permissions or owners.]

Files in second .deb but not in first

-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf2-py2.7.egg-info/PKG-INFO
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf2-py2.7.egg-info/SOURCES.txt
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf2-py2.7.egg-info/dependency_links.txt
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf2-py2.7.egg-info/entry_points.txt
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf2-py2.7.egg-info/installed-files.txt
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf2-py2.7.egg-info/not-zip-safe
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf2-py2.7.egg-info/requires.txt
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf2-py2.7.egg-info/top_level.txt

Files in first .deb but not in second

-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf1-py2.7.egg-info/PKG-INFO
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf1-py2.7.egg-info/SOURCES.txt
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf1-py2.7.egg-info/dependency_links.txt
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf1-py2.7.egg-info/entry_points.txt
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf1-py2.7.egg-info/installed-files.txt
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf1-py2.7.egg-info/not-zip-safe
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf1-py2.7.egg-info/requires.txt
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf1-py2.7.egg-info/top_level.txt

Control files: lines which differ (wdiff format)

Version: [-2.5.1-wmf1-] {+2.5.1-wmf2+}

Mentioned in SAL (#wikimedia-operations) [2018-02-06T21:49:50Z] <hashar> Flushing Zuul queue and upgrading to zuul_2.5.1-wmf2 | T186381

hashar lowered the priority of this task from Unbreak Now! to High.
hashar removed a project: Patch-For-Review.

Posted a quick note on wikitech-l. I will write an incident report later on.

Change 408630 merged by Hashar:
[integration/zuul@patch-queue/debian/jessie-wikimedia] wmf: change number must be a string when formatting

https://gerrit.wikimedia.org/r/408630

Change 408686 merged by jenkins-bot:
[integration/zuul@debian/jessie-wikimedia] 2.5.1-wmf2: fix ChangeIsh.basePath

https://gerrit.wikimedia.org/r/408686

Change 411466 had a related patch set uploaded (by Thcipriani; owner: Thcipriani):
[integration/zuul@patch-queue/debian/jessie-wikimedia] WIP: ensure that Change.number is a string

https://gerrit.wikimedia.org/r/411466

Change 411466 abandoned by Thcipriani:
WIP: ensure that Change.number is a string

https://gerrit.wikimedia.org/r/411466