Exception while launching job: TypeError: 'int' object has no attribute '__getitem__'
Closed, ResolvedPublic

Description

2018-02-03 03:49:38,038 ERROR zuul.DependentPipelineManager: Exception while launching job mediawiki-phpunit-php55-jessie for change <Change 0x7f4308053610 407165,3>:
Traceback (most recent call last):
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/scheduler.py", line 1518, in _launchJobs
    dependent_items)
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/launcher/gearman.py", line 318, in launch
    destination_path = os.path.join(item.change.getBasePath(),
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/model.py", line 904, in getBasePath
    self.number[-2:], self.number, self.patchset)
TypeError: 'int' object has no attribute '__getitem__'
2018-02-03 03:49:38,038 INFO zuul.Gearman: Launch job mediawiki-core-php70-phan-docker (uuid: d2f0a5b29ea84206a4d17842be5294f1) for change <Change 0x7f4308053610 407165,3> with dependent changes []
2018-02-03 03:49:38,038 ERROR zuul.DependentPipelineManager: Exception while launching job mediawiki-core-php70-phan-docker for change <Change 0x7f4308053610 407165,3>:
Traceback (most recent call last):
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/scheduler.py", line 1518, in _launchJobs
    dependent_items)
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/launcher/gearman.py", line 318, in launch
    destination_path = os.path.join(item.change.getBasePath(),
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/model.py", line 904, in getBasePath
    self.number[-2:], self.number, self.patchset)
TypeError: 'int' object has no attribute '__getitem__'

That happens whenever a change having Depends-On enters gate-and-submit. Zuul does a query to Gerrit to find whether the dependent changes are still open, if so Gerrit emits a json payload having "number": 123456 which python consider an int. Zuul then tries to do a string operation ([-2]:]) on the int which causes the TypeError exception.

https://review.openstack.org/#/c/433748/ fixed the general use case code path but has left behind the Depends-On code path.

Workaround

CR+2 the reverse dependencies first and wait for the change to be merged/closed

Then CR+2 the change having dependencies once they got all merged.

Legoktm created this task.Feb 3 2018, 3:50 AM
Legoktm triaged this task as Unbreak Now! priority.
Restricted Application added subscribers: Liuxinyu970226, Jay8g, TerraCodes, Aklapper. · View Herald TranscriptFeb 3 2018, 3:50 AM

It appears that this was caused specifically by change 407165,3. I restarted zuul (dropping the entire queue), and things look back to normal now.

Paladox lowered the priority of this task from Unbreak Now! to High.Feb 3 2018, 12:11 PM
Paladox added a subscriber: Paladox.

Per "I restarted zuul (dropping the entire queue), and things look back to normal now."

Oh i see what happened here.

We didn't test using Depends-On: Ie7073f2048ba2b79a8b36ad913453008ec3555ce . so we missed that bug.

hmm that works for me on my test site.

Paladox raised the priority of this task from High to Unbreak Now!.Feb 6 2018, 12:15 AM

Broke again

[00:14:32] <legoktm> no_justification: paladox: thcipriani: the stuck TemplateStyles patch is the int/str zuul thing

This is currently affecting 393285,5 / TemplateStyles. Luckily it's in the test queue and not blocking all merges, but still. Same traceback. :/

Paladox added a subscriber: hashar.Feb 6 2018, 12:22 AM

cc @hashar i think we need to cover self.number with str so it's str(self.number[-2:])? and str(self.number)

hashar added a comment.EditedFeb 6 2018, 3:50 PM

That is what https://review.openstack.org/#/c/433748/ is supposed to fix. Namely Gerrit sends the change number as an integer which later caused self.number[-2:] to fail.

It is the latest patch in the list of our cherry picks (branch: patch-queue/debian/jessie-wikimedia)

It is included in the packaging branch debian/jessie-wikimedia as the patch debian/patches/0014-fix-gerrit-2-14-support.patch

That should be in the Zuul 2.5.1-wmf1 Debian package. I have confirmed on contint1001 that the source file has the code.

This comment was removed by Paladox.

@hashar it seems change number is converted to a string in zuul, but patch number isen't. we had this problem in wikibugs which @Legoktm fixed in https://gerrit.wikimedia.org/r/c/407887/1/grrrrit.py

event.patch_number = patchset.get('number') > event.patch_number = str(patchset.get('number'))

hashar added a comment.EditedFeb 6 2018, 4:10 PM

When a Depends-On header is set, Zuul queries Gerrit for open changes for the project which eventually invokes:

zuul/source/gerrit.py
def getProjectOpenChanges(self, project):
    # This is a best-effort function in case Gerrit is unable to return
    # a particular change.  It happens.
    query = "project:%s status:open" % (project.name,)
    self.log.debug("Running query %s to get project open changes" %
                   (query,))
    data = self.connection.simpleQuery(query)
    changes = []
    for record in data:
        try:
            changes.append(
                self._getChange(record['number'],
                                record['currentPatchSet']['number']))
        except Exception:
            self.log.exception("Unable to query change %s" %
                               (record.get('number'),))
    return changes

Note how it creates a Changeish object straight from the Gerrit json output using record['number'] (that is the change number) and record['currentPatchSet']['number'] (the patchset number, not a cause of the issue).

$ gerrit query --format json --current-patch-set "project:integration/config is:open" limit:1|head -n1 |jq .
{
  "project": "integration/config",
  "url": "https://gerrit.wikimedia.org/r/408368",
  "number": 408368,
...

And here we have. A json integer to describe the change number, which escalates to the same issue https://review.openstack.org/#/c/433748/ fixed.

record['currentPatchSet']['number'] should be str(record['currentPatchSet']['number']) too? and str(record['number']) ?

Mentioned in SAL (#wikimedia-releng) [2018-02-06T19:25:19Z] <hashar> Restarted Zuul due to T186381

Mentioned in SAL (#wikimedia-operations) [2018-02-06T19:25:24Z] <hashar> Restarted Zuul due to T186381

Happened again, this time with 408295. Both by me, but otherwise unconnected (different repos), so not sure what's triggering the breakage.

hashar updated the task description. (Show Details)Feb 6 2018, 7:57 PM

That happens whenever a change having Depends-On enters gate-and-submit. Zuul does a query to Gerrit to find whether the dependent changes are still open, if so Gerrit emits a json payload having "number": 123456 which python consider an int. Zuul then tries to do a string operation ([-2]:]) on the int which causes the TypeError exception.

https://review.openstack.org/#/c/433748/ fixed the general use case code path but has left behind the Depends-On code path.

Workaround

CR+2 the reverse dependencies first and wait for the change to be merged/closed

Then CR+2 the change having dependencies once they got all merged.

hashar added a comment.Feb 6 2018, 8:39 PM

I am hunting the code path used by Zuul when it processes the json. My plan to cast it to a string as close as possible from the original source.

Mentioned in SAL (#wikimedia-operations) [2018-02-06T21:14:06Z] <legoktm> restarted zuul due to patch being stuck (T186381)

Mentioned in SAL (#wikimedia-releng) [2018-02-06T21:14:21Z] <legoktm> restarted zuul due to patch being stuck (T186381)

Change 408630 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/zuul@patch-queue/debian/jessie-wikimedia] wmf: change number must be a string when formatting

https://gerrit.wikimedia.org/r/408630

Change 408686 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/zuul@debian/jessie-wikimedia] 2.5.1-wmf2: fix ChangeIsh.basePath

https://gerrit.wikimedia.org/r/408686

Mentioned in SAL (#wikimedia-releng) [2018-02-06T21:41:05Z] <hashar> Rebuilding Zuul package to hotfix T186381

Mentioned in SAL (#wikimedia-operations) [2018-02-06T21:41:35Z] <hashar> Going to shutdown Zuul in a few for an emergency hotfix | T186381

hashar added a comment.Feb 6 2018, 9:45 PM

https://people.wikimedia.org/~hashar/debs/zuul_2.5.1-wmf2/

$ debdiff zuul_2.5.1-wmf1_amd64.deb zuul_2.5.1-wmf2_amd64.deb
[The following lists of changes regard files as different if they have
different names, permissions or owners.]

Files in second .deb but not in first

-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf2-py2.7.egg-info/PKG-INFO
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf2-py2.7.egg-info/SOURCES.txt
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf2-py2.7.egg-info/dependency_links.txt
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf2-py2.7.egg-info/entry_points.txt
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf2-py2.7.egg-info/installed-files.txt
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf2-py2.7.egg-info/not-zip-safe
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf2-py2.7.egg-info/requires.txt
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf2-py2.7.egg-info/top_level.txt

Files in first .deb but not in second

-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf1-py2.7.egg-info/PKG-INFO
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf1-py2.7.egg-info/SOURCES.txt
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf1-py2.7.egg-info/dependency_links.txt
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf1-py2.7.egg-info/entry_points.txt
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf1-py2.7.egg-info/installed-files.txt
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf1-py2.7.egg-info/not-zip-safe
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf1-py2.7.egg-info/requires.txt
-rw-r--r-- root/root /usr/share/python/zuul/lib/python2.7/site-packages/zuul-2.5.1_wmf1-py2.7.egg-info/top_level.txt

Control files: lines which differ (wdiff format)

Version: [-2.5.1-wmf1-] {+2.5.1-wmf2+}

Mentioned in SAL (#wikimedia-operations) [2018-02-06T21:49:50Z] <hashar> Flushing Zuul queue and upgrading to zuul_2.5.1-wmf2 | T186381

hashar lowered the priority of this task from Unbreak Now! to High.Feb 6 2018, 10:07 PM
hashar claimed this task.
hashar removed a project: Patch-For-Review.

Posted a quick note on wikitech-l. I will write an incident report later on.

Change 408630 merged by Hashar:
[integration/zuul@patch-queue/debian/jessie-wikimedia] wmf: change number must be a string when formatting

https://gerrit.wikimedia.org/r/408630

Change 408686 merged by jenkins-bot:
[integration/zuul@debian/jessie-wikimedia] 2.5.1-wmf2: fix ChangeIsh.basePath

https://gerrit.wikimedia.org/r/408686

Different issue having the same root cause (patchset number changed to an integer): T187567

Change 411466 had a related patch set uploaded (by Thcipriani; owner: Thcipriani):
[integration/zuul@patch-queue/debian/jessie-wikimedia] WIP: ensure that Change.number is a string

https://gerrit.wikimedia.org/r/411466

hashar closed this task as Resolved.Mar 26 2018, 4:37 PM

Change 411466 abandoned by Thcipriani:
WIP: ensure that Change.number is a string

https://gerrit.wikimedia.org/r/411466