Page MenuHomePhabricator

CI is running against parent patches, not the patches themselves for chained patches
Closed, ResolvedPublic

Description

In https://gerrit.wikimedia.org/r/#/c/411058/ we removed jscs but in the job it's running those https://integration.wikimedia.org/ci/job/mwgate-npm-node-6-docker/20518/console - you can see that it checked out the wrong patch too: HEAD is now at b6bb278... build: Bump devDependencies to latest

Other places I've potentially seen this:

Event Timeline

Legoktm triaged this task as Unbreak Now! priority.Feb 16 2018, 5:00 PM
Legoktm created this task.
Restricted Application added subscribers: Liuxinyu970226, Jay8g, TerraCodes, Aklapper. · View Herald TranscriptFeb 16 2018, 5:00 PM
hashar added a subscriber: hashar.Feb 16 2018, 5:04 PM

https://integration.wikimedia.org/ci/job/mwgate-npm-node-6-docker/20518/console got triggered with refs/zuul/master/Zcb21260c372a48d295ef55f57f2a1bb5 on contint1001. We can look at the reference on the zuul-merger repos:

git -C /srv/zuul/git/mediawiki/extensions/CentralAuth show b6bb278254b96f23beaac321461dac67be2ba8cb

Which yields b6bb278254b96f23beaac321461dac67be2ba8cb build: Bump devDependencies to latest

So at least the job is building the reference that is being passed to it. I believe the issue is on the zuul-merger side, supposedly it should have picked the patch from https://gerrit.wikimedia.org/r/#/c/411058/ merge that with other dependencies if any against the tip of the target branch (master).

The job got on 411058,1 in gate-and-submit which might matter. Though the job from the test pipeline at https://integration.wikimedia.org/ci/job/mwgate-npm-node-6-docker/20413/console shows the same issue :(

The commit that show up is from the parent change https://gerrit.wikimedia.org/r/#/c/411057/ which was still open.

So the chain was:

https://gerrit.wikimedia.org/r/#/c/411057/
https://gerrit.wikimedia.org/r/#/c/411058/

But the later did not get tested.

When the changes got a +2:

2018-02-16 05:51:01,357 DEBUG zuul.Merger: Merging for change 411057,1.
2018-02-16 05:51:01,357 DEBUG zuul.Merger: Processing refspec refs/changes/57/411057/1 for project mediawiki/extensions/CentralAuth / master ref Z398fe1e7c98b44ef94c03349e5a7834a
2018-02-16 05:51:01,450 DEBUG zuul.Merger: Found commit b6bb278254b96f23beaac321461dac67be2ba8cb for ref master/Z398fe1e7c98b44ef94c03349e5a7834a

b6bb278254 is the parent of the change.

That is good.

Then:

2018-02-16 05:51:01,450 DEBUG zuul.Merger: Merging for change 411058,1.
2018-02-16 05:51:01,450 DEBUG zuul.Merger: Processing refspec None for project mediawiki/extensions/CentralAuth / master ref Zcb21260c372a48d295ef55f57f2a1bb5
2018-02-16 05:51:01,487 DEBUG zuul.Merger: Unable to find commit for ref master/Zcb21260c372a48d295ef55f57f2a1bb5
2018-02-16 05:51:01,487 DEBUG zuul.Merger: Found base commit b6bb278254b96f23beaac321461dac67be2ba8cb for (u'mediawiki/extensions/CentralAuth', u'master')
2018-02-16 05:51:01,487 DEBUG zuul.Repo: Checking out b6bb278254b96f23beaac321461dac67be2ba8cb
2018-02-16 05:51:01,877 DEBUG zuul.Repo: Merging None with args ['-s', 'resolve', 'FETCH_HEAD']

That 'None' is suspicious'

2018-02-16 05:51:01,913 DEBUG zuul.Repo: CreateZuulRef master/Zcb21260c372a48d295ef55f57f2a1bb5 at b6bb278254b96f23beaac321461dac67be2ba8cb on <git.Repo "/srv/zuul/git/mediawiki/extensions/CentralAuth/.git">

That is the reference that got created.

2018-02-16 05:51:01,921 DEBUG zuul.Repo: CreateZuulRef master/Zcb21260c372a48d295ef55f57f2a1bb5 at 1ac24053d78be924f1a0b41f33ff767806629927 on <git.Repo "/srv/zuul/git/mediawiki/extensions/Calendar/.git">

Another one is created for Calendar that had a change at the same time.

And later still a refspec None:

2018-02-16 05:51:10,267 DEBUG zuul.Merger: Merging for change 411058,1.
2018-02-16 05:51:10,267 DEBUG zuul.Merger: Processing refspec None for project mediawiki/extensions/CentralAuth / master ref Zcb21260c372a48d295ef55f57f2a1bb5
2018-02-16 05:51:10,339 DEBUG zuul.Merger: Found commit b6bb278254b96f23beaac321461dac67be2ba8cb for ref master/Zcb21260c372a48d295ef55f57f2a1bb5

Change 411302 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] This change should have JJB diff job running

https://gerrit.wikimedia.org/r/411302

Change 411302 abandoned by Hashar:
This change should have JJB diff job running

https://gerrit.wikimedia.org/r/411302

Legoktm renamed this task from CI is running against parent patches, not the patches themselves to CI is running against parent patches, not the patches themselves for chained patches.Feb 16 2018, 5:33 PM

In the zuul-scheduler debug logs:

Checking for changes needed by <Change 0x7f764dbf8790 411058,1>:
  Change <Change 0x7f764dbf8790 411058,1> needs change <Change 0x7f764fdc0690 411057,1>:
  Needed change is already ahead in the queue
Preparing ref for: <Change 0x7f764dbf8790 411058,1>
Change <Change 0x7f764dbf8790 411058,1> depends on changes
  [<Change 0x7f764fdc0690 411057,1>, <Change 0x7f7647a4d810 411054,1>]

So the scheduler knows about the chain 411057,1 -> 411058,1 and ask to create a reference. That triggers a merge job for one of the zuul-merger to merge both changes against tip of the target branch (master). The debug data:

Submitting job <gear.Job 0x7f7667e38d90 handle: None name: merger:merge unique: 2ccb26c05deb4a8082db74cd8c8c363e>

with formatted data:

{'items': [
{
  'oldrev': None,
    'branch': u'master',
    'connection_name': 'gerrit',
    'merge_mode': 2,
    'newrev': None,
    'number': '411054',
    'patchset': 1,
    'project': 'mediawiki/extensions/Calendar',
    'refspec': u'refs/changes/54/411054/1',
    'ref': 'Z108510fcb9434fd2a0d0ed7c5b966ecc'
    'url': 'ssh://jenkins-bot@gerrit.wikimedia.org:29418/mediawiki/extensions/Calendar',
},
{
  'branch': u'master',
  'connection_name': 'gerrit',
  'merge_mode': 2,
  'newrev': None,
  'number': '411057',
  'oldrev': None,
  'patchset': 1,
  'project': 'mediawiki/extensions/CentralAuth',
  'refspec': u'refs/changes/57/411057/1',
  'ref': 'Z398fe1e7c98b44ef94c03349e5a7834a'
  'url': 'ssh://jenkins-bot@gerrit.wikimedia.org:29418/mediawiki/extensions/CentralAuth',
},
{
  'branch': u'master',
  'connection_name': 'gerrit',
  'merge_mode': 2,
  'newrev': None,
  'number': u'411058',
  'oldrev': None,
  'patchset': u'1',
  'project': 'mediawiki/extensions/CentralAuth',
  'refspec': None,
  'ref': 'Zcb21260c372a48d295ef55f57f2a1bb5'
  'url': 'ssh://jenkins-bot@gerrit.wikimedia.org:29418/mediawiki/extensions/CentralAuth',
} 
]}

The Calendar change 411054 happened to be in the queue at the same time. The real issue there is that the Zuul scheduler failed to generate the proper refspec.

'number': '411057',
'refspec': u'refs/changes/57/411057/1',
'patchset': u'1',

'number': u'411058',
'refspec': None,   # <-------------- WRONG !!!!!!!!!!!!
'patchset': u'1',

Zuul get the change metadata via _updateChange in zuul/source/gerrit.py. It does:

gerrit query --format json --all-approvals --comments --commit-message --current-patch-set --dependencies --files --patch-sets --submit-records 411302

The change patchset is retrieved with:

data['currentPatchSet']['number'] => int(1)

Then it iterates for each patchset to find the refspec.

        if change.patchset is None:
            change.patchset = data['currentPatchSet']['number']
...
        max_ps = 0
        for ps in data['patchSets']:
            if ps['number'] == change.patchset:
                change.refspec = ps['ref']
                for f in ps.get('files', []):
                    files.append(f['file'])
            if int(ps['number']) > int(max_ps):
                max_ps = ps['number']

To me that is the only place where change.refspec is assigned. Maybe ps['number'] == change.patchset fails :(

Change 411321 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/zuul@patch-queue/debian/jessie-wikimedia] wmf: debug log for GerritSource._updateChange

https://gerrit.wikimedia.org/r/411321

Change 411322 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/zuul@debian/jessie-wikimedia] 2.5.1-wmf3: debug logging for GerritSource._updateChange

https://gerrit.wikimedia.org/r/411322

Change 411302 restored by Hashar:
This change should have JJB diff job running

https://gerrit.wikimedia.org/r/411302

After restarting Zuul, a new change yields:

2018-02-16 18:35:54,974
Updating <Change 0x7f5d74f503d0 411302,2>
T187567 checking ps # <type 'int'> 1
T187567 checking ps # <type 'int'> 2
T187567 change.refspec = refs/changes/02/411302/2
T187567 post change: ... refspec': u'refs/changes/02/411302/2'

Though ZUUL_CHANGES has a None value:

integration/config:master:None^integration/config:master:refs/changes/02/411302/2

:(

Updating <Change 0x7f5d75c381d0 410625,10>

2018-02-16 18:40:24,714 DEBUG zuul.source.Gerrit: pre change: {'files': [], 'status': None, 'needed_by_changes': [], 'needs_changes': [], 'refspec': None, 'failed_to_merge': False, 'url': None, 'can_merge': False, 'is_merged': False, 'number': u'410625', 'project': None, 'owner': None, 'branch': None, 'approvals': [], 'is_current_patchset': True, 'patchset': u'10', 'open': None}

Note how the patchset is a string!

2018-02-16 18:40:25,547 DEBUG zuul.source.Gerrit: post change:
'refspec': None,
'number': u'410625',
'patchset': u'10',

So the call to _updateChange is made with a Change that has refspec=u'10'. The list of patchset received uses integer and thus none of the patchset matches since they are integers:

2018-02-16 18:40:24,899 DEBUG zuul.source.Gerrit: checking ps # <type 'int'> 1
2018-02-16 18:40:24,900 DEBUG zuul.source.Gerrit: checking ps # <type 'int'> 2
...
2018-02-16 18:40:24,902 DEBUG zuul.source.Gerrit: checking ps # <type 'int'> 9
2018-02-16 18:40:24,902 DEBUG zuul.source.Gerrit: checking ps # <type 'int'> 10

Change 411372 had a related patch set uploaded (by Thcipriani; owner: Thcipriani):
[integration/zuul@patch-queue/debian/jessie-wikimedia] Ensure patchset comparison in the case of different types

https://gerrit.wikimedia.org/r/411372

hashar lowered the priority of this task from Unbreak Now! to High.Feb 16 2018, 8:40 PM
hashar edited projects, added Zuul; removed Patch-For-Review.
hashar added a subscriber: thcipriani.

@thcipriani and I looked at it at the same time and we ended up with the same conclusion as how to fix it. It is a monkey patch to fix comparison between string and integer. That is a similar issue as T186381

hashar added a comment.EditedFeb 16 2018, 8:57 PM

I found a code path that injects a string for the patchset.

The list of dependencies obtained via the --dependencies flag: gerrit query --format json --dependencies 411302:

{
  "dependsOn": [
    {
      "id": "I14ced2657ef9ad57f6b406ac5079cf68bdb18d78",
      "number": 411301,
      "revision": "0378d2613166a55fa361534f6541b6b98b9f3125",
      "ref": "refs/changes/01/411301/1",
      "isCurrentPatchSet": true
    }
  ],

GerritSource._updateChange takes the ref values (a string) and pass it to _getChange, which itself create a Change object from those strings:

if 'dependsOn' in data:
    parts = data['dependsOn'][0]['ref'].split('/')
    dep_num, dep_ps = parts[3], parts[4]
    # both are strings
    ...
    self.log.debug("Updating %s: Getting git-dependent change %s,%s" %
                   (change, dep_num, dep_ps))
    dep = self._getChange(dep_num, dep_ps, history=history)

It ends up in the cache of changes and in change.needs_changes. The same happen when handling 'neededBy'.

So I guess we should cast to an integer as well:

https://gerrit.wikimedia.org/r/#/c/411372/2..3/zuul/source/gerrit.py

T186949 is an other issue which has the same root cause.

Change 411302 abandoned by Hashar:
This change should have JJB diff job running

https://gerrit.wikimedia.org/r/411302

greg moved this task from To Triage to Follow-up/Actionables on the Wikimedia-Incident board.
greg added a subscriber: greg.

Follow-up is:

  • Incident report
  • Any other follow-up with upstream Zuul (to be determined)

status update

Will write an incident report for this one and related T186949 which we fixed last Friday.

greg assigned this task to hashar.Feb 21 2018, 5:09 PM

Change 411321 abandoned by Hashar:
wmf: debug log for GerritSource._updateChange

Reason:
No more needed. https://gerrit.wikimedia.org/r/#/c/411372/3 fixed it properly.

https://gerrit.wikimedia.org/r/411321

Change 411322 merged by jenkins-bot:
[integration/zuul@debian/jessie-wikimedia] 2.5.1-wmf3: fix patchset comparaison

https://gerrit.wikimedia.org/r/411322

Change 411372 merged by Hashar:
[integration/zuul@patch-queue/debian/jessie-wikimedia] Ensure patchset comparison in the case of different types

https://gerrit.wikimedia.org/r/411372

hashar closed this task as Resolved.Mar 26 2018, 4:36 PM