Page MenuHomePhabricator

Jenkins merged a faulty change
Closed, ResolvedPublic

Description

Aaron Schulz wrote:

I noticed that https://gerrit.wikimedia.org/r/#/c/33971/ passed the tests but after it was merged, the new tests started failing for everything. The commit to revert it also failed so I override Jenkins and merged anyway, and the failures went away for new commits. This indicates that something broken is going, possibly Jenkins running tests just against master rather than master + the patch, which would explain this problem.


Version: unspecified
Severity: normal

Details

Reference
bz46723

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 1:30 AM
bzimport set Reference to bz46723.
hashar created this task.Mar 30 2013, 2:27 PM

Related URL: https://gerrit.wikimedia.org/r/58283 (Gerrit Change I4b3fadccaae9c35964a0c47d63b22c4f35148a24)

hashar added a comment.Apr 9 2013, 8:46 AM

From bug 47031 : https://gerrit.wikimedia.org/r/#/c/57436/ has been merged although it is faulty.

The unit tests ran on patchset upload did catch the issue:

https://integration.wikimedia.org/ci/job/mediawiki-core-phpunit-misc/5222/console : FAILURE

But the gating run after CR+2 did not catch it:

https://integration.wikimedia.org/ci/job/mediawiki-core-phpunit-misc/5223/console : SUCCESS

The root cause is that despite the ZUUL_REF points to the proper merge commit, the Jenkins Git plugin seems to use the current origin/master to build.

hashar added a comment.Apr 9 2013, 1:30 PM

build #5223

Workspace did get wiped:
02:46:53 Wiping out workspace first.

It check out the revision:
02:46:56 Checking out Revision 4c69569db71d149feff6c4b10ea7a493425d67fd (origin/master)

That is the master revision NOT the change. The commit should have been
7dd3356a51951f8cdfe463552b5e5aae272e8e60


The related merge job
https://integration.wikimedia.org/ci/job/mediawiki-core-merge/11333/console

02:44:17 Commencing build of Revision 7dd3356a51951f8cdfe463552b5e5aae272e8e60 (origin/master)
02:44:17 Checking out Revision 7dd3356a51951f8cdfe463552b5e5aae272e8e60 (origin/master)


The ZUUL_REF has probably not been resolved properly and the git plugin fallback to master.

There is also the possibility that the mediawiki-core-phpunit-misc job was using ZUUL_COMMIT as a refspec instead of ZUUL_REF. That might prevent the plugin from fetching the revision. The job history is no more accessible due to an unexpected upgrade (see bug 47040).

hashar added a comment.Apr 9 2013, 9:05 PM

Created attachment 12065
python script parsing build logs to find Zuul commit vs Git plugin checkout

Attached:

hashar added a comment.Apr 9 2013, 9:10 PM

Created attachment 12066
output of checkbug46723.py

The result script output highlight that some builds are not testing what they should be testing because they check out a parent commit. By looking at the Jenkins Git plugin source code, it seems that whenever the reference is not parseable (aka: git rev-parse $ZUUL_REF), the plugin fallback to use master or some parent commit.

I need to improve the script to find out if that happens in a specific pipeline or for some specific refs.

Attached:

hashar added a comment.Apr 9 2013, 9:18 PM

Extract for the two builds referenced somewhere above:

Verifying /var/lib/jenkins/jobs/mediawiki-core-phpunit-misc/builds/5222/log
Zuulcommit: 8cc0b601aa2db6db09ac0e4d70847293d75875aa
Checkedout: 8cc0b601aa2db6db09ac0e4d70847293d75875aa
Verifying /var/lib/jenkins/jobs/mediawiki-core-phpunit-misc/builds/5223/log
Zuulcommit: 7dd3356a51951f8cdfe463552b5e5aae272e8e60
Checkedout: 4c69569db71d149feff6c4b10ea7a493425d67fd (MISMATCH)

We can see that build 5223 did not used the proper commit :-]

I suspect git plugin does not fetch the proper references / can't find it. That result internally in an unknown sha1 and then git plugin fallback to master or something else.

I will try to reproduce the issue in labs with git plugin set to verbose. That needs to start Jenkins with -Dhudson.plugins.git.GitSCM.verbose=true

I have traced the issue as far as mediawiki-core-lint build #19 from made on November 22nd 2012).

MISMATCH in /var/lib/jenkins/jobs/mediawiki-core-lint/builds/19/log
Pipeline: gate
Zuulcommit: 76606b66b006ac0e62087e6d00b1e4bdd56fff09
Checkedout: 232e34733fc68739ba96cccc31d3ff88f9484a23

We are lacking the git plugin verbose mode in production due to a bug. It is corrected with https://gerrit.wikimedia.org/r/58489 . That will help find out what the plugin is doing internally.

Created attachment 12084
Console output for https://integration.wikimedia.org/ci/job/mediawiki-core-phpunit-parser/5386/console

Attached:

ZUUL_COMMIT=76cb37f0c69dcd69884fc6e66681e77c8045a08e

but it fetched origin/master instead :-(

The branch specifier in the git plugin is set to ZUUL_BRANCH which is 'master'.

In the git plugin (at git-plugin/src/main/java/hudson/plugins/git/util/DefaultBuildChooser.java ), the getCandidateRevisions() will recognize whether the branch looks like a sha1 (if it matches /[0-9a-f]{6,40}/) and in such a case will create a detached branch using that commit.

Seems the Jenkins job macro should then use ZUUL_COMMIT as a branch specifier.

Related URL: https://gerrit.wikimedia.org/r/58865 (Gerrit Change Iafebfffe480886fc8956e56517291b1b3b1fc0cc)

Related URL: https://gerrit.wikimedia.org/r/58865 (Gerrit Change Iafebfffe480886fc8956e56517291b1b3b1fc0cc)

I have updated mediawiki-core-whitespaces job to use ZUUL_COMMIT as a refspec specifier. The job is non voting so that is not going to do any harm.

The experimental change is https://gerrit.wikimedia.org/r/58865

(In reply to comment #13)

Related URL: https://gerrit.wikimedia.org/r/58865 (Gerrit Change
Iafebfffe480886fc8956e56517291b1b3b1fc0cc)

Why is this comment duplicated?

  • Bug 47208 has been marked as a duplicate of this bug. ***

https://gerrit.wikimedia.org/r/58865 (Gerrit Change Iafebfffe480886fc8956e56517291b1b3b1fc0cc) | change APPROVED and MERGED [by Hashar]

https://gerrit.wikimedia.org/r/#/c/58865/ has been deployed.

I am now manually updating the jobs which are not under JJB:

analytics-libanon
analytics-udp-filters
analytics-webstatscollector
analytics-wikistats
mwext-PoolCounter-pep8
mwext-VisualEditor-docgen
operations-debs-python-voluptuous-debbuild
parsoid-parse-tool-check
parsoid-roundtrip-test-check
parsoid-runTests
test-mediawiki-merge

Will monitor over the next few days. Lowering priority for now.

hashar@gallium:~$ ./checkbug46723.py mediawiki-core-phpunit-api --filter 2013-04-16*
Found 0 mismatches in 29 log files.
hashar@gallium:~$ ./checkbug46723.py mediawiki-core-phpunit-misc --filter 2013-04-16*
Found 0 mismatches in 29 log files.
$

Seems it got fixed :-] Will verify again during the week, but so far that looks good.

I have verified the jobs triggered over the past few days. Seems to work fine now :-) The root cause was using ZUUL_BRANCH as a branch specifier instead of ZUUL_COMMIT.

Change 117045 had a related patch set uploaded by Hashar:
Parsoid: uses ZUUL_COMMIT as a git refspec to build

https://gerrit.wikimedia.org/r/117045

Change 117045 merged by jenkins-bot:
Parsoid: uses ZUUL_COMMIT as a git refspec to build

https://gerrit.wikimedia.org/r/117045

hashar lowered the priority of this task from Unbreak Now! to Normal.Mar 3 2015, 10:26 AM
hashar raised the priority of this task from Normal to Unbreak Now!.