Page MenuHomePhabricator

Zuul-cloner failing to acquire .git lock sometimes
Closed, ResolvedPublic

Description

From https://integration.wikimedia.org/ci/job/mediawiki-extensions-hhvm/293/console:

INFO:zuul.Cloner:Creating repo mediawiki/extensions/WikiGrok from upstream https://gerrit.wikimedia.org/r/p/mediawiki/extensions/WikiGrok
ERROR:zuul.Repo:Unable to initialize repo for https://gerrit.wikimedia.org/r/p/mediawiki/extensions/WikiGrok
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/zuul/merger/merger.py", line 38, in __init__
    self._ensure_cloned()
  File "/usr/local/lib/python2.7/dist-packages/zuul/merger/merger.py", line 58, in _ensure_cloned
    repo.config_writer().write()
  File "/usr/lib/python2.7/dist-packages/git/repo/base.py", line 369, in config_writer
    return GitConfigParser(self._get_config_path(config_level), read_only = False)
  File "/usr/lib/python2.7/dist-packages/git/config.py", line 172, in __init__
    self._lock._obtain_lock()
  File "/usr/lib/python2.7/dist-packages/git/util.py", line 494, in _obtain_lock
    return self._obtain_lock_or_raise()
  File "/usr/lib/python2.7/dist-packages/git/util.py", line 481, in _obtain_lock_or_raise
    raise IOError("Lock for file %r did already exist, delete %r in case the lock is illegal" % (self._file_path, lock_file))
IOError: Lock for file '/mnt/jenkins-workspace/workspace/mediawiki-extensions-hhvm/src/extensions/WikiGrok/.git/config' did already exist, delete '/mnt/jenkins-workspace/workspace/mediawiki-extensions-hhvm/src/extensions/WikiGrok/.git/config.lock' in case the lock is illegal
Traceback (most recent call last):
  File "/usr/local/bin/zuul-cloner", line 10, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/dist-packages/zuul/cmd/cloner.py", line 156, in main
    cloner.main()
  File "/usr/local/lib/python2.7/dist-packages/zuul/cmd/cloner.py", line 151, in main
    cloner.execute()
  File "/usr/local/lib/python2.7/dist-packages/zuul/lib/cloner.py", line 66, in execute
    self.prepareRepo(project, dest)
  File "/usr/local/lib/python2.7/dist-packages/zuul/lib/cloner.py", line 127, in prepareRepo
    repo = self.cloneUpstream(project, dest)
  File "/usr/local/lib/python2.7/dist-packages/zuul/lib/cloner.py", line 95, in cloneUpstream
    raise Exception("Error cloning %s to %s" % (git_upstream, dest))
Exception: Error cloning https://gerrit.wikimedia.org/r/p/mediawiki/extensions/WikiGrok to src/extensions/WikiGrok
Build step 'Execute shell' marked build as failure

I cleaned up the same error for a different repo on the same host previously: https://wikitech.wikimedia.org/w/index.php?title=Release_Engineering%2FSAL&diff=140645&oldid=140607

This feels like a race of some sort where multiple jobs are using the same working copy but it may just be an intermittent failure of zuul.Cloner of some sort. It seems that once a repo becomes corrupted like this it must be manually deleted before the job will succeed on that Jenkins slave again.

Related Objects

Event Timeline

bd808 created this task.Jan 14 2015, 2:11 AM
bd808 raised the priority of this task from to Needs Triage.
bd808 updated the task description. (Show Details)
bd808 added a subscriber: bd808.
greg triaged this task as Unbreak Now! priority.Jan 14 2015, 4:38 PM
greg added subscribers: greg, Cmcmahon, dduvall and 2 others.

@Cmcmahon / @mmodell / @dduvall / @hashar / whoever is awake: This is blocking merges during a SWAT deploy right now.

greg added a comment.Jan 14 2015, 4:59 PM

I rm -rf'd the WikiGroke checkout just now, let's see....

Ori proposed a patch upstream https://review.openstack.org/#/c/147101/1/ which properly release the lock after the config has been written.

I have added the patch to our integration/zuul.git repository ( 9cb4842 ) and tagged it wmf-deploy-20150114-1.

Redeploying Zuul on all servers/instances.

hashar lowered the priority of this task from Unbreak Now! to Normal.Jan 14 2015, 8:30 PM

I have deployed Ori patch on all labs slaves as well as the two production slaves. Should hopefully fix the issue.

Lets keep this Task open until upstream has reviewed and merged the Ori patch.

Thank you Ori!

hashar renamed this task from Git clone corruption by mediawiki-extensions-hhvm job on integration-slave1006 to [fixed, pending upstream merge] Git clone corruption by mediawiki-extensions-hhvm job on integration-slave1006.Jan 14 2015, 8:44 PM
hashar added a project: Upstream.
hashar set Security to None.
Krinkle renamed this task from [fixed, pending upstream merge] Git clone corruption by mediawiki-extensions-hhvm job on integration-slave1006 to Zuul-cloner failing to acquire .git/config lock sometimes.Jan 26 2015, 11:20 PM
Krinkle closed this task as Resolved.
Krinkle claimed this task.

A slightly different lock error happened just now: https://integration.wikimedia.org/ci/job/mediawiki-phpunit-hhvm/1672/console

23:09:07 Traceback (most recent call last):
23:09:07   File "/usr/local/bin/zuul-cloner", line 10, in <module>
23:09:07     sys.exit(main())
23:09:07   File "/usr/local/lib/python2.7/dist-packages/zuul/cmd/cloner.py", line 156, in main
23:09:07     cloner.main()
23:09:07   File "/usr/local/lib/python2.7/dist-packages/zuul/cmd/cloner.py", line 151, in main
23:09:07     cloner.execute()
23:09:07   File "/usr/local/lib/python2.7/dist-packages/zuul/lib/cloner.py", line 66, in execute
23:09:07     self.prepareRepo(project, dest)
23:09:07   File "/usr/local/lib/python2.7/dist-packages/zuul/lib/cloner.py", line 159, in prepareRepo
23:09:07     repo.checkout(fetch_head)
23:09:07   File "/usr/local/lib/python2.7/dist-packages/zuul/merger/merger.py", line 127, in checkout
23:09:07     repo.head.reference = ref
23:09:07   File "/usr/lib/python2.7/dist-packages/git/refs/symbolic.py", line 301, in set_reference
23:09:07     fd = lfd.open(write=True, stream=True)
23:09:07   File "/usr/lib/python2.7/dist-packages/gitdb/util.py", line 314, in open
23:09:07     raise IOError("Lock at %r could not be obtained" % self._lockfilepath())
23:09:07 IOError: Lock at '/mnt/jenkins-workspace/workspace/mediawiki-phpunit-hhvm/src/.git/HEAD.lock' could not be obtained
23:09:07 Build step 'Execute shell' marked build as failure
23:09:08 [xUnit] [INFO] - Starting to record.
23:09:08 [xUnit] [INFO] - Processing PHPUnit-3.x (default)
23:09:08 [xUnit] [INFO] - [PHPUnit-3.x (default)] - 1 test report file(s) were found with the pattern 'log/junit-mw-phpunit.xml' relative to '/mnt/jenkins-workspace/workspace/mediawiki-phpunit-hhvm' for the testing framework 'PHPUnit-3.x (default)'.
23:09:08 [xUnit] [ERROR] - Test reports were found but not all of them are new. Did all the tests run?
23:09:08   * /mnt/jenkins-workspace/workspace/mediawiki-phpunit-hhvm/log/junit-mw-phpunit.xml is 3 days 23 hr old
Krinkle reopened this task as Open.Jan 26 2015, 11:27 PM
Krinkle added a subscriber: Umherirrender.

https://integration.wikimedia.org/ci/job/mediawiki-extensions-zend/1938/console

22:09:41 ERROR:zuul.Repo:Unable to initialize repo for https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Flow
22:09:41 Traceback (most recent call last):
22:09:41   File "/usr/local/lib/python2.7/dist-packages/zuul/merger/merger.py", line 38, in __init__
22:09:41     self._ensure_cloned()
22:09:41   File "/usr/local/lib/python2.7/dist-packages/zuul/merger/merger.py", line 58, in _ensure_cloned
22:09:41     config_writer = repo.config_writer()
22:09:41   File "/usr/lib/pymodules/python2.7/git/repo/base.py", line 369, in config_writer
22:09:41     return GitConfigParser(self._get_config_path(config_level), read_only = False)
22:09:41   File "/usr/lib/pymodules/python2.7/git/config.py", line 172, in __init__
22:09:41     self._lock._obtain_lock()
22:09:41   File "/usr/lib/pymodules/python2.7/git/util.py", line 494, in _obtain_lock
22:09:41     return self._obtain_lock_or_raise()
22:09:41   File "/usr/lib/pymodules/python2.7/git/util.py", line 481, in _obtain_lock_or_raise
22:09:41     raise IOError("Lock for file %r did already exist, delete %r in case the lock is illegal" % (self._file_path, lock_file))
22:09:41 IOError: Lock for file '/srv/ssd/jenkins-slave/workspace/mediawiki-extensions-zend@2/src/extensions/Flow/.git/config' did already exist, delete '/srv/ssd/jenkins-slave/workspace/mediawiki-extensions-zend@2/src/extensions/Flow/.git/config.lock' in case the lock is illegal
Se4598 added a subscriber: Se4598.Feb 5 2015, 1:01 AM
Se4598 raised the priority of this task from Normal to High.Feb 5 2015, 1:10 AM

this is annoying, and together with T88554 it is hard to get a verified+2 in gerrit even when rechecking.
Apparently for the last failing builds on mediawiki-extensions-zend it was always the flow repo (Flow/.git/config.lock).

https://integration.wikimedia.org/ci/job/mediawiki-extensions-zend/2597/console

21:49:32 INFO:zuul.Cloner:Prepared mediawiki/extensions/EventLogging repo with branch master
21:49:32 INFO:zuul.Cloner:Creating repo mediawiki/extensions/Flow from upstream https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Flow
21:49:32 ERROR:zuul.Repo:Unable to initialize repo for https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Flow
21:49:32 Traceback (most recent call last):
21:49:32   File "/usr/local/lib/python2.7/dist-packages/zuul/merger/merger.py", line 38, in __init__
21:49:32     self._ensure_cloned()
21:49:32   File "/usr/local/lib/python2.7/dist-packages/zuul/merger/merger.py", line 58, in _ensure_cloned
21:49:32     config_writer = repo.config_writer()
21:49:32   File "/usr/lib/pymodules/python2.7/git/repo/base.py", line 369, in config_writer
21:49:32     return GitConfigParser(self._get_config_path(config_level), read_only = False)
21:49:32   File "/usr/lib/pymodules/python2.7/git/config.py", line 172, in __init__
21:49:32     self._lock._obtain_lock()
21:49:32   File "/usr/lib/pymodules/python2.7/git/util.py", line 494, in _obtain_lock
21:49:32     return self._obtain_lock_or_raise()
21:49:32   File "/usr/lib/pymodules/python2.7/git/util.py", line 481, in _obtain_lock_or_raise
21:49:32     raise IOError("Lock for file %r did already exist, delete %r in case the lock is illegal" % (self._file_path, lock_file))
21:49:32 IOError: Lock for file '/srv/ssd/jenkins-slave/workspace/mediawiki-extensions-zend@3/src/extensions/Flow/.git/config' did already exist, delete '/srv/ssd/jenkins-slave/workspace/mediawiki-extensions-zend@3/src/extensions/Flow/.git/config.lock' in case the lock is illegal
Krinkle removed Krinkle as the assignee of this task.Feb 25 2015, 12:45 AM

The root cause is most probably some git operation being interrupted abruptly. When one quickly send two patchsets, Zuul will abort the job for the first patch set and Jenkins kill all running process attached to the job. That leaves the git workspace in a dirty state (lock file is still around).

A possible fix would be to have zuul-cloner clean up lock files before proceeding.

GitPython has a LockFile class in git/util.py which has a _release_lock() method. We can get Zuul Merger.Merger._ensure_cloned() to invoke it.

Krinkle lowered the priority of this task from High to Low.Mar 3 2015, 11:09 PM
Krinkle moved this task from Next to Untriaged on the Continuous-Integration-Infrastructure board.

Lowering priority. It doesn't happen very often and will be obsolete if we use clean workspaces and/or isolated VMs.

Krinkle moved this task from Backlog to Enhancements on the Zuul board.Apr 15 2015, 7:47 PM
Krinkle moved this task from Enhancements to Bugs on the Zuul board.
Krinkle added a comment.EditedApr 17 2015, 12:40 PM

Still happens. E.g. on https://integration.wikimedia.org/ci/job/mediawiki-extensions-zend/12810/console just now

ERROR:zuul.Repo:Unable to initialize repo for https://gerrit.wikimedia.org/r/p/mediawiki/extensions/cldr
Traceback (most recent call last):
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/merger/merger.py", line 38, in __init__
    self._ensure_cloned()
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/merger/merger.py", line 58, in _ensure_cloned
    config_writer = repo.config_writer()
  File "/usr/lib/pymodules/python2.7/git/repo/base.py", line 369, in config_writer
    return GitConfigParser(self._get_config_path(config_level), read_only = False)
  File "/usr/lib/pymodules/python2.7/git/config.py", line 172, in __init__
    self._lock._obtain_lock()
  File "/usr/lib/pymodules/python2.7/git/util.py", line 494, in _obtain_lock
    return self._obtain_lock_or_raise()
  File "/usr/lib/pymodules/python2.7/git/util.py", line 481, in _obtain_lock_or_raise
    raise IOError("Lock for file %r did already exist, delete %r in case the lock is illegal" % (self._file_path, lock_file))
IOError: Lock for file '/mnt/jenkins-workspace/workspace/mediawiki-extensions-zend@2/src/extensions/cldr/.git/config' did already exist, delete '/mnt/jenkins-workspace/workspace/mediawiki-extensions-zend@2/src/extensions/cldr/.git/config.lock' in case the lock is illegal
Traceback (most recent call last):
  File "/usr/bin/zuul-cloner", line 10, in <module>
    sys.exit(main())
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/cmd/cloner.py", line 156, in main
    cloner.main()
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/cmd/cloner.py", line 151, in main
    cloner.execute()
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/lib/cloner.py", line 66, in execute
    self.prepareRepo(project, dest)
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/lib/cloner.py", line 127, in prepareRepo
    repo = self.cloneUpstream(project, dest)
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/lib/cloner.py", line 95, in cloneUpstream
    raise Exception("Error cloning %s to %s" % (git_upstream, dest))
Exception: Error cloning https://gerrit.wikimedia.org/r/p/mediawiki/extensions/cldr to src/extensions/cldr

I left a comment on our upstream patch: https://review.openstack.org/#/c/147101/

The patch at https://review.openstack.org/#/c/147101/ handle the config.lock case. It would not release other locks that might be caused when Zuul abruptly cancel jobs during a git operation.

A poor man solution would be to find and delete such lock files before invoking zuul-cloner. Something like:

echo "Deleting leftover git lock files"
find src/ -path '**/.git/*.lock' -delete

Have to be tested though then we can inject it in JJB config before any invocation of zuul-cloner until its handled by zuul-cloner itself.

The patch at https://review.openstack.org/#/c/147101/ handle the config.lock case. It would not release other locks that might be caused when Zuul abruptly cancel jobs during a git operation.

The stack trace in the comment before this clearly shows config_writer. There are also other possible locks inside .git. But it seems even the config lock is not handled correctly by that patch.

Krinkle raised the priority of this task from Low to High.Apr 20 2015, 10:47 PM

This keeps happening at random when the Zuul scheduler cancels builds and leaves behind corrupt workspaces. Subsequently this routinely cause all builds to keep failing until someone logs on the slave over SSH and restores the git clone.

This is unacceptable. We need to either fix the locks, or clean our workspaces (T76304; depends on having git-cache, T87294).

To fix the locks we would:

  • Make Zuul-cloner remove locks,
  • .. or; Remove them ahead of time,
  • .. or make Zuul-cloner more resilient and falback to re-cloning if updating fails.

Are the developers of zuul out of their minds? It seems crazy to me that it would abruptly just kill a job and expect the workspace to be usable afterwards.

Are the developers of zuul out of their minds? It seems crazy to me that it would abruptly just kill a job and expect the workspace to be usable afterwards.

It doesn't. We're crazy for not wiping workspaces at the start or end of each build. Developers of Zuul (OpenStack infra) do not preserve workspaces ever. We do this because we don't have a git cache and without git cache, preserving workspaces (or rather, just the .git) is the simplest way to not have to re-clone big repositories every time (which would slow down every mediawiki related job by 5 minutes).

We're working on this, see T96627 and associated tasks.

Still happening.
https://integration.wikimedia.org/ci/job/mediawiki-extensions-qunit/1634/console

02:42:55 INFO:zuul.Cloner:Creating repo mediawiki/extensions/Flow from upstream https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Flow
02:42:55 ERROR:zuul.Repo:Unable to initialize repo for https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Flow
02:42:55 Traceback (most recent call last):
02:42:55   File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/merger/merger.py", line 38, in __init__
02:42:55     self._ensure_cloned()
02:42:55   File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/merger/merger.py", line 58, in _ensure_cloned
02:42:55     config_writer = repo.config_writer()
02:42:55   File "/usr/lib/python2.7/dist-packages/git/repo/base.py", line 369, in config_writer
02:42:55     return GitConfigParser(self._get_config_path(config_level), read_only = False)
02:42:55   File "/usr/lib/python2.7/dist-packages/git/config.py", line 172, in __init__
02:42:55     self._lock._obtain_lock()
02:42:55   File "/usr/lib/python2.7/dist-packages/git/util.py", line 494, in _obtain_lock
02:42:55     return self._obtain_lock_or_raise()
02:42:55   File "/usr/lib/python2.7/dist-packages/git/util.py", line 481, in _obtain_lock_or_raise
02:42:55     raise IOError("Lock for file %r did already exist, delete %r in case the lock is illegal" % (self._file_path, lock_file))
02:42:55 IOError: Lock for file '/mnt/jenkins-workspace/workspace/mediawiki-extensions-qunit/src/extensions/Flow/.git/config' did already exist, delete '/mnt/jenkins-workspace/workspace/mediawiki-extensions-qunit/src/extensions/Flow/.git/config.lock' in case the lock is illegal
Tgr added a subscriber: Tgr.Apr 11 2016, 4:46 PM

Seems to be happening fairly often these days.
https://integration.wikimedia.org/ci/job/mediawiki-extensions-qunit/38439/console

16:23:28 INFO:zuul.Cloner:Creating repo mediawiki/extensions/cldr from upstream https://gerrit.wikimedia.org/r/p/mediawiki/extensions/cldr
16:23:29 DEBUG:zuul.Repo:Resetting repository src/extensions/cldr
16:23:29 DEBUG:zuul.Repo:Updating repository src/extensions/cldr
16:23:29 Traceback (most recent call last):
16:23:29   File "/usr/bin/zuul-cloner", line 10, in <module>
16:23:29     sys.exit(main())
16:23:29   File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/cmd/cloner.py", line 156, in main
16:23:29     cloner.main()
16:23:29   File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/cmd/cloner.py", line 151, in main
16:23:29     cloner.execute()
16:23:29   File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/lib/cloner.py", line 68, in execute
16:23:29     self.prepareRepo(project, dest)
16:23:29   File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/lib/cloner.py", line 148, in prepareRepo
16:23:29     repo.reset()
16:23:29   File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/merger/merger.py", line 107, in reset
16:23:29     repo.create_head(ref.remote_head, ref, force=True)
16:23:29   File "/usr/share/python/zuul/local/lib/python2.7/site-packages/git/repo/base.py", line 332, in create_head
16:23:29     return Head.create(self, path, commit, force, logmsg)
16:23:29   File "/usr/share/python/zuul/local/lib/python2.7/site-packages/git/refs/symbolic.py", line 527, in create
16:23:29     return cls._create(repo, path, cls._resolve_ref_on_create, reference, force, logmsg)
16:23:29   File "/usr/share/python/zuul/local/lib/python2.7/site-packages/git/refs/symbolic.py", line 494, in _create
16:23:29     ref.set_reference(target, logmsg)
16:23:29   File "/usr/share/python/zuul/local/lib/python2.7/site-packages/git/refs/symbolic.py", line 315, in set_reference
16:23:29     fd = lfd.open(write=True, stream=True)
16:23:29   File "/usr/share/python/zuul/local/lib/python2.7/site-packages/gitdb/util.py", line 314, in open
16:23:29     raise IOError("Lock at %r could not be obtained" % self._lockfilepath())
16:23:29 IOError: Lock at '/mnt/jenkins-workspace/workspace/mediawiki-extensions-qunit/src/extensions/cldr/.git/refs/heads/wmf/1.24wmf1.lock' could not be obtained
16:23:29 Build step 'Execute shell' marked build as failure
foreach ($list as $item) {
  work_miracles($item);
}

Same patch, same day, errors out on the same repo: https://integration.wikimedia.org/ci/job/mediawiki-extensions-qunit/38412/console

Tgr added a comment.Apr 12 2016, 1:09 PM

A different error:
https://integration.wikimedia.org/ci/job/mediawiki-core-qunit/62842/console

11:55:45 INFO:zuul.Cloner:Creating repo mediawiki/core from upstream https://gerrit.wikimedia.org/r/p/mediawiki/core
11:55:47 DEBUG:zuul.Repo:Resetting repository src
11:55:47 DEBUG:zuul.Repo:Updating repository src
11:55:55 INFO:zuul.Cloner:upstream repo has branch master
11:56:35 DEBUG:zuul.Cloner:Fetched ref refs/zuul/master/Zac0122d4a776400494993e9d8ec7287b from mediawiki/core
11:56:35 DEBUG:zuul.Repo:Checking out 4f0bd64e50ad721e2d96ab42c39ca2ed4e84b80f
11:56:35 Traceback (most recent call last):
11:56:35   File "/usr/bin/zuul-cloner", line 10, in <module>
11:56:35     sys.exit(main())
11:56:35   File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/cmd/cloner.py", line 156, in main
11:56:35     cloner.main()
11:56:35   File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/cmd/cloner.py", line 151, in main
11:56:35     cloner.execute()
11:56:35   File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/lib/cloner.py", line 68, in execute
11:56:35     self.prepareRepo(project, dest)
11:56:35   File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/lib/cloner.py", line 185, in prepareRepo
11:56:35     repo.checkout(fetch_head)
11:56:35   File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/merger/merger.py", line 143, in checkout
11:56:35     reset_repo_to_head(repo)
11:56:35   File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/merger/merger.py", line 30, in reset_repo_to_head
11:56:35     repo.git.reset('--hard', 'HEAD', '--')
11:56:35   File "/usr/share/python/zuul/local/lib/python2.7/site-packages/git/cmd.py", line 440, in <lambda>
11:56:35     return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
11:56:35   File "/usr/share/python/zuul/local/lib/python2.7/site-packages/git/cmd.py", line 834, in _call_process
11:56:35     return self.execute(make_call(), **_kwargs)
11:56:35   File "/usr/share/python/zuul/local/lib/python2.7/site-packages/git/cmd.py", line 627, in execute
11:56:35     raise GitCommandError(command, status, stderr_value)
11:56:35 git.exc.GitCommandError: 'git reset --hard HEAD --' returned with exit code 128
11:56:35 stderr: 'fatal: unable to read tree 0300037ef45754d51bf2561e51ab92b9337d0eed'
11:56:35 Build step 'Execute shell' marked build as failure
11:56:35 [PostBuildScript] - Execution post build scripts.
11:56:35 [mediawiki-core-qunit] $ /bin/bash -xe /tmp/hudson4661934340094476409.sh
11:56:35 + rm -f /srv/localhost-worker/jenkins-mediawiki-core-qunit-62842
11:56:35 [PostBuildScript] - Execution post build scripts.
11:56:35 [mediawiki-core-qunit] $ /bin/bash -xe /tmp/hudson7597643577514989694.sh
11:56:35 + /srv/deployment/integration/slave-scripts/bin/mw-teardown-mysql.sh
11:56:35 ERROR 1269 (HY000) at line 1: Can't revoke all privileges for one or more of the requested users

Not sure if that's the same problem or T126699 but the main error seems zuul-related.

Still happening:

https://gerrit.wikimedia.org/r/#/c/291361/3

https://integration.wikimedia.org/ci/job/mediawiki-extensions-hhvm/65501/console

Building remotely on integration-slave-trusty-1017 ..
23:05:39 IOError: Lock at '/mnt/jenkins-workspace/workspace/mediawiki-extensions-hhvm/src/extensions/Babel/.git/refs/heads/wmf/1.26wmf11.lock' could not be obtained

https://integration.wikimedia.org/ci/job/mediawiki-extensions-hhvm/65504/console

Building remotely on integration-slave-trusty-1017 ..
23:45:38 IOError: Lock at '/mnt/jenkins-workspace/workspace/mediawiki-extensions-hhvm/src/extensions/Babel/.git/refs/heads/wmf/1.26wmf11.lock' could not be obtained

Mentioned in SAL [2016-06-04T00:09:28Z] <Krinkle> krinkle@integration-slave-trusty-1017:~$ sudo rm -rf /mnt/jenkins-workspace/workspace/mediawiki-extensions-hhvm/src/extensions/Babel (T86730)

Krinkle renamed this task from Zuul-cloner failing to acquire .git/config lock sometimes to Zuul-cloner failing to acquire .git lock sometimes.Jun 4 2016, 12:10 AM

I belive this is fixed in https://phabricator.wikimedia.org/rCIZU0a6a0c422cdffe668bec2a9420d9bdd32182a0d8
Since it now also checks for .git.

That commit relates to the git cache (source), which is not enabled by default. This bug is about the git status inside the Jenkins job workspace (destination).

Krinkle removed a subscriber: Krinkle.Jul 25 2016, 11:31 PM

Is this fixed in https://phabricator.wikimedia.org/rCIZUe489cf2a1a97870c55abd4279a9bd8eeac0cb8b7 since it looks like that patch fixes that problem?

hashar added a comment.Sep 2 2016, 2:06 PM

e489cf2a1a97870c55abd4279a9bd8eeac0cb8b7 is the Zuul merger it deals with the file lock sticking when modifying the git configuration for the repo. That patch has been included in Zuul ages ago.

This task is about zuul-cloner (not zuul-merger) which can be abruptly aborted and leave lock file behind. The next run on that job/slave would then choke due to the leftover lock file.

A way to fix it would be to have zuul-cloner discard the lock files entirely. A future fix would be to always start with a clean environment.

Oh, @hashar would you know how to implement the fix please?

Krinkle closed this task as Resolved.Sep 28 2018, 7:37 PM
Krinkle claimed this task.

Not seen in a while. Let's close this.

mmodell changed the subtype of this task from "Task" to "Production Error".Wed, Aug 28, 11:12 PM