Page MenuHomePhabricator

zuul-merger fails to fetch from Gerrit
Closed, ResolvedPublic

Description

2019-09-20 08:21:16,459 DEBUG zuul.Merger: Merging for change 535631,11.
2019-09-20 08:21:16,459 DEBUG zuul.Merger: Processing refspec refs/changes/31/535631/11 for project operations/puppet / production ref Zb33b89b8a9ff4e4584edb29b222a46a7
2019-09-20 08:21:17,140 DEBUG zuul.Merger: Unable to find commit for ref production/Zb33b89b8a9ff4e4584edb29b222a46a7
2019-09-20 08:21:17,140 DEBUG zuul.Merger: No base commit found for (u'operations/puppet', u'production')
2019-09-20 08:21:17,160 DEBUG zuul.Repo: Resetting repository /srv/zuul/git/operations/puppet
2019-09-20 08:21:17,161 DEBUG zuul.Repo: Updating repository /srv/zuul/git/operations/puppet
2019-09-20 08:21:17,436 ERROR zuul.Merger: Unable to reset repo <zuul.merger.merger.Repo object at 0x7fbb373da510>
Traceback (most recent call last):
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/merger/merger.py", line 326, in _mergeItem
    repo.reset()
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/merger/merger.py", line 101, in reset
    self.update()
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/merger/merger.py", line 207, in update
    origin.fetch(tags=True)
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/git/remote.py", line 789, in fetch
    res = self._get_fetch_info_from_stderr(proc, progress)
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/git/remote.py", line 675, in _get_fetch_info_from_stderr
    proc.wait(stderr=stderr_text)
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/git/cmd.py", line 415, in wait
    raise GitCommandError(self.args, status, errstr)
GitCommandError: Cmd('git') failed due to: exit code(128)
  cmdline: git fetch --tags -v origin
  stderr: 'fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.'

Event Timeline

hashar created this task.Sep 20 2019, 8:23 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 20 2019, 8:23 AM
hashar triaged this task as Unbreak Now! priority.Sep 20 2019, 8:23 AM
Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptSep 20 2019, 8:23 AM

Mentioned in SAL (#wikimedia-operations) [2019-09-20T08:23:36Z] <hashar> CI in default since it is somehow no more able to fetch from Gerrit T233390

On contint1001

$ sudo su - zuul
$ cd /srv/zuul/git/operations/puppet
$ git fetch -v
Received disconnect from 2620:0:861:3:208:80:154:85: 12: Too many concurrent connections (4) - max. allowed: 4
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Gerrit has two connections from each contint servers for a total of four connections. We have Gerrit restricting to a total of four ssh connections. Hence the zuul-merger is no more able to fetch

Session    User            Remote Host
--------------------------------------------------------------
9bb66493   jenkins-bot     contint2001.wikimedia.org
7b67f043   jenkins-bot     contint2001.wikimedia.org
836929be   jenkins-bot     contint1001.wikimedia.org
e36ee520   jenkins-bot     contint1001.wikimedia.org

Mentioned in SAL (#wikimedia-operations) [2019-09-20T08:28:17Z] <hashar> Killed zuul-server process on contint2001 which was establishing connections to Gerrit and filling the pool of allowed ssh connections # T233390

hashar closed this task as Resolved.Sep 20 2019, 8:30 AM

I have upgraded zuul on contint2001 (T203846) which eventually got the zuul-server to start and establish two connections to the Gerrit server. I have stopped the service freeing the extra connections.

I have mentioned that issue previously on https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/502203/

hieradata/hosts/contint1001.yaml:profile::zuul::server::service_ensure: running
hieradata/hosts/contint2001.yaml:profile::zuul::server::service_ensure: stopped

And indeed the next puppet run would stop it:

Notice: /Stage[main]/Zuul::Server/Systemd::Service[zuul]/Service[zuul]/ensure: ensure changed 'running' to 'stopped'

But the package installation starts it nonetheless:

root@contint2001:~# dpkg -i zuul_2.5.1-wmf9_amd64.deb
...
Processing triggers for systemd (215-17+deb8u13) ...

root@contint2001:~# systemctl status zuul
Active: active (running) since Fri 2019-05-17 08:32:23 UTC; 4s ago

I guess I would need it to be masked instead?

Joe added a subscriber: Joe.Sep 20 2019, 8:31 AM

@hashar I guess the CI servers should have more relaxed thresholds? Is it even possible to configure gerrit to whitelist some host?

Joe added a comment.Sep 20 2019, 8:32 AM

and yes, if you need to have a service not start when the package is installed, you need a systemd::mask definition in puppet.

@hashar I guess the CI servers should have more relaxed thresholds? Is it even possible to configure gerrit to whitelist some host?

Nop it is global to the ssh daemon and the concurrency limit is set to 4. That has been set due to T182756#5119773 (private)

and yes, if you need to have a service not start when the package is installed, you need a systemd::mask definition in puppet.

Indeed, and I forgot to follow up on a comment Daniel did recently. I hav efilled T233391 about it.