2019-09-20 08:21:16,459 DEBUG zuul.Merger: Merging for change 535631,11.
2019-09-20 08:21:16,459 DEBUG zuul.Merger: Processing refspec refs/changes/31/535631/11 for project operations/puppet / production ref Zb33b89b8a9ff4e4584edb29b222a46a7
2019-09-20 08:21:17,140 DEBUG zuul.Merger: Unable to find commit for ref production/Zb33b89b8a9ff4e4584edb29b222a46a7
2019-09-20 08:21:17,140 DEBUG zuul.Merger: No base commit found for (u'operations/puppet', u'production')
2019-09-20 08:21:17,160 DEBUG zuul.Repo: Resetting repository /srv/zuul/git/operations/puppet
2019-09-20 08:21:17,161 DEBUG zuul.Repo: Updating repository /srv/zuul/git/operations/puppet
2019-09-20 08:21:17,436 ERROR zuul.Merger: Unable to reset repo <zuul.merger.merger.Repo object at 0x7fbb373da510>
Traceback (most recent call last):
File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/merger/merger.py", line 326, in _mergeItem
repo.reset()
File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/merger/merger.py", line 101, in reset
self.update()
File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/merger/merger.py", line 207, in update
origin.fetch(tags=True)
File "/usr/share/python/zuul/local/lib/python2.7/site-packages/git/remote.py", line 789, in fetch
res = self._get_fetch_info_from_stderr(proc, progress)
File "/usr/share/python/zuul/local/lib/python2.7/site-packages/git/remote.py", line 675, in _get_fetch_info_from_stderr
proc.wait(stderr=stderr_text)
File "/usr/share/python/zuul/local/lib/python2.7/site-packages/git/cmd.py", line 415, in wait
raise GitCommandError(self.args, status, errstr)
GitCommandError: Cmd('git') failed due to: exit code(128)
cmdline: git fetch --tags -v origin
stderr: 'fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.'Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | hashar | T233390 zuul-merger fails to fetch from Gerrit | |||
| Resolved | hashar | T233391 zuul-server should not start on spare server when the Debian package is upgraded |
Event Timeline
Mentioned in SAL (#wikimedia-operations) [2019-09-20T08:23:36Z] <hashar> CI in default since it is somehow no more able to fetch from Gerrit T233390
On contint1001
$ sudo su - zuul $ cd /srv/zuul/git/operations/puppet $ git fetch -v Received disconnect from 2620:0:861:3:208:80:154:85: 12: Too many concurrent connections (4) - max. allowed: 4 fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists.
Gerrit has two connections from each contint servers for a total of four connections. We have Gerrit restricting to a total of four ssh connections. Hence the zuul-merger is no more able to fetch
Session User Remote Host -------------------------------------------------------------- 9bb66493 jenkins-bot contint2001.wikimedia.org 7b67f043 jenkins-bot contint2001.wikimedia.org 836929be jenkins-bot contint1001.wikimedia.org e36ee520 jenkins-bot contint1001.wikimedia.org
Mentioned in SAL (#wikimedia-operations) [2019-09-20T08:28:17Z] <hashar> Killed zuul-server process on contint2001 which was establishing connections to Gerrit and filling the pool of allowed ssh connections # T233390
I have upgraded zuul on contint2001 (T203846) which eventually got the zuul-server to start and establish two connections to the Gerrit server. I have stopped the service freeing the extra connections.
I have mentioned that issue previously on https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/502203/
hieradata/hosts/contint1001.yaml:profile::zuul::server::service_ensure: running
hieradata/hosts/contint2001.yaml:profile::zuul::server::service_ensure: stopped
And indeed the next puppet run would stop it:
Notice: /Stage[main]/Zuul::Server/Systemd::Service[zuul]/Service[zuul]/ensure: ensure changed 'running' to 'stopped'
But the package installation starts it nonetheless:
root@contint2001:~# dpkg -i zuul_2.5.1-wmf9_amd64.deb
...
Processing triggers for systemd (215-17+deb8u13) ...
root@contint2001:~# systemctl status zuul
Active: active (running) since Fri 2019-05-17 08:32:23 UTC; 4s ago
I guess I would need it to be masked instead?
@hashar I guess the CI servers should have more relaxed thresholds? Is it even possible to configure gerrit to whitelist some host?
and yes, if you need to have a service not start when the package is installed, you need a systemd::mask definition in puppet.
Nop it is global to the ssh daemon and the concurrency limit is set to 4. That has been set due to T182756#5119773 (private)
Indeed, and I forgot to follow up on a comment Daniel did recently. I hav efilled T233391 about it.