Page MenuHomePhabricator

train-presync failed due to git clone failing with gnutls_handshake() failure
Closed, ResolvedPublic

Description

The train-presync step failed with:

Cloning into '/srv/mediawiki-staging/php-1.45.0-wmf.21/extensions/3D'...
Cloning into '/srv/mediawiki-staging/php-1.45.0-wmf.21/extensions/CodeEditor'...
fatal: unable to access 'https://gerrit.wikimedia.org/r/mediawiki/extensions/CodeEditor/': gnutls_handshake() failed: Error in the pull function.
fatal: clone of 'https://gerrit.wikimedia.org/r/mediawiki/extensions/CodeEditor' into submodule path '/srv/mediawiki-staging/php-1.45.0-wmf.21/extensions/CodeEditor' failed
Failed to clone 'extensions/CodeEditor'. Retry scheduled
...

scap logging output is unfortunately buffered, so I don't have an exact timing, but the clone started after 03:00:56 UTC and errored out at 03:01:25 UTC.

Details

Event Timeline

Train blocker task is an Unbreak Now! priority. I am pretty sure this was a off by one transient issue and it is not worth looking deeper, but we shall see.

If I try manually from deploy2002 it seems to work

git clone https://gerrit.wikimedia.org/r/mediawiki/core --single-branch --branch wmf/1.45.0-wmf.21  --recurse-submodules --depth=1

I thought it might have been related to an erroneous throttling since the git clone with submodules emits a lot of requests. Then the internal network should be excluded, and the command worked this morning.

I'll retry once I have rebased the security patches.

Well it failed again:

Sep 30 09:29:17 deploy2002 scap[711190]: Cloning into '/srv/mediawiki-staging/php-1.45.0-wmf.21/extensions/FundraiserLandingPage'...
Sep 30 09:29:17 deploy2002 scap[711190]: error: RPC failed; curl 35 gnutls_handshake() failed: Error in the pull function.
Sep 30 09:29:17 deploy2002 scap[711190]: fatal: expected flush after ref listing
Sep 30 09:29:17 deploy2002 scap[711190]: fatal: clone of 'https://gerrit.wikimedia.org/r/mediawiki/extensions/FundraiserLandingPage' into submodule path '/srv/mediawiki-staging/php-1.45.0-wmf.21/extensions/FundraiserLandingPage' failed
Sep 30 09:29:17 deploy2002 scap[711190]: Failed to clone 'extensions/FundraiserLandingPage' a second time, aborting
Sep 30 09:29:17 deploy2002 scap[711190]: Cloning into '/srv/mediawiki-staging/php-1.45.0-wmf.21/extensions/Wikidata.org'...

*sigh*

I think I found the issue:

25883:[Tue Sep 30 09:28:53.619018 2025] [qos:error] [pid 449352:tid 449366] mod_qos(031): access denied, QS_SrvMaxConnPerIP rule: max=25, concurrent connections=26, c=2620:0:860:103:10:192:32:7
25884:[Tue Sep 30 09:28:53.621921 2025] [qos:error] [pid 449352:tid 449361] mod_qos(031): access denied, QS_SrvMaxConnPerIP rule: max=25, concurrent connections=26, c=2620:0:860:103:10:192:32:7
25885:[Tue Sep 30 09:28:53.622878 2025] [qos:error] [pid 696907:tid 696925] mod_qos(031): access denied, QS_SrvMaxConnPerIP rule: max=25, concurrent connections=26, c=2620:0:860:103:10:192:32:7
25886:[Tue Sep 30 09:28:53.988029 2025] [qos:error] [pid 449352:tid 449363] mod_qos(031): access denied, QS_SrvMaxConnPerIP rule: max=25, concurrent connections=26, c=2620:0:860:103:10:192:32:7
25887:[Tue Sep 30 09:28:53.992930 2025] [qos:error] [pid 701189:tid 701191] mod_qos(031): access denied, QS_SrvMaxConnPerIP rule: max=25, concurrent connections=26, c=2620:0:860:103:10:192:32:7
25889:[Tue Sep 30 09:29:00.738779 2025] [qos:error] [pid 701189:tid 701208] mod_qos(031): access denied, QS_SrvMaxConnPerIP rule: max=25, concurrent connections=26, c=2620:0:860:103:10:192:32:7

Change #1192510 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] gerrit: disable mod_qos to debug allowlist

https://gerrit.wikimedia.org/r/1192510

Change #1192510 merged by Arnaudb:

[operations/puppet@production] gerrit: disable mod_qos to debug allowlist

https://gerrit.wikimedia.org/r/1192510

change applied on gerrit1003, it should be back to normal now

This was a side effect of {T402611}, it's been worked around for now