Page MenuHomePhabricator

codesearch is getting Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Closed, ResolvedPublic

Description

Apr 21 14:48:35 codesearch8 docker[20188]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/mediawiki/libs/NormalizedException.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:48:35 codesearch8 docker[20199]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/analytics/analytics.wikimedia.org.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:48:37 codesearch8 docker[20199]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/analytics/limn-extdist-data.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:48:37 codesearch8 docker[1713]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/mediawiki/tools/scap-vagrant.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:48:41 codesearch8 docker[1713]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/integration/pipelinelib.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:48:49 codesearch8 docker[19887]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/mediawiki/extensions/CategoryTree.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:48:59 codesearch8 docker[19887]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/mediawiki/extensions/ImageMap.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:49:09 codesearch8 docker[2336]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/mediawiki/extensions/CodeReview.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:49:18 codesearch8 docker[2336]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/mediawiki/extensions/PageViewInfo.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:49:48 codesearch8 docker[21850]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/mediawiki/skins/Truglass.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:49:50 codesearch8 docker[21850]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/mediawiki/skins/Gamepress.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:49:54 codesearch8 docker[26149]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/wikidata/query/rdf.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:49:57 codesearch8 docker[26149]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/mediawiki/extensions/RevisionSlider.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:50:11 codesearch8 docker[27348]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/analytics/aqs/deploy.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:50:19 codesearch8 docker[27348]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/mediawiki/services/kask.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:50:23 codesearch8 docker[568]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/mediawiki/services/kafka-watcher.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:50:25 codesearch8 docker[568]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/mediawiki/services/chromium-render.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:50:33 codesearch8 docker[20204]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/openstack/horizon/wmf-sudo-dashboard.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:50:33 codesearch8 docker[20204]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/cloud/toolforge/ingress-admission-controller.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:50:33 codesearch8 docker[3363]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/performance/excimer-ui-client.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:50:37 codesearch8 docker[3363]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/performance/debs/php-slim-views.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:50:42 codesearch8 docker[20188]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/WrappedString.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:50:46 codesearch8 docker[20199]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/analytics/analytics.wikimedia.org.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:50:46 codesearch8 docker[20188]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/mediawiki/libs/NormalizedException.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:50:48 codesearch8 docker[1713]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/mediawiki/tools/scap-vagrant.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:50:48 codesearch8 docker[20199]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/analytics/limn-extdist-data.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:50:52 codesearch8 docker[1713]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/integration/pipelinelib.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out
Apr 21 14:51:02 codesearch8 docker[19887]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/mediawiki/extensions/SyntaxHighlight_GeSHi.git/': Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out

Thousands of these from today.

But locally from my laptop everything is fine...

GIT_TRACE=1 git clone https://gerrit-replica.wikimedia.org/r/mediawiki/libs/NormalizedException.git/
10:52:40.807666 git.c:439               trace: built-in: git clone https://gerrit-replica.wikimedia.org/r/mediawiki/libs/NormalizedException.git/
Cloning into 'NormalizedException'...
10:52:40.825920 run-command.c:655       trace: run_command: git remote-https origin https://gerrit-replica.wikimedia.org/r/mediawiki/libs/NormalizedException.git/
10:52:40.828240 git.c:725               trace: exec: git-remote-https origin https://gerrit-replica.wikimedia.org/r/mediawiki/libs/NormalizedException.git/
10:52:40.828386 run-command.c:655       trace: run_command: git-remote-https origin https://gerrit-replica.wikimedia.org/r/mediawiki/libs/NormalizedException.git/
10:52:41.408180 run-command.c:655       trace: run_command: git index-pack --stdin -v --fix-thin '--keep=fetch-pack 6196 on dev' --check-self-contained-and-connected
remote: Counting objects: 15, done
remote: Finding sources: 100% (13/13)
10:52:41.414204 git.c:439               trace: built-in: git index-pack --stdin -v --fix-thin '--keep=fetch-pack 6196 on dev' --check-self-contained-and-connected
remote: Getting sizes: 100% (9/9)
remote: Compressing objects: 100% (6129/6129)
remote: Total 97 (delta 3), reused 84 (delta 0)
Receiving objects: 100% (97/97), 20.87 KiB | 192.00 KiB/s, done.
Resolving deltas: 100% (36/36), done.
10:52:41.577457 run-command.c:655       trace: run_command: git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
10:52:41.578536 git.c:439               trace: built-in: git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity

Event Timeline

Legoktm triaged this task as Unbreak Now! priority.Apr 21 2023, 2:54 PM
Legoktm created this task.
legoktm@codesearch8:~$ ping gerrit-replica.wikimedia.org
PING gerrit-replica.wikimedia.org (208.80.153.104) 56(84) bytes of data.
^C
--- gerrit-replica.wikimedia.org ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 98ms
legoktm@codesearch8:~$ ping google.com
PING google.com (172.253.122.139) 56(84) bytes of data.
64 bytes from bh-in-f139.1e100.net (172.253.122.139): icmp_seq=1 ttl=107 time=1.46 ms
64 bytes from bh-in-f139.1e100.net (172.253.122.139): icmp_seq=2 ttl=107 time=1.47 ms
64 bytes from bh-in-f139.1e100.net (172.253.122.139): icmp_seq=3 ttl=107 time=1.62 ms
^C
--- google.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 6ms
legoktm@codesearch8:~$ ping gerrit.wikimedia.org
PING gerrit.wikimedia.org (208.80.154.137) 56(84) bytes of data.
^C
--- gerrit.wikimedia.org ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 53ms

legoktm@codesearch8:~$ ping en.wikipedia.org
PING dyna.wikimedia.org (208.80.154.224) 56(84) bytes of data.
^C
--- dyna.wikimedia.org ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 12ms

Firewall issue maybe?

Mentioned in SAL (#wikimedia-cloud) [2023-04-21T14:57:46Z] <legoktm> rebooting codesearch8 to see if that fixes networking issues (T335197)

Possibly related to https://gerrit.wikimedia.org/r/c/operations/puppet/+/909794 and https://gerrit.wikimedia.org/r/plugins/gitiles/operations/homer/public/+/refs/heads/master/definitions/static.net#19 no longer agreeing on the IPv4 address for gerrit-replica.wikimedia.org?

[08:31]  <     tgr_> the extensions cluster of codesearch is down: https://codesearch-backend.wmcloud.org/extensions/
[14:43] legoktm looks
[14:46]  <  legoktm> hmmm
[14:56]  <  legoktm> tgr_: thanks, it's down because the "search" cluster is still in startup mode, but that's because requests to Gerrit are timing out so it's been going for hours. T335197
[14:56]  < stashbot> T335197: codesearch is getting Failed to connect to gerrit-replica.wikimedia.org port 443: Connection timed out - https://phabricator.wikimedia.org/T335197
[14:57]  <  legoktm> !log codesearch rebooting codesearch8 to see if that fixes networking issues (T335197)
[14:57]  < stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Codesearch/SAL
[15:00]  <    bd808> legoktm: hmmm... I saw a puppet patch yesterday that had something to do with gerrit-replica.wm.o. https://gerrit.wikimedia.org/r/c/operations/puppet/+/909794
[15:01]  <  legoktm> the server can't even ping en.wikipedia.org/dyna.wikimedia.org
[15:01]  <    taavi> which server?
[15:02]  <  legoktm> codesearch8.codesearch.eqiad1.wikimedia.cloud
[15:04]  <    taavi> bd808: sigh. that no longer matches https://gerrit.wikimedia.org/r/plugins/gitiles/operations/homer/public/+/refs/heads/master/definitions/static.net#19
[15:10]  <    taavi> and I think the ping issue is unrelated
[15:10]  <  legoktm> hm yeah, I can curl it fine
[15:15]  <  legoktm> taavi: should I re-triage the task as netops/serviceops then? or if you want to leave a comment explaining what that file is for since it's totally new to me

Change 910717 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] Revert "cloudgw: fix IP address for gerrit-replica.wikimedia.org"

https://gerrit.wikimedia.org/r/910717

Change 910717 merged by Dzahn:

[operations/puppet@production] Revert "cloudgw: fix IP address for gerrit-replica.wikimedia.org"

https://gerrit.wikimedia.org/r/910717

@Legoktm Reverted the change that was supposed to fix the gerrit-replica IP. Should be back to normal. We will look at it closer next week then.

Aklapper assigned this task to Dzahn.

Works again, right?

Assuming so. If not, please reopen. Thanks!