Page MenuHomePhabricator

integration-agent-docker-1028 is not reachable
Closed, ResolvedPublic

Description

https://integration.wikimedia.org/ci/computer/integration%2Dagent%2Ddocker%2D1028/log

[08/16/22 20:19:38] [SSH] Opening SSH connection to 172.16.7.62:22.
connect timed out
ERROR: Connection is not established!
java.lang.IllegalStateException: Connection is not established!

And I can't ssh to it:

integration-agent-docker-1028.integration.eqiad1.wikimedia.cloud: Permission denied (publickey).

Event Timeline

Mentioned in SAL (#wikimedia-releng) [2022-08-16T20:30:47Z] <hashar> Repooled integration-agent-docker-1028 , it was mysteriously unreachable T315372

hashar claimed this task.

According to https://horizon.wikimedia.org/project/instances/886e9ca7-a2bc-4bd3-9a71-ae743245fe6a/ It had a live-migration on July 19, 2022, 3:13 a.m.

I have rebooted the instance via Horizon and managed to ssh to it again. motd says:

The last Puppet run was at Tue Aug 16 16:37:37 UTC 2022 (231 minutes ago).

So it got stuck somehow?

Based on timing, this was likely related to an accidental reboot of several cloudvirts that happened on today (2022-09-16).

Could have been. That got noticed last week due to a script failing T315106 and I am guessing the instance had been unreachable for a lot longer though. Anyway the restart fixed it ;) Thx for the hint Bryan.