Page MenuHomePhabricator

mgmt ssh access for prometheus hosts in magru
Closed, ResolvedPublic

Description

prometheus in magru is set up now, I see though that it can't access mgmt network on port 22 to check the ipmi interfaces, see also https://prometheus-magru.wikimedia.org/ops/alerts?search=sshdown

prometheus7001:~$ telnet cp7002.mgmt.magru.wmnet 22
Trying 10.140.128.12...

cc @cmooney

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
Resolvedcmooney

Event Timeline

On https://librenms.wikimedia.org/alerts, I see the following for mr1-magru:

#1: last_polled => '2024-05-08 14:01:37'
   last_polled_timetaken => '33.584737062454'
   last_discovered_timetaken => '54.538'
   last_discovered => '2024-05-08 12:42:21'
   last_ping => '2024-05-08 14:01:04'
   last_ping_timetaken => '116'
   msg => 'No supported key exchange algorithms found'

On mr1-magru, I see 10.140.1.18 (prometheus7001) and denied by policy, which makes me wonder if we need to run https://netbox.wikimedia.org/extras/scripts/capirca.GetHosts/ for Capirca to generate the ACL?

On mr1-magru, I see 10.140.1.18 (prometheus7001) and denied by policy, which makes me wonder if we need to run https://netbox.wikimedia.org/extras/scripts/capirca.GetHosts/ for Capirca to generate the ACL?

Yeah I think that is very likely it - the host won't be included in the config on the firewall unless we've run that and pushed the update with Homer. I'll get on that now thanks!

In terms of LibreNMS / key exchange message I'll check deeper on what that relates to, but as per the telnet failing looks like the connection from prometheus server is being blocked at the tcp level it's not getting that far.

cmooney claimed this task.

Sorry for the delay, the capirca script times out a lot for some reason will need to look at that.

Working now after pushing the updated config to mr1-magru:

cmooney@prometheus7001:~$ ssh cp7002.mgmt.magru.wmnet 
The authenticity of host 'cp7002.mgmt.magru.wmnet (10.140.128.12)' can't be established.
ECDSA key fingerprint is SHA256:IiL2+2fsc0Zw6iBQb9GX1/cHJLC93Tx6R5c1kPBIi7E.
Are you sure you want to continue connecting (yes/no/[fingerprint])?