Page MenuHomePhabricator

Jenkins master / client ssh connection fails due to missing ssh algorithm
Closed, ResolvedPublic

Description

I restarted Jenkins a few minutes ago due to a Java upgrade. On start up, it is no more establishing SSH connections to the labs Precise/Trusty instances. The only one working are:

  • gallium (prod precise)
  • gallium (prod precise)
  • integration-slave-jessie-1001 (labs jessie)
  • puppet-compiler02.eqiad.wmflabs (labs)

integration-slave-precise-1011 auth.log reports:

fatal: no matching mac found: client hmac-sha1-96,hmac-sha1,hmac-md5-96,hmac-md5 server hmac-sha2-512,hmac-sha2-256 [preauth]
error: Could not load host key: /etc/ssh/ssh_host_ed25519_key

Event Timeline

hashar raised the priority of this task from to Needs Triage.
hashar updated the task description. (Show Details)
hashar added subscribers: hashar, MoritzMuehlenhoff.
hashar triaged this task as Unbreak Now! priority.May 27 2015, 1:49 PM

Being investigated with @MoritzMuehlenhoff

QChris sent a change for Gerrit which is related.

https://gerrit.wikimedia.org/r/#/c/213216/ Turn off sshd MAC and KEX hardening for gerrit replication targets

So seems java doesn't support some algorithms. That got turned off for gallium / lanthanum which explain why they pass. Need to do the same for all of CI :-}

Applied on hiera page https://wikitech.wikimedia.org/wiki/Hiera:Integration

"ssh::server::disable_nist_kex": false
"ssh::server::explicit_macs": false

Puppet then change /etc/ssh/sshd_config

-KexAlgorithms curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256
-MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-512,hmac-sha2-256,umac-128@openssh.com

That solves the problem. Sending it to operations/puppet.

Change 214055 had a related patch set uploaded (by Hashar):
Turn off sshd MAC and KEX hardening for Jenkins slaves

https://gerrit.wikimedia.org/r/214055

hashar lowered the priority of this task from Unbreak Now! to Medium.May 27 2015, 2:08 PM

Issue is fixed by cherry picked the puppet patch https://gerrit.wikimedia.org/r/214055 . Pending ops review to refine the patch if needed.

hashar claimed this task.

I have filled follow up tasks (T100517 and T100518). Puppet patch is already deployed on integration and beta cluster puppet masters.

All Jenkins slaves are now attached to the Jenkins master. The issue is solved.

Change 214055 merged by Muehlenhoff:
Turn off sshd MAC/KEX hardening for Jenkins and Beta

https://gerrit.wikimedia.org/r/214055

Change 219828 had a related patch set uploaded (by Hashar):
Reenable sshd MAC/KEX hardening for Jenkins and Beta

https://gerrit.wikimedia.org/r/219828

Change 219828 abandoned by Hashar:
Reenable sshd MAC/KEX hardening for Jenkins and Beta

Reason:
Broken for now because of java trilead-ssh2 T103351

https://gerrit.wikimedia.org/r/219828

@hashar do you think this will change on contint1001 with openjdk-7-jdk:amd64 7u111-2.6.7-1~deb8u1 ?

Change 318248 had a related patch set uploaded (by Dzahn):
deployment-prep/integration: stop downgrading sshd MAC and KEX

https://gerrit.wikimedia.org/r/318248

Change 318248 abandoned by Hashar:
deployment-prep/integration: stop downgrading sshd MAC and KEX

Reason:
The issue is in Jenkins itself, not Java :D T100509

https://gerrit.wikimedia.org/r/318248

@Dzahn the issue is in Jenkins itself which uses an old / apparently unmaintained ssh library: trilead-ssh2 T103351. So we need to keep the rule around.