Page MenuHomePhabricator

integration-agent-docker-1001.integration.eqiad.wmflabs is unreachable
Closed, ResolvedPublic

Description

Mentioned by @Andrew integration-agent-docker-1001.integration.eqiad.wmflabs is unreachable over ssh.

Created Sept. 9, 2019, 9:45 p.m by @thcipriani
It uses the debian-jessie image.

https://horizon.wikimedia.org/project/instances/f2e6bb52-e7d3-4c67-9178-7e6c90284dc9/console

The the initial puppet catalog eventually failed due to wrong SSL cert for the puppetmaster.

Ssh yields:

integration-agent-docker-1001.integration.eqiad.wmflabs: Permission denied (publickey).

Event Timeline

hashar created this task.Sep 11 2019, 2:23 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 11 2019, 2:23 PM
hashar updated the task description. (Show Details)Sep 11 2019, 2:24 PM

Oh, if this was created on Monday then it was probably built during the puppetmaster swap and was dead on arrival.

hashar updated the task description. (Show Details)Sep 11 2019, 2:26 PM

Mentioned in SAL (#wikimedia-releng) [2019-09-11T14:27:12Z] <hashar> puppet cert clean integration-agent-docker-1001.integration.eqiad.wmflabs # T232619 | might unbreak puppet

I think I have created an instance based on the same name earlier but using Stretch, that was to prepare the migration of the instances to Stretch. Not a big loss though.

What I suspect is there is a cert mismatch between the client (which would use the labs puppetmaster cert) and the CI puppetmaster. That is a known issue. Potentially clearing the cert on the puppetmaster might solve the issue, else we will have to recreate the instance.

hashar renamed this task from integration-agent-docker-1001.integration.eqiad.wmflabs is unrecheable to integration-agent-docker-1001.integration.eqiad.wmflabs is unreachable.Sep 11 2019, 2:30 PM
hashar triaged this task as Normal priority.

I think I have created an instance based on the same name earlier but using Stretch, that was to prepare the migration of the instances to Stretch. Not a big loss though.
What I suspect is there is a cert mismatch between the client (which would use the labs puppetmaster cert) and the CI puppetmaster. That is a known issue. Potentially clearing the cert on the puppetmaster might solve the issue, else we will have to recreate the instance.

You created it, but then I realized it wasn't pooled in Jenkins, so I deleted and re-created it.

Oh, if this was created on Monday then it was probably built during the puppetmaster swap and was dead on arrival.

I think this was what the problem was ^ probably just need to delete the instance and re-create.

@hashar sorry for making a mess and not cleaning it up :)

hashar closed this task as Resolved.Sep 11 2019, 4:02 PM
hashar claimed this task.

Thank you @thcipriani and I have deleted the instance :]