af-nb-db-2.automation-framework.eqiad.wmflabs has broken network
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	Krenair
	Sep 11 2019, 9:01 PM

Description

For T232429: Create in-cloud, cloud-vps-wide cumin masters I was looking at what hosts are *not* currently responding to cumin. One of them is this host with this error:
ssh: connect to host af-nb-db-2.automation-framework.eqiad.wmflabs port 22: No route to host
Now that's interesting, that should never happen for a host under eqiad.wmflabs, ever. It turns out that the IP in Designate for this is 172.16.6.245, but openstack-browser reveals something strange about the networking setup on this instance:

It has two internal IP addresses listed, the second of them being 172.16.6.244. It turns out that IP *does* function:

krenair@cloud-cumin-01:~$ ssh 172.16.6.244
Permission denied (publickey).

(keys are broken there but that's a minor thing in comparison)

Why and how does this host have multiple internal IPs? What should happen if a host ends up with this?

Related Objects

Mentioned In: T232677: Remove support for Debian Jessie in Cloud Services
Mentioned Here: T232429: Create in-cloud, cloud-vps-wide cumin masters

Event Timeline

Krenair created this task.Sep 11 2019, 9:01 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 11 2019, 9:01 PM

Nova seems to associate @crusnov with this server
Addresses data:

{
    'lan-flat-cloudinstances2b': [
        {
            'OS-EXT-IPS-MAC:mac_addr': 'fa:16:3e:8e:7d:5e',
            'version': 4,
            'addr': '172.16.6.244',
            'OS-EXT-IPS:type': 'fixed'
        },
        {
            'OS-EXT-IPS-MAC:mac_addr': 'fa:16:3e:f0:8e:7a',
            'version': 4,
            'addr': '172.16.6.245',
            'OS-EXT-IPS:type': 'fixed'
        }
    ]
}

Krenair mentioned this in T232677: Remove support for Debian Jessie in Cloud Services.Sep 12 2019, 11:55 PM

Krenair updated the task description. (Show Details)Sep 18 2019, 8:13 PM

@crusnov is this instance working at all? If not please could you try deleting it, and if needed, re-create?

@aborrero it looks like arturo-k8s-test-3.openstack.eqiad.wmflabs has also got this issue

This is an error that sometimes happens during VM creation -- I think it's something like...

VM is scheduled
IP is allocated by Neutron
Scheduled VM fails to come up (possibly due to a cloudvirt being offline)
VM is rescheduled
IP is allocated by Neutron

etc.

As far as I know it only happens to brand new VMs so never damages any actual work in progress. And I don't know if the bug is still present in Newton.

Probably best to just delete the affected VM and wait and see if it happens again.

aborrero triaged this task as Low priority.Nov 22 2019, 10:15 AM

VM no longer exists.

af-nb-db-2.automation-framework.eqiad.wmflabs has broken networkClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

af-nb-db-2.automation-framework.eqiad.wmflabs has broken network
Closed, ResolvedPublic
Actions