Page MenuHomePhabricator

Restore asw-c7-codfw cables
Closed, ResolvedPublic

Description

The migration done in T267865#6624814 properly moved the interfaces but for some reasons the cables were still tied to the device itself.
This caused all the cables from asw-c7-codfw to be deleted at the same time as the switch.

https://netbox.wikimedia.org/extras/changelog/?request_id=12a5ec8c-f569-4f47-a847-1f4276d589ce

Event Timeline

ayounsi created this task.

It appears that the cable objects, in addition to having the termination_a_id and termination_b_id, have also the _termination_a_device_id and _termination_b_device_id properties, tying up the cable to both the interfaces and the devices. That's probably what bit us as we did migrate only the interfaces assuming that the cables were tied up to them and not to the devices.

I'm in the process of restoring the deleted cables and will also check later if we have any migrated cable for which the device IDs don't match with the IDs of the interfaces's device.

Ok, I should have restored the deleted cables, see:
https://netbox.wikimedia.org/extras/changelog/?request_id=6a07ffae-1e22-41b3-bde2-27363eb07d45

The created + updates is because I forgot the type in the first creation.

This is the code I've run in the end in a Netbox nbshell (and could have been limited to the first block if adding the type directly there):

import csv
import uuid

request_id = uuid.uuid4()
user = User.objects.get(username='volans')
cables = {}
with open('/srv/netbox-dumps/2021-01-04-20:05/dcim.cables.csv', newline='') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        cables[row['id']] = row

missing_cables = ['3222', '2762', '2749', '2734', '2622', '2602', '2538', '2430', '2420', '2419', '2157', '2156', '1983']
for cable in cables.values():
    if cable['label'] == '10708':
        print(cable['id'])
missing_cables.append('1578')

missing_cables_details = {int(m): cables[m] for m in missing_cables}

for cable in missing_cables_details.values():
    c = Cable.objects.create(label=cable['label'], color=cable['color'], status=cable['status'], termination_a_id=int(cable['termination_a_id']), termination_a_type_id=19, termination_b_id=int(cable['termination_b_id']), termination_b_type_id=19)
    log = c.to_objectchange('create')
    log.request_id = request_id
    log.user = user
    log.save()

# and now to set the type I forgot, same for all cables except one

new_cables = list(range(3597, 3611, 1))
for cable_id in new_cables:
    c = Cable.objects.get(id=cable_id)
    c.type = 'dac-passive'
    log = c.to_objectchange('update')
    log.request_id = request_id
    log.user = user
    log.save()
    c.save()

# and now for the last cable that is different
c = Cable.objects.get(id=3610)
c.type = 'mmf'
log = c.to_objectchange('update')
log.request_id = request_id
log.user = user
log.save()
c.save()

The termination type ID had to be statically set because the CSV dump has only the label and would have required an additional lookup. Given it was the same for all the interfaces I've decided to hardcode it.

@ayounsi could you have a look and check if it's all restored please?

Checked that data in Netbox looks good and homer run is a NOOP.

I've verified if we had any other inconsistencies with:

def compare(cid, a, b, label):
    if a != b:
        print(f'Cable ID {cid} {label}: {a} != {b}')


def check_cable(cable):
    compare(cable.id, cable.termination_a_id, cable._orig_termination_a_id, 'A termination differs from the _orig one')
    compare(cable.id, cable.termination_a_type_id, cable._orig_termination_a_type_id, 'A termination TYPE differs from the _orig one')
    compare(cable.id, cable.termination_b_id, cable._orig_termination_b_id, 'B termination differs from the _orig one')
    compare(cable.id, cable.termination_b_type_id, cable._orig_termination_b_type_id, 'B termination TYPE differs from the _orig one')
    a_side = ContentType.objects.get(id=cable.termination_a_type_id).model_class().objects.get(id=cable.termination_a_id)
    b_side = ContentType.objects.get(id=cable.termination_b_type_id).model_class().objects.get(id=cable.termination_b_id)
    a_device_id = None
    try:
        a_device_id = a_side.device.id
    except:
        if a_side.connected_endpoint and a_side.connected_endpoint.device:
            a_device_id = a_side.connected_endpoint.device.id
    b_device_id = None
    try:
        b_device_id = b_side.device.id
    except:
        if b_side.connected_endpoint and b_side.connected_endpoint.device:
            b_device_id = b_side.connected_endpoint.device.id
    if a_device_id is not None and cable._termination_a_device_id is not None:
        compare(cable.id, a_device_id, cable._termination_a_device_id, 'A termination device ID differs from the _termination_a_device_id')
    if b_device_id is not None and cable._termination_b_device_id is not None:
        compare(cable.id, b_device_id, cable._termination_b_device_id, 'B termination device ID differs from the _termination_b_device_id')


cables = Cable.objects.all()
for cable in cables:
    check_cable(cable)

The results is just:

Cable ID 1132 A termination device ID differs from the _termination_a_device_id: 204 != 1615
Cable ID 1607 B termination device ID differs from the _termination_b_device_id: 1956 != 16
Cable ID 1608 B termination device ID differs from the _termination_b_device_id: 615 != 1600

In which it seems that the difference is between the VC master and the device where the cable is physically connected.
It looks like we don't have any other cable with inconsistent data.

I've also checked a PostgreSQL dump and found that for example the existing row for cable 3222 had termination_b_id=17297 (the interface) that had its device.id=235 (the correct, moved object), but had also _termination_b_device_id=1892, that is the deleted old switch. This confirms that the move on the related task left some "private" data inconsistent on those cables and got deleted in cascade when the old switch was deleted.

@ayounsi I think we could resolve this one. The custom script to move a server seems to delete and re-create the cable so that should not be affected by this.