Page MenuHomePhabricator

Support services VIPs with not marked as VIP in Netbox
Open, MediumPublic

Description

As a result of an audit (see parent task) I've noticed some discrepancies between the data in Netbox and the one on the hosts (synced via PuppetDB) for some support services:

gitlab

On gitlab1001 we have allocated 208.80.154.14/32 as a proper VIP, but that one is allocated as /26 on Netbox (the default is to use the parent prefix's netmask) and is not marked as VIP, as it should following [1]. The automation in those cases doesn't attach the IP to the interface on the server because being a VIP it could migrate to another host. See https://netbox.wikimedia.org/ipam/ip-addresses/8255/ and https://netbox.wikimedia.org/ipam/ip-addresses/8257/

Same on gitlab2001 for https://netbox.wikimedia.org/ipam/ip-addresses/8829/ and https://netbox.wikimedia.org/ipam/ip-addresses/8830/

if the above is correct, the solution is to:

  • alter the above IP Addresses netmask in Netbox to /32
  • set their role to VIP
  • detatch the interface from the respective hosts
list1001

Has the same IP allocated twice on the host with two different netmasks:

inet 208.80.154.31/26 brd 208.80.154.63 scope global ens5
   valid_lft forever preferred_lft forever
inet 208.80.154.31/32 scope global ens5
   valid_lft forever preferred_lft forever

As this is not a VIP I'm not sure why it's allocated twice and with the /32 netmask, I guess the host need a fix to have it just once with the /26 netmask.

[1] https://wikitech.wikimedia.org/wiki/DNS/Netbox#How_to_manually_allocate_a_special_purpose_IP_address_in_Netbox

Event Timeline

Pinging @Jelto for gitlab (not sure if the issue is still present or relevant) and @akosiaris for lists1001 (I can confirm the missconfig is still there, but dunno who handles that service those days :) )

Pinging @Jelto for gitlab (not sure if the issue is still present or relevant) and @akosiaris for lists1001 (I can confirm the missconfig is still there, but dunno who handles that service those days :) )

Adding @Amir1 and @Legoktm. I can see the misconfig too, not sure why it is there though. My guess is operator error since that file isn't puppet managed (aside from a weird hack we do to insert IPv6 related things in there), but I 'd like a confirmation. If this isn't by design, the fix appears to be easy, just fix the misconfig and reboot.

Legoktm added a subscriber: Ladsgroup.

Any IP (mis)configuration most likely predates Amir's and my involvement with mailman, we never touched that stuff. I can't think of any reason why it was set like that, so fixing it sounds good to me, we should just announce the maintenance window ahead of time since it needs a reboot.

gitlab1001 and gitlab2001 will be decommissioned soon in T307142. So regarding GitLab this should be resolved soon.

However thanks for checking the config and bringing this up. For future GitLab machines we will try to keep netbox and puppet in sync.

Regarding mailman, yes, Kunal and I didn't touch those settings [1] (I couldn't as I didn't have access to netbox back then). Maybe @herron did it when he did the upgrade to buster? but it doesn't matter. It's fine by me and if you need someone to babysit the change (and do the communication, etc.) I can do it.

[1] OT note: The way we did migration of mm2 to mm3 was basically changing the floor while people were standing on it. mm3 and mm2 were served at the same time from the same apache server and the same DNS record (we didn't even change anything there). It didn't even require a reboot. We just switched the main URL to redirect to a new path.

Didn't need a reboot after all. I fixed /etc/network/interfaces configuration and issued a systemctl restart networking logged in as root via the ganeti-instance console command. Didn't miss a single packet. I think that we can call lists1001 solved.

gitlab1001 and gitlab2001 will be decommissioned soon in T307142. So regarding GitLab this should be resolved soon.

However thanks for checking the config and bringing this up. For future GitLab machines we will try to keep netbox and puppet in sync.

This has happened. Both hosts don't exist anymore now.

Assigning to Cathal as per meeting discussion.