Page MenuHomePhabricator

No free IPs on public1-ulsfo vlan (Nov 2025)
Closed, ResolvedPublic

Description

@ssingh found that he could not create a new VM on the public vlan in ulsfo today, as there are no free IPs on the allocated subnet.

Widen range

Luckily we had planned to increase the size of this subnet to a /27 as part of the upcoming ulsfo network refresh (see T408892#11330727). I wasn't aware of the lack of free IPs but the plan was to bring the subdivision of the public /24 there match what we have at other POPs, where we have a public /27 for each rack.

So it is no problem for us to make the subnet 198.35.26.0/27. I can make this change in Netbox and also on the routers. This change will not disrupt any existing host traffic.

Subnet mask on existing hosts

The tricker problem is that when we assign a host to the IP 198.35.26.15/27, any of the existing hosts - for instance dns4002 on 198.35.26.8/28 - will be unable to communicate with it.

Once the router change is done, therefore, we need to somehow adjust the netmask on all the existing hosts on the vlan. Probably the simplest way to do this is for us to go through them one-by-one, change the netmask in /etc/network/interfaces, and reboot the host.

Existing hosts

Servers:

dns4003.wikimedia.org
dns4004.wikimedia.org
lvs4008.ulsfo.wmnet
lvs4009.ulsfo.wmnet
lvs4010.ulsfo.wmnet

VMs:

bast4005.wikimedia.org
doh4001.wikimedia.org
doh4002.wikimedia.org
hcaptcha-proxy4001.wikimedia.org
install4003.wikimedia.org

Once all existing hosts have had this done we can safely add new hosts to the vlan, which will start using the free IPs in the upper half of the extended range.

Event Timeline

cmooney triaged this task as Medium priority.
cmooney updated the task description. (Show Details)
cmooney updated the task description. (Show Details)
Reedy renamed this task from No free IPs on public1-ulsfo vlan (Nov 2025) to No free IPs on public1-ulsfo vlan (Nov 2025).Thu, Nov 13, 3:40 PM

Change #1205135 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/puppet@production] network data: increase size of public1-ulsfo IPv4 range

https://gerrit.wikimedia.org/r/1205135

@ssingh I made a patch and can kick off the changes in Netbox and on the routers next week for this.

However I wonder what your thoughts are, how many more public IPs do you need in the short term? Reason I ask is this vlan will be doubled (quadrupled actually as we will add another public vlan for the second rack) during the T408510: ULSFO: switch refresh work which is coming up in the next few months. That work will require most hosts to be reimaged while we change the network setup to a L3 POP.

So an option, if removing unused IPs like the LVS from the vlan now gives enough for the proxy VMs, is to decline this task and increase the subnet size as planned during the larger job?

Actually I discussed with @Papaul in relation to our plans for ulsfo, and we both agree that work would be a lot simpler if we make this change now. We can discuss the way forward next week.

Once the router change is done, therefore, we need to somehow adjust the netmask on all the existing hosts on the vlan. Probably the simplest way to do this is for us to go through them one-by-one, change the netmask in /etc/network/interfaces, and reboot the host.

Then update the host IP in Netbox or better, run the Netbox puppetdb import script for each hosts for a proper sync up.

@ssingh I made a patch and can kick off the changes in Netbox and on the routers next week for this.

However I wonder what your thoughts are, how many more public IPs do you need in the short term? Reason I ask is this vlan will be doubled (quadrupled actually as we will add another public vlan for the second rack) during the T408510: ULSFO: switch refresh work which is coming up in the next few months. That work will require most hosts to be reimaged while we change the network setup to a L3 POP.

So an option, if removing unused IPs like the LVS from the vlan now gives enough for the proxy VMs, is to decline this task and increase the subnet size as planned during the larger job?

Actually I discussed with @Papaul in relation to our plans for ulsfo, and we both agree that work would be a lot simpler if we make this change now. We can discuss the way forward next week.

My plan for now to unblock the hCaptcha work was to decommission one of the Wikidough hosts in ulsfo -- which should be fine since they both average ~14 rps between them -- and then use that IP to create the hCaptcha VM. The reasoning for doing so is that hCaptcha is more critical service than Wikidough right now and I don't want it to be running on a single VM in ulsfo. But let me know what you think about this, in general, and if I should not go down this path!

You can use 198.35.26.5/28. It's marked as reserved for infra, but we don't need it (and we will even less need it after the network upgrade).

My plan for now to unblock the hCaptcha work was to decommission one of the Wikidough hosts in ulsfo -- which should be fine since they both average ~14 rps between them -- and then use that IP to create the hCaptcha VM. The reasoning for doing so is that hCaptcha is more critical service than Wikidough right now and I don't want it to be running on a single VM in ulsfo. But let me know what you think about this, in general, and if I should not go down this path!

Why not just re-use the LVS IPs? I think it's better to keep Wikidough with two hosts in case one fails?

In terms of expanding the vlan I think we should do it anyway, but I agree it should not hold up hCaptcha.

You can use 198.35.26.5/28. It's marked as reserved for infra, but we don't need it (and we will even less need it after the network upgrade).

I am not even sure how to pass that in the cookbook though, a specific subnet. Right now it is failing for me because of resource allocation (IP address), so any thoughts on how do that? I can RTFM but asking you is much easier :)

My plan for now to unblock the hCaptcha work was to decommission one of the Wikidough hosts in ulsfo -- which should be fine since they both average ~14 rps between them -- and then use that IP to create the hCaptcha VM. The reasoning for doing so is that hCaptcha is more critical service than Wikidough right now and I don't want it to be running on a single VM in ulsfo. But let me know what you think about this, in general, and if I should not go down this path!

Why not just re-use the LVS IPs? I think it's better to keep Wikidough with two hosts in case one fails?

In terms of expanding the vlan I think we should do it anyway, but I agree it should not hold up hCaptcha.

Yeah, good point about the LVS IPs since we no longer need them given Liberica. I will be checking that with Valentin today.

Yeah, good point about the LVS IPs since we no longer need them given Liberica. I will be checking that with Valentin today.

It's more than they aren't "needed", they aren't being used. They are just incorrectly marked as in use in Netbox and need to be tidied up (also in puppet where they are referenced).

You can use 198.35.26.5/28. It's marked as reserved for infra, but we don't need it (and we will even less need it after the network upgrade).

I am not even sure how to pass that in the cookbook though, a specific subnet. Right now it is failing for me because of resource allocation (IP address), so any thoughts on how do that? I can RTFM but asking you is much easier :)

If the IP is freed up in Netbox the cookbook will pick it automatically. I went ahead and deleted it there now so there is one free at the moment.

Yeah, good point about the LVS IPs since we no longer need them given Liberica. I will be checking that with Valentin today.

It's more than they aren't "needed", they aren't being used. They are just incorrectly marked as in use in Netbox and need to be tidied up (also in puppet where they are referenced).

Yes thanks, we will be discussing this today and I will file a task later.

You can use 198.35.26.5/28. It's marked as reserved for infra, but we don't need it (and we will even less need it after the network upgrade).

I am not even sure how to pass that in the cookbook though, a specific subnet. Right now it is failing for me because of resource allocation (IP address), so any thoughts on how do that? I can RTFM but asking you is much easier :)

If the IP is freed up in Netbox the cookbook will pick it automatically. I went ahead and deleted it there now so there is one free at the moment.

Thanks, that worked! Allocated IPv4 198.35.26.5/28

Change #1206424 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] hiera: lvs/interfaces: remove public1-ulsfo

https://gerrit.wikimedia.org/r/1206424

Change #1206424 merged by Ssingh:

[operations/puppet@production] hiera: lvs/interfaces: remove public1-ulsfo

https://gerrit.wikimedia.org/r/1206424

Actually I discussed with @Papaul in relation to our plans for ulsfo, and we both agree that work would be a lot simpler if we make this change now. We can discuss the way forward next week.

@ayounsi has a different view on this and we have freed up a few IPs now so I will close this task for the time being. The subnet size will be changed when we rebuild ulsfo moving it from L2 to L3 pop (T408510)