Page MenuHomePhabricator

Add pod ip address blocks to staging
Closed, ResolvedPublic

Description

With the growing workloads deployed on staging, we need more pod ip space.

With the next re-init of the wikikube-staging clusters we're going to change the pod ip ranges as follows:

SiteExisting Staging POD RangeNew Staging POD Range
eqiad10.64.75.0/2410.64.64.0/21
codfw10.192.75.0/2410.192.64.0/21

Event Timeline

Clement_Goubert triaged this task as High priority.

We have a free /21 in the Prefix containers used for wikikube staging in codfw and eqiad that we could probably use instead of adding another /24:

Not sure if there is anything speaking against that, but I think it's more future proof than adding another /24 (cc @akosiaris).

Unfortunately there is no proper way of doing this without downtime since kube-proxy requires the pod ip space to be a single network per address family. So we'll have to do this during T341984: Update Kubernetes clusters to 1.31.

It requires (probably not comprehensive):

  • Updating the ipv4 pool in helmfile.d/admin_ng/values/staging*/calico-values.yaml
  • Updating cluster_cidr in hieradata/common/kubernetes.yaml
  • Updating Kubernetes POD IP delegation in templates/10.in-addr.arpa
  • The usual grep and replace in hieradata and deployment-charts repo

Not sure if there is anything speaking against that, but I think it's more future proof than adding another /24 (cc @akosiaris).

There is unfortunately. WikiKube's service IPs (e.g. 10.64.72.0/24 for eqiad, similarly for codfw) are in that /21 as well as staging's service IPs. We can't give calico the entire /21 because of those 2. We 'd need to renumber them before we do that. To unblock and give us some time before we have to think about this again, I 'd go for 10.64.80.0/20 and 10.192.80.0/20 respectively. They are larger and in the original /18 we had reserved for what came to be known as wikikube later on.

Ah, I was pretty sure I was reading netbox wrong :)
10.64.80.0/20 and 10.192.80.0/20 should be fine as well though...

@cmooney this probably needs a prefix update on "your" side as well, right (like T375845)?

Not sure if there is anything speaking against that, but I think it's more future proof than adding another /24 (cc @akosiaris).

There is unfortunately. WikiKube's service IPs (e.g. 10.64.72.0/24 for eqiad, similarly for codfw) are in that /21 as well as staging's service IPs.

I think @akosiaris may have made a small accounting error here. 10.64.64.0/21 ends at 10.64.71.255, so it doesn't overlap with 10.64.72.0/24, and checking on the routers it is definitely available. The same is true of 10.192.64.0/21 in codfw.

10.64.80.0/20 and 10.192.80.0/20 should be fine as well though...

Both of those are also free and could be used as well.

One thing I should also mention is it may be possible to extend the existing ranges, turning the current range into the upper-half of a new /23 assignment:

SiteExisting Staging POD RangePotential Staging POD Range
eqiad10.64.75.0/2410.64.74.0/23
codfw10.192.75.0/2410.192.74.0/23

That obviously doesn't give us much runway, it only adds another 256 IPs total in each case. But perhaps it would make some of the work easier as we could keep a single range and existing pods with their current IPs.

Ah, I was pretty sure I was reading netbox wrong :)
10.64.80.0/20 and 10.192.80.0/20 should be fine as well though...

Apparently, no, I was the one. I got tricked by the 2 dots in netbox's interface before the 10.192.72.0/24 and 10.64.72.0/24 and never crossed checked it with either my brain or sipcalc, which I should have.

I think @akosiaris may have made a small accounting error here. 10.64.64.0/21 ends at 10.64.71.255, so it doesn't overlap with 10.64.72.0/24, and checking on the routers it is definitely available. The same is true of 10.192.64.0/21 in codfw.

Indeed I did. However, you wouldn't see that /24 in the routers anyway. No Service IP range gets announced currently from any kubernetes cluster. They are entirely internal to their respective clusters. For all intents and purposes we could have gone with something in the 192.168.x.x range but decided against that in order to allow ourselves the ability to announce it if ever needed as well as avoiding confusion (reactions like what is that 192.168. doing here?). We have done some experiments that haven't full panned out, see T238909: Proposal: simplify set up of a new load-balanced service on kubernetes, we might revisit them at some point (it's been years, all components are more mature).

That obviously doesn't give us much runway, it only adds another 256 IPs total in each case. But perhaps it would make some of the work easier as we could keep a single range and existing pods with their current IPs.

To be honest, I 'd rather give us the heard room now than have to revisit this in a few quarters and have to wait for another kubernetes upgrade to make this easier.. It is conceivable we 'll need to increase the size of the staging cluster anyway, I 'd rather use either the /21 or the /20 instead.

Let's go with the /21's then which should give us ample room for staging

Change #1126102 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] Add new Wikikube staging POD IP ranges to router/switch BGP filter

https://gerrit.wikimedia.org/r/1126102

Change #1126103 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] admin_ng: Change staging-codfw pod ip range to 10.192.64.0/21

https://gerrit.wikimedia.org/r/1126103

Change #1126105 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Update wikikube-staging codfw pod ip range

https://gerrit.wikimedia.org/r/1126105

Change #1126108 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/dns@master] Delegate reverse zones for newly assigned K8s POD IP ranges staging

https://gerrit.wikimedia.org/r/1126108

Change #1126102 merged by jenkins-bot:

[operations/homer/public@master] Add new Wikikube staging POD IP ranges to router/switch BGP filter

https://gerrit.wikimedia.org/r/1126102

Change #1126108 merged by Cathal Mooney:

[operations/dns@master] Delegate reverse zones for newly assigned K8s POD IP ranges staging

https://gerrit.wikimedia.org/r/1126108

Change #1126105 merged by JMeybohm:

[operations/puppet@production] Update wikikube-staging codfw pod ip range

https://gerrit.wikimedia.org/r/1126105

Change #1126103 merged by jenkins-bot:

[operations/deployment-charts@master] admin_ng: Change staging-codfw pod ip range to 10.192.64.0/21

https://gerrit.wikimedia.org/r/1126103

staging-codfw switched to the new ip pool today (T384450)

Change #1128350 had a related patch set uploaded (by Kamila Součková; author: Kamila Součková):

[operations/puppet@production] Update wikikube-staging codfw pod ip range

https://gerrit.wikimedia.org/r/1128350

staging-eqiad switched to the new ip pool today (T389045)

JMeybohm renamed this task from Add pod ip address blocks to staging-eqiad to Add pod ip address blocks to staging.Mar 17 2025, 3:18 PM