Summary
We will merging 2 /18s to a single /17 for IPv4 during the next k8s version upgrade T341984: Update Kubernetes clusters to 1.31
Long form
During putting in service the new wikikube-workers for T369744: wikikube-worker1240 to wikikube-worker1304 implementation tracking, we 've encountered an issue where a number of wikikube-workers were left without an IPv4 /26 prefix because we 've initially allocated a /18, which containers 256 /26 prefixes (we ended up with 286 nodes when the new nodes were pooled). While the original projection of 256 /26s being enough wasn't entirely wrong (221 was the node count before adding the nodes for the refresh), it doesn't leave enough space to perform large-ish operations (e.g. like adding 65 nodes).
While:
- we 'll easily fix this by decomming the old nodes that the new nodes were slated to refresh T375842: decommission mw[1349-1413]
- we could avoid this in the future by instating rules like "don't ever add more than X nodes in a batch "
long term, it is probably more productive to just add 1 more /18 to the wikikube pool for both DCs. We are lucky (or I had the foresight, can't remember) and the /18s we currently have are followed by another empty and already reserved for Kubernetes /18. Namely:
we already have
10.67.128.0/18 (eqiad), 10.194.128.0/18 (codfw)
and
10.67.192.0/18 (eqiad), 10.67.192.0/18 (codfw) are available.
I went ahead and marked them as reserved with a description to this task already pending discussion
Note that per past experience, changing the ippool is a arduous and dangerous process. We probably don't need that though and can live without aggregating on the configuration level the 2 /18s to a /17. On the BGP level, it's /26s anyway that get announced.
Tagging netops and serviceops for further discussion.