HomePhabricator
Phame Blogs Routing knowledge
Routing knowledge
Wikimedia network engineering blog

Ganeti on modern network design

Written by ayounsi on Dec 15 2023, 9:57 AM.

For reasons already mentioned in other docs (eg. Eqiad Expansion Network Design) we’re moving towards a network architecture where the servers’ layer 3 domains (subnets) are constrained in each rack. Currently (and in most of our core DCs) those layer 3 domains are stretched across all the racks of a given row. In that setting, a Ganeti cluster of a given row (where its hypervisors are spread across the row) leverages this L2 adjacency to be able to live migrate VMs between hypervisors.
In other words, if work is going to be done on hypervisor1, all the VMs it hosts can be temporarily and transparently distributed across the other hypervisorX to prevent any disruptions. Having the same vlan trunked to all the hypervisors of the same row allows the VMs to move to a different hypervisor without requiring any IP renumbering and thus downtime.

Read more...

Multi-platform network configuration

Written by ayounsi on Jul 13 2023, 2:31 PM.

Network configuration is a quite rapidly evolving area which went through multiple phases. It’s also surprisingly tied to monitoring. Below is some historical context from the industry as well as what we’re doing in SRE.

Read more...

Netbox news

Written by ayounsi on Jun 28 2022, 11:27 AM.

Netbox is a tool used by all SREs, either directly or abstracted through cookbooks and various scripts. Managed by Infrastructure-Foundations, it went through a major (and much needed!) upgrade this past quarter, led by John Bond, myself and with the help of Riccardo.

Read more...

RPKI Origin Validation

Written by ayounsi on Aug 10 2020, 1:02 PM.

Since the late 90s, databases named Internet Routing Registries (IRR) have been trying to fulfill that (single) source of truth role. Unfortunately, they are subject to a lot of issues: fragmentation (many existing databases, not all equally well-maintained), security (some databases allow anyone to “claim” a prefix) and complexity (for the network operators). They also contain a lot of inaccurate data that have accumulated over time.

Read more...

Internal anycast

Written by ayounsi on Aug 7 2020, 9:48 AM.

This project brought two major changes to our infrastructure. Firstly, servers that used to be fronted by LVS for load balancing are now peering directly with our routers. Secondly, we have started using IP anycast for a highly critical service: recursive DNS.

Read more...