Page MenuHomePhabricator

Wikidough: consider regional Anycast addresses
Closed, DeclinedPublic

Assigned To
None
Authored By
cmooney
Mar 21 2026, 11:02 AM
Referenced Files
F73322522: image.png
Mar 21 2026, 11:02 AM
F73322526: image.png
Mar 21 2026, 11:02 AM
F73322514: image.png
Mar 21 2026, 11:02 AM

Description

One thing I observed today when double-checking the routing after we depooled Wikidough in esams was that eqiad took up all the redirected traffic, not drmrs as expected.

image.png (445×926 px, 72 KB)

image.png (445×926 px, 101 KB)

image.png (445×926 px, 143 KB)

This led me to consider if we should perhaps think about allocating additional, region-specific, hostnames / IPs for the service. For example:

HostnameIPv4Announced from sites
europe.wikimedia-dns.org185.71.138.139esams, drmrs
north-america.wikimedia-dns.org185.71.138.140eqiad, codfw, ulsfo

Users could use those instead if they wanted to ensure their requests only go to servers in the region. Beyond those suggestions I'd not be so sure. Certainly anything like that would make the logic about depooling harder, and I'm not sure it makes sense to publish a region-specific address if it is only going to be served by one POP (i.e. eqsin/magru).

Anyway just a thought, this kind of routing shouldn't really happen so much, but it can. I'm taking some inspiration here from how ntp.org have the generic pool names but also region-specific.

Event Timeline

cmooney triaged this task as Low priority.

FWIW the reason for traffic re-routed to eqiad not drmrs was due to how we have the core routers set up. TL;DR depooling the service (i.e. stopping the doh VMs announcing the /32 IPs) did not cause the CRs in Amsterdam to cease announcing the /24 and /48 prefixes to the world. Reason for that was other anycast IPs in the same range still being announced locally in esams (durum IPs).

So the traffic kept coming to esams, which routed it over its direct transport link to eqiad (shorter path than out to US and back to get to drmrs).

For now for this particular case I'm going to decline it. Thinking it through we'd need separate /24s to make it work effectively, and I think if we can improve the situation described in T420821 it hopefully will be less of a concern.