Currently, here's how users get 'mapped' to a specific CDN point of presence:
* We get a DNS request from that user's resolver.
* We use Maxmind to geolocate that resolver's IP address
* We return the IP address of the geographically-closest edge location
This has the following issues:
* The resolver IP address used doesn't necessarily correlate with user location (for instance, the user might use Google Public DNS or a similar service)
* No IP geolocation service has perfect accuracy
* Geographically-closest is often but isn't always lowest-latency
It's been discussed many times over the years to improve upon this. Here's a proposal for how we might do so, based on both [[ https://docs.google.com/document/d/1QQ8qgmeiJNy3hDnGgIsqZSUZXZoe4n2mY4x4gVX2IYY/edit# | Arzhel's design document from 2021 ]] and also [[ https://wm-bot.wmflabs.org/libera_logs/%23wikimedia-traffic/20220803.txt | a conversation on #wikimedia-traffic IRC a few weeks ago ]].
* Create site-specific subdomains for all of our sites, perhaps something like `edge-timing-ulsfo.wikimedia.org` (although I think we probably want to use a different domain name for such purposes; see also T263847 and T292866)
* Configure those domains to serve NEL response headers setting both `failure_fraction` and `success_fraction` to 1.0, with reports going into our existing NEL pipeline (see T257527). Configure a long TTL on that policy, so that reporting any failures actually happens.
* Write and deploy some client-side JS that (with a small probability on each pageview?) might fetch a small piece of content from each of our edge sites on those domains. (It's possible we could reuse some parts of [[ https://github.com/Netflix/probnik | Probnik ]] for this, although the tool itself seems to be stagnant since 2019.)
* Complete T304373 so that NEL data is available in Analytics
* Design and implement a pipeline in Analytics that will aggregate NEL reports, cares about how many samples we get from a network, and decaying weight of older samples, etc, to generate something like our geo-maps file (but for networks, not countries) (file format TBD; it's possible it would be best to use something like [[ https://github.com/maxmind/mmdbwriter | mmdbwriter ]] for instance)
* Use that file to serve GeoDNS responses, after some careful evaluation of the impact of this change (probably with a few iterations of improvement/fixes)
* Later on: use Alt-Svc to 'solve' the cases where the resolver location<>user location mapping is very wrong (see T208242)