Currently our GeoDNS solution only allows hard, immediate changes to the pooled state of datacenters, with only the TTL providing any smoothness to the traffic movement.
As we move from Varnish to ATS backends, we are losing tertiary (and beyond) fallback caching at the core sites while edges repool, which helped mitigate this in the past.
We need to be able to re-pool sites more smoothly so as to reduce the initial impact of cold cache misses (especially if the site was depooled for a a day or more, or everything was restarted, and the caches are truly and fully cold).
The design should be such that we're able to slowly bring in increasingly-large, consistent subsets of the total geographic area assigned to a datacenter over a configurable window of time. For example, when repooling a cold eqsin, we might ramp into full pooling over 1 hour period where the geographic radius mapped to the DC slowly increases over that time without flapping any client networks into and back out of the set.
There will be some preliminary work to update gdnsd's dynamic-records framework to support this and other upcoming needs (others being runtime-dynamic pools of IPs, and future DNSSEC concerns interplaying with that), and then the implementation of a newer/better geoip system within that framework, which meets the needs of this task.