This came up in a conversation with @faidon a couple of weeks ago. Our current scheme is a direct DYNA record for every public service hostname, which means separate A-records with TTL=600 (more on reducing that in T140365 , which I think this will change a little).
So a fresh lookup on any given service hostname always returns distinct RRs for caches, with examples like:
en.wikipedia.org 600 A 192.0.2.1 ; +ednsc-splitting in all such caches
fr.wikipedia.org 600 A 192.0.2.1 ; +ednsc-splitting in all such caches
fr.wiktionary.org 600 A 192.0.2.1 ; +ednsc-splitting in all such caches
For domainnames which are very popular, such as en.wikipedia.org, this scheme works out fine, and large DNS caches tend to keep it hot. However, for less-used languages and/or projects (we have something on the order of ~3000 such combinations possible), it's not so efficient, as they're not likely to be re-used from many DNS caches before they expire.
Using CNAMEs with longer TTLs pointing at a central short-TTL lb record may be more-efficient in the net for all of the other names. In such a scheme we might do something like:
en.wikipedia.org 86400 CNAME text-lb.wikiMedia.org ; globally-static, no ednsc-splitting
fr.wikipedia.org 86400 CNAME text-lb.wikiMedia.org ; globally-static, no ednsc-splitting
fr.wikitionary.org 86400 CNAME text-lb.wikiMedia.org ; globally-static, no ednsc-splitting
text-lb.wikiMedia.org 600 A 192.0.2.1 ; +ednsc-splitting in all such caches
or:
en.wikipedia.org 86400 CNAME text-lb.wikiPedia.org ; globally-static, no ednsc-splitting
fr.wikipedia.org 86400 CNAME text-lb.wikiPedia.org ; globally-static, no ednsc-splitting
fr.wikitionary.org 86400 CNAME text-lb.wikiPedia.org ; globally-static, no ednsc-splitting
text-lb.wikiPedia.org 600 A 192.0.2.1 ; +ednsc-splitting in all such caches
(which is potentially more-efficient for the most-popular-by-far project domain, at the cost of perhaps some pedants complaining about e.g. wikivoyage traffic mentioning the wikipedia.org domainname at this low level of technical matters).
There's some tricky tradeoffs to work through about different DNS cache scenarios:
- private vs widely-shared caches
- how this affects cache splits on edns-client-subnet
- whether the gains for less-popular hostnames (among clients of a given cache) sufficiently offset the possible minor/rare regression for the more-popular
- what kind of role regional patterns of hostname popularity play.
- How we prioritize the p50 vs p99 kinds of tradeoffs here, across all global domains/traffic.
There are fewer tradeoffs with the second scheme where the canonical LB name lives in the most-popular project's domain.
[Note: There was a time in the distant past when we also used a CNAME-based scheme, but this scheme above differs from the old one]