Copied from T103921:
Given we're in a closed, controlled environment and only harming our own DNS caches, I think we'd be better served by a more aggressive and redundant strategy where we don't use LVS, specify the recdns servers in resolv.conf directly, and our resolver fires off parallel queries to them with aggressive timeouts and accepts the first legit response it gets. I think the reason glibc doesn't implement an option for this is to protect all the random caches in the world from excess load.
In related IRC discussion last night, @MoritzMuehlenhoff pointed out an existing NSS module for similar stuff here: https://github.com/grobian/dnspq . That code is a little dated, and only supports A-records (then falls back to glibc for anything else - we'd need at least AAAA added to it). But it's very short and simple. We could perhaps audit it and update it a bit to be sure it can do exactly what we want and need in a github fork (and pullreq the changes back in case the author wants it too) and experiment with packaging and using this on the fleet.