Steps to replicate the issue (include links if applicable):
- Start with two DNS records, one with type A and the other one with type CNAME. The CNAME record should point to the A record. Make sure both are managed by OpenTofu.
- Create a Tofu change that swaps the two records: the CNAME records becomes an A record, and the A record becomes a CNAME record pointing to the new A record
- Run tofu apply
What happens?:
- the OpenStack CLI and Horizon show the correct new values
- DNS queries for both records fail
- pdns logs show got a CNAME referral (from cache) that causes a loop
- restarting the pdns recursor fixes the issue
What should have happened instead?:
- the new records should resolve correctly without the need of a manual restart
Other information:
This happened during T352206: [toolsdb] Upgrade to MariaDB 10.6, the Tofu change was merge_requests/142.
It should be easy to reproduce with test records, to see if it consistently fails or if it's a race condition. I haven't tried to reproduce it yet.
Full pdns error log:
Nov 25 13:51:02 cloudservices1005 pdns-recursor[2773294]: msg="Sending SERVFAIL during resolve" error="got a CNAME referral (from cache) that causes a loop" subsystem="syncres" level="0" prio="Notice" tid="3" ts="1732542662.251" ecs="" mtid="102916031" proto="udp" qname="tools.db.svc.wikimedia.cloud" qtype="A" remote="185.15.56.63:39724"