Wikidough currently does not perform any DNSSEC validation through pdns-recursor (dnssec configuration option is set to off). To be good internet citizens, we should enable DNSSEC for Wikidough by fetching and validating DNSSEC signatures just like other major resolver services. However, there are a few concerns we should address and also think about the level of DNSSEC validation we should support.
Even though Wikidough provides confidentiality and may further check for integrity of the DNS records from auth servers by validating the DNSSEC signatures, clients like Firefox do not perform end-to-end validation and therefore rely on the TRR (trusted recursive resolver; Wikidough) to validate the records for them. Cloudflare as the TRR in Firefox does the same so while this seems to be the standard and clients are already trusting their TRRs, it is important to note that there is currently no end-to-end validation of these records in the browsers themselves unless external extensions/add-ons are used.
As per https://support.mozilla.org/en-US/kb/dns-over-https-doh-faqs#w_do-you-validate-dnssec, if Wikidough (or some other TRR) returns a SERVFAIL response in case it fails to validate the DNSSEC signature due to a misconfigured DNSSEC domain (see below), Firefox falls back to the native resolver to complete the request. (This behaviour is defined by the network.trr.mode preference, set to 2 by default. If set to 3, Firefox will only use TRR and will never use the native resolver. See https://wiki.mozilla.org/Trusted_Recursive_Resolver for more details.) The user is not made aware of this transition from TRR to the native resolver, so while the user may think that they are resolving domains through Wikidough, in such a case when Firefox gets a SERVFAIL, the domain name resolution may happen through the native resolver instead, which may not be desired as the native resolver may leak the query in plain text.
Given that outages due to misconfigured DNSSEC domains are all too common (see https://ianix.com/pub/dnssec-outages.html for a list) and that Firefox will default to the native resolver in case it gets a SERVFAIL response, we should not enable strict validation in pdns-recursor, where all queries are validated regardless of the client's intention to validate, and a SERVFAIL response is returned in case of an incorrect validation. Firefox has no way of distinguishing between a SERVFAIL response that resulted from a misconfigured auth server or from an actual bogus response.
To start with, we should enable DNSSEC for Wikidough as an experiment. pdns-recursor has a log-fail mode (see https://docs.powerdns.com/recursor/dnssec.html), in which it validates all DNSSEC data it retrieves from authoritative servers and logs the validation result, irrespective of whether a client like Firefox asked for it but it doesn't send SERVFAIL response (or the AD bit) unless the client set the AD and/or DO bits. This allows us to experiment with DNSSEC validation to measure what percentage of validations actually fail while not affecting the experience for Firefox/Android users. The other option is the "full blow validation", the validate mode in pdns-recursor, which validates all queries and responds with a SERVFAIL irrespective of whether the client requested for the DNSSEC records and/or the validation.
- If we set log-fail, we will always perform validation and log the result, send SERVFAIL in case of invalid response iff the client set +AD or +DO. Since Firefox does not set this, nothing changes for its users. For users who care about DNSSEC, they can set the bits and let Wikidough perform validation for them.
- If we enable strict validation, we will always perform validation, send SERVFAIL in case of invalid response irrespective of the client's request. But in case of misconfigured auth servers, users on Firefox will get SERVFAIL responses and then lookup the domain using their native resolver.
- Or, we can enable strict validation and ask users to change the network.trr.mode preference to 3. This makes their entire experience more secure even outside of the DNSSEC issue as in this case if TRR lookup fails, the complete lookup fails with no fallback, but requiring special configurations may not scale to all users.
The majority of Wikidough users will be on Firefox and Chrome (once it has proper DoH support) so the log-fail is an acceptable compromise for now; more advanced users who run their own stub resolvers can can set the bits accordingly, in which case Wikidough will respond with the validation data. Depending on how this experiment goes, we can switch from log-fail to complete validation, or switch to process-no-validate, in which we send the DNSSEC RRSIGs in the reponse but do not perform any validation and don't set the AD bit.
Or we can continue to keep DNSSEC disabled!