TCP traffic ramped up during the past couple of days on authdns1001 and authdns2001 leading to the following errors (logged in /var/log/daemon.log):
authdns2001 gdnsd[3268]: TCP DNS: accept() failed: Too many open files
We realized about the problem only due to the root partition disk space alerts :(
The max open files settings for gdnsd were too low:
Max open files 1024 524288 files
@Vgutierrez applied a hot fix adding LimitNOFILE=500000 to the gdnsd unit and restarting the daemons, with puppet disabled.
This task has been created to track two things:
- Permanent fix for LimitNOFILE=500000 (puppet override since the systemd unit is shipped with the package?)
- Review dns boxes alerting/monitoring and figure out if we need more alarms.