Page MenuHomePhabricator

Intermittent DNS failures in beta labs regularly trigger a bunch of puppet failures
Closed, DuplicatePublic

Description

Periodically we get a bunch of puppet failures in beta labs that are apparently caused by temporary name resolution failures.

I'm unable to access the dns server in order to further diagnose the trouble. How can we make name resolution rock-solid-reliable? It's a real big waste of time to constantly chase down puppet failures that we can't do anything about.

Is there some way that we can either a) fix DNS so it doesn't break regularly, or b) give me access I need to track down the problem, or c) set up our own DNS infrastructure on labs so that it's within our sphere of influence (in release engineering) so that we can control our destiny?

Event Timeline

mmodell raised the priority of this task from to Needs Triage.
mmodell updated the task description. (Show Details)
mmodell added a project: acl*sre-team.
mmodell added subscribers: mmodell, yuvipanda, coren and 2 others.