Page MenuHomePhabricator

Make authdns-update compatible with local emergency changes
Closed, DuplicatePublic

Description

We should improve our current [1] support of deploying an emergency DNS change when other dependent services are broken.

The final outcome should be:

  • A non-working Gerrit or DNS should not prevent us to deploy a DNS change.
  • As long as there is IP reachability between authdns and rechability from outside to at least to one authdns, we should be able to ssh into any authdns server, make an emergency commit in /srv/authdns/git and run authdns-update --SOME_OPTION to tell the script to sync from this local master instead of the usual path.
  • The procedure to return to normal operations via Gerrit patch should be clearly documented.

Current limitations:

  • origin/master is not correctly updated on the hosts, running git status reports Your branch is ahead of 'origin/master' by 463 commits.. This is due because we execute git fetch $REMOTE with $REMOTE being either the Gerrit URL without specifying the branch or the authdns update ssh path. In both cases origin/remote HEAD is not correctly updated. We should fix the current scripts to maintain a clean local checkout with all references in order.
  • the various scripts involved in cascade when calling authdns-update should be updated accordingly to add the new option and behave correctly in both scenarios (working Gerrit an local emergency patch)
  • we rely on a working DNS for this to work, the authdns-update script should not rely on it and be able to run purely by IP, either storing them in the existing configuration file (/etc/wikimedia-authdns.conf) or directly in /etc/hosts.

The existing documentation [1] should still be valid for the worse case-scenario in which we can somehow reach the authdns server but they can't talk to each other.

[1] https://wikitech.wikimedia.org/wiki/DNS#Update_DNS_if_gerrit_or_DNS_are_down_(on_an_emergency_only)

Event Timeline

Volans created this task.Mar 27 2019, 3:00 PM
Restricted Application added a project: Operations. · View Herald TranscriptMar 27 2019, 3:00 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ema moved this task from Triage to DNS Infra on the Traffic board.Apr 1 2019, 9:31 AM
fgiunchedi triaged this task as Medium priority.Apr 9 2019, 8:38 AM
BBlack added a comment.Dec 6 2019, 2:02 PM

Sorry I hadn't remember we had this existing ticket. Will merge into the other newer one since it has patches already and some deeper context, and copy the main text over.