Page MenuHomePhabricator

dns repository left in a broken state
Closed, ResolvedPublic

Description

The dns repository was left in a broken state preventing gdnsd to be able to be reloaded.

According to https://gerrit.wikimedia.org/r/c/operations/dns/+/623465/4#message-e5e66fc58775da9ffc30b9d2595a908b2e790164 that change makes gdnsd fail to reload.

This is the actual error thrown by authdns-update:

Traceback (most recent call last):
  File "utils/deploy-check.py", line 276, in <module>
    main()
  File "utils/deploy-check.py", line 268, in main
    deploy_check(args.deploy, args.skip_reload, args.no_gdnsd, Path(tdir), gdir)
  File "utils/deploy-check.py", line 214, in deploy_check
    safe_cmd([GDNSD_BIN, '-c', str(tdir), 'checkconf'])
  File "utils/deploy-check.py", line 88, in safe_cmd
    p_err.decode('utf-8')))
Exception: Command /usr/sbin/gdnsd -c /tmp/dns-check.y_d5bxcd checkconf failed with exit code 42, stderr:
info: gdnsd version 3.3.0 @ pid 7475
info: DNS listener threads (8 UDP + 8 TCP) configured for 208.80.154.238:53
info: DNS listener threads (8 UDP + 8 TCP) configured for 208.80.153.231:53
info: DNS listener threads (8 UDP + 8 TCP) configured for 91.198.174.239:53
info: DNS listener threads (8 UDP + 8 TCP) configured for 198.35.27.27:53
info: DNS listener threads (8 TCP PROXY) configured for 127.0.0.1:535
info: DNS listener threads (1 UDP + 1 TCP) configured for 0.0.0.0:5353
info: DNS listener threads (1 UDP + 1 TCP) configured for [::]:5353
info: plugin_geoip: map 'generic-map': Loading GeoIP2 database '/tmp/dns-check.y_d5bxcd/geoip/GeoIP2-City.mmdb': Version: 2.0, Type: GeoIP2-City, IPVersion: 6, Timestamp: 2020-09-14 17:00:04 UTC
info: plugin_geoip: map 'generic-map' runtime db updated. nets: 743341 dclists: 8
info: plugin_geoip: map 'discovery-map': Loading GeoIP2 database '/tmp/dns-check.y_d5bxcd/geoip/GeoIP2-City.mmdb': Version: 2.0, Type: GeoIP2-City, IPVersion: 6, Timestamp: 2020-09-14 17:00:04 UTC
info: plugin_geoip: map 'discovery-map' runtime db updated. nets: 464 dclists: 2
info: admin_state: checking state file '/tmp/dns-check.y_d5bxcd/state/admin_state'...
error: plugin_geoip: Invalid resource name 'disc-releases' detected from zonefile lookup
error: Name 'releases.discovery.wmnet.': resolver plugin 'geoip' rejected resource name 'disc-releases'
fatal: Initial load of zone data failed

Event Timeline

Volans triaged this task as Unbreak Now! priority.Sep 22 2020, 6:53 AM
Volans created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

The erros seems to be caused by the lack of the entry related to releases in the discovery-states file in gdnsd configuration. This in turn seems to be related to the fact that in hieradata/common/service.yaml the releases entry has state: monitoring_setup and in modules/profile/manifests/dns/auth/discovery.pp the entries are filtered by state: production:

wmflib::service::fetch().filter |$n, $svc| { 'discovery' in $svc  and $svc['state'] == 'production' }

I'll revert the related changes to put back the DNS repo in the same state that gdnds is right now.

I've merged https://gerrit.wikimedia.org/r/c/operations/dns/+/628995 and now authdns-update runs without errors and the DNS is unblocked.

The erros seems to be caused by the lack of the entry related to releases in the discovery-states file in gdnsd configuration.

The reverted change did add the entry to discovery-geo-resources though. While discovery-states is empty and has only one git log from 2018?
So I am still wondering about the correct way. Did you really mean discovery-states or discovery-geo-resources?

This in turn seems to be related to the fact that in hieradata/common/service.yaml the releases entry has state: monitoring_setup and in modules/profile/manifests/dns/auth/discovery.pp the entries are filtered by state: production:

Per docs it should probably first be "state: service_setup" before it switches to "monitoring_setup" but neither state should cause this breakage ( https://wikitech.wikimedia.org /wiki/LVS#Create_an_entry_in_the_service::catalog

There are comments at the top of the DNS repo's utils/mock_etc/discovery-geo-resources and utils/mock_etc/discovery-metafo-resources about avoiding this scenario by updating things in the correct order. I think the comments themselves are outdated now, as they don't know about the monitoring_setup state and they point at a hieradata file that doesn't exist anymore...

(I'm guessing they should probably be updated to the correct file, and also to mention that it has to be in state: production before deploying the DNS mock_etc part of things, but I'm not sure as I didn't change that stuff....)