Page MenuHomePhabricator

[mx] check what happened to mx-out01.wmflabs.org
Closed, ResolvedPublic

Description

That's still used in many places, but it does not resolve anymore (internally nor externally), there's still ptr records pointing to it too:

03:39 PM ~/Work/wikimedia/operations-puppet  (production|✔) 
dcaro@vulcanus$ dig +short -x 185.15.56.18
instance-mx-out01.cloudinfra.wmflabs.org.
mx-out01.wmflabs.org.

03:39 PM ~/Work/wikimedia/operations-puppet  (production|✔) 
dcaro@vulcanus$ dig +short mx-out01.wmflabs.org

For now the other record that resolves internally was manually configured to get unstuck, on the vm:

dcaro@accounts-appserver5:~$ sudo grep mx /etc/exim4/exim4.conf
        route_list = *  mx-out01.cloudinfra.eqiad1.wikimedia.cloud:mx-out02.cloudinfra.eqiad1.wikimedia.cloud
#       route_list = *  mx-out01.wmflabs.org:mx-out02.wmflabs.org

Event Timeline

Added manually to the project hiera config to avoid puppet from reverting it, remember to change when this is solved (account-creation-assistance project)

Mentioned in SAL (#wikimedia-cloud) [2021-01-06T14:53:15Z] <dcaro> manually configured mx servers to use wikimedia.cloud domain on project hiera (T271322)

I think when I created the mx-0[1-2].wmflabs.org zones to make acme-chief happy it messed up the existing entry that was associated with the hosts. I can recreate those.

Mentioned in SAL (#wikimedia-cloud) [2021-01-07T09:46:07Z] <dcaro> Added recordset for mx-out01.wmflabs.org (T271322)

Mentioned in SAL (#wikimedia-cloud) [2021-01-07T09:47:01Z] <dcaro> Removing old recorset that has been moved to cloudinfra for mx-out01.wmflabs.org (T271322)

Mentioned in SAL (#wikimedia-cloud) [2021-01-07T09:49:03Z] <dcaro> Added recordset for mx-out02.wmflabs.org (T271322)

Mentioned in SAL (#wikimedia-cloud) [2021-01-07T09:50:19Z] <dcaro> Removing old recorset that has been moved to cloudinfra for mx-out02.wmflabs.org (T271322)

Mentioned in SAL (#wikimedia-cloud) [2021-01-07T11:16:36Z] <dcaro> removing custom mx hosts, as the global names are now resolvable again (T271322)

Finally the issue was that when creating a zone with the same name as the host entry (A record), the zone has priority and the host record becomes invalid, so fixed by adding a top level record to the zone with the same name so it would be resolved in the zone itself and removing the old A records.