Page MenuHomePhabricator

Grants not working with DB hosts with to ipv6
Open, MediumPublic

Description

We have a few new hosts installed for x2 (T269324) and they all got a ipv6:

root@cumin1001:~# host db2142.codfw.wmnet
db2142.codfw.wmnet has address 10.192.0.14
db2142.codfw.wmnet has IPv6 address 2620:0:860:101:10:192:0:14

Those hosts are having connection issues, as grants aren't ready for those.
ie:

# mysql.py -hdb2142
ERROR 1045 (28000): Access denied for user 'root'@'2620:0:861:103:10:64:32:25' (using password: YES)

These hosts aren't in production, but this is going to be an issue if we need to upgrade all the grants on all the hosts as soon as they start getting ipv6s.
Right now our grants (for mw general) are:

wikiadmin 10.64.%
wikiuser 10.64.%
wikiadmin 10.192.%
wikiuser 10.192.%

But we also have plenty of other grants (like root, watchdog etc).
Until we have a proper management system, we might need to disable ipv6 on DB hosts (and their DNS)

Related Objects

Event Timeline

Setting this to high as even if it is not blocking the setup of x2 in general, we might want to reach an agreement on how to proceed before putting these hosts (and the other 22 hosts we'll have next Q) in production

AFAIK databases are still in the list of clusters that do not support IPv6 as listed in T253173. As such the Netbox script to provision a server's network attributes, should be executed checking the flag:

Skip IPv6 DNS records.
Skip the generation of the IPv6 DNS records. Enable if the devices don't yet fully support IPv6.

That would prevent the generation of the AAAA records for the database host and hence the client resolving its IPv6 (v6 is the default on newer OSes) and keeping everything on v4.

If there is any host that was provisioned with the AAAA records and you need to remove them, it's sufficient to (example links for db2142):

Thanks @Volans for the detailed info! I will remove them from those hosts, but I am worried about the other 22 we still need to install.
Who does normally run those provisioning scripts? Is that DC-Ops? We might need to ping them about that, so we don't get those with ipv6

Yes, usually is run by DCOps that should ask the service owner if they need it or not.

Marostegui lowered the priority of this task from High to Medium.Dec 15 2020, 7:06 AM
Marostegui added a subscriber: wiki_willy.

I have deleted ipv6 for the affected x2 hosts, so this is triaged with Volans workaround.

db2142.codfw.wmnet has address 10.192.0.14
db2143.codfw.wmnet has address 10.192.16.20
db2144.codfw.wmnet has address 10.192.32.27
db1151.eqiad.wmnet has address 10.64.0.6
db1152.eqiad.wmnet has address 10.64.16.156
db1153.eqiad.wmnet has address 10.64.48.48

However, I am going to leave this task open, as this is a problem we eventually need to fix (if we, at some point, want to go for ipv6 on the db hosts).

@wiki_willy could you talk to your team to make sure the rest of hosts at T267043: (Need By: 2020-11-29) rack/setup/install db11[51-76] do not get installed with ipv6 once they start to get provisioned?

Ack @Marostegui

@wiki_willy could you talk to your team to make sure the rest of hosts at T267043: (Need By: 2020-11-29) rack/setup/install db11[51-76] do not get installed with ipv6 once they start to get provisioned?

LSobanski mentioned this in Unknown Object (Task).Jan 11 2021, 12:44 PM
LSobanski mentioned this in Unknown Object (Task).
jcrespo added a parent task: Restricted Task.Jan 22 2021, 11:24 AM

@Marostegui Should we wait with adopting x2 for mainstash-db until this is done?

I don't think it would matter much. This might take sometime. Whatever solution we come up with needs to be applied everywhere, so just 6 more hosts will not make much of a difference

If we do:

wikiadmin 10.64.%
wikiuser 10.64.%
wikiadmin 10.192.%
wikiuser 10.192.%

Can we "just" add the following?

wikiadmin 2620:0:861:%
wikiuser 2620:0:861:%
wikiadmin 2620:0:860:%
wikiuser  2620:0:860:%

Or 2620:0:861:1% if we only want private subnets

Yes, but deploying that across all our databases will need to be done carefully as I would rather not do it with replication enabled.
I would love to audit all the grants first, especially for wikiadminand do a proper clean up (T249683)

To expand marostegui's answer (as I also reasearched it at T271148#6735477):

Can we "just" add the following

Not really, adding ipv6 means the extra grants are to be deployed on the db, indeed (which it is already a challenge on itself, as per ticket), but also:

  • Firewall changes (some things are *not* based on dns)
  • monitoring on the extra stack
  • Application config changes ( app server and most services don't use dns to connect to the hosts, the ips are defined on the config: https://noc.wikimedia.org/dbconfig/eqiad.json , see hostByName config) <-- this will probably be the most delicate change
  • proxy configuration changes (ip-based)
  • Potential backup configuration/software changes

Nothing there is impossible, for example, new backup services use IPv6 from the start, and that is easy as it is a new service, but enabling the IPv6 stack, because many things are using the hardcoded ips requires careful planning, so I fully understand the dbas wanting to take their time to do it properly.

For example, I recently had an issue with missing ipv6 config on netbox (T274206#7587291) and that created a timeout on backup recovery. If it had been mw (large throughput), it would have been a large outage.