Page MenuHomePhabricator

Some Data Persistence DB clusters apparently do not support IPv6
Open, MediumPublic

Description

Greetings!

During importation of DNS into Netbox as part of the transition to automation, we discovered some clusters do not have IPv6 DNS entries, which we interpreted as intentional (given that this was the mechanism used to prevent potential clients from accessing the IPv6 interfaces on the machine, if a given service did not support IPv6), and prevented from being imported into automation.

We are now triaging these clusters for their potential at supporting IPv6 in the future, so below are hosts which were left out of IPv6 DNS which we think that your team is responsible for. If you could take some time to put any information you have about supporting IPv6 on these clusters, specific plans for doing so, or if it will not in the forseeable future be possible to do so, it would be greatly appreciated!

If any of these machines don't belong to you let us know on this ticket or the parent task (T253173), thanks!

  • db[2071-2140].codfw.wmnet - will not pursue now
  • db[1074-1139,1141-1149].eqiad.wmnet
  • dbstore[1003-1005].eqiad.wmnet
  • pc[2007-2010].codfw.wmnet
  • pc[1007-1010].eqiad.wmnet

Media storage:

  • ms-be[2016-2056].codfw.wmnet
  • ms-be[1016-1026,1028-1059].eqiad.wmnet
  • ms-fe[2005-2008].codfw.wmnet
  • ms-fe[1005-1008].eqiad.wmnet

Event Timeline

  • heze.codfw.wmnet will be deprecated (planned in Q3).
  • ms-be and ms-fe are media storage and not managed by us (yet). @fgiunchedi, could you comment on these?
  • dbproxy hosts are owned by WMCS, @Bstorm, could you comment on these?
LSobanski moved this task from Triage to Refine on the DBA board.
LSobanski added subscribers: Marostegui, Kormat.
LSobanski renamed this task from Some Data Persistence clusters apparently do not support IPv6 to Some Data Persistence DB clusters apparently do not support IPv6.Jan 4 2021, 7:47 PM
LSobanski triaged this task as Medium priority.

The databases are mostly blocked on the grants audit and cleanup, which is not an easy task T270101

  • ms-be and ms-fe are media storage and not managed by us (yet). @fgiunchedi, could you comment on these?

I'm fairly sure ms-be hosts can have their ipv6 added to DNS and things should work (modulo ferm / swift daemons reload perhaps). For ms-fe hosts things should similarly work I think (LVS should be fine since we're talking host addresses not service ip addresses). The thanos cluster hosts (which run swift among other things) have ipv6 for the most part and work as expected.

@fgiunchedi Is there any process we should follow to test/make sure everything is okay if we add ipv6 DNS for ms-be and ms-fe?

@fgiunchedi Is there any process we should follow to test/make sure everything is okay if we add ipv6 DNS for ms-be and ms-fe?

The easiest I think would be to:

  1. Add AAAA for one ms-fe and one ms-be hosts in codfw (less traffic), this I believe is safe as far as swift/lvs is concerned: swift addresses are v4 only and configured statically, and lvs address (ms-fe.svc) shouldn't be affected anyways (?)
  2. Check swift logs for obvious errors
  3. Add AAAAs for all ms-fe/ms-be codfw hosts, and check for obvious errors
  4. Add AAAAs for ms-fe/ms-be in eqiad

AFAICT at least on ms-be hosts I don't see the swift processes listen on v6, shouldn't that be addressed too?

I think this one's done from the DBA perspective. Let me know if you think it's not the case.