This task was made after a chat between me and Brandon about what are the options to get LVS endpoints within Analytics VLANs.
Background: there is currently no way in puppet/etc.. to add a LVS endpoint that forwards traffic to a backend composed by hosts in Analytics VLANS.
Use cases: Analytics/Data Engineering would need low traffic LVS VIPs for services like Druid Analytics Brokers (for example, Turnilo/Superset have specify a single druid hostname:port combination in their configs as backend target to fetch data from) and Hive servers (we currently have one active and one standby, but they could work active/active). More use cases may come in the future, also from the ML-side.
Options:
- Add a new interface to the low traffic LVSes for each Analytics VLAN (there are four in eqiad, one for each row). This would allow LVS hosts to L2-forward to Analytics VLANs, but it may be controversial from the security perspective. It would represent a little tech debt added and not a clean solution, but it should be feasible.
- Buy two more LVS nodes (that should be very basic and cheap) to be used only within the Analytics VLANs. This would require time to set them up (with the Traffic team's help) and also Analytics would need to manage them long term (probably a shared ownership with SRE). This would represent a cleaner solution, but it can potentially represent a lot of work for Analytics. There is probably also some work to be done on the DCOps side, since the new nodes will need to be connected to multiple switches and cross-cabling may be a problem in eqiad in these days.
- We make the Analytics VLANs part of production, removing the problem entirely. The motivation is that things changed a lot from when the Analytics VLAN were first introduced, so they may not be needed nowadays. Last time that we tried (T157806#3075311) the answer was a mild no :)
The preferred/suggested solution from the Traffic team seems to be 2).
It is also important to note that the Traffic team will work on reviewing alternative solutions to LVS for the public endpoints/load-balancers, so in a long term scenario LVS-based load balancing may become deprecated (but we are talking about a lot of time).
The Analytics/Data Engineering team should review the above and decide if the use cases are worth or not, and what road to choose :)