In T176223, we decided that in order to use Druid as a backend for AQS (which will in turn be used as a backend for the new Wikistats 2.0 website) we need to make a new 'public' Druid cluster, separate from the existent 'analytics' Druid cluster.
In order to use Druid as a backend, we need LVS to load balance client queries to the Druid broker service, which runs on all Druid nodes. Yesterday, I tried to set this up in https://gerrit.wikimedia.org/r/#/c/378956/. This would enable LVS for the existent Druid analytics brokers. While we don't necessarily need LVS for the internal analytics Druid cluster, it would be nice to have, and if we do need it for the to be created 'public' Druid cluster, we might as well do it for the analytics one too.
Anyway, this failed and was reverted because (among other reasons) Druid lives in the Analytics VLAN. According to @bblack, router LVS settings are only configured to work in production VLANs.
So, our options are (in order of our preference):
- A. Configure routers so that LVS will work in Analytics VLAN.
- B. Put the public Druid cluster in the production network.
- C. Set up LVS servers inside of the Analytics VLAN.
If we can do A., then we can use LVS for both Druid clusters. One question though. If we do LVS for the analytics Druid cluster in the Analytics VLAN, we still want to restrict incoming connections to the analytics Druid broker service only to $ANALYTICS_NETWORK hosts. Will the existing ferm rules on the Druid boxes that already do this be enough, if the connection is coming in via LVS? We believe so, since the Druid boxes should see the source IP of the client, not the LVS hosts. Just double checking with yall with more LVS knowledge..
If we can't do A., we will do B., but we will then have to make special firewall rules (both ferm and network ACLs) to allow connections between Hadoop and the public Druid cluster in the production network.
So! Can we do A.? If so, can we do it soon? :)