Page MenuHomePhabricator

Set up 3 Ganeti VMs for datalake cloud analytics Hadoop cluster
Closed, DeclinedPublic

Description

In {T204177} we received a buncha new hardware. We'll be using 5 of these as worker nodes as specified in T207194: rack/setup/install cloudvirtan100[1-5].eqiad.wmnet.

This Hadoop cluster will be called the 'cloud-analytics'. It's Hadoop 'cluster name' in Hadoop configs will be 'cloud-analytics-eqiad', to match the naming convention we have been using for other clusters, e.g. Kafka, Zookeeper, Druid, etc. The cloud-analytics nodes will also run other softwares (Hive, Presto, etc.).

We need 3 more nodes for HA Hadoop masters and Hive etc. 'coordinator' nodes. These should be:

  • ca-master1001
  • ca-master1002
  • ca-coord1001

Each should have 16G RAM and 8 cores. (If this is too highly speced, we can be flexible here).

They should also be accessible from Cloud VPS networks on restricted (TBD) ports, as well as from the Analytics VLAN. They should have the same network/vlan settings as the HW nodes in T207194.