Page MenuHomePhabricator

Create a cookbook to automate the bootstrap of new Hadoop workers
Closed, ResolvedPublic


During the last batch of hadoop worker deployments (happened a year ago IIRC) I ran the following (please don't judge) scripts to automate the creation of the hadoop datanode partitions:

elukey@an-worker1080:~$ tail -n+1 step*
==> step1 <==
set -e
set -x

# Create a logical volumne for JournalNode data.
# There should only be one VG, look up its name:
vgname=$(vgdisplay -C --noheadings -o vg_name | head -n 1 | tr -d ' ')
lvcreate -n journalnode -L 10G $vgname

# make an ext4 filesystem
mkfs.ext4 /dev/$vgname/journalnode

# Don't reserve any blocks for OS on this partition.
tune2fs -m 0 /dev/$vgname/journalnode

mkdir -pv $mount_point
grep -q $mount_point /etc/fstab || echo -e "# # Hadoop JournalNode partition\n/dev/$vgname/journalnode\t${mount_point}\text4\tdefaults,noatime\t0\t2" | tee -a /etc/fstab

mount -v $mount_point

==> step2 <==
set -e
set -x

for disk_letter in b c d e f g h i j k l m; do
    parted ${disk} --script mklabel gpt
    parted ${disk} --script mkpart primary ext4 0% 100%

    mkfs.ext4 $partition

==> step3 <==
set -e
set -x

for disk_letter in b c d e f g h i j k l m; do
    # Don't reserve any blocks for OS on these partitions.
    tune2fs -m 0 $partition

    # Make the mount point.
    mkdir -pv $mount_point
    # add it to fstab unless it is already there
    grep -q $mount_point /etc/fstab || (
        uuid=$(blkid | grep primary | grep ${partition} | awk '{print $2}' | sed -e 's/[:"]//g')
        echo -e "# Hadoop DataNode partition ${disk_letter}\n${uuid}\t${mount_point}\text4\tdefaults,noatime\t0\t2" | tee -a /etc/fstab

    mount -v $mount_point

==> step4 <==
# ReadAhead Adaptive
megacli -LDSetProp ADRA -LALL -aALL

# Direct (No cache)
megacli -LDSetProp -Direct -LALL -aALL

# No write cache if bad BBU
megacli -LDSetProp NoCachedBadBBU -LALL -aALL

# Disable BBU auto-learn
echo "autoLearnMode=1" > /tmp/disable_learn && megacli -AdpBbuCmd -SetBbuProperties -f /tmp/disable_learn -a0

Some explanation about the why of the above horror:

  • every worker node has 2xSDD disks in a flex bay with hw RAID 1. This means that the OS sees the /dev/sda partition usually as single disk, that we use for the OS.
  • every worker node has also 12x4TB disks, that have a "special" config. They need to be configured as JBOD, but due to how the hw raid controller works (may have changed in recent versions) they need to be set up as single disk RAID0, to appear to the OS as single JBOD disks. These disks have not been configured in partman, so they are not formatted/accounted during Debian Install (that is also a plus when we upgrade, since we don't have to care about data being wiped etc..).

On top of what wrote above, we have a new config for the 6 nodes with GPUs:

  • no flexbay, 24x2TB disks (same raid0 single disks caveat as above)

We should write a cookbook to automate and document this procedure (and to improve it if needed).

Event Timeline

Change 629384 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/cookbooks@master] Add

Change 629384 merged by Elukey:
[operations/cookbooks@master] Add

elukey added a project: Analytics-Kanban.
elukey moved this task from Next Up to In Progress on the Analytics-Kanban board.

The cookbook now works, I was able to add all the partitions on an-worker1096->1101. In the current version I forgot to add the journalnode partition though, will amend the cookbook.

Change 629435 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/cookbooks@master] add journalnode partition

Change 629435 abandoned by Elukey:
[operations/cookbooks@master] add journalnode partition

Not needed for the moment

Tested it multiple times, works fine!