We are currently setting Virtual Drives on every Hadoop worker node configured with one disk in Raid-0. This setting should be a sort of JBOD, but it seems inconsistent across nodes:
elukey@neodymium:~$ sudo cumin 'R:class = role::analytics_cluster::hadoop::worker and not analytics1030*' 'megacli -LDPDInfo -aAll | grep "Current Cache Policy" | uniq -c' ===== NODE GROUP ===== (1) analytics1032.eqiad.wmnet ----- OUTPUT of 'megacli -LDPDInf...olicy" | uniq -c' ----- 4 Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU 1 Current Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU 8 Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU ===== NODE GROUP ===== (11) analytics[1058-1068].eqiad.wmnet ----- OUTPUT of 'megacli -LDPDInf...olicy" | uniq -c' ----- 1 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU 12 Current Cache Policy: WriteBack, ReadAheadNone, Cached, No Write Cache if Bad BBU ===== NODE GROUP ===== (1) analytics1045.eqiad.wmnet ----- OUTPUT of 'megacli -LDPDInf...olicy" | uniq -c' ----- 9 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU 1 Current Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU 3 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU ===== NODE GROUP ===== (1) analytics1047.eqiad.wmnet ----- OUTPUT of 'megacli -LDPDInf...olicy" | uniq -c' ----- 3 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU 1 Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU 9 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU ===== NODE GROUP ===== (13) analytics[1042-1044,1046,1048,1050-1057].eqiad.wmnet ----- OUTPUT of 'megacli -LDPDInf...olicy" | uniq -c' ----- 13 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU ===== NODE GROUP ===== (1) analytics1049.eqiad.wmnet ----- OUTPUT of 'megacli -LDPDInf...olicy" | uniq -c' ----- 2 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU 1 Current Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU 5 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU 1 Current Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU 4 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU ===== NODE GROUP ===== (11) analytics[1028-1031,1034-1038,1040-1041].eqiad.wmnet ----- OUTPUT of 'megacli -LDPDInf...olicy" | uniq -c' ----- 13 Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU ===== NODE GROUP ===== (2) analytics[1033,1039].eqiad.wmnet ----- OUTPUT of 'megacli -LDPDInf...olicy" | uniq -c' ----- 13 Current Cache Policy: WriteThrough, ReadAdaptive, Direct, No Write Cache if Bad BBU
Let's find a single good configuration and apply it to all the nodes.