Page MenuHomePhabricator

Review Megacli Analytics Hadoop workers settings
Closed, ResolvedPublic5 Estimated Story Points

Description

We are currently setting Virtual Drives on every Hadoop worker node configured with one disk in Raid-0. This setting should be a sort of JBOD, but it seems inconsistent across nodes:

elukey@neodymium:~$ sudo cumin 'R:class = role::analytics_cluster::hadoop::worker and not analytics1030*' 'megacli -LDPDInfo -aAll | grep "Current Cache Policy" | uniq -c'

===== NODE GROUP =====
(1) analytics1032.eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf...olicy" | uniq -c' -----
      4 Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
      1 Current Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU
      8 Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
===== NODE GROUP =====
(11) analytics[1058-1068].eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf...olicy" | uniq -c' -----
      1 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
     12 Current Cache Policy: WriteBack, ReadAheadNone, Cached, No Write Cache if Bad BBU
===== NODE GROUP =====
(1) analytics1045.eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf...olicy" | uniq -c' -----
      9 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
      1 Current Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU
      3 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
===== NODE GROUP =====
(1) analytics1047.eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf...olicy" | uniq -c' -----
      3 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
      1 Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
      9 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
===== NODE GROUP =====
(13) analytics[1042-1044,1046,1048,1050-1057].eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf...olicy" | uniq -c' -----
     13 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
===== NODE GROUP =====
(1) analytics1049.eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf...olicy" | uniq -c' -----
      2 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
      1 Current Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU
      5 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
      1 Current Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU
      4 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
===== NODE GROUP =====
(11) analytics[1028-1031,1034-1038,1040-1041].eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf...olicy" | uniq -c' -----
     13 Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
===== NODE GROUP =====
(2) analytics[1033,1039].eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf...olicy" | uniq -c' -----
     13 Current Cache Policy: WriteThrough, ReadAdaptive, Direct, No Write Cache if Bad BBU

Let's find a single good configuration and apply it to all the nodes.

Event Timeline

On analytics1033:

sudo megacli -AdpBbuCmd  -a0

elukey@analytics1033:~$ sudo megacli -AdpBbuCmd  -a0

BBU status for Adapter: 0

BatteryType: BBU
Voltage: 12 mV
Current: 0 mA
Temperature: 53 C
Battery State: Failed
BBU Firmware Status:

  Charging Status              : None
  Voltage                                 : Low
  Temperature                             : OK
  Learn Cycle Requested	                  : Yes
  Learn Cycle Active                      : No
  Learn Cycle Status                      : OK
  Learn Cycle Timeout                     : No
  I2c Errors Detected                     : No
  Battery Pack Missing                    : No
  Battery Replacement required            : No
  Remaining Capacity Low                  : Yes
  Periodic Learn Required                 : No
  Transparent Learn                       : Yes
  No space to cache offload               : No
  Pack is about to fail & should be replaced : No
  Cache Offload premium feature required  : No
  Module microcode update required        : No

BBU GasGauge Status: 0x043e
Relative State of Charge: 0 %
Charger Status: Unknown
Remaining Capacity: 0 mAh
Full Charge Capacity: 555 mAh
isSOHGood: Yes
  Battery backup charge time : 0 hours

BBU Capacity Info for Adapter: 0

  Relative State of Charge: 0 %
  Absolute State of charge: 0 %
  Remaining Capacity: 0 mAh
  Full Charge Capacity: 555 mAh
  Run time to empty: Battery is not being charged.
  Average time to empty:
  Estimated Time to full recharge: Battery is not being charged.
  Cycle Count: 4
Max Error = 0 %
Remaining Capacity Alarm = 0 mAh
Remining Time Alarm = 0 Min

BBU Design Info for Adapter: 0

  Date of Manufacture: 07/18, 2011
  Design Capacity: 90 mAh
  Design Voltage: 0 mV
  Specification Info: 0
  Serial Number: 0
  Pack Stat Configuration: 0x0000
  Manufacture Name:
  Firmware Version   : 0148 03
  Device Name:
  Device Chemistry:
  Battery FRU: N/A
Module Version = 0148 03
  Transparent Learn = 1
  App Data = 1

BBU Properties for Adapter: 0

  Auto Learn Period: 90 Days
  Next Learn time: Sat Jan  1 00:00:00 2000
  Learn Delay Interval:0 Hours
  Auto-Learn Mode: Transparent

Exit Code: 0x00

On analytics1039

elukey@analytics1039:~$ sudo megacli -AdpBbuCmd  -a0

BBU status for Adapter: 0

BatteryType: BBU
Voltage: 12 mV
Current: 0 mA
Temperature: 56 C
Battery State: Failed
BBU Firmware Status:

  Charging Status              : None
  Voltage                                 : Low
  Temperature                             : OK
  Learn Cycle Requested	                  : Yes
  Learn Cycle Active                      : No
  Learn Cycle Status                      : OK
  Learn Cycle Timeout                     : No
  I2c Errors Detected                     : No
  Battery Pack Missing                    : No
  Battery Replacement required            : No
  Remaining Capacity Low                  : Yes
  Periodic Learn Required                 : No
  Transparent Learn                       : Yes
  No space to cache offload               : No
  Pack is about to fail & should be replaced : No
  Cache Offload premium feature required  : No
  Module microcode update required        : No

BBU GasGauge Status: 0x043e
Relative State of Charge: 0 %
Charger Status: Unknown
Remaining Capacity: 0 mAh
Full Charge Capacity: 540 mAh
isSOHGood: Yes
  Battery backup charge time : 0 hours

BBU Capacity Info for Adapter: 0

  Relative State of Charge: 0 %
  Absolute State of charge: 0 %
  Remaining Capacity: 0 mAh
  Full Charge Capacity: 540 mAh
  Run time to empty: Battery is not being charged.
  Average time to empty:
  Estimated Time to full recharge: Battery is not being charged.
  Cycle Count: 1
Max Error = 0 %
Remaining Capacity Alarm = 0 mAh
Remining Time Alarm = 0 Min

BBU Design Info for Adapter: 0

  Date of Manufacture: 07/18, 2011
  Design Capacity: 90 mAh
  Design Voltage: 0 mV
  Specification Info: 0
  Serial Number: 0
  Pack Stat Configuration: 0x0000
  Manufacture Name:
  Firmware Version   : 0148 03
  Device Name:
  Device Chemistry:
  Battery FRU: N/A
Module Version = 0148 03
  Transparent Learn = 1
  App Data = 0

BBU Properties for Adapter: 0

  Auto Learn Period: 90 Days
  Next Learn time: Wed Jul 27 11:41:46 2016
  Learn Delay Interval:0 Hours
  Auto-Learn Mode: Transparent

Exit Code: 0x00

@Cmjohnson: Faulty BBU on analytics1033 and 1039? Whenever you have time could you double check?

Thanks!

Ordered both servers to get new cards
You have successfully submitted request SR948957999.

You have successfully submitted request SR948957885.

@elukey new raid controllers for an1033 and 1039 are on-site. please let me know when you want to swap them out

Mentioned in SAL (#wikimedia-operations) [2017-06-06T13:39:32Z] <elukey> shutdown analytics1033 and analytics1039 to replace their BBU - T166140

Replaced both bbu's

Return shipping info
Fedex 9612018 6911799 02034386
96112018 6911799 02034379

Change 357403 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Set profile::base::check_raid_policy to 'WriteBack' for hadoop workers

https://gerrit.wikimedia.org/r/357403

Change 357403 merged by Elukey:
[operations/puppet@production] Set profile::base::check_raid_policy to 'WriteBack' for hadoop workers

https://gerrit.wikimedia.org/r/357403

Current status:

elukey@neodymium:~$ sudo cumin 'R:class = role::analytics_cluster::hadoop::worker' 'megacli -LDPDInfo -aAll | grep "Current Cache Policy" | uniq -c'
41 hosts will be targeted:
analytics[1028-1068].eqiad.wmnet
Confirm to continue [y/n]? y
===== NODE GROUP =====
(1) analytics1032.eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf...olicy" | uniq -c' -----
      4 Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
      1 Current Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU
      8 Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
===== NODE GROUP =====
(11) analytics[1058-1068].eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf...olicy" | uniq -c' -----
      1 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
     12 Current Cache Policy: WriteBack, ReadAheadNone, Cached, No Write Cache if Bad BBU
===== NODE GROUP =====
(1) analytics1045.eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf...olicy" | uniq -c' -----
      9 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
      1 Current Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU
      3 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
===== NODE GROUP =====
(1) analytics1047.eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf...olicy" | uniq -c' -----
      3 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
      1 Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
      9 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
===== NODE GROUP =====
(13) analytics[1042-1044,1046,1048,1050-1057].eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf...olicy" | uniq -c' -----
     13 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
===== NODE GROUP =====
(1) analytics1049.eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf...olicy" | uniq -c' -----
      2 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
      1 Current Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU
      5 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
      1 Current Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU
      4 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
===== NODE GROUP =====
(13) analytics[1028-1031,1033-1041].eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf...olicy" | uniq -c' -----
     13 Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU

Last step is to figure out what is the best ready policy to set (and decide if it is worth to alarm on it or not?).

Based on several guides like http://download.intel.com/support/motherboards/server/sb/configuring_raid_for_optimal_perfromance_11.pdf I'd propose the following setting for all the workers:

Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU

Looks good to me (even if I don't understand in depth what it means).
I particularly like the idea of having coherent config accross nodes, and to have an alarm on bad BBU (since we use WriteBack policy).

Mentioned in SAL (#wikimedia-analytics) [2017-06-08T10:28:42Z] <elukey> executed megacli -LDSetProp NoCachedBadBBU -LALL -aALL on analytics1032 as test - T166140

Better view:

elukey@neodymium:~$ sudo cumin 'R:class = role::analytics_cluster::hadoop::worker' 'megacli -LDPDInfo -aAll | grep "Current Cache Policy" | sort| uniq -c'
41 hosts will be targeted:
analytics[1028-1068].eqiad.wmnet
Confirm to continue [y/n]? y
===== NODE GROUP =====
(11) analytics[1058-1068].eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf... | sort| uniq -c' -----
      1 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
     12 Current Cache Policy: WriteBack, ReadAheadNone, Cached, No Write Cache if Bad BBU
===== NODE GROUP =====
(1) analytics1049.eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf... | sort| uniq -c' -----
     11 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
      2 Current Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU
===== NODE GROUP =====
(1) analytics1032.eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf... | sort| uniq -c' -----
     12 Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
      1 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
===== NODE GROUP =====
(13) analytics[1042-1044,1046,1048,1050-1057].eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf... | sort| uniq -c' -----
     13 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
===== NODE GROUP =====
(1) analytics1045.eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf... | sort| uniq -c' -----
     12 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
      1 Current Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU
===== NODE GROUP =====
(1) analytics1047.eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf... | sort| uniq -c' -----
      1 Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
     12 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
===== NODE GROUP =====
(13) analytics[1028-1031,1033-1041].eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf... | sort| uniq -c' -----
     13 Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU

Mentioned in SAL (#wikimedia-analytics) [2017-06-08T12:16:21Z] <elukey> run megacli -LDSetProp ADRA -LALL -aALL on analytics1047 to set ReadAheadAdaptive - T166140

Mentioned in SAL (#wikimedia-analytics) [2017-06-08T12:54:55Z] <elukey> run megacli -LDSetProp ADRA -LALL -aALL on analytics1047 to set ReadAheadAdaptive on analytics[1042-1046,1048-1057].eqiad.wmnet - T166140

Current status is:

elukey@neodymium:~$ sudo cumin 'R:class = role::analytics_cluster::hadoop::worker' 'megacli -LDPDInfo -aAll | grep "Current Cache Policy" | sort| uniq -c'
41 hosts will be targeted:
analytics[1028-1068].eqiad.wmnet
Confirm to continue [y/n]? y
===== NODE GROUP =====
(30) analytics[1028-1057].eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf... | sort| uniq -c' -----
     13 Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
===== NODE GROUP =====
(11) analytics[1058-1068].eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf... | sort| uniq -c' -----
      1 Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
     12 Current Cache Policy: WriteBack, ReadAheadNone, Cached, No Write Cache if Bad BBU

Will complete the rest of the work tomorrow if nothing comes up after this round of fixes.

Mentioned in SAL (#wikimedia-operations) [2017-06-09T07:26:09Z] <elukey> run megacli -LDSetProp ADRA -LALL -aALL on analytics[1058-1068] - T166140

Mentioned in SAL (#wikimedia-operations) [2017-06-09T07:51:07Z] <elukey> run megacli -LDSetProp -Direct -LALL -aALL on analytics[1058-1068] - T166140

Finally the same setting across all analytics workers:

elukey@neodymium:~$ sudo cumin 'R:class = role::analytics_cluster::hadoop::worker' 'megacli -LDPDInfo -aAll | grep "Current Cache Policy" | sort| uniq -c'
41 hosts will be targeted:
analytics[1028-1068].eqiad.wmnet
Confirm to continue [y/n]? y
===== NODE GROUP =====
(41) analytics[1028-1068].eqiad.wmnet
----- OUTPUT of 'megacli -LDPDInf... | sort| uniq -c' -----
     13 Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU

Last step is to updated the documentation about check to do after a worker is added to the pool.

elukey set the point value for this task to 5.Jun 12 2017, 10:46 AM
elukey moved this task from In Progress to Done on the Analytics-Kanban board.