Page MenuHomePhabricator

Split up labstore external shelf storage available in codfw between labstore2001 and 2
Closed, ResolvedPublic

Description

The current setup in codfw for labstore offline backup servers is:

  • Labstore2001 at rack B1 is idle and connected to labstore-array[0,1,2,3]-codfw via H800. The attached storage amounts to 48 disks.
  • Labstore2002 at rack B1 is idle, has H800 card available, but no connected storage
  • We have two additional shelves available with 12 disks each(?) - labstore-array4-codfw, and labstore-sparearray2001-codfw, both in rack B1.
  • For completeness, Labstore2003 and labstore2004 are at rack B8 and actively used as offline storage for NFS shares served from labstore1004 and 1005 (secondary cluster). These two servers have no attached storage, and have ~10.8T of internal storage each post RAID 10.

I would like to split up the external shelves labstore-array[0-4]-codfw between labstore2001 and 2002. We could connect 2 shelves to each and have two spares, or connect 3 shelves to one and 2 to the other, and have 1 spare shelf, to maximize storage available for backups, while also having a spare. All the storage can be set up with RAID 10

@Papaul Let me know if you have any questions or concerns! Thank you.

Event Timeline

@Papaul poke on this task since it's been ~3 weeks. Let me know if you need anything from me to proceed, thank you!

@madhuvishy This task was not assigned to me and it is no way on the ops-codfw work board so I did not know about the task until now. In the future to avoid this type of situation can you please assigned the task to me or put it on the ops-codfw work board? I will take a look at this next week when on site.

Thanks.

Papaul claimed this task.Aug 19 2017, 2:35 AM

@Papaul Aah, sorry, I had pinged you on the task and didn't know about adding to the ops-codfw board, I'll definitely do that here on! Thanks so much :)

@madhuvishy here is my proposal:

labstore2001 2 shelves labstore-array [0-1]
labstore2002 3 shelves labstore-array [2-4]

we keep labstore-sparearray2001-codfw for spare.

@Papaul Yeah that seems fine to me. Thanks!

@madhuvishy Let me know when i have green light to disconnect everything and start working on the new setup.

@Papaul The servers are not in use and have no useful data in them, you have green light to disconnect everything :)

@Papaul please let me know and @madhuvishy know if you have any issues getting the disk shelves to be seen. I cannot find a way to see them on the raid controller.

Papaul reassigned this task from Papaul to madhuvishy.Aug 22 2017, 3:56 PM

@Cmjohnson I have no issues

@madhuvishy This is complete please check and let me know if you have any questions.

Thanks.

Script wmf_auto_reimage was launched by madhuvishy on neodymium.eqiad.wmnet for hosts:

['labstore2001.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201708232252_madhuvishy_8186.log.

Completed auto-reimage of hosts:

['labstore2001.codfw.wmnet']

Of which those FAILED:

set(['labstore2001.codfw.wmnet'])

Script wmf_auto_reimage was launched by madhuvishy on neodymium.eqiad.wmnet for hosts:

['labstore2002.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201708241738_madhuvishy_32225.log.

Completed auto-reimage of hosts:

['labstore2002.codfw.wmnet']

Of which those FAILED:

set(['labstore2002.codfw.wmnet'])

Script wmf_auto_reimage was launched by madhuvishy on neodymium.eqiad.wmnet for hosts:

['labstore2002.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201708242250_madhuvishy_31148.log.

Completed auto-reimage of hosts:

['labstore2002.codfw.wmnet']

Of which those FAILED:

set(['labstore2002.codfw.wmnet'])

@Papaul, thanks for splitting up the shelves! I've reimaged the servers, and that part looks right

root@labstore2001:~# megacli -PDList -Aall | grep 'Raw Size' | wc -l
36

root@labstore2002:/home/madhuvishy# megacli -PDList -Aall | grep 'Raw Size' | wc -l
48

However I'm not sure if the Virtual Disks and RAID's are configured on them correctly - I'd like the external shelf to be RAID10, labstore2001 doesn't list the external shelf disks if I do fdisk -l, and I'm not able to change it from RAID management.

root@labstore2001:~# megacli -LdPdInfo -a1

Adapter #1

Number of Virtual Disks: 0

Exit Code: 0x00

On labstore2002, only 24 of the 36 disks show up as VDs

root@labstore2002:/home/madhuvishy# megacli -LdPdInfo -a1 | grep 'Virtual Disks'
Number of Virtual Disks: 24

They are all also set up as RAID 0, afaict

root@labstore2002:/home/madhuvishy# megacli -LdPdInfo -a1 | grep RAID
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
Papaul added a comment.EditedAug 29 2017, 3:25 AM

@madhuvishy I took a quick look at labstore2001 the H800 controller doesn't allow me to create a RAID 10 on all 24 disks (2 shelves)

Here are the options

option 1
RAID 10 on the first 16 disks and RAID 10 on the other 8 disks left

option2
RAID10 on 12 disks (1st shelf)
RAID 10 on 12 disks (2nd shelf)

option 3
setup up each disk in RAID 0 and configure software RAID 10

I have it for now setup for option 2. left me know

on labstore2002 i setup each disk in RAID 0 (total of 36 disks)

let me know which on works.

Thanks.

@Papaul, Hardware RAID 10 on both labstore2001 and 2002, with 6 or 8 disks per logical/virtual RAID drive would be great (12 still feels like a really big disk).

@madhuvishy here is what I am about to setup

on Labstore2001

3xRAID10 of 8 disks per logical/virtual

on labstore2002

6xRAID10 of 6 disks per logical/virtual

let me know if this works for you

@Papaul Yup that's perfect, thanks!

madhuvishy closed this task as Resolved.Aug 30 2017, 5:43 AM

Thank you so much, that all looks right. Closing this as resolved!