Page MenuHomePhabricator

Reimage JBO-RAID0 configured RESTBase HP machines
Closed, ResolvedPublic

Description

During the course of investigating unusually high latencies with some hardware combinations, it was discovered that disk performance is significantly better on smartarray equipped machines when in HBA mode, as opposed to organizing a JBOD configuration from single-disk RAID0s. Many of the HPs in the RESTBase cluster have already been setup this way, the remaining should be converted as well. The (9) machines in question are:

  • restbase1010.eqiad.wmnet
  • restbase1012.eqiad.wmnet
  • restbase1014.eqiad.wmnet
  • restbase2003.codfw.wmnet
  • restbase2004.codfw.wmnet
  • restbase2001.codfw.wmnet
  • restbase2002.codfw.wmnet
  • restbase2005.codfw.wmnet
  • restbase2006.codfw.wmnet
NOTE: Given the findings of T189057, namely that it may be a mistake to correlate the poor IO performance with SmartArray-equipped HPs, I propose we replace the Samsungs in one of the above HPs with Intel SSDs in order to compare it with restbase2009 (which has 5 HP SSDs).

Related Objects

Event Timeline

Eevans triaged this task as Medium priority.Feb 5 2018, 8:54 PM
Eevans created this task.
Eevans removed Eevans as the assignee of this task.Feb 6 2018, 2:18 PM

We now have the cassandra_3.11.0-wmf5 package uploaded to the apt repo, and enabled for 3.x clusters. Aside from https://gerrit.wikimedia.org/r/404705 (i.e. T175284: Create parent directories for JBOD data_directories and e.g. commitlog directories), are there any remaining blockers here?

AFAIK that's it, the other thing we'll have to experiment is to start with all cassandra isntances masked and un-mask one by one, but not a blocker

fgiunchedi claimed this task.

AFAICT this has happened in one for or another (decom or reimage), resolving though feel free to reopen