Page MenuHomePhabricator

Imbalanced storage distribution over JBOD devices
Open, NormalPublic

Description

In the RESTBase Cassandra cluster, data distributed over the JBOD devices, isn't always very balanced (some are more imbalanced than others). For example:

restbase1007
~# df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             10M     0   10M   0% /dev
tmpfs            26G  2.5G   23G  10% /run
/dev/md0         28G  4.5G   22G  18% /
tmpfs            63G     0   63G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            63G     0   63G   0% /sys/fs/cgroup
/dev/sdd4       893G  544G  349G  61% /srv/sdd4
/dev/sdb4       893G  528G  365G  60% /srv/sdb4
/dev/sdc4       893G  587G  306G  66% /srv/sdc4
/dev/sda4       893G  368G  525G  42% /srv/sda4
/dev/sde4       893G  278G  615G  32% /srv/sde4
/dev/md2         46G   23G   21G  52% /srv/cassandra/instance-data

~# du -shc /srv/sd[a-e]4/cassandra-{a,b,c}
105G	/srv/sda4/cassandra-a
172G	/srv/sdb4/cassandra-a
205G	/srv/sdc4/cassandra-a
113G	/srv/sdd4/cassandra-a
53G	/srv/sde4/cassandra-a
76G	/srv/sda4/cassandra-b
147G	/srv/sdb4/cassandra-b
171G	/srv/sdc4/cassandra-b
189G	/srv/sdd4/cassandra-b
106G	/srv/sde4/cassandra-b
188G	/srv/sda4/cassandra-c
210G	/srv/sdb4/cassandra-c
212G	/srv/sdc4/cassandra-c
243G	/srv/sdd4/cassandra-c
120G	/srv/sde4/cassandra-c
2.3T	total
~#

Event Timeline

Eevans created this task.Jun 4 2018, 3:21 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 4 2018, 3:21 PM
Eevans triaged this task as Normal priority.Jun 4 2018, 3:21 PM
mobrovac edited projects, added Services (next); removed Services.Jun 5 2018, 12:03 PM
Eevans moved this task from Backlog to Next on the User-Eevans board.Jun 12 2018, 2:21 PM
Vvjjkkii renamed this task from Imbalanced storage distribution over JBOD devices to eobaaaaaaa.Jul 1 2018, 1:06 AM
Vvjjkkii raised the priority of this task from Normal to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
mobrovac renamed this task from eobaaaaaaa to Imbalanced storage distribution over JBOD devices.Jul 1 2018, 11:19 AM
mobrovac lowered the priority of this task from High to Normal.
mobrovac updated the task description. (Show Details)
Eevans moved this task from Next to In-Progress on the User-Eevans board.Jul 31 2018, 7:02 PM
Eevans moved this task from In-Progress to Backlog on the User-Eevans board.Oct 2 2018, 7:33 PM

This imbalance seems to resolved itself.

1restbase1016.eqiad.wmnet: /dev/sdc4 1.4T 298G 1.1T 21% /srv/sdc4
2restbase1016.eqiad.wmnet: /dev/sda4 1.4T 299G 1.1T 22% /srv/sda4
3restbase1016.eqiad.wmnet: /dev/sdd4 1.4T 304G 1.1T 22% /srv/sdd4
4restbase1016.eqiad.wmnet: /dev/sdb4 1.4T 313G 1.1T 23% /srv/sdb4
5restbase1019.eqiad.wmnet: /dev/sdb4 1.7T 382G 1.3T 23% /srv/sdb4
6restbase1019.eqiad.wmnet: /dev/sda4 1.7T 369G 1.3T 22% /srv/sda4
7restbase1019.eqiad.wmnet: /dev/sdc4 1.7T 360G 1.3T 22% /srv/sdc4
8restbase1020.eqiad.wmnet: /dev/sda4 1.7T 393G 1.3T 24% /srv/sda4
9restbase1020.eqiad.wmnet: /dev/sdb4 1.7T 402G 1.3T 24% /srv/sdb4
10restbase1020.eqiad.wmnet: /dev/sdc4 1.7T 419G 1.3T 25% /srv/sdc4
11restbase1021.eqiad.wmnet: /dev/sda4 1.7T 365G 1.3T 22% /srv/sda4
12restbase1021.eqiad.wmnet: /dev/sdb4 1.7T 339G 1.4T 21% /srv/sdb4
13restbase1021.eqiad.wmnet: /dev/sdc4 1.7T 360G 1.3T 22% /srv/sdc4
14restbase1017.eqiad.wmnet: /dev/sda4 1.4T 258G 1.2T 19% /srv/sda4
15restbase1017.eqiad.wmnet: /dev/sdb4 1.4T 270G 1.2T 19% /srv/sdb4
16restbase1017.eqiad.wmnet: /dev/sdc4 1.4T 268G 1.2T 19% /srv/sdc4
17restbase1017.eqiad.wmnet: /dev/sdd4 1.4T 280G 1.2T 20% /srv/sdd4
18restbase1022.eqiad.wmnet: /dev/sdc4 1.7T 386G 1.3T 23% /srv/sdc4
19restbase1022.eqiad.wmnet: /dev/sda4 1.7T 373G 1.3T 23% /srv/sda4
20restbase1022.eqiad.wmnet: /dev/sdb4 1.7T 382G 1.3T 23% /srv/sdb4
21restbase1023.eqiad.wmnet: /dev/sdb4 1.7T 373G 1.3T 23% /srv/sdb4
22restbase1023.eqiad.wmnet: /dev/sdc4 1.7T 388G 1.3T 23% /srv/sdc4
23restbase1023.eqiad.wmnet: /dev/sda4 1.7T 393G 1.3T 24% /srv/sda4
24restbase1024.eqiad.wmnet: /dev/sda4 1.7T 358G 1.4T 22% /srv/sda4
25restbase1024.eqiad.wmnet: /dev/sdb4 1.7T 362G 1.3T 22% /srv/sdb4
26restbase1024.eqiad.wmnet: /dev/sdc4 1.7T 400G 1.3T 24% /srv/sdc4
27restbase1018.eqiad.wmnet: /dev/sdc4 1.4T 296G 1.1T 21% /srv/sdc4
28restbase1018.eqiad.wmnet: /dev/sda4 1.4T 302G 1.1T 22% /srv/sda4
29restbase1018.eqiad.wmnet: /dev/sdb4 1.4T 294G 1.2T 21% /srv/sdb4
30restbase1018.eqiad.wmnet: /dev/sdd4 1.4T 283G 1.2T 20% /srv/sdd4
31restbase1025.eqiad.wmnet: /dev/sdc4 1.7T 397G 1.3T 24% /srv/sdc4
32restbase1025.eqiad.wmnet: /dev/sdb4 1.7T 369G 1.3T 22% /srv/sdb4
33restbase1025.eqiad.wmnet: /dev/sda4 1.7T 383G 1.3T 23% /srv/sda4
34restbase1026.eqiad.wmnet: /dev/sda4 1.7T 397G 1.3T 24% /srv/sda4
35restbase1026.eqiad.wmnet: /dev/sdb4 1.7T 386G 1.3T 23% /srv/sdb4
36restbase1026.eqiad.wmnet: /dev/sdc4 1.7T 422G 1.3T 25% /srv/sdc4
37restbase1027.eqiad.wmnet: /dev/sdb4 1.7T 379G 1.3T 23% /srv/sdb4
38restbase1027.eqiad.wmnet: /dev/sda4 1.7T 396G 1.3T 24% /srv/sda4
39restbase1027.eqiad.wmnet: /dev/sdc4 1.7T 362G 1.3T 22% /srv/sdc4

Much has changed since this issue was originally opened, most significantly, we no longer make use of wide rows. I suspect that what appeared to be an imbalance in placement was actually the result of the distribution of row sizes.

I propose we close this.