Overview
We have two sets of cloudelastic nodes that were provisioned at different times
cloudelastic100[1-4] is the oldest group
cloudelastic100[5-6] were added more recently
In this case our pre-existing elasticsearch-readahead udev rule is failing on cloudelastic100[5-6]. This may be due to the different way the two are partitioned. See the final section, Additional context on disk layout, for more info.
Proposed Solution
Figure out how to change the udev rule to make it more robust for different device configurations.
It looks like https://github.com/wikimedia/puppet/blob/dfa5f9795ec7cc681fa7bbd5c90da7b3dd4823aa/modules/profile/manifests/elasticsearch/cirrus.pp#L16 feeds into https://github.com/wikimedia/puppet/blob/dfa5f9795ec7cc681fa7bbd5c90da7b3dd4823aa/modules/profile/manifests/elasticsearch/cirrus.pp#L119-L121, so we should just need different hiera values for cloudelastic100[1-4] vs cloudelastic100[5-6].
Additional context on disk layout
Putting this at the bottom of the ticket writeup since it takes up a lot of space.
user@Ryans-MacBook-Pro-3 ~/wmf/puppet [production]% ssh cloudelastic1005.wikimedia.org Linux cloudelastic1005 4.9.0-12-amd64 #1 SMP Debian 4.9.210-1 (2020-01-20) x86_64 Debian GNU/Linux 9.13 (stretch) cloudelastic1005 is a elasticsearch cloud elastic cirrus (elasticsearch::cloudelastic) The last Puppet run was at Fri Oct 16 01:01:14 UTC 2020 (13 minutes ago). Last puppet commit: (dfa5f9795e) Dzahn - base::environment: remove lint-ignore that ignores nothing Debian GNU/Linux 9 auto-installed on Tue May 5 14:54:52 UTC 2020. Last login: Thu Oct 8 23:30:58 2020 from 2620:0:863:1:198:35:26:6 ryankemper@cloudelastic1005:~$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 1.8T 0 disk ├─sda1 8:1 0 285M 0 part └─sda2 8:2 0 1.8T 0 part └─md0 9:0 0 5.2T 0 raid10 ├─vg0-swap 253:0 0 976M 0 lvm [SWAP] ├─vg0-root 253:1 0 74.5G 0 lvm / └─vg0-srv 253:2 0 5.2T 0 lvm /srv sdb 8:16 0 1.8T 0 disk ├─sdb1 8:17 0 285M 0 part └─sdb2 8:18 0 1.8T 0 part └─md0 9:0 0 5.2T 0 raid10 ├─vg0-swap 253:0 0 976M 0 lvm [SWAP] ├─vg0-root 253:1 0 74.5G 0 lvm / └─vg0-srv 253:2 0 5.2T 0 lvm /srv sdc 8:32 0 1.8T 0 disk ├─sdc1 8:33 0 285M 0 part └─sdc2 8:34 0 1.8T 0 part └─md0 9:0 0 5.2T 0 raid10 ├─vg0-swap 253:0 0 976M 0 lvm [SWAP] ├─vg0-root 253:1 0 74.5G 0 lvm / └─vg0-srv 253:2 0 5.2T 0 lvm /srv sdd 8:48 0 1.8T 0 disk ├─sdd1 8:49 0 285M 0 part └─sdd2 8:50 0 1.8T 0 part └─md0 9:0 0 5.2T 0 raid10 ├─vg0-swap 253:0 0 976M 0 lvm [SWAP] ├─vg0-root 253:1 0 74.5G 0 lvm / └─vg0-srv 253:2 0 5.2T 0 lvm /srv sde 8:64 0 1.8T 0 disk ├─sde1 8:65 0 285M 0 part └─sde2 8:66 0 1.8T 0 part └─md0 9:0 0 5.2T 0 raid10 ├─vg0-swap 253:0 0 976M 0 lvm [SWAP] ├─vg0-root 253:1 0 74.5G 0 lvm / └─vg0-srv 253:2 0 5.2T 0 lvm /srv sdf 8:80 0 1.8T 0 disk ├─sdf1 8:81 0 285M 0 part └─sdf2 8:82 0 1.8T 0 part └─md0 9:0 0 5.2T 0 raid10 ├─vg0-swap 253:0 0 976M 0 lvm [SWAP] ├─vg0-root 253:1 0 74.5G 0 lvm / └─vg0-srv 253:2 0 5.2T 0 lvm /srv ryankemper@cloudelastic1005:~$ exit logout Connection to cloudelastic1005.wikimedia.org closed. user@Ryans-MacBook-Pro-3 ~/wmf/puppet [production]% ssh cloudelastic1004.wikimedia.org Linux cloudelastic1004 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1+deb9u3 (2019-06-16) x86_64 Debian GNU/Linux 9.13 (stretch) cloudelastic1004 is a elasticsearch cloud elastic cirrus (elasticsearch::cloudelastic) The last Puppet run was at Fri Oct 16 01:03:03 UTC 2020 (12 minutes ago). Last puppet commit: (dfa5f9795e) Dzahn - base::environment: remove lint-ignore that ignores nothing Debian GNU/Linux 9 auto-installed on Fri Feb 8 16:29:47 UTC 2019. Last login: Fri Oct 16 01:14:15 2020 from 2620:0:863:1:198:35:26:6 ryankemper@cloudelastic1004:~$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 1.8T 0 disk ├─sda1 8:1 0 1M 0 part ├─sda2 8:2 0 46.6G 0 part │ └─md0 9:0 0 139.6G 0 raid10 / └─sda3 8:3 0 1.7T 0 part └─md1 9:1 0 5.1T 0 raid10 └─cloudelastic1004--vg-data 253:0 0 5.1T 0 lvm /srv sdb 8:16 0 1.8T 0 disk ├─sdb1 8:17 0 1M 0 part ├─sdb2 8:18 0 46.6G 0 part │ └─md0 9:0 0 139.6G 0 raid10 / └─sdb3 8:19 0 1.7T 0 part └─md1 9:1 0 5.1T 0 raid10 └─cloudelastic1004--vg-data 253:0 0 5.1T 0 lvm /srv sdc 8:32 0 1.8T 0 disk ├─sdc1 8:33 0 1M 0 part ├─sdc2 8:34 0 46.6G 0 part │ └─md0 9:0 0 139.6G 0 raid10 / └─sdc3 8:35 0 1.7T 0 part └─md1 9:1 0 5.1T 0 raid10 └─cloudelastic1004--vg-data 253:0 0 5.1T 0 lvm /srv sdd 8:48 0 1.8T 0 disk ├─sdd1 8:49 0 1M 0 part ├─sdd2 8:50 0 46.6G 0 part │ └─md0 9:0 0 139.6G 0 raid10 / └─sdd3 8:51 0 1.7T 0 part └─md1 9:1 0 5.1T 0 raid10 └─cloudelastic1004--vg-data 253:0 0 5.1T 0 lvm /srv sde 8:64 0 1.8T 0 disk ├─sde1 8:65 0 1M 0 part ├─sde2 8:66 0 46.6G 0 part │ └─md0 9:0 0 139.6G 0 raid10 / └─sde3 8:67 0 1.7T 0 part └─md1 9:1 0 5.1T 0 raid10 └─cloudelastic1004--vg-data 253:0 0 5.1T 0 lvm /srv sdf 8:80 0 1.8T 0 disk ├─sdf1 8:81 0 1M 0 part ├─sdf2 8:82 0 46.6G 0 part │ └─md0 9:0 0 139.6G 0 raid10 / └─sdf3 8:83 0 1.7T 0 part └─md1 9:1 0 5.1T 0 raid10 └─cloudelastic1004--vg-data 253:0 0 5.1T 0 lvm /srv
Note the disk configurations of the two instance groups are way different.
ryankemper@cloudelastic1003:~$ sudo lvdisplay --- Logical volume --- LV Path /dev/cloudelastic1003-vg/data LV Name data VG Name cloudelastic1003-vg LV UUID eOF5d7-D7wn-yMWZ-EJx0-V9Yg-udiq-ifoLXh LV Write Access read/write LV Creation host, time cloudelastic1003, 2018-08-02 21:36:10 +0000 LV Status available # open 1 LV Size 5.10 TiB Current LE 1337704 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 32 Block device 253:0 ryankemper@cloudelastic1003:~$ exit logout Connection to cloudelastic1003.wikimedia.org closed. user@Ryans-MacBook-Pro-3 ~/wmf/puppet [production]% ssh cloudelastic1005.wikimedia.org Linux cloudelastic1005 4.9.0-12-amd64 #1 SMP Debian 4.9.210-1 (2020-01-20) x86_64 Debian GNU/Linux 9.13 (stretch) cloudelastic1005 is a elasticsearch cloud elastic cirrus (elasticsearch::cloudelastic) The last Puppet run was at Fri Oct 16 01:01:14 UTC 2020 (16 minutes ago). Last puppet commit: (dfa5f9795e) Dzahn - base::environment: remove lint-ignore that ignores nothing Debian GNU/Linux 9 auto-installed on Tue May 5 14:54:52 UTC 2020. Last login: Fri Oct 16 01:14:22 2020 from 2620:0:863:1:198:35:26:6 ryankemper@cloudelastic1005:~$ sudo lvdisplay --- Logical volume --- LV Path /dev/vg0/swap LV Name swap VG Name vg0 LV UUID NPf84w-vdtD-vAm7-H1gV-mpfK-Hpg2-1HOAVP LV Write Access read/write LV Creation host, time cloudelastic1005, 2020-05-05 14:50:16 +0000 LV Status available # open 2 LV Size 976.00 MiB Current LE 244 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 32 Block device 253:0 --- Logical volume --- LV Path /dev/vg0/root LV Name root VG Name vg0 LV UUID VIZSuW-AtjS-G0sQ-j8rm-ptJo-8DlN-34ogQe LV Write Access read/write LV Creation host, time cloudelastic1005, 2020-05-05 14:50:16 +0000 LV Status available # open 1 LV Size 74.50 GiB Current LE 19073 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 6144 Block device 253:1 --- Logical volume --- LV Path /dev/vg0/srv LV Name srv VG Name vg0 LV UUID KjyN08-wQ0A-ZTl2-veuS-bcsa-swOy-O8wrRl LV Write Access read/write LV Creation host, time cloudelastic1005, 2020-05-05 14:50:16 +0000 LV Status available # open 1 LV Size 5.16 TiB Current LE 1353937 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 6144 Block device 253:2