Page MenuHomePhabricator

Boot time race condition when assembling root raid device on cp1052
Closed, ResolvedPublic

Description

Today we upgraded cp1052 to jessie 8.4 point release and linux 4.4 (T131746, T131928).

After rebooting the system, the raid1 root device was in degraded state because of a race condition. It looks like sdb was still being recognized while the mdadm device was being assembled.

Apr 06 13:54:45.020969 cp1052 kernel: md/raid1:md0: active with 1 out of 2 mirrors
[...]
Apr 06 13:54:45.021098 cp1052 kernel: sd 1:0:0:0: [sdb] Attached SCSI removable disk

We manually fixed the issue by adding /dev/sdb1 to the array:

mdadm --manage /dev/md0 --add /dev/sdb1

Before carrying on rebooting other cp* hosts we need to make sure this problem is fixed.

We could try one of the following kernel boot parameters to see if they are viable workarounds, although making sure mdadm does its thing only after scsi devices have been recognized with a proper dependency would be better.

scsi_mod.scan=sync
rootdelay=10

More logs:

Apr 06 13:54:45.019853 cp1052 kernel: sd 0:0:0:0: [sda] 781422768 512-byte logical blocks: (400 GB/373 GiB)
Apr 06 13:54:45.019935 cp1052 kernel: sd 0:0:0:0: [sda] Write Protect is off
Apr 06 13:54:45.020016 cp1052 kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Apr 06 13:54:45.020100 cp1052 kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Apr 06 13:54:45.020132 cp1052 kernel: ata1.00: Enabling discard_zeroes_data
Apr 06 13:54:45.020159 cp1052 kernel:  sda: sda1 sda2 sda3
Apr 06 13:54:45.020191 cp1052 kernel: ata1.00: Enabling discard_zeroes_data
Apr 06 13:54:45.020267 cp1052 kernel: sd 0:0:0:0: [sda] Attached SCSI removable disk
Apr 06 13:54:45.020301 cp1052 kernel: ata2.00: Enabling discard_zeroes_data
Apr 06 13:54:45.020379 cp1052 kernel: sd 1:0:0:0: [sdb] 781422768 512-byte logical blocks: (400 GB/373 GiB)
Apr 06 13:54:45.020414 cp1052 kernel: md: md0 stopped.
Apr 06 13:54:45.020441 cp1052 kernel: md: bind<sda1>
Apr 06 13:54:45.020518 cp1052 kernel: sd 1:0:0:0: [sdb] Write Protect is off
Apr 06 13:54:45.020600 cp1052 kernel: sd 0:0:0:0: Attached scsi generic sg0 type 0
Apr 06 13:54:45.020688 cp1052 kernel: sd 1:0:0:0: Attached scsi generic sg1 type 0
Apr 06 13:54:45.020771 cp1052 kernel: sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
Apr 06 13:54:45.020855 cp1052 kernel: sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Apr 06 13:54:45.020890 cp1052 kernel: ata2.00: Enabling discard_zeroes_data
Apr 06 13:54:45.020916 cp1052 kernel: md: raid1 personality registered for level 1
Apr 06 13:54:45.020943 cp1052 kernel:  sdb: sdb1 sdb2 sdb3
Apr 06 13:54:45.020969 cp1052 kernel: md/raid1:md0: active with 1 out of 2 mirrors
Apr 06 13:54:45.020995 cp1052 kernel: md0: detected capacity change from 0 to 9990832128
Apr 06 13:54:45.021022 cp1052 kernel: ata2.00: Enabling discard_zeroes_data
Apr 06 13:54:45.021098 cp1052 kernel: sd 1:0:0:0: [sdb] Attached SCSI removable disk
Apr 06 13:54:45.021270 cp1052 kernel: usb 1-1.6: new high-speed USB device number 3 using ehci-pci
Apr 06 13:54:45.021308 cp1052 kernel: EXT4-fs (md0): mounted filesystem with ordered data mode. Opts: (null)

Event Timeline

ema renamed this task from Boot time race condition when assembling root raid device to Boot time race condition when assembling root raid device on cp1052.Apr 6 2016, 6:52 PM
ema triaged this task as High priority.Apr 12 2016, 6:02 PM

came across the same problem in T130930: restbase1007 not assembling raid after reboot, could be a jessie-specific issue as I don't remember seeing the same on trusty/precise

Two workarounds seem to be possible.

Creating /etc/initramfs-tools/scripts/local-top/mdadm-enhance-your-calm with a 2 seconds sleep and running update-initramfs -u -k 4.4.0-1-amd64. Note that the script needs to be executable.

[Thu Apr 14 13:44:41 2016] sd 1:0:0:0: [sdb] Attached SCSI disk
[Thu Apr 14 13:44:43 2016] md: md0 stopped.

Booting with rootdelay=2.

[Thu Apr 14 13:48:48 2016] sd 1:0:0:0: [sdb] Attached SCSI disk
[Thu Apr 14 13:48:50 2016] md: md0 stopped.

The output above is taken from dmesg -T
Note that using journalctl can be misleading. The output of journalctl --dmesg does *not* report the same timing info as dmesg:

Apr 14 13:48:51.133867 jessievirt kernel: sd 1:0:0:0: [sdb] Attached SCSI disk
Apr 14 13:48:51.133914 jessievirt kernel: md: md0 stopped.

Mentioned in SAL [2016-04-14T14:24:00Z] <ema> rebooting cp3022 to test workaroud for T131961

Change 283459 had a related patch set uploaded (by Ema):
Workaround for mdadm boot-time race condition

https://gerrit.wikimedia.org/r/283459

Change 283459 merged by Ema:
Workaround for mdadm boot-time race condition

https://gerrit.wikimedia.org/r/283459

This issue should be fixed now: https://gerrit.wikimedia.org/r/#/c/283459/. I'd leave the ticket open given that we (successfully) rebooted just a couple of test machines so far. Feel free to close it if it's clear that hosts boot cleanly while working on T131928.

I've just rebooted cp1065 today for unrelated reasons, and the problem does not seem fixed (in fact, it seems worse if anything. neither of the disks was ready in time, on two soft reboots in a row?). You can see the output from the change, there is a Waiting 5s for disks to show up (T131961), but the console's output order seems to be 'fail to assemble arrays -> sleep -> detect scsi devices':

[    4.679358] usb 1-1: new high-speed USB device number 2 using ehci-pci
[    4.679362] usb 2-1: new high-speed USB device number 2 using ehci-pci
Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... Begin: Assembling all MD arrays ... mdadm: No devices listed in conf file were found.
Failure: failed to assemble all arrays.
done.
Waiting 5s for disks to show up (T131961)
[    4.803393] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    4.810367] ata5: SATA link down (SStatus 0 SControl 300)
[    4.811686] usb 2-1: New USB device found, idVendor=8087, idProduct=0024
[    4.811688] usb 2-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    4.812113] hub 2-1:1.0: USB hub found
[    4.812306] hub 2-1:1.0: 8 ports detected
[    4.828105] usb 1-1: New USB device found, idVendor=8087, idProduct=0024
[    4.828108] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    4.828689] hub 1-1:1.0: USB hub found
[    4.828845] hub 1-1:1.0: 6 ports detected
[    4.864738] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    4.871656] ata2.00: ATA-8: INTEL SSDSC2BA400G3T, 5DV1DL03, max UDMA/133
[    4.879168] ata2.00: 781422768 sectors, multi 1: LBA48 NCQ (depth 31/32)
[    4.886714] ata6: SATA link down (SStatus 0 SControl 300)
[    4.892771] ata1.00: ATA-8: INTEL SSDSC2BA400G3T, 5DV1DL03, max UDMA/133
[    4.900268] ata1.00: 781422768 sectors, multi 1: LBA48 NCQ (depth 31/32)
[    4.907799] ata3: SATA link down (SStatus 0 SControl 300)
[    4.913878] ata2.00: configured for UDMA/133
[    4.918723] ata4: SATA link down (SStatus 0 SControl 300)
[    4.925071] ata1.00: configured for UDMA/133
[    4.930225] scsi 0:0:0:0: Direct-Access     ATA      INTEL SSDSC2BA40 DL03 PQ: 0 ANSI: 5
[    4.939904] scsi 1:0:0:0: Direct-Access     ATA      INTEL SSDSC2BA40 DL03 PQ: 0 ANSI: 5
[    4.949452] ata1.00: Enabling discard_zeroes_data
[    4.954717] ata2.00: Enabling discard_zeroes_data
[    4.954752] sd 0:0:0:0: [sda] 781422768 512-byte logical blocks: (400 GB/373 GiB)
[    4.968405] sd 0:0:0:0: [sda] Write Protect is off
[    4.968458] sd 1:0:0:0: [sdb] 781422768 512-byte logical blocks: (400 GB/373 GiB)
[    4.968518] sd 1:0:0:0: [sdb] Write Protect is off
[    4.968537] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    4.968808] ata2.00: Enabling discard_zeroes_data
[    4.969196]  sdb: sdb1 sdb2 sdb3
[    4.969850] ata2.00: Enabling discard_zeroes_data
[    4.969900] sd 1:0:0:0: [sdb] Attached SCSI removable disk
[    4.982507] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    4.982677] sd 1:0:0:0: Attached scsi generic sg1 type 0
[    5.029710] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    5.040002] ata1.00: Enabling discard_zeroes_data
[    5.045630]  sda: sda1 sda2 sda3
[    5.049979] ata1.00: Enabling discard_zeroes_data
[    5.055319] sd 0:0:0:0: [sda] Attached SCSI removable disk
[    5.099369] usb 1-1.6: new high-speed USB device number 3 using ehci-pci
[    5.200999] usb 1-1.6: New USB device found, idVendor=0624, idProduct=0248
[    5.208702] usb 1-1.6: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[    5.216859] usb 1-1.6: Product: Gadget USB HUB
[    5.221825] usb 1-1.6: Manufacturer: no manufacturer
[    5.227385] usb 1-1.6: SerialNumber: 0123456789
[    5.233075] hub 1-1.6:1.0: USB hub found
[    5.237764] hub 1-1.6:1.0: 6 ports detected
[    5.531388] usb 1-1.6.1: new high-speed USB device number 4 using ehci-pci
[    5.641412] usb 1-1.6.1: New USB device found, idVendor=0624, idProduct=0249
[    5.649300] usb 1-1.6.1: New USB device strings: Mfr=4, Product=5, SerialNumber=6
[    5.657663] usb 1-1.6.1: Product: Keyboard/Mouse Function
[    5.663681] usb 1-1.6.1: Manufacturer: Avocent
[    5.668653] usb 1-1.6.1: SerialNumber: 20120917
[    5.681418] hidraw: raw HID events driver (C) Jiri Kosina
[    5.690586] usbcore: registered new interface driver usbhid
[    5.696818] usbhid: USB HID core driver
[    5.702441] input: Avocent Keyboard/Mouse Function as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.6/1-1.6.1/1-1.6.1:1.0/0003:0624:0249.0001/input/input0
[    5.771764] hid-generic 0003:0624:0249.0001: input,hidraw0: USB HID v1.00 Keyboard [Avocent Keyboard/Mouse Function] on usb-0000:00:1a.0-1.6.1/input0
[    5.787028] input: Avocent Keyboard/Mouse Function as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.6/1-1.6.1/1-1.6.1:1.1/0003:0624:0249.0002/input/input1
[    5.802893] hid-generic 0003:0624:0249.0002: input,hidraw1: USB HID v1.00 Mouse [Avocent Keyboard/Mouse Function] on usb-0000:00:1a.0-1.6.1/input1
[    5.817794] input: Avocent Keyboard/Mouse Function as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.6/1-1.6.1/1-1.6.1:1.2/0003:0624:0249.0003/input/input2
[    5.833602] hid-generic 0003:0624:0249.0003: input,hidraw2: USB HID v1.00 Mouse [Avocent Keyboard/Mouse Function] on usb-0000:00:1a.0-1.6.1/input2
done.
Begin: Running /scripts/local-premount ... done.
Begin: Waiting for root file system ... Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
done.
Gave up waiting for root device.  Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay[   39.813298] uhci_hcd: USB Universal Host Controller Interface driver
= (did the system wait long enough?)
   - Check[   39.824014] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
 root= (did the system wait for the right device?)
 - Missing modules (cat /proc/modules; ls /dev)
ALERT!  /dev/disk/by-uuid/14f314d7-dc19-493b-b1d7-0041ec8795d6 does not exist.  Dropping to a shell!
modprobe: module ehci-orion not found in modules.dep


BusyBox v1.22.1 (Debian 1:1.22.0-9+deb8u1) built-in shell (ash)
Enter 'help' for a list of built-in commands.

/bin/sh: can't access tty; job control turned off
(initramfs)

I manually edited the grub cmdline on the console (one-shot) and added rootdelay=10, and it succeeded with this relevant output:

[    4.989925] sd 0:0:0:0: [sda] 781422768 512-byte logical blocks: (400 GB/372 GiB)
[    4.989966] ata2.00: Enabling discard_zeroes_data
[    4.989974] sd 1:0:0:0: [sdb] 781422768 512-byte logical blocks: (400 GB/372 GiB)
[    4.990007] sd 1:0:0:0: [sdb] Write Protect is off
[    4.990018] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    4.990217] ata2.00: Enabling discard_zeroes_data
[    4.990332] sd 0:0:0:0: [sda] Write Protect is off
[    4.990499] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    4.990781] ata1.00: Enabling discard_zeroes_data
[    5.001680]  sdb: sdb1 sdb2 sdb3
[    5.001682]  sda: sda1 sda2 sda3
[    5.002290] ata2.00: Enabling discard_zeroes_data
[    5.002325] ata1.00: Enabling discard_zeroes_data
[    5.002336] sd 1:0:0:0: [sdb] Attached SCSI disk
[    5.003018] sd 0:0:0:0: [sda] Attached SCSI disk
[    5.038320] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    5.038416] sd 1:0:0:0: Attached scsi generic sg1 type 0
[    5.133626] bnx2x 0000:01:00.3: part number 394D4342-30383735-30305430-473030
[    5.157613] usb 1-1: New USB device found, idVendor=8087, idProduct=0024
[    5.165110] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    5.173510] hub 1-1:1.0: USB hub found
[    5.177821] hub 1-1:1.0: 6 ports detected
[    5.229674] usb 2-1: New USB device found, idVendor=8087, idProduct=0024
[    5.237167] usb 2-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    5.245618] hub 2-1:1.0: USB hub found
[    5.249881] hub 2-1:1.0: 8 ports detected
[    5.333493] Switched to clocksource tsc
[    5.453046] usb 1-1.6: new high-speed USB device number 3 using ehci-pci
[    5.550647] usb 1-1.6: New USB device found, idVendor=0624, idProduct=0248
[    5.558347] usb 1-1.6: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[    5.566517] usb 1-1.6: Product: Gadget USB HUB
[    5.571485] usb 1-1.6: Manufacturer: no manufacturer
[    5.577034] usb 1-1.6: SerialNumber: 0123456789
[    5.582586] hub 1-1.6:1.0: USB hub found
[    5.587118] hub 1-1.6:1.0: 6 ports detected
[    5.880764] usb 1-1.6.1: new high-speed USB device number 4 using ehci-pci
[    5.990346] usb 1-1.6.1: New USB device found, idVendor=0624, idProduct=0249
[    5.998238] usb 1-1.6.1: New USB device strings: Mfr=4, Product=5, SerialNumber=6
[    6.006598] usb 1-1.6.1: Product: Keyboard/Mouse Function
[    6.012638] usb 1-1.6.1: Manufacturer: Avocent
[    6.017601] usb 1-1.6.1: SerialNumber: 20120917
[    6.030247] hidraw: raw HID events driver (C) Jiri Kosina
[    6.039703] usbcore: registered new interface driver usbhid
[    6.045929] usbhid: USB HID core driver
[    6.051878] input: Avocent Keyboard/Mouse Function as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.6/1-1.6.1/1-1.6.1:1.0/0003:0624:0249.0001/input/input0
[    6.120709] hid-generic 0003:0624:0249.0001: input,hidraw0: USB HID v1.00 Keyboard [Avocent Keyboard/Mouse Function] on usb-0000:00:1a.0-1.6.1/input0
[    6.135861] input: Avocent Keyboard/Mouse Function as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.6/1-1.6.1/1-1.6.1:1.1/0003:0624:0249.0002/input/input1
[    6.151588] hid-generic 0003:0624:0249.0002: input,hidraw1: USB HID v1.00 Mouse [Avocent Keyboard/Mouse Function] on usb-0000:00:1a.0-1.6.1/input1
[    6.166424] input: Avocent Keyboard/Mouse Function as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.6/1-1.6.1/1-1.6.1:1.2/0003:0624:0249.0003/input/input2
[    6.182098] hid-generic 0003:0624:0249.0003: input,hidraw2: USB HID v1.00 Mouse [Avocent Keyboard/Mouse Function] on usb-0000:00:1a.0-1.6.1/input2
Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running[   15.320243] md: md0 stopped.
 /scripts/local-[   15.325094] md: bind<sdb1>
top ... Begin: A[   15.329024] md: bind<sda1>
ssembling all MD arrays ... [   15.335370] md: raid1 personality registered for level 1
[   15.342936] md/raid1:md0: active with 2 out of 2 mirrors
[   15.348962] md0: detected capacity change from 0 to 9990832128
[   15.356306]  md0: unknown partition table
mdadm: /dev/md/0 has been started with 2 drives.
Success: assembled all arrays.
done.
Waiting 5s for disks to show up (T131961)
done.
Begin: Running /scripts/local-premount ... done.
Begin: Will now check root file system ... fsck from util-linux 2.25.2
[/sbin/fsck.ext4 (1) -- /dev/md0] fsck.ext4 -a -C0 /dev/md0
/dev/md0: recovering journal
/dev/md0: Clearing orphaned inode 76 (uid=0, gid=0, mode=0100644, size=64024)
/dev/md0: Clearing orphaned inode 5841 (uid=0, gid=0, mode=[   20.584267] EXT4-fs (md0): mounted filesystem with ordered data mode. Opts: (null)
0100644, size=35176)
/dev/md0: Clearing orphaned inode 5834 (uid=0, gid=0, mode=0100755, size=140928)
/dev/md0: Clearing orphaned inode 5832 (uid=0, gid=0, mode=0100755, size=137440)
/dev/md0: Clearing orphaned inode 85 (uid=0, gid=0, mode=0100644, size=448440)
/dev/md0: clean, 138873/610800 files, 1572866/2439168 blocks
done.
done.
Begin: Running /scripts/local-bottom ... done.
Begin: Running /scripts/init-bottom ... done.
[   20.682825] ERST: NVRAM ERST Log Address Range not implemented yet.
[   20.693422] systemd[1]: systemd 215 running in system mode. (+PAM +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ -SECCOMP -APPARMOR)
[   20.709199] systemd[1]: Detected architecture 'x86-64'.

Welcome to Debian GNU/Linux 8 (jessie)!

[   20.739160] systemd[1]: Inserted module 'autofs4'
[   20.745229] systemd[1]: Set hostname to <cp1065>.

We've chosen local-top as the boot stage to sleep at. From initramfs-tools(8):

local-top OR nfs-top After these scripts have been executed, the root device node is expected to be present (local) or the network interface is expected to be usable (NFS).

Judging from the boot output posted by @BBlack it looks like the system tried to mount root before executing local-top:

Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... Begin: Assembling all MD arrays ... mdadm: No devices listed in conf file were  found.
Failure: failed to assemble all arrays.
done.
Waiting 5s for disks to show up (T131961)

It's probably better to put our sleep in init-premount.

Change 287914 had a related patch set uploaded (by Ema):
mdadm boot-time race condition: sleep in init-premount

https://gerrit.wikimedia.org/r/287914

Change 287914 merged by Ema:
mdadm boot-time race condition: sleep in init-premount

https://gerrit.wikimedia.org/r/287914

Mentioned in SAL [2016-05-10T13:30:06Z] <ema> depooling and rebooting cp1065 to test mdadm boot workaround T131961

Mentioned in SAL [2016-05-10T14:01:29Z] <ema> depooling and rebooting cp1066 to test mdadm boot workaround T131961

ema claimed this task.

I've just finished rebooting all cp* hosts in esams (T131928) and they came back up fine without raid issues.