2016-09-06 16:30:43: Icinga registered a service failure of CQL on restbase2004-b.codfw.wmnet, the result of a shutdown after encountering data corruption.
ERROR [CompactionExecutor:12945] 2016-09-06 16:24:37,145 CassandraDaemon.java:185 - Exception in threa d Thread[CompactionExecutor:12945,1,main] org.apache.cassandra.io.FSReadError: org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /srv/cassandra-b/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/la-35707-big-Data.db at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:358) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:359) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:322) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:126) ~[apache-cassandra-2.2.6.jar:2.2.6] [...] Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /srv/cassandra-b/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/la-35707-big-Data.db at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBufferStandard(CompressedRandomAccessReader.java:153) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:230) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.io.compress.CompressedThrottledReader.reBuffer(CompressedThrottledReader.java:42) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:346) ~[apache-cassandra-2.2.6.jar:2.2.6] ... 30 common frames omitted Caused by: org.apache.cassandra.io.compress.CorruptBlockException: (/srv/cassandra-b/data/local_group_wikipedia_T_parsoid_html/data-f3648bc0c2cb11e49ce6a1da77f2fd34/la-35707-big-Data.db): corruption detected, chunk at 24958152 of length 57619. at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBufferStandard(CompressedRandomAccessReader.java:124) ~[apache-cassandra-2.2.6.jar:2.2.6] ... 33 common frames omitted ERROR [CompactionExecutor:12945] 2016-09-06 16:24:37,147 StorageService.java:467 - Stopping gossiper WARN [CompactionExecutor:12945] 2016-09-06 16:24:37,147 StorageService.java:373 - Stopping gossip by operator request INFO [CompactionExecutor:12945] 2016-09-06 16:24:37,147 Gossiper.java:1448 - Announcing shutdown INFO [CompactionExecutor:12945] 2016-09-06 16:24:37,149 StorageService.java:1937 - Node /10.192.32.138 state jump to shutdown ERROR [CompactionExecutor:12945] 2016-09-06 16:24:39,170 StorageService.java:477 - Stopping native transport INFO [CompactionExecutor:12945] 2016-09-06 16:24:39,255 Server.java:218 - Stop listening for CQL clients
Additionally, this was found in dmesg:
10923014.250916] hpsa 0000:03:00.0: scsi 0:1:0:1 Aborting command ffff880cd6f1c9c0Tag:0x00000000:00000120 CDBLen: 10 CDB: 0x2a00... SN: 0x0 BEING SENT [10923014.250922] hpsa 0000:03:00.0: scsi 0:1:0:1: Aborting command Direct-Access HP LOGICAL VOLUME RAID-0 SSDSmartPathCap+ En+ Exp=1 [10923014.250960] hpsa 0000:03:00.0: scsi 0:1:0:1 Aborting command ffff880cd6f1c9c0Tag:0x00000000:00000120 CDBLen: 10 CDB: 0x2a00... SN: 0x0 SENT, FAILED [10923014.250968] hpsa 0000:03:00.0: scsi 0:1:0:1: FAILED to abort command Direct-Access HP LOGICAL VOLUME RAID-0 SSDSmartPathCap+ En+ Exp=1 [10923031.235036] hpsa 0000:03:00.0: scsi 0:1:0:1: resetting logical Direct-Access HP LOGICAL VOLUME RAID-0 SSDSmartPathCap+ En+ Exp=1 [10923046.114183] hpsa 0000:03:00.0: aborted: LUN:000000c000000101 CDB:12000000310000000000000000000000 [10923046.114189] hpsa 0000:03:00.0: hpsa_update_device_info: inquiry failed [10923046.146611] hpsa 0000:03:00.0: Inquiry failed, skipping device. [10923046.163938] hpsa 0000:03:00.0: scsi 0:1:0:1: reset logical completed successfully Direct-Access HP LOGICAL VOLUME RAID-0 SSDSmartPathCap+ En+ Exp=1 [10923046.173235] hpsa 0000:03:00.0: scsi 0:0:1:0: removed Direct-Access ATA Samsung SSD 850 PHYS DRV SSDSmartPathCap- En- Exp=0 [10923136.937859] hpsa 0000:03:00.0: scsi 0:0:1:0: masked Direct-Access ATA Samsung SSD 850 PHYS DRV SSDSmartPathCap- En- Exp=0
2016-09-07 14:10:11: Similar corruption has become evident on 2004-c.
ERROR [CompactionExecutor:15586] 2016-09-07 14:37:32,955 CassandraDaemon.java:201 - Exception in thread Thread[CompactionExecutor:15586,1,main] org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /srv/cassandra-c/data/local_group_globaldomain_T_mathoid_png/data-776488b0ef6911e59c486ffb300f2009/la-9319-big-Data.db at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBufferStandard(CompressedRandomAccessReader.java:153) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:230) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.io.compress.CompressedThrottledReader.reBuffer(CompressedThrottledReader.java:42) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:346) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:359) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:322) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:132) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:86) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:52) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:46) ~[apache-cassandra-2.2.6.jar:2.2.6] at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) ~[guava-16.0.jar:na] at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) ~[guava-16.0.jar:na] at org.apache.cassandra.io.sstable.SSTableIdentityIterator.hasNext(SSTableIdentityIterator.java:169) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:202) ~[apache-cassandra-2.2.6.jar:2.2.6] at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) ~[guava-16.0.jar:na] at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) ~[guava-16.0.jar:na] at com.google.common.collect.Iterators$7.computeNext(Iterators.java:645) ~[guava-16.0.jar:na] at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) ~[guava-16.0.jar:na] at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) ~[guava-16.0.jar:na] at org.apache.cassandra.db.ColumnIndex$Builder.buildForCompaction(ColumnIndex.java:166) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:125) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:136) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:117) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.db.compaction.writers.DefaultCompactionWriter.append(DefaultCompactionWriter.java:64) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:184) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:74) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:256) ~[apache-cassandra-2.2.6.jar:2.2.6] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_102] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_102] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_102] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_102] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102] Caused by: org.apache.cassandra.io.compress.CorruptBlockException: (/srv/cassandra-c/data/local_group_globaldomain_T_mathoid_png/data-776488b0ef6911e59c486ffb300f2009/la-9319-big-Data.db): corruption detected, chunk at 90435132 of length 243858. at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBufferStandard(CompressedRandomAccessReader.java:124) ~[apache-cassandra-2.2.6.jar:2.2.6] ... 33 common frames omitted
2016-09-10 05:26:29: Back to local_group_wikipedia_T_parsoid_html and instance 2004-b.
2016-09-12 06:44:53: Once more, local_group_wikipedia_T_parsoid_html on 2004-b.