Page MenuHomePhabricator

Cassandra instances outages (was: Outage of restbase2017-b)
Closed, ResolvedPublic

Assigned To
Authored By
Eevans
Aug 22 2019, 4:24 PM
Referenced Files
F30064789: image.png
Aug 22 2019, 4:47 PM
F30064761: image.png
Aug 22 2019, 4:43 PM
F30064780: image.png
Aug 22 2019, 4:43 PM
F30064777: image.png
Aug 22 2019, 4:43 PM

Description

Icinga reported that restbase2017-b went down today at 15:03:21, it was reported back up at 15:10:35 after a cron initiated Puppet run restarted the unit.

PROBLEM alert - restbase2017/cassandra-b service is CRITICALThu Aug 22 15:03:21 UTC 2019
RECOVERY alert - restbase2017/cassandra-b service is OKThu Aug 22 15:10:35 UTC 2019
/var/log/cassandra/system-b.log
[ ... ]

INFO  [ScheduledTasks:1] 2019-08-22 15:00:23,465 NoSpamLogger.java:91 - Some operations were slow, details available at debug level (debug.log)
INFO  [Service Thread] 2019-08-22 15:00:27,828 GCInspector.java:284 - G1 Young Generation GC in 216ms.  G1 Eden Space: 469762048 -> 0; G1 Old Gen: 10564281752 -> 8725110808;
 
WARN  [Service Thread] 2019-08-22 15:00:29,909 GCInspector.java:282 - G1 Young Generation GC in 1639ms.  G1 Eden Space: 461373440 -> 0; G1 Old Gen: 10667456672 -> 9641073664
; 
INFO  [Service Thread] 2019-08-22 15:00:29,910 StatusLogger.java:47 - Pool Name                    Active   Pending      Completed   Blocked  All Time Blocked
ERROR [ReadRepairStage:609648] 2019-08-22 15:00:29,999 CassandraDaemon.java:228 - Exception in thread Thread[ReadRepairStage:609648,5,main]
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 1 responses.
        at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:202) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:175) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:92) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:79) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_222]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_222]
        at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_222]
ERROR [ReadRepairStage:609645] 2019-08-22 15:00:30,020 CassandraDaemon.java:228 - Exception in thread Thread[ReadRepairStage:609645,5,main]
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 1 responses.
        at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:202) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:175) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:92) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:79) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_222]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_222]
        at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_222]

[ ... ]

INFO  [main] 2019-08-22 15:10:36,323 YamlConfigurationLoader.java:89 - Configuration location: file:/etc/cassandra-b/cassandra.yaml
INFO  [main] 2019-08-22 15:10:36,632 Config.java:495 - Node configuration:[allocate_tokens_for_keyspace=null; ...

[ ... ]

1DEBUG [COMMIT-LOG-ALLOCATOR] 2019-08-22 15:00:05,844 AbstractCommitLogSegmentManager.java:109 - No segments in reserve; creating a fresh one
2DEBUG [NativePoolCleaner] 2019-08-22 15:00:19,233 ColumnFamilyStore.java:1308 - Flushing largest CFS(Keyspace='wikipedia_T_parsoid', ColumnFamily='data') to free up room. Used total: 0.01/0.50, live: 0.01/0.50, flushing: 0.00/0.00, this: 0.00/0.09
3DEBUG [NativePoolCleaner] 2019-08-22 15:00:19,234 ColumnFamilyStore.java:918 - Enqueuing flush of data: 164.063KiB (0%) on-heap, 288.933MiB (9%) off-heap
4DEBUG [PerDiskMemtableFlushWriter_0:9424] 2019-08-22 15:00:19,287 Memtable.java:456 - Writing Memtable-data@634761634(86.039MiB serialized bytes, 1065 ops, 0%/9% of on/off-heap limit), flushed range = (min(-9223372036854775808), max(-1462500924879604616)]
5DEBUG [PerDiskMemtableFlushWriter_1:9423] 2019-08-22 15:00:19,287 Memtable.java:456 - Writing Memtable-data@634761634(86.039MiB serialized bytes, 1065 ops, 0%/9% of on/off-heap limit), flushed range = (max(-1462500924879604616), max(3778679129973661054)]
6DEBUG [PerDiskMemtableFlushWriter_2:9424] 2019-08-22 15:00:19,287 Memtable.java:456 - Writing Memtable-data@634761634(86.039MiB serialized bytes, 1065 ops, 0%/9% of on/off-heap limit), flushed range = (max(3778679129973661054), max(9223372036854775807)]
7DEBUG [PerDiskMemtableFlushWriter_1:9423] 2019-08-22 15:00:19,395 Memtable.java:485 - Completed flushing /srv/sdb4/cassandra-b/data/wikipedia_T_parsoid/data-db05bba065db11e9bc6c8958654eb0a6/mc-49095-big-Data.db (23.455MiB) for commitlog position CommitLogPosition(segmentId=1565021954783, position=24182044)
8DEBUG [PerDiskMemtableFlushWriter_2:9424] 2019-08-22 15:00:19,423 Memtable.java:485 - Completed flushing /srv/sdc4/cassandra-b/data/wikipedia_T_parsoid/data-db05bba065db11e9bc6c8958654eb0a6/mc-49096-big-Data.db (30.703MiB) for commitlog position CommitLogPosition(segmentId=1565021954783, position=24182044)
9DEBUG [PerDiskMemtableFlushWriter_0:9424] 2019-08-22 15:00:19,437 Memtable.java:485 - Completed flushing /srv/sda4/cassandra-b/data/wikipedia_T_parsoid/data-db05bba065db11e9bc6c8958654eb0a6/mc-49094-big-Data.db (31.892MiB) for commitlog position CommitLogPosition(segmentId=1565021954783, position=24182044)
10DEBUG [MemtableFlushWriter:9423] 2019-08-22 15:00:19,516 ColumnFamilyStore.java:1216 - Flushed to [BigTableReader(path='/srv/sda4/cassandra-b/data/wikipedia_T_parsoid/data-db05bba065db11e9bc6c8958654eb0a6/mc-49094-big-Data.db'), BigTableReader(path='/srv/sdb4/cassandra-b/data/wikipedia_T_parsoid/data-db05bba065db11e9bc6c8958654eb0a6/mc-49095-big-Data.db'), BigTableReader(path='/srv/sdc4/cassandra-b/data/wikipedia_T_parsoid/data-db05bba065db11e9bc6c8958654eb0a6/mc-49096-big-Data.db')] (3 sstables, 22.548MiB), biggest 8.510MiB, smallest 6.160MiB
11DEBUG [CompactionExecutor:45589] 2019-08-22 15:00:19,517 CompactionTask.java:155 - Compacting (8ee4bfc0-c4ed-11e9-ad4b-e11861b6015e) [/srv/sda4/cassandra-b/data/wikipedia_T_parsoid/data-db05bba065db11e9bc6c8958654eb0a6/mc-49094-big-Data.db:level=0, /srv/sda4/cassandra-b/data/wikipedia_T_parsoid/data-db05bba065db11e9bc6c8958654eb0a6/mc-49091-big-Data.db:level=0, /srv/sda4/cassandra-b/data/wikipedia_T_parsoid/data-db05bba065db11e9bc6c8958654eb0a6/mc-49086-big-Data.db:level=0, /srv/sda4/cassandra-b/data/wikipedia_T_parsoid/data-db05bba065db11e9bc6c8958654eb0a6/mc-49085-big-Data.db:level=0, ]
12DEBUG [COMMIT-LOG-ALLOCATOR] 2019-08-22 15:00:19,759 AbstractCommitLogSegmentManager.java:109 - No segments in reserve; creating a fresh one
13DEBUG [COMMIT-LOG-ALLOCATOR] 2019-08-22 15:00:20,776 AbstractCommitLogSegmentManager.java:109 - No segments in reserve; creating a fresh one
14WARN [PERIODIC-COMMIT-LOG-SYNCER] 2019-08-22 15:00:20,916 NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with average duration of Infinityms, 1 have exceeded the configured commit interval by an average of 39.99ms
15DEBUG [NativePoolCleaner] 2019-08-22 15:00:20,918 ColumnFamilyStore.java:1308 - Flushing largest CFS(Keyspace='enwiki_T_parsoid', ColumnFamily='data') to free up room. Used total: 0.01/0.50, live: 0.01/0.50, flushing: 0.00/0.00, this: 0.00/0.18
16DEBUG [NativePoolCleaner] 2019-08-22 15:00:20,918 ColumnFamilyStore.java:918 - Enqueuing flush of data: 14.648KiB (0%) on-heap, 561.592MiB (18%) off-heap
17DEBUG [PerDiskMemtableFlushWriter_2:9424] 2019-08-22 15:00:21,020 Memtable.java:456 - Writing Memtable-data@411722(14.848MiB serialized bytes, 244 ops, 0%/22% of on/off-heap limit), flushed range = (max(3778679129973661054), max(9223372036854775807)]
18DEBUG [PerDiskMemtableFlushWriter_1:9423] 2019-08-22 15:00:21,020 Memtable.java:456 - Writing Memtable-data@411722(14.848MiB serialized bytes, 244 ops, 0%/22% of on/off-heap limit), flushed range = (max(-1462500924879604616), max(3778679129973661054)]
19DEBUG [PerDiskMemtableFlushWriter_0:9424] 2019-08-22 15:00:21,020 Memtable.java:456 - Writing Memtable-data@411722(14.848MiB serialized bytes, 244 ops, 0%/22% of on/off-heap limit), flushed range = (min(-9223372036854775808), max(-1462500924879604616)]
20DEBUG [PerDiskMemtableFlushWriter_1:9423] 2019-08-22 15:00:21,028 Memtable.java:485 - Completed flushing /srv/sdb4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40013-big-Data.db (1.217MiB) for commitlog position CommitLogPosition(segmentId=1565021954785, position=13511880)
21DEBUG [PerDiskMemtableFlushWriter_2:9424] 2019-08-22 15:00:21,051 Memtable.java:485 - Completed flushing /srv/sdc4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40014-big-Data.db (6.381MiB) for commitlog position CommitLogPosition(segmentId=1565021954785, position=13511880)
22DEBUG [PerDiskMemtableFlushWriter_0:9424] 2019-08-22 15:00:21,065 Memtable.java:485 - Completed flushing /srv/sda4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40012-big-Data.db (7.251MiB) for commitlog position CommitLogPosition(segmentId=1565021954785, position=13511880)
23DEBUG [MemtableFlushWriter:9423] 2019-08-22 15:00:21,083 ColumnFamilyStore.java:1216 - Flushed to [BigTableReader(path='/srv/sda4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40012-big-Data.db'), BigTableReader(path='/srv/sdb4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40013-big-Data.db'), BigTableReader(path='/srv/sdc4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40014-big-Data.db')] (3 sstables, 3.175MiB), biggest 1.568MiB, smallest 367.798KiB
24DEBUG [CompactionExecutor:45588] 2019-08-22 15:00:21,084 CompactionTask.java:155 - Compacting (8fd401c0-c4ed-11e9-ad4b-e11861b6015e) [/srv/sdb4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40013-big-Data.db:level=0, /srv/sdb4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40009-big-Data.db:level=0, /srv/sdb4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40003-big-Data.db:level=0, /srv/sdb4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40001-big-Data.db:level=0, ]
25WARN [Service Thread] 2019-08-22 15:00:22,757 GCInspector.java:282 - G1 Young Generation GC in 1121ms. G1 Eden Space: 2566914048 -> 0; G1 Old Gen: 7893013408 -> 4274730688; G1 Survivor Space: 150994944 -> 0;
26DEBUG [COMMIT-LOG-ALLOCATOR] 2019-08-22 15:00:22,796 AbstractCommitLogSegmentManager.java:109 - No segments in reserve; creating a fresh one
27DEBUG [ReadRepairStage:609645] 2019-08-22 15:00:22,797 ReadCallback.java:242 - Digest mismatch:
28DEBUG [ReadRepairStage:609653] 2019-08-22 15:00:22,887 ReadCallback.java:242 - Digest mismatch:
29DEBUG [ReadRepairStage:609656] 2019-08-22 15:00:23,014 ReadCallback.java:242 - Digest mismatch:
30DEBUG [ReadRepairStage:609660] 2019-08-22 15:00:23,045 ReadCallback.java:242 - Digest mismatch:
31DEBUG [ReadRepairStage:609655] 2019-08-22 15:00:23,059 ReadCallback.java:242 - Digest mismatch:
32DEBUG [ReadRepairStage:609631] 2019-08-22 15:00:23,079 ReadCallback.java:242 - Digest mismatch:
33DEBUG [ReadRepairStage:609639] 2019-08-22 15:00:23,188 ReadCallback.java:242 - Digest mismatch:
34DEBUG [COMMIT-LOG-ALLOCATOR] 2019-08-22 15:00:23,190 AbstractCommitLogSegmentManager.java:109 - No segments in reserve; creating a fresh one
35DEBUG [ReadRepairStage:609649] 2019-08-22 15:00:23,260 ReadCallback.java:242 - Digest mismatch:
36DEBUG [ReadRepairStage:609645] 2019-08-22 15:00:23,281 ReadCallback.java:242 - Digest mismatch:
37DEBUG [NativePoolCleaner] 2019-08-22 15:00:23,297 ColumnFamilyStore.java:1308 - Flushing largest CFS(Keyspace='enwiki_T_parsoid', ColumnFamily='data') to free up room. Used total: 0.01/0.50, live: 0.01/0.50, flushing: 0.00/0.00, this: 0.00/0.18
38DEBUG [NativePoolCleaner] 2019-08-22 15:00:23,298 ColumnFamilyStore.java:918 - Enqueuing flush of data: 6.152KiB (0%) on-heap, 566.784MiB (18%) off-heap
39DEBUG [ReadRepairStage:609654] 2019-08-22 15:00:23,338 ReadCallback.java:242 - Digest mismatch:
40DEBUG [ReadRepairStage:609641] 2019-08-22 15:00:23,377 ReadCallback.java:242 - Digest mismatch:
41DEBUG [ReadRepairStage:609658] 2019-08-22 15:00:23,380 ReadCallback.java:242 - Digest mismatch:
42DEBUG [PerDiskMemtableFlushWriter_1:9423] 2019-08-22 15:00:23,459 Memtable.java:456 - Writing Memtable-data@1393473686(12.552MiB serialized bytes, 76 ops, 0%/19% of on/off-heap limit), flushed range = (max(-1462500924879604616), max(3778679129973661054)]
43DEBUG [PerDiskMemtableFlushWriter_0:9424] 2019-08-22 15:00:23,459 Memtable.java:456 - Writing Memtable-data@1393473686(12.552MiB serialized bytes, 76 ops, 0%/19% of on/off-heap limit), flushed range = (min(-9223372036854775808), max(-1462500924879604616)]
44DEBUG [PerDiskMemtableFlushWriter_1:9423] 2019-08-22 15:00:23,464 Memtable.java:485 - Completed flushing /srv/sdb4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40017-big-Data.db (553.905KiB) for commitlog position CommitLogPosition(segmentId=1565021954787, position=24521324)
45DEBUG [PerDiskMemtableFlushWriter_2:9424] 2019-08-22 15:00:23,465 Memtable.java:456 - Writing Memtable-data@1393473686(12.552MiB serialized bytes, 76 ops, 0%/19% of on/off-heap limit), flushed range = (max(3778679129973661054), max(9223372036854775807)]
46INFO [ScheduledTasks:1] 2019-08-22 15:00:23,465 NoSpamLogger.java:91 - Some operations were slow, details available at debug level (debug.log)
47DEBUG [ScheduledTasks:1] 2019-08-22 15:00:23,466 MonitoringTask.java:173 - 7 operations were slow in the last 4998 msecs:
48DEBUG [COMMIT-LOG-ALLOCATOR] 2019-08-22 15:00:23,546 AbstractCommitLogSegmentManager.java:109 - No segments in reserve; creating a fresh one
49DEBUG [PerDiskMemtableFlushWriter_0:9424] 2019-08-22 15:00:23,621 Memtable.java:485 - Completed flushing /srv/sda4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40016-big-Data.db (6.049MiB) for commitlog position CommitLogPosition(segmentId=1565021954787, position=24521324)
50DEBUG [PerDiskMemtableFlushWriter_2:9424] 2019-08-22 15:00:23,648 Memtable.java:485 - Completed flushing /srv/sdc4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40018-big-Data.db (5.962MiB) for commitlog position CommitLogPosition(segmentId=1565021954787, position=24521324)
51DEBUG [MemtableFlushWriter:9423] 2019-08-22 15:00:23,673 ColumnFamilyStore.java:1216 - Flushed to [BigTableReader(path='/srv/sda4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40016-big-Data.db'), BigTableReader(path='/srv/sdb4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40017-big-Data.db'), BigTableReader(path='/srv/sdc4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40018-big-Data.db')] (3 sstables, 2.408MiB), biggest 1.140MiB, smallest 168.172KiB
52DEBUG [ReadRepairStage:609628] 2019-08-22 15:00:23,739 ReadCallback.java:242 - Digest mismatch:
53DEBUG [ReadRepairStage:609629] 2019-08-22 15:00:24,060 ReadCallback.java:242 - Digest mismatch:
54DEBUG [CompactionExecutor:45588] 2019-08-22 15:00:24,317 CompactionTask.java:255 - Compacted (8fd401c0-c4ed-11e9-ad4b-e11861b6015e) 4 sstables to [/srv/sdb4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40015-big,] to level=0. 18.402MiB to 16.405MiB (~89% of original) in 3,232ms. Read Throughput = 5.693MiB/s, Write Throughput = 5.075MiB/s, Row Throughput = ~378/s. 795 total partitions merged to 757. Partition merge counts were {1:724, 2:28, 3:5, }
55DEBUG [ReadRepairStage:609660] 2019-08-22 15:00:24,548 ReadCallback.java:242 - Digest mismatch:
56DEBUG [ReadRepairStage:609630] 2019-08-22 15:00:24,555 ReadCallback.java:242 - Digest mismatch:
57DEBUG [ReadRepairStage:609628] 2019-08-22 15:00:24,868 ReadCallback.java:242 - Digest mismatch:
58DEBUG [NativePoolCleaner] 2019-08-22 15:00:24,912 ColumnFamilyStore.java:1308 - Flushing largest CFS(Keyspace='enwiki_T_parsoid', ColumnFamily='data') to free up room. Used total: 0.01/0.50, live: 0.01/0.50, flushing: 0.00/0.00, this: 0.00/0.18
59DEBUG [NativePoolCleaner] 2019-08-22 15:00:24,912 ColumnFamilyStore.java:918 - Enqueuing flush of data: 2.637KiB (0%) on-heap, 557.222MiB (18%) off-heap
60DEBUG [COMMIT-LOG-ALLOCATOR] 2019-08-22 15:00:24,924 AbstractCommitLogSegmentManager.java:109 - No segments in reserve; creating a fresh one
61DEBUG [ReadRepairStage:609660] 2019-08-22 15:00:25,062 ReadCallback.java:242 - Digest mismatch:
62DEBUG [ReadRepairStage:609635] 2019-08-22 15:00:25,103 ReadCallback.java:242 - Digest mismatch:
63DEBUG [PerDiskMemtableFlushWriter_2:9424] 2019-08-22 15:00:25,135 Memtable.java:456 - Writing Memtable-data@238293468(11.862MiB serialized bytes, 66 ops, 0%/19% of on/off-heap limit), flushed range = (max(3778679129973661054), max(9223372036854775807)]
64DEBUG [PerDiskMemtableFlushWriter_0:9424] 2019-08-22 15:00:25,135 Memtable.java:456 - Writing Memtable-data@238293468(11.862MiB serialized bytes, 66 ops, 0%/19% of on/off-heap limit), flushed range = (min(-9223372036854775808), max(-1462500924879604616)]
65DEBUG [PerDiskMemtableFlushWriter_1:9423] 2019-08-22 15:00:25,142 Memtable.java:456 - Writing Memtable-data@238293468(11.862MiB serialized bytes, 66 ops, 0%/19% of on/off-heap limit), flushed range = (max(-1462500924879604616), max(3778679129973661054)]
66DEBUG [PerDiskMemtableFlushWriter_1:9423] 2019-08-22 15:00:25,143 Memtable.java:485 - Completed flushing /srv/sdb4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40020-big-Data.db (61.480KiB) for commitlog position CommitLogPosition(segmentId=1565021954788, position=31174485)
67DEBUG [CompactionExecutor:45589] 2019-08-22 15:00:25,163 CompactionTask.java:255 - Compacted (8ee4bfc0-c4ed-11e9-ad4b-e11861b6015e) 4 sstables to [/srv/sda4/cassandra-b/data/wikipedia_T_parsoid/data-db05bba065db11e9bc6c8958654eb0a6/mc-49097-big,] to level=0. 70.042MiB to 62.930MiB (~89% of original) in 5,645ms. Read Throughput = 12.407MiB/s, Write Throughput = 11.147MiB/s, Row Throughput = ~531/s. 1,702 total partitions merged to 1,595. Partition merge counts were {1:1508, 2:70, 3:14, 4:3, }
68DEBUG [ReadRepairStage:609636] 2019-08-22 15:00:25,202 ReadCallback.java:242 - Digest mismatch:
69DEBUG [PerDiskMemtableFlushWriter_2:9424] 2019-08-22 15:00:25,233 Memtable.java:485 - Completed flushing /srv/sdc4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40021-big-Data.db (5.979MiB) for commitlog position CommitLogPosition(segmentId=1565021954788, position=31174485)
70DEBUG [PerDiskMemtableFlushWriter_0:9424] 2019-08-22 15:00:25,247 Memtable.java:485 - Completed flushing /srv/sda4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40019-big-Data.db (5.823MiB) for commitlog position CommitLogPosition(segmentId=1565021954788, position=31174485)
71DEBUG [ReadRepairStage:609647] 2019-08-22 15:00:25,248 ReadCallback.java:242 - Digest mismatch:
72DEBUG [ReadRepairStage:609637] 2019-08-22 15:00:25,321 ReadCallback.java:242 - Digest mismatch:
73DEBUG [MemtableFlushWriter:9423] 2019-08-22 15:00:25,350 ColumnFamilyStore.java:1216 - Flushed to [BigTableReader(path='/srv/sda4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40019-big-Data.db'), BigTableReader(path='/srv/sdb4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40020-big-Data.db'), BigTableReader(path='/srv/sdc4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40021-big-Data.db')] (3 sstables, 2.197MiB), biggest 1.110MiB, smallest 19.190KiB
74DEBUG [CompactionExecutor:45582] 2019-08-22 15:00:25,350 CompactionTask.java:155 - Compacting (925ef260-c4ed-11e9-ad4b-e11861b6015e) [/srv/sda4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40012-big-Data.db:level=0, /srv/sda4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40019-big-Data.db:level=0, /srv/sda4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40008-big-Data.db:level=0, /srv/sda4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40016-big-Data.db:level=0, ]
75DEBUG [ReadRepairStage:609651] 2019-08-22 15:00:25,355 ReadCallback.java:242 - Digest mismatch:
76DEBUG [ReadRepairStage:609663] 2019-08-22 15:00:25,489 ReadCallback.java:242 - Digest mismatch:
77DEBUG [ReadRepairStage:609662] 2019-08-22 15:00:25,686 ReadCallback.java:242 - Digest mismatch:
78DEBUG [ReadRepairStage:609654] 2019-08-22 15:00:25,730 ReadCallback.java:242 - Digest mismatch:
79DEBUG [ReadRepairStage:609632] 2019-08-22 15:00:25,784 ReadCallback.java:242 - Digest mismatch:
80DEBUG [ReadRepairStage:609631] 2019-08-22 15:00:25,796 ReadCallback.java:242 - Digest mismatch:
81DEBUG [ReadRepairStage:609627] 2019-08-22 15:00:25,801 ReadCallback.java:242 - Digest mismatch:
82DEBUG [ReadRepairStage:609664] 2019-08-22 15:00:25,817 ReadCallback.java:242 - Digest mismatch:
83DEBUG [ReadRepairStage:609625] 2019-08-22 15:00:25,828 ReadCallback.java:242 - Digest mismatch:
84DEBUG [ReadRepairStage:609633] 2019-08-22 15:00:25,842 ReadCallback.java:242 - Digest mismatch:
85DEBUG [COMMIT-LOG-ALLOCATOR] 2019-08-22 15:00:25,918 AbstractCommitLogSegmentManager.java:109 - No segments in reserve; creating a fresh one
86DEBUG [ReadRepairStage:609634] 2019-08-22 15:00:26,063 ReadCallback.java:242 - Digest mismatch:
87DEBUG [ReadRepairStage:609655] 2019-08-22 15:00:26,220 ReadCallback.java:242 - Digest mismatch:
88DEBUG [ReadRepairStage:609630] 2019-08-22 15:00:26,230 ReadCallback.java:242 - Digest mismatch:
89DEBUG [ReadRepairStage:609642] 2019-08-22 15:00:26,283 ReadCallback.java:242 - Digest mismatch:
90DEBUG [NativePoolCleaner] 2019-08-22 15:00:26,333 ColumnFamilyStore.java:1308 - Flushing largest CFS(Keyspace='enwiki_T_parsoid', ColumnFamily='data') to free up room. Used total: 0.01/0.68, live: 0.01/0.49, flushing: 0.00/0.18, this: 0.00/0.17
91DEBUG [NativePoolCleaner] 2019-08-22 15:00:26,333 ColumnFamilyStore.java:918 - Enqueuing flush of data: 1.172KiB (0%) on-heap, 512.418MiB (17%) off-heap
92DEBUG [COMMIT-LOG-ALLOCATOR] 2019-08-22 15:00:26,336 AbstractCommitLogSegmentManager.java:109 - No segments in reserve; creating a fresh one
93DEBUG [PerDiskMemtableFlushWriter_1:9423] 2019-08-22 15:00:26,387 Memtable.java:456 - Writing Memtable-data@1531469497(11.764MiB serialized bytes, 51 ops, 0%/17% of on/off-heap limit), flushed range = (max(-1462500924879604616), max(3778679129973661054)]
94DEBUG [PerDiskMemtableFlushWriter_0:9424] 2019-08-22 15:00:26,387 Memtable.java:456 - Writing Memtable-data@1531469497(11.764MiB serialized bytes, 51 ops, 0%/17% of on/off-heap limit), flushed range = (min(-9223372036854775808), max(-1462500924879604616)]
95DEBUG [PerDiskMemtableFlushWriter_1:9423] 2019-08-22 15:00:26,387 Memtable.java:485 - Completed flushing /srv/sdb4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40024-big-Data.db (0.000KiB) for commitlog position CommitLogPosition(segmentId=1565021954790, position=30804115)
96DEBUG [PerDiskMemtableFlushWriter_2:9424] 2019-08-22 15:00:26,388 Memtable.java:456 - Writing Memtable-data@1531469497(11.764MiB serialized bytes, 51 ops, 0%/17% of on/off-heap limit), flushed range = (max(3778679129973661054), max(9223372036854775807)]
97DEBUG [ReadRepairStage:609649] 2019-08-22 15:00:26,449 ReadCallback.java:242 - Digest mismatch:
98DEBUG [ReadRepairStage:609659] 2019-08-22 15:00:26,478 ReadCallback.java:242 - Digest mismatch:
99DEBUG [PerDiskMemtableFlushWriter_2:9424] 2019-08-22 15:00:26,501 Memtable.java:485 - Completed flushing /srv/sdc4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40025-big-Data.db (5.926MiB) for commitlog position CommitLogPosition(segmentId=1565021954790, position=30804115)
100DEBUG [PerDiskMemtableFlushWriter_0:9424] 2019-08-22 15:00:26,519 Memtable.java:485 - Completed flushing /srv/sda4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40023-big-Data.db (5.839MiB) for commitlog position CommitLogPosition(segmentId=1565021954790, position=30804115)
101DEBUG [MemtableFlushWriter:9423] 2019-08-22 15:00:26,555 ColumnFamilyStore.java:1216 - Flushed to [BigTableReader(path='/srv/sda4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40023-big-Data.db'), BigTableReader(path='/srv/sdc4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40025-big-Data.db')] (2 sstables, 2.167MiB), biggest 1.094MiB, smallest 1.074MiB
102DEBUG [CompactionExecutor:45584] 2019-08-22 15:00:26,556 CompactionTask.java:155 - Compacting (9316d0b0-c4ed-11e9-ad4b-e11861b6015e) [/srv/sdc4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40018-big-Data.db:level=0, /srv/sdc4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40014-big-Data.db:level=0, /srv/sdc4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40021-big-Data.db:level=0, /srv/sdc4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40025-big-Data.db:level=0, ]
103DEBUG [CompactionExecutor:45582] 2019-08-22 15:00:26,602 CompactionTask.java:255 - Compacted (925ef260-c4ed-11e9-ad4b-e11861b6015e) 4 sstables to [/srv/sda4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40022-big,] to level=0. 9.376MiB to 6.167MiB (~65% of original) in 1,235ms. Read Throughput = 7.589MiB/s, Write Throughput = 4.991MiB/s, Row Throughput = ~181/s. 185 total partitions merged to 181. Partition merge counts were {1:179, 2:1, 4:1, }
104DEBUG [ReadRepairStage:609656] 2019-08-22 15:00:26,731 ReadCallback.java:242 - Digest mismatch:
105DEBUG [COMMIT-LOG-ALLOCATOR] 2019-08-22 15:00:27,177 AbstractCommitLogSegmentManager.java:109 - No segments in reserve; creating a fresh one
106DEBUG [NativePoolCleaner] 2019-08-22 15:00:27,293 ColumnFamilyStore.java:1308 - Flushing largest CFS(Keyspace='enwiki_T_parsoid', ColumnFamily='data') to free up room. Used total: 0.01/0.67, live: 0.01/0.50, flushing: 0.00/0.17, this: 0.00/0.18
107DEBUG [NativePoolCleaner] 2019-08-22 15:00:27,293 ColumnFamilyStore.java:918 - Enqueuing flush of data: 0.586KiB (0%) on-heap, 548.402MiB (18%) off-heap
108DEBUG [CompactionExecutor:45584] 2019-08-22 15:00:27,388 CompactionTask.java:255 - Compacted (9316d0b0-c4ed-11e9-ad4b-e11861b6015e) 4 sstables to [/srv/sdc4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40026-big,] to level=0. 4.529MiB to 1.278MiB (~28% of original) in 831ms. Read Throughput = 5.446MiB/s, Write Throughput = 1.537MiB/s, Row Throughput = ~44/s. 25 total partitions merged to 22. Partition merge counts were {1:21, 4:1, }
109DEBUG [PerDiskMemtableFlushWriter_0:9424] 2019-08-22 15:00:27,516 Memtable.java:456 - Writing Memtable-data@1832774074(11.723MiB serialized bytes, 40 ops, 0%/20% of on/off-heap limit), flushed range = (min(-9223372036854775808), max(-1462500924879604616)]
110DEBUG [PerDiskMemtableFlushWriter_1:9423] 2019-08-22 15:00:27,522 Memtable.java:456 - Writing Memtable-data@1832774074(11.723MiB serialized bytes, 40 ops, 0%/20% of on/off-heap limit), flushed range = (max(-1462500924879604616), max(3778679129973661054)]
111DEBUG [PerDiskMemtableFlushWriter_1:9423] 2019-08-22 15:00:27,523 Memtable.java:485 - Completed flushing /srv/sdb4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40028-big-Data.db (0.000KiB) for commitlog position CommitLogPosition(segmentId=1565021954792, position=6102662)
112DEBUG [PerDiskMemtableFlushWriter_2:9424] 2019-08-22 15:00:27,523 Memtable.java:456 - Writing Memtable-data@1832774074(11.723MiB serialized bytes, 40 ops, 0%/20% of on/off-heap limit), flushed range = (max(3778679129973661054), max(9223372036854775807)]
113INFO [Service Thread] 2019-08-22 15:00:27,828 GCInspector.java:284 - G1 Young Generation GC in 216ms. G1 Eden Space: 469762048 -> 0; G1 Old Gen: 10564281752 -> 8725110808;
114DEBUG [COMMIT-LOG-ALLOCATOR] 2019-08-22 15:00:27,851 AbstractCommitLogSegmentManager.java:109 - No segments in reserve; creating a fresh one
115DEBUG [PerDiskMemtableFlushWriter_2:9424] 2019-08-22 15:00:27,901 Memtable.java:485 - Completed flushing /srv/sdc4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40029-big-Data.db (5.909MiB) for commitlog position CommitLogPosition(segmentId=1565021954792, position=6102662)
116DEBUG [PerDiskMemtableFlushWriter_0:9424] 2019-08-22 15:00:27,908 Memtable.java:485 - Completed flushing /srv/sda4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40027-big-Data.db (5.814MiB) for commitlog position CommitLogPosition(segmentId=1565021954792, position=6102662)
117DEBUG [MemtableFlushWriter:9423] 2019-08-22 15:00:27,923 ColumnFamilyStore.java:1216 - Flushed to [BigTableReader(path='/srv/sda4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40027-big-Data.db'), BigTableReader(path='/srv/sdc4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40029-big-Data.db')] (2 sstables, 2.154MiB), biggest 1.088MiB, smallest 1.066MiB
118DEBUG [NativePoolCleaner] 2019-08-22 15:00:28,175 ColumnFamilyStore.java:1308 - Flushing largest CFS(Keyspace='enwiki_T_parsoid', ColumnFamily='data') to free up room. Used total: 0.01/0.68, live: 0.01/0.48, flushing: 0.00/0.18, this: 0.00/0.15
119DEBUG [NativePoolCleaner] 2019-08-22 15:00:28,175 ColumnFamilyStore.java:918 - Enqueuing flush of data: 0.586KiB (0%) on-heap, 463.869MiB (15%) off-heap
120DEBUG [ReadRepairStage:609634] 2019-08-22 15:00:28,179 ReadCallback.java:242 - Digest mismatch:
121DEBUG [Native-Transport-Requests-22] 2019-08-22 15:00:29,909 ReadCallback.java:132 - Timed out; received 2 of 3 responses (including data)
122WARN [Service Thread] 2019-08-22 15:00:29,909 GCInspector.java:282 - G1 Young Generation GC in 1639ms. G1 Eden Space: 461373440 -> 0; G1 Old Gen: 10667456672 -> 9641073664;
123DEBUG [ScheduledTasks:1] 2019-08-22 15:00:29,912 MonitoringTask.java:173 - 185 operations were slow in the last 4806 msecs:
124DEBUG [Native-Transport-Requests-17] 2019-08-22 15:00:29,977 DataResolver.java:200 - Timeout while read-repairing after receiving all 6 data and digest responses
125DEBUG [Native-Transport-Requests-71] 2019-08-22 15:00:29,977 ReadCallback.java:132 - Timed out; received 2 of 3 responses (including data)
126DEBUG [Native-Transport-Requests-54] 2019-08-22 15:00:29,977 ReadCallback.java:132 - Timed out; received 2 of 3 responses (including data)
127DEBUG [Native-Transport-Requests-49] 2019-08-22 15:00:29,977 ReadCallback.java:132 - Timed out; received 2 of 3 responses (including data)
128DEBUG [ReadRepairStage:609648] 2019-08-22 15:00:29,984 DataResolver.java:200 - Timeout while read-repairing after receiving all 2 data and digest responses
129DEBUG [ReadRepairStage:609645] 2019-08-22 15:00:29,986 DataResolver.java:200 - Timeout while read-repairing after receiving all 2 data and digest responses
130ERROR [ReadRepairStage:609648] 2019-08-22 15:00:29,999 CassandraDaemon.java:228 - Exception in thread Thread[ReadRepairStage:609648,5,main]
131DEBUG [Native-Transport-Requests-6] 2019-08-22 15:00:30,019 ReadCallback.java:132 - Timed out; received 2 of 3 responses (including data)
132ERROR [ReadRepairStage:609645] 2019-08-22 15:00:30,020 CassandraDaemon.java:228 - Exception in thread Thread[ReadRepairStage:609645,5,main]
133DEBUG [Native-Transport-Requests-41] 2019-08-22 15:00:30,021 DataResolver.java:200 - Timeout while read-repairing after receiving all 6 data and digest responses
134DEBUG [Native-Transport-Requests-3] 2019-08-22 15:00:30,020 ReadCallback.java:132 - Timed out; received 1 of 3 responses (including data)
135DEBUG [COMMIT-LOG-ALLOCATOR] 2019-08-22 15:00:30,039 AbstractCommitLogSegmentManager.java:109 - No segments in reserve; creating a fresh one
136DEBUG [Native-Transport-Requests-8] 2019-08-22 15:00:30,088 ReadCallback.java:132 - Timed out; received 2 of 3 responses (including data)
137DEBUG [PerDiskMemtableFlushWriter_0:9424] 2019-08-22 15:00:30,117 Memtable.java:456 - Writing Memtable-data@929332878(11.723MiB serialized bytes, 39 ops, 0%/20% of on/off-heap limit), flushed range = (min(-9223372036854775808), max(-1462500924879604616)]
138DEBUG [PerDiskMemtableFlushWriter_1:9423] 2019-08-22 15:00:30,122 Memtable.java:456 - Writing Memtable-data@929332878(11.723MiB serialized bytes, 39 ops, 0%/20% of on/off-heap limit), flushed range = (max(-1462500924879604616), max(3778679129973661054)]
139DEBUG [PerDiskMemtableFlushWriter_1:9423] 2019-08-22 15:00:30,123 Memtable.java:485 - Completed flushing /srv/sdb4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40031-big-Data.db (0.000KiB) for commitlog position CommitLogPosition(segmentId=1565021954793, position=18598938)
140DEBUG [PerDiskMemtableFlushWriter_2:9424] 2019-08-22 15:00:30,123 Memtable.java:456 - Writing Memtable-data@929332878(11.723MiB serialized bytes, 39 ops, 0%/20% of on/off-heap limit), flushed range = (max(3778679129973661054), max(9223372036854775807)]
141DEBUG [Native-Transport-Requests-66] 2019-08-22 15:00:30,161 ReadCallback.java:132 - Timed out; received 1 of 3 responses (including data)
142DEBUG [PerDiskMemtableFlushWriter_2:9424] 2019-08-22 15:00:30,230 Memtable.java:485 - Completed flushing /srv/sdc4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40032-big-Data.db (5.909MiB) for commitlog position CommitLogPosition(segmentId=1565021954793, position=18598938)
143DEBUG [Native-Transport-Requests-4] 2019-08-22 15:00:30,533 ReadCallback.java:132 - Timed out; received 1 of 3 responses (including data)
144DEBUG [ReadRepairStage:609664] 2019-08-22 15:00:30,534 ReadCallback.java:242 - Digest mismatch:
145DEBUG [PerDiskMemtableFlushWriter_0:9424] 2019-08-22 15:00:30,565 Memtable.java:485 - Completed flushing /srv/sda4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40030-big-Data.db (5.814MiB) for commitlog position CommitLogPosition(segmentId=1565021954793, position=18598938)
146DEBUG [MemtableFlushWriter:9423] 2019-08-22 15:00:30,587 ColumnFamilyStore.java:1216 - Flushed to [BigTableReader(path='/srv/sda4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40030-big-Data.db'), BigTableReader(path='/srv/sdc4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40032-big-Data.db')] (2 sstables, 2.154MiB), biggest 1.088MiB, smallest 1.066MiB
147DEBUG [CompactionExecutor:45581] 2019-08-22 15:00:30,587 CompactionTask.java:155 - Compacting (957e0cb0-c4ed-11e9-ad4b-e11861b6015e) [/srv/sda4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40022-big-Data.db:level=0, /srv/sda4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40023-big-Data.db:level=0, /srv/sda4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40027-big-Data.db:level=0, /srv/sda4/cassandra-b/data/enwiki_T_parsoid/data-ae7fe34065df11e9bc6c8958654eb0a6/mc-40030-big-Data.db:level=0, ]
148DEBUG [Native-Transport-Requests-26] 2019-08-22 15:00:30,907 ReadCallback.java:132 - Timed out; received 2 of 3 responses (including data)
149DEBUG [ReadRepairStage:609650] 2019-08-22 15:00:31,011 ReadCallback.java:242 - Digest mismatch:
150DEBUG [NativePoolCleaner] 2019-08-22 15:00:31,118 ColumnFamilyStore.java:1308 - Flushing largest CFS(Keyspace='enwiki_T_parsoid', ColumnFamily='data') to free up room. Used total: 0.01/0.65, live: 0.01/0.46, flushing: 0.00/0.15, this: 0.00/0.12
151DEBUG [NativePoolCleaner] 2019-08-22 15:00:31,118 ColumnFamilyStore.java:918 - Enqueuing flush of data: 0.586KiB (0%) on-heap, 376.564MiB (12%) off-heap

image.png (266×445 px, 20 KB)
2017-b latency (Write-LOCAL_QUORUM in light brown, Read-LOCAL_ONE in purple)
image.png (267×443 px, 20 KB)
2009-b latency (Write-LOCAL_QUORUM in light brown, Read-LOCAL_ONE in purple)
image.png (268×594 px, 71 KB)
Load average (restbase2009-b show in light green)

Event Timeline

Eevans updated the task description. (Show Details)
Eevans updated the task description. (Show Details)

I've only looked at this briefly, but some observations:

  • The JVM seems to have spontaneously and uncerimoniously self-destructed:
    • No logged (fatal) exceptions
    • No crash log or heap dump
  • None of the usual signs of distress preceding the event
    • StatusLogger frequency
    • Major GCs
    • Load avg, cpu, IO
  • High latency immediately preceding the event shared by 2009-b

Conclusion: This is...weird, but probably not worth spending any more time on unless it begins to look like a pattern.

Additionally, before digging too deeply, it would make sense to pursue T200803: Upgrade Cassandra 3.11.2 clusters to 3.11.4 (bugfix release) first.

This instance has gone down again, we should dig deeper.

To: services@wikimedia.org
Subject: ** PROBLEM alert - restbase2017/Check systemd state is CRITICAL **

Notification Type: PROBLEM

Service: Check systemd state
Host: restbase2017
Address: 10.192.48.119
State: CRITICAL

Date/Time: Thu Aug 29 03:20:29 UTC 2019

Notes URLs: https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state

Acknowledged by : 

Additional Info:

CRITICAL - degraded: The system is operational but one or more units failed.

The timeline of events goes like this:

  • [2019-08-29T03:07:53+00:00] StatusLogger shows ReadStage backing up some (~100 pending tasks). Cause? Effect?
  • [2019-08-29T03:17:07+00:00] Exceptions in ReadRepairStage thread (4 total ~4 secs). Cause? Effect?
  • [2019-08-29T03:17:12+00:00] Last message logged (StatusLogger.java:101...)
  • [2019-08-29T03:17:15+00:00] Syslog/systemd shutdown (see below)
  • [2019-08-29T03:20:27+00:00] cassandra-b service is CRITICAL
  • [2019-08-29T03:20:59+00:00] cassandra-b CQL 10.192.48.122:9042 is CRITICAL
  • [2019-08-29T03:21:37+00:00] cassandra-b SSL 10.192.48.122:7001 is CRITICAL
  • [2019-08-29T03:40:33+00:00] First message logged from startup (restarted by Puppet)
  • [2019-08-29T03:40:55+00:00] cassandra-b service is OK
  • [2019-08-29T03:42:03+00:00] cassandra-b SSL 10.192.48.122:7001 is OK
  • [2019-08-29T03:42:55+00:00] cassandra-b CQL 10.192.48.122:9042 is OK
syslog
Aug 29 03:17:15 restbase2017 systemd[1]: cassandra-b.service: Main process exited, code=exited, status=3/NOTIMPLEMENTED
Aug 29 03:17:15 restbase2017 systemd[1]: cassandra-b.service: Unit entered failed state.
Aug 29 03:17:15 restbase2017 systemd[1]: cassandra-b.service: Failed with result 'exit-code'

[ ... ]

Aug 29 03:40:21 restbase2017 puppet-agent[75589]: (/Stage[main]/Cassandra/Cassandra::Instance[b]/Systemd::Service[cassandra-b]/Service[cassandra-b]/ensure) ensure changed stopped' to 'running'

As before, there were no hs_err_pid%p.log" files or heap dumps (cassandra-$(date +%s)-pid$$.hprof). And no fatal exceptions in the logs.

If the systemd syslog output is to believed, the service exited with status code 3 (...status=3/NOTIMPLEMENTED). AFAICT, Java would exit with a status code > 127 if the JVM shutdown as the result of a signal, otherwise this would have to be the result of a java.lang.System.exit(3) call. However, I can find no explanation of this in Cassandra's code.

This instance has gone down again, we should dig deeper.

[ ... ]

It would appear this also happened on July 30 at ~19:15 UTC (and went un-investigated at the time).

Mentioned in SAL (#wikimedia-operations) [2019-08-30T16:44:22Z] <urandom> restarting restbase2017-b with live hack startup script (adds logging) -- T231027

If the systemd syslog output is to believed, the service exited with status code 3 (...status=3/NOTIMPLEMENTED). AFAICT, Java would exit with a status code > 127 if the JVM shutdown as the result of a signal, otherwise this would have to be the result of a java.lang.System.exit(3) call. However, I can find no explanation of this in Cassandra's code.

I think we can trust systemd on that. I checked and it correctly identifies java as the mainpid, which was the only doubt I had.

Very dumb question: which version of the cassandra code did you check? Did you include debian patches we apply, if any?

A stopgap if we want the unit to be restarted regularly even when exiting in error would be to use Restart=always and some sane throttling rules, but I'm not sure we want it for cassandra in general.

If the systemd syslog output is to believed, the service exited with status code 3 (...status=3/NOTIMPLEMENTED). AFAICT, Java would exit with a status code > 127 if the JVM shutdown as the result of a signal, otherwise this would have to be the result of a java.lang.System.exit(3) call. However, I can find no explanation of this in Cassandra's code.

I think we can trust systemd on that. I checked and it correctly identifies java as the mainpid, which was the only doubt I had.

Very dumb question: which version of the cassandra code did you check? Did you include debian patches we apply, if any?

We're running 3.11.2, and using the upstream package.

A stopgap if we want the unit to be restarted regularly even when exiting in error would be to use Restart=always and some sane throttling rules, but I'm not sure we want it for cassandra in general.

The outages themselves haven't caused any issues, and Puppet will start it back up when it next runs; I don't think we need to go this far.

Part of what has made this so odd is that it only ever occurred to this one instance, so good/bad news, this has now been observed on 2009-b as well.

restbase2009:/var/log/syslog
[ ... ]
Sep  9 09:35:12 restbase2009 systemd[1]: cassandra-b.service: main process exited, code=exited, status=3/NOTIMPLEMENTED
Sep  9 09:35:12 restbase2009 systemd[1]: Unit cassandra-b.service entered failed state.
[ .... ]
Eevans renamed this task from Outage of restbase2017-b to Cassandra instances outages (was: Outage of restbase2017-b).Sep 10 2019, 6:03 PM
Eevans lowered the priority of this task from High to Medium.Sep 30 2019, 2:39 PM

Since an upgrade to Cassandra 3.11.4 was pending anyway (T200803: Upgrade Cassandra 3.11.2 clusters to 3.11.4 (bugfix release)), we prioritized that work a) in the event the issue had been fixed upstream, and b) so that if we had to dig deeper to troubleshoot, that we would be doing so against a current release. Unfortunately, this does not seem to have fixed (see: T238591), and we will indeed need to dig deeper.

Whatever the issue here, it seems to have resolved itself (years ago...); Closing