Page MenuHomePhabricator

put new restbase servers in service
Closed, ResolvedPublic

Description

new restbase machines have base installed completed, pending:

  • coordinated puppet signing
  • ordinately bring up new cassandra nodes and join the cluster
  • add new restbase machines to pybal once cassandra fully in service

Event Timeline

fgiunchedi claimed this task.
fgiunchedi raised the priority of this task from to Medium.
fgiunchedi updated the task description. (Show Details)
fgiunchedi added projects: acl*sre-team, RESTBase.
fgiunchedi added subscribers: Eevans, mobrovac, GWicke.

Change 217678 had a related patch set uploaded (by Filippo Giunchedi):
restbase: add restbase100[789] to manifest and hieradata

https://gerrit.wikimedia.org/r/217678

Change 217678 merged by Filippo Giunchedi:
restbase: add restbase100[789] to manifest and hieradata

https://gerrit.wikimedia.org/r/217678

ran through the initial bootstrapping of restbase1007 today with @GWicke

The cassandra version on our apt repo is 2.1.5 which got there on April 29th, likely an unintended side effect of reprepro update which in turns pulls updates from all sources. cassandra 2.1.5 could not successfully stream data from other nodes resulting in NPE, to work around that the cassandra version has been downgraded on restbase1007 to 2.1.3 and that allows for a successful bootstrap and we'll need to do the same for other new nodes (testing 2.1.6 is T101745)

1007 ran into disk trouble:

352515.128962] sd 1:0:0:0: [sdb] Unhandled error code
[352515.134401] sd 1:0:0:0: [sdb]  
[352515.138002] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[352515.144801] sd 1:0:0:0: [sdb] CDB: 
[352515.148789] Read(10): 28 00 42 cd 5e 60 00 00 08 00
[352515.154449] end_request: I/O error, dev sdb, sector 1120755296

currently looking into that and attempting bootstrap of restbase1008 with @Eevans, we've seen this error coming up on nodes when trying to bootstrap (logs of peer node, already in production)

ERROR [STREAM-IN-/10.64.32.178] 2015-06-15 20:20:43,539 StreamSession.java:477 - [Stream #f5e89ff0-139b-11e5-9881-3fcfb96209d9] Streaming error occurred
java.lang.NullPointerException: null
        at org.apache.cassandra.io.sstable.SSTableReader.getPositionsForRanges(SSTableReader.java:1277) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.StreamSession.getSSTableSectionsForRanges(StreamSession.java:313) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.StreamSession.addTransferRanges(StreamSession.java:266) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:493) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:425) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:251) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
INFO  [STREAM-IN-/10.64.32.178] 2015-06-15 20:20:43,546 StreamResultFuture.java:180 - [Stream #f5e89ff0-139b-11e5-9881-3fcfb96209d9] Session with /10.64.32.178 is complete
WARN  [STREAM-IN-/10.64.32.178] 2015-06-15 20:20:43,547 StreamResultFuture.java:207 - [Stream #f5e89ff0-139b-11e5-9881-3fcfb96209d9] Stream failed
INFO  [GossipStage:1] 2015-06-15 20:20:45,554 Gossiper.java:930 - InetAddress /10.64.32.178 is now DOWN

and on the node in question

INFO  [STREAM-IN-/10.64.32.159] 2015-06-15 20:20:43,539 StreamResultFuture.java:180 - [Stream #f5e89ff0-139b-11e5-9881-3fcfb96209d9] Session with /10.64.32.159 is complete
WARN  [STREAM-IN-/10.64.32.159] 2015-06-15 20:20:43,542 StreamResultFuture.java:207 - [Stream #f5e89ff0-139b-11e5-9881-3fcfb96209d9] Stream failed
ERROR [main] 2015-06-15 20:20:43,550 CassandraDaemon.java:482 - Exception encountered during startup
java.lang.RuntimeException: Error during boostrap: Stream failed
        at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:86) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1036) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:839) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:635) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:518) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:338) [apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:465) [apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:554) [apache-cassandra-2.1.3.jar:2.1.3]
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
        at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) ~[guava-16.0.jar:na]
        at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) ~[guava-16.0.jar:na]
        at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-16.0.jar:na]
        at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-16.0.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) ~[guava-16.0.jar:na]
        at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:208) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:184) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:387) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.StreamSession.sessionFailed(StreamSession.java:592) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:447) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:251) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_79]
INFO  [StorageServiceShutdownHook] 2015-06-15 20:20:43,553 Gossiper.java:1341 - Announcing shutdown
INFO  [StorageServiceShutdownHook] 2015-06-15 20:20:45,553 MessagingService.java:707 - Waiting for messaging service to quiesce
INFO  [ACCEPT-/10.64.32.178] 2015-06-15 20:20:45,555 MessagingService.java:955 - MessagingService has terminated the accept() thread

The bootstrap of 1008 that was started yesterday just failed with (the usual):

INFO  [StreamReceiveTask:50] 2015-06-16 14:07:43,508 StreamResultFuture.java:180 - [Stream #dfb718e0-139d-11e5-844b-3fcfb96209d9] Session with /10.64.32.159 is complete
WARN  [StreamReceiveTask:50] 2015-06-16 14:07:43,510 StreamResultFuture.java:207 - [Stream #dfb718e0-139d-11e5-844b-3fcfb96209d9] Stream failed
ERROR [main] 2015-06-16 14:07:43,511 CassandraDaemon.java:482 - Exception encountered during startup
java.lang.RuntimeException: Error during boostrap: Stream failed
        at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:86) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1036) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:839) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:635) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:518) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:338) [apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:465) [apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:554) [apache-cassandra-2.1.3.jar:2.1.3]
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
        at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) ~[guava-16.0.jar:na]
        at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) ~[guava-16.0.jar:na]
        at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-16.0.jar:na]
        at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-16.0.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) ~[guava-16.0.jar:na]
        at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:208) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:184) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:387) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:660) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:623) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:139) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_79]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_79]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_79]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_79]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_79]

What is unusual, is that I cannot find the typical NullPointerException in any of the other 6 nodes' logs.

None of these exceptions are very helpful: I suggest that before going any further (regardless the course), that we move the logging level to DEBUG, (while configuring the rotation to be less aggressive). The signal-to-noise ratio will be very bad, but on the off chance there is any increase in "signal", it'll be worth it. It might even be worth taking one more stab at a bootstrap with DEBUG enabled, before trying other things.

I see a few possible courses of action going forward:

  • (a) assuming we cannot reproduce in testing, patch/instrument the effected code (version 2.1.3) to produce better debug data (iterate as necessary)
  • (b) upgrade to 2.1.6
  • (c) wait, and upgrade 2.1.7 (fast-tracking a release if possible)

The advantage to (a) is understanding what is broken here. And assuming that we make small, targetted, non-normative changes, it should be the least risky strategy as well. The disadvantage is that it may take longer in the end.

The premise behind (b) is that 2.1.6 should be Less Buggy than 2.1.3, but without knowing what this bug is, there is really no reason to think it is fixed in 2.1.6. And, there is always the possibility it will introduce new regressions. https://issues.apache.org/jira/browse/CASSANDRA-9549 in particular may be a problem for us (more research needed), and at the very least stands as evidence that 2.1.6 is not immune from serious bugs. On the other hand, if it does work, it's the quickest/easiest solution.

Waiting for 2.1.7 would at least mean that CASSANDRA-9549 has been fixed.


I recommend, out of an abundance of caution, that we go with (a).

@Eevans, how will DEBUG affect latencies?

Good question. I don't know, we'd have to (carefully) try it.

One option would be to use JMX to enable it on a class-by-class and/or package-by-package basis. Either as a test of the impact, or as a strategy to minimize any impact if that proves to be a problem.

For posterity sake:

INFO  [STREAM-IN-/10.64.32.159] 2015-06-15 20:34:39,278 StreamResultFuture.java:166 - [Stream #dfb718e0-139d-11e5-844b-3fcfb96209d9 ID#0] Prepare completed. Receiving 2847 files(304524197467 bytes), sending 0 files(0 bytes)
...
WARN  [StreamReceiveTask:50] 2015-06-16 14:07:43,510 StreamResultFuture.java:207 - [Stream #dfb718e0-139d-11e5-844b-3fcfb96209d9] Stream failed

Each iteration/attempt takes a minimum of ~17.5 hours from bootstrap-start to stream failure.

@Eevans ok sounds like we can try enabling it even now at runtime via JMX?

myself and @Eevans have played yesterday a bit with debug settings for org.apache.cassandra.io.sstable org.apache.cassandra.dht org.apache.cassandra.streaming org.apache.cassandra.service on restbase1003 and tried bootstrapping restbase1008 again, that yielded the same error but no useful debugging. Current plan is to go for (b) and upgrade the fleet to 2.1.6 first.

root@restbase1001:~# nodetool snapshot
Requested creating snapshot(s) for [all keyspaces] with snapshot name [1434558594941]
Snapshot directory: 1434558594941
root@restbase1001:~# service cassandra stop
root@restbase1001:~# apt-get install cassandra
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Suggested packages:
  cassandra-tools
The following packages will be upgraded:
  cassandra
1 upgraded, 0 newly installed, 0 to remove and 157 not upgraded.
Need to get 19.9 MB of archives.
After this operation, 1,475 kB of additional disk space will be used.
Get:1 http://apt.wikimedia.org/wikimedia/ jessie-wikimedia/thirdparty cassandra all 2.1.6 [19.9 MB]
Fetched 19.9 MB in 0s (32.4 MB/s)  
Reading changelogs... Done
(Reading database ... 43952 files and directories currently installed.)
Preparing to unpack .../cassandra_2.1.6_all.deb ...
Unpacking cassandra (2.1.6) over (2.1.3) ...
Processing triggers for python-support (1.0.15) ...
Setting up cassandra (2.1.6) ...
Installing new version of config file /etc/cassandra/commitlog_archiving.properties ...

Configuration file '/etc/cassandra/cassandra.yaml'
 ==> Modified (by you or by a script) since installation.
 ==> Package distributor has shipped an updated version.
   What would you like to do about it ?  Your options are:
    Y or I  : install the package maintainer's version
    N or O  : keep your currently-installed version
      D     : show the differences between the versions
      Z     : start a shell to examine the situation
 The default action is to keep your current version.
*** cassandra.yaml (Y/I/N/O/D/Z) [default=N] ? 

Configuration file '/etc/cassandra/cassandra-env.sh'
 ==> Modified (by you or by a script) since installation.
 ==> Package distributor has shipped an updated version.
   What would you like to do about it ?  Your options are:
    Y or I  : install the package maintainer's version
    N or O  : keep your currently-installed version
      D     : show the differences between the versions
      Z     : start a shell to examine the situation
 The default action is to keep your current version.
*** cassandra-env.sh (Y/I/N/O/D/Z) [default=N] ? 
sysctl: cannot open "/etc/sysctl.d/cassandra.conf": No such file or directory

Warning: unable to set vm.max_map_count; is this an OpenVZ
instance? If so, it is highly recommended that you set
vm.max_map_count to 1048575 in the host.

Deleting the local sysctl.d/cassandra.conf.
update-rc.d: warning: start and stop actions are no longer supported; falling back to defaults
Processing triggers for python-support (1.0.15) ...
root@restbase1001:~#

current status, restarting cassandra works however it seems that metrics are being pushed to graphite only for (only?) one period (60s) and then stop. a thread dump shows some metrics threads waiting

"metrics-graphite-reporter-thread-1" daemon prio=10 tid=0x00007f85c469b000 nid=0x7279 waiting on condition [0x00007f7dc5882000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000684e4a0c0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1079)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
"metrics-meter-tick-thread-2" daemon prio=10 tid=0x00007f85c4862800 nid=0x6fb5 waiting on condition [0x00007f85a4bc0000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000675a38228> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1085)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

"metrics-meter-tick-thread-1" daemon prio=10 tid=0x00007f85c485d000 nid=0x6fb4 waiting on condition [0x00007f85a4c01000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000675a38228> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

which in turn might indicate a deadlock? note that reporting metrics works fine on the test cluster

scratch the deadlock part, threads are in the same state even on a working cassandra 2.1.3 instance

Change 219236 had a related patch set uploaded (by Filippo Giunchedi):
cassandra: add restbase1008 to cassandra::seeds

https://gerrit.wikimedia.org/r/219236

Change 219236 merged by Filippo Giunchedi:
cassandra: add restbase1008 to cassandra::seeds

https://gerrit.wikimedia.org/r/219236

The latest bootstrap attempt of 1008 has failed as well. The exact reason is unclear because a hardware issue prevents reliably getting at the log data.

...
[832404.241192] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[832404.241194] sd 1:0:0:0: [sdb] CDB: 
[832404.241195] Read(10): 28 00 2c e1 fc 00 00 00 08 00
[832404.241201] end_request: I/O error, dev sdb, sector 753007616
[832409.938536] sd 1:0:0:0: [sdb] Unhandled error code
[832409.938541] sd 1:0:0:0: [sdb]  
[832409.938543] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[832409.938544] sd 1:0:0:0: [sdb] CDB: 
[832409.938546] Read(10): 28 00 2c e1 fc 00 00 00 08 00
[832409.938553] end_request: I/O error, dev sdb, sector 753007616
[832409.945133] sd 1:0:0:0: [sdb] Unhandled error code
[832409.945135] sd 1:0:0:0: [sdb]  
[832409.945137] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[832409.945138] sd 1:0:0:0: [sdb] CDB: 
[832409.945139] Read(10): 28 00 2c e1 fc 00 00 00 08 00
[832409.945146] end_request: I/O error, dev sdb, sector 753007616
[832411.116554] sd 1:0:0:0: [sdb] Unhandled error code
[832411.116558] sd 1:0:0:0: [sdb]  
[832411.116560] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[832411.116562] sd 1:0:0:0: [sdb] CDB: 
[832411.116563] Read(10): 28 00 2c e1 fc 00 00 00 08 00
[832411.116571] end_request: I/O error, dev sdb, sector 753007616
[832411.123150] sd 1:0:0:0: [sdb] Unhandled error code
[832411.123152] sd 1:0:0:0: [sdb]  
[832411.123153] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[832411.123155] sd 1:0:0:0: [sdb] CDB: 
[832411.123156] Read(10): 28 00 2c e1 fc 00 00 00 08 00
[832411.123162] end_request: I/O error, dev sdb, sector 753007616
[832449.398075] sd 1:0:0:0: [sdb] Unhandled error code
[832449.398081] sd 1:0:0:0: [sdb]  
[832449.398084] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[832449.398086] sd 1:0:0:0: [sdb] CDB: 
[832449.398088] Read(10): 28 00 2c e1 fc 00 00 00 08 00
[832449.398099] end_request: I/O error, dev sdb, sector 753007616
[832449.404855] sd 1:0:0:0: [sdb] Unhandled error code
[832449.404858] sd 1:0:0:0: [sdb]  
[832449.404860] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[832449.404862] sd 1:0:0:0: [sdb] CDB: 
[832449.404863] Read(10): 28 00 2c e1 fc 00 00 00 08 00
[832449.404872] end_request: I/O error, dev sdb, sector 753007616

That makes this two out of three new nodes having sdb hw issues.

First errors in dmesg:

[831652.740509] ata2.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x6 frozen
[831652.749174] ata2.00: failed command: READ FPDMA QUEUED
[831652.755063] ata2.00: cmd 60/00:00:00:a8:d8/04:00:50:00:00/40 tag 0 ncq 524288 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)                              
[831652.771830] ata2.00: status: { DRDY }
[831652.776049] ata2.00: failed command: READ FPDMA QUEUED
[831652.781891] ata2.00: cmd 60/00:08:00:0c:ab/04:00:50:00:00/40 tag 1 ncq 524288 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)                              
[831652.798586] ata2.00: status: { DRDY }

Worth checking if dmesg on 1009 shows similar issues.

we've seen more sdb errors on restbase1008 too, but nothing on sda so far. To rule out further things like driver issues/bugs and so on I've upgraded 1007/8/9 to 3.19 which we have to do anyway and launched smart extended tests on all disks, not that it will be conclusive but at least the disk self-scans itself

Regarding the two non-hardware blockers, (the bootstrap/streaming failures, and metrics reporting):

https://issues.apache.org/jira/browse/CASSANDRA-9624
https://issues.apache.org/jira/browse/CASSANDRA-9625

Change 219496 had a related patch set uploaded (by Filippo Giunchedi):
cassandra: add restbase1009 to cassandra::seeds

https://gerrit.wikimedia.org/r/219496

Change 219496 merged by Filippo Giunchedi:
cassandra: add restbase1009 to cassandra::seeds

https://gerrit.wikimedia.org/r/219496

With 2.1.7 on both 1006 and 1009 the first ~400G stream succeeds, but for some reason some other part of the stream fails right after:

INFO  [StreamReceiveTask:55] 2015-06-20 23:48:37,574 StreamResultFuture.java:180 - [Stream #b9e17e80-16db-11e5-8044-b35b54affa2e] Session with /10.64.48.100 is complete
INFO  [CompactionExecutor:482] 2015-06-20 23:48:37,575 ColumnFamilyStore.java:906 - Enqueuing flush of compactions_in_progress: 404 (0%) on-heap, 348 (0%) off-heap
INFO  [MemtableFlushWriter:286] 2015-06-20 23:48:37,576 Memtable.java:346 - Writing Memtable-compactions_in_progress@999280685(270 serialized bytes, 11 ops, 0%/0% of on/off-heap limit)
WARN  [StreamReceiveTask:55] 2015-06-20 23:48:37,577 StreamResultFuture.java:207 - [Stream #b9e17e80-16db-11e5-8044-b35b54affa2e] Stream failed
ERROR [main] 2015-06-20 23:48:37,578 CassandraDaemon.java:541 - Exception encountered during startup
java.lang.RuntimeException: Error during boostrap: Stream failed
        at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:86) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1137) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:927) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:723) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:605) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) [apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:524) [apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:613) [apache-cassandra-2.1.7.jar:2.1.7]
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
        at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) ~[guava-16.0.jar:na]
        at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) ~[guava-16.0.jar:na]
        at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-16.0.jar:na]
        at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-16.0.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) ~[guava-16.0.jar:na]
        at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:208) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:184) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:409) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:682) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:645) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:139) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_79]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_79]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_79]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_79]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_79]

During the second bootstrap attempt restbase1009 now suffered (almost) the same disk failure fate as its brethren; only difference is that this time sda is failing:

[186963.425817] blk_update_request: I/O error, dev sda, sector 1317209424
[186963.433194] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[186963.433197] sd 0:0:0:0: [sda] CDB: 
[186963.433199] Read(10): 28 00 4e 83 05 50 00 00 08 00

To me this is starting to look a lot like a bad batch of SSDs.

At this point, 2.1.7 is overall looking like a regression (still can't bootstrap, metrics don't work); I suggest we downgrade to 2.1.3, and regroup after the new hardware has been sorted.

agreed, 2.1.3 will give us the metrics back (and/or for longer at least)

I've downgraded restbase100[1-6] back to 2.1.3

Now that restbase1008 is fitted with different disks, I propose the following course of action:

  1. We attempt a bootstrap of 1008 while running 2.1.3 (since we cannot rule out faulty hardware on all of the prior failed attempts).
  2. If bootstrapping fails (w/o any obvious link to hardware), choose from one of:
    • retry bootstrap w/ 2.1.7
    • goto #3 (my preferred, since there is little reason to think 2.1.7 should solve this issue).
  3. If issues persist, I propose we ramp up our efforts to debug (using 2.1.3, since that version does not introduce a metrics reporting regression)
    1. patch the affected code to produce more meaningful errors / log additional context at the point of failure (may require multiple iterations)
    2. as a weapon of last resort, attach a remote debugger to the production instance (means suspending the vm), first the stream destination, then if necessary, the stream source (existing/joined production nodes).

Obviously, in an ideal world, we'd never go near production with 3a or 3b, but without the ability to reproduce this problem in the test environment, we're left with little option.

I went ahead and started cassandra 2.1.3 on restbase1008, the bootstrap/streaming failure is there from 1004

restbase1004

ERROR [STREAM-IN-/10.64.32.178] 2015-06-24 21:10:37,750 StreamSession.java:477 - [Stream #6858a1f0-1ab5-11e5-9171-3fcfb96209d9] Streaming error occurred
java.lang.NullPointerException: null
        at org.apache.cassandra.io.sstable.SSTableReader.getPositionsForRanges(SSTableReader.java:1277) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.StreamSession.getSSTableSectionsForRanges(StreamSession.java:313) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.StreamSession.addTransferRanges(StreamSession.java:266) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:493) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:425) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:251) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
INFO  [STREAM-IN-/10.64.32.178] 2015-06-24 21:10:37,755 StreamResultFuture.java:180 - [Stream #6858a1f0-1ab5-11e5-9171-3fcfb96209d9] Session with /10.64.32.178 is complete
WARN  [STREAM-IN-/10.64.32.178] 2015-06-24 21:10:37,755 StreamResultFuture.java:207 - [Stream #6858a1f0-1ab5-11e5-9171-3fcfb96209d9] Stream failed
INFO  [GossipStage:1] 2015-06-24 21:10:39,764 Gossiper.java:930 - InetAddress /10.64.32.178 is now DOWN

and restbase1008

INFO  [STREAM-IN-/10.64.32.160] 2015-06-24 21:10:37,750 StreamResultFuture.java:180 - [Stream #6858a1f0-1ab5-11e5-9171-3fcfb96209d9] Session with /10.64.32.160 is complete
WARN  [STREAM-IN-/10.64.32.160] 2015-06-24 21:10:37,752 StreamResultFuture.java:207 - [Stream #6858a1f0-1ab5-11e5-9171-3fcfb96209d9] Stream failed
ERROR [main] 2015-06-24 21:10:37,760 CassandraDaemon.java:482 - Exception encountered during startup
java.lang.RuntimeException: Error during boostrap: Stream failed
        at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:86) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1036) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:839) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:635) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:518) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:338) [apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:465) [apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:554) [apache-cassandra-2.1.3.jar:2.1.3]
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
        at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) ~[guava-16.0.jar:na]
        at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) ~[guava-16.0.jar:na]
        at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-16.0.jar:na]
        at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-16.0.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) ~[guava-16.0.jar:na]
        at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:208) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:184) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:387) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.StreamSession.sessionFailed(StreamSession.java:592) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:447) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:251) ~[apache-cassandra-2.1.3.jar:2.1.3]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_79]
INFO  [StorageServiceShutdownHook] 2015-06-24 21:10:37,763 Gossiper.java:1341 - Announcing shutdown
INFO  [StorageServiceShutdownHook] 2015-06-24 21:10:39,763 MessagingService.java:707 - Waiting for messaging service to quiesce
INFO  [ACCEPT-/10.64.32.178] 2015-06-24 21:10:39,765 MessagingService.java:955 - MessagingService has terminated the accept() thread

The most recent attempt at bootstrapping 1008 failed early this morning, after/during the first streaming session with 1004. Both nodes were running Cassandra 2.1.7.

From 1008:

INFO  [StreamReceiveTask:57] 2015-06-26 06:48:53,061 StreamResultFuture.java:180 - [Stream #8bdeb1b0-1ad2-11e5-abd8-3fcfb96209d9] Session with /10.64.32.160 is complete
WARN  [StreamReceiveTask:57] 2015-06-26 06:48:53,062 StreamResultFuture.java:207 - [Stream #8bdeb1b0-1ad2-11e5-abd8-3fcfb96209d9] Stream failed
INFO  [CompactionExecutor:2885] 2015-06-26 06:48:53,062 ColumnFamilyStore.java:906 - Enqueuing flush of compactions_in_progress: 428 (0%) on-heap, 379 (0%) off-heap
INFO  [MemtableFlushWriter:959] 2015-06-26 06:48:53,063 Memtable.java:346 - Writing Memtable-compactions_in_progress@1203013482(294 serialized bytes, 12 ops, 0%/0% of on/off-heap limit)
ERROR [main] 2015-06-26 06:48:53,063 CassandraDaemon.java:541 - Exception encountered during startup
java.lang.RuntimeException: Error during boostrap: Stream failed
        at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:86) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1137) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:927) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:723) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:605) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) [apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:524) [apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:613) [apache-cassandra-2.1.7.jar:2.1.7]
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
        at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) ~[guava-16.0.jar:na]
        at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) ~[guava-16.0.jar:na]
        at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-16.0.jar:na]
        at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-16.0.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) ~[guava-16.0.jar:na]
        at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:208) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:184) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:409) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:682) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:645) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:139) ~[apache-cassandra-2.1.7.jar:2.1.7]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_79]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_79]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_79]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_79]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_79]

From 1004:

INFO  [STREAM-IN-/10.64.32.178] 2015-06-26 06:48:53,060 StreamResultFuture.java:180 - [Stream #8bdeb1b0-1ad2-11e5-abd8-3fcfb96209d9] Session with /10.64.32.178 is complete
INFO  [STREAM-IN-/10.64.32.178] 2015-06-26 06:48:53,064 StreamResultFuture.java:212 - [Stream #8bdeb1b0-1ad2-11e5-abd8-3fcfb96209d9] All sessions completed

This mirrors the past 2.1.7-to-2.1.7 attempt.

Neither machine has any disk errors that I can see.

OK, so catching up with where we are according to The Plan:

  1. We attempt a bootstrap of 1008 while running 2.1.3 (since we cannot rule out faulty hardware on all of the prior failed attempts).
  2. If bootstrapping fails (w/o any obvious link to hardware), choose from one of:
    • retry bootstrap w/ 2.1.7
    • goto #3 (my preferred, since there is little reason to think 2.1.7 should solve this issue).
  3. If issues persist, I propose we ramp up our efforts to debug (using 2.1.3, since that version does not introduce a metrics reporting regression)
    1. patch the affected code to produce more meaningful errors / log additional context at the point of failure (may require multiple iterations)
    2. as a weapon of last resort, attach a remote debugger to the production instance (means suspending the vm), first the stream destination, then if necessary, the stream source (existing/joined production nodes).

Having failed to bootstrap with 2.1.3 and 2.1.7, we are now at step #3.

Originally I proposed that we troubleshoot the bootstrapping issue using 2.1.3, on the basis that we could defer tackling the metrics regression. I'm amending that recommendation now to 2.1.7. My rationale:

  • After downgrading to 2.1.3 on the 23rd, we immediately began to experience higher than usual storage latencies. I'm not sure how to explain this, the only correlating factor (other than the timing of the downgrade), is 'tombestones scanned'. In the absence of a good explanation, moving back to 2.1.7 seems like a reasonable next step.
  • Working the bootstrap issue against the latest/current stable release is more advantageous with respect to getting opinions from the Cassandra community.

This of course means that we are going to need an alternative to the graphite metrics reporter.

Change 221763 had a related patch set uploaded (by Filippo Giunchedi):
cassandra: testing a higher a reporting interval

https://gerrit.wikimedia.org/r/221763

Change 221763 merged by Filippo Giunchedi:
cassandra: testing a higher a reporting interval

https://gerrit.wikimedia.org/r/221763

One reoccurring issue seems to manifest as sharply increasing pending tasks for memtable flush and mutation stages:

http://grafana.wikimedia.org/#/dashboard/db/restbase-cassandra-thread-pools?from=1436296798752&to=1436297988835&var-node=restbase1005

Eventually this results in increasing GC activity, followed by death.

This would seem to be IO related, but I'd like to get a better idea what is happening, and would request that anyone encountering a node in such a condition attempt to obtain stack and heap dumps, prior to restarting.

JAVA_VERSION=8
sudo -u cassandra /usr/lib/jvm/java-$JAVA_VERSION-openjdk-amd64/bin/jstack -J-d64 -l -m `pgrep -f CassandraDaemon` > /path/to/thread-dump-`date +%s`.log
sudo -u cassandra /usr/lib/jvm/java-$JAVA_VERSION-openjdk-amd64/bin/jmap -dump:format=b,file=/path/to/heap-`date +%s`.bin `pgrep -f CassandraDaemon`

It would appear that with gc_grace_seconds set to 0, and deletes happening at consistency ONE, that some deletes aren't sticking.

$ zcat wikipedia_T_parsoid_html.log.1435873391.gz | tail -n 25
-- deleting:  ["sh.wikipedia.org","San_Juan_del_Valle,_Chilchotla",13830580] Fri May 08 2015 12:23:55 GMT+0000 (UTC)
-- deleting:  ["sh.wikipedia.org","San_Juan_del_Valle,_Chilchotla",13790428] Fri May 08 2015 04:48:32 GMT+0000 (UTC)
-- deleting:  ["sh.wikipedia.org","San_Juan_del_Valle,_Chilchotla",13620095] Wed May 06 2015 19:07:52 GMT+0000 (UTC)
-- deleting:  ["sh.wikipedia.org","San_Juan_del_Valle,_Chilchotla",13532196] Wed May 06 2015 01:10:53 GMT+0000 (UTC)
-- deleting:  ["sh.wikipedia.org","San_Juan_del_Valle,_Chilchotla",13434471] Tue May 05 2015 04:29:43 GMT+0000 (UTC)
-- deleting:  ["sh.wikipedia.org","San_Juan_del_Valle,_Chilchotla",13274729] Sun May 03 2015 21:01:47 GMT+0000 (UTC)
-- deleting:  ["en.wikipedia.org","1953_Iranian_coup_d'état",657303897] Tue May 12 2015 00:12:05 GMT+0000 (UTC)
-- deleting:  ["en.wikipedia.org","Vagus_nerve",654533006] Tue May 05 2015 09:50:03 GMT+0000 (UTC)
...
cassandra@cqlsh:local_group_wikipedia_T_parsoid_html> select key,rev,dateOf(tid),tid from data where "_dmain" = 'sh.wikipedia.org' and key = 'San_Juan_del_Valle,_Chilchotla' and rev = 13790428;

 key                            | rev      | dateOf(tid)              | tid
--------------------------------+----------+--------------------------+--------------------------------------
 San_Juan_del_Valle,_Chilchotla | 13790428 | 2015-05-08 04:48:32+0000 | 7a40b404-f53d-11e4-bb02-e72a65a01436

(1 rows)
cassandra@cqlsh:local_group_wikipedia_T_parsoid_html> CONSISTENCY ALL 
Consistency level set to ALL.
cassandra@cqlsh:local_group_wikipedia_T_parsoid_html> select key,rev,dateOf(tid),tid from data where "_dmain" = 'sh.wikipedia.org' and key = 'San_Juan_del_Valle,_Chilchotla' and rev = 13790428 and tid = 7a40b04-f53d-11e4-bb02-e72a65a01436;

 _domain | key | rev | tid | _del | content-location | content-sha256 | content-type | latestTid | tags | value
---------+-----+-----+-----+------+------------------+----------------+--------------+-----------+------+-------

(0 rows)

Of the dozen or so I've checked so far, at least half were like this.

A lot of the issues we encountered were caused by T105509. Since deploying the RESTBase hotfix to that issue, compaction has started to catch up, latencies are recovering, timeouts are rare again and storage space is starting to shrink.

We still need to reduce the instance size going forward to make sure that we can bootstrap & can actually drive the hardware fully without hitting GC limits. However, we have also improved the setup a bit (switched to G1GC, hardened concurrency limits against overload), which might have possibly pushed up the absolute limit very slightly for our use case.

Change 227475 had a related patch set uploaded (by Filippo Giunchedi):
cassandra: add restbase1007

https://gerrit.wikimedia.org/r/227475

Change 227475 merged by Filippo Giunchedi:
cassandra: add restbase1007

https://gerrit.wikimedia.org/r/227475

bootstrapping restbase1007 was successful!

15:47 -icinga-wm:#wikimedia-operations- RECOVERY - Cassanda CQL query interface on restbase1007 is OK: TCP OK - 0.005 second response time on port 9042

restbase1007:~$ nodetool status
Datacenter: eqiad
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens  Owns    Host ID                               Rack
UN  10.64.32.160  822.2 GB   256     ?       798ff758-8c91-46e0-b85e-dad356c46f20  b
UN  10.64.48.99   757.26 GB  256     ?       325e01e8-debe-45f0-a8c2-93b3baa58968  d
UN  10.64.48.100  714.7 GB   256     ?       2abf437d-a16d-406b-a6de-8d28b7dda808  d
UN  10.64.0.220   725.13 GB  256     ?       c021a198-b7f1-4dc2-94d7-9cb8b8a8df28  a
UN  10.64.0.221   795.92 GB  256     ?       fc041cc8-cd28-4030-b29a-05b9a632cafc  a
UN  10.64.32.159  673.18 GB  256     ?       88d9ef9f-d81b-466e-babf-6a283b13f648  b
UN  10.64.0.223   437.24 GB  256     ?       c1b5a012-4840-4096-9a71-ce4d3afb0029  a

/me high-fives @fgiunchedi and @Eevans!

It took a bit longer than planned to add that node, but it's all good in the end ;)

Change 227994 had a related patch set uploaded (by Filippo Giunchedi):
cassandra: add restbase1008

https://gerrit.wikimedia.org/r/227994

Change 227994 merged by Filippo Giunchedi:
cassandra: add restbase1008

https://gerrit.wikimedia.org/r/227994

1008 has now joined the cluster as well:

Datacenter: eqiad
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address                   Load       Tokens  Owns    Host ID                               Rack
UN  restbase1005.eqiad.wmnet  763.29 GB  256     ?       325e01e8-debe-45f0-a8c2-93b3baa58968  d
UN  restbase1003.eqiad.wmnet  672.61 GB  256     ?       88d9ef9f-d81b-466e-babf-6a283b13f648  b
UN  restbase1007.eqiad.wmnet  390.78 GB  256     ?       c1b5a012-4840-4096-9a71-ce4d3afb0029  a
UN  restbase1002.eqiad.wmnet  796.21 GB  256     ?       fc041cc8-cd28-4030-b29a-05b9a632cafc  a
UN  restbase1006.eqiad.wmnet  711.87 GB  256     ?       2abf437d-a16d-406b-a6de-8d28b7dda808  d
UN  restbase1001.eqiad.wmnet  725.37 GB  256     ?       c021a198-b7f1-4dc2-94d7-9cb8b8a8df28  a
UN  restbase1008.eqiad.wmnet  500.7 GB   256     ?       e9ab408a-309e-4e93-b145-9ac5f2365523  b
UN  restbase1004.eqiad.wmnet  812.06 GB  256     ?       798ff758-8c91-46e0-b85e-dad356c46f20  b

@fgiunchedi, @Eevans: Should we start the bootstrap for 1009 today?

@fgiunchedi, @Eevans: Should we start the bootstrap for 1009 today?

That's the plan AFAIK; @fgiunchedi?

@fgiunchedi, @Eevans: Should we start the bootstrap for 1009 today?

That's the plan AFAIK; @fgiunchedi?

Filippo's out today, it's a holiday in Ireland.

Change 229095 had a related patch set uploaded (by Filippo Giunchedi):
cassandra: add restbase1009

https://gerrit.wikimedia.org/r/229095

Change 229095 merged by Filippo Giunchedi:
cassandra: add restbase1009

https://gerrit.wikimedia.org/r/229095

@fgiunchedi, @Eevans: Should we start the bootstrap for 1009 today?

That's the plan AFAIK; @fgiunchedi?

Filippo's out today, it's a holiday in Ireland.

indeed!

restbase1009 is bootstrapping

root@restbase1009:~# nodetool netstats  | grep -v 100%
Mode: JOINING
Bootstrap 6b6ac0f0-3a93-11e5-8b42-63ed2935445e
    /10.64.0.221
    /10.64.32.178
    /10.64.48.100
        Receiving 2662 files, 266574138067 bytes total. Already received 10 files, 414282277 bytes total
            /var/lib/cassandra/data/local_group_wiktionary_T_parsoid_dataW4ULtxs1oMq/data-8198e0d0fe3111e492d527cac5e63599/local_group_wiktionary_T_parsoid_dataW4ULtxs1oMq-data-tmp-ka-10-Data.db 8844856/12859749 bytes(68%) received from idx:0/10.64.48.100
    /10.64.0.220
    /10.64.0.223
    /10.64.32.159
    /10.64.32.160
    /10.64.48.99
        Receiving 2899 files, 296637105677 bytes total. Already received 6 files, 435622852 bytes total
            /var/lib/cassandra/data/local_group_wiktionary_T_parsoid_dataW4ULtxs1oMq/data-8198e0d0fe3111e492d527cac5e63599/local_group_wiktionary_T_parsoid_dataW4ULtxs1oMq-data-tmp-ka-7-Data.db 83140935/137981086 bytes(60%) received from idx:0/10.64.48.99
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed
Commands                        n/a         0              9
Responses                       n/a         0           9011
root@restbase1009:~#

..aand we are complete:

Datacenter: eqiad                                                                                 
=================                                                                                 
Status=Up/Down                                                                                    
|/ State=Normal/Leaving/Joining/Moving                                                            
--  Address                   Load       Tokens  Owns    Host ID                               Rack                                                                                                 
UN  restbase1004.eqiad.wmnet  803.24 GB  256     ?       798ff758-8c91-46e0-b85e-dad356c46f20  b  
UN  restbase1008.eqiad.wmnet  478.13 GB  256     ?       e9ab408a-309e-4e93-b145-9ac5f2365523  b  
UN  restbase1005.eqiad.wmnet  771.36 GB  256     ?       325e01e8-debe-45f0-a8c2-93b3baa58968  d  
UN  restbase1006.eqiad.wmnet  709.21 GB  256     ?       2abf437d-a16d-406b-a6de-8d28b7dda808  d  
UN  restbase1001.eqiad.wmnet  721.87 GB  256     ?       c021a198-b7f1-4dc2-94d7-9cb8b8a8df28  a  
UN  restbase1002.eqiad.wmnet  789.45 GB  256     ?       fc041cc8-cd28-4030-b29a-05b9a632cafc  a  
UN  restbase1009.eqiad.wmnet  523.07 GB  256     ?       3ee62592-ef41-445a-863f-b05be9ae5ac8  d  
UN  restbase1003.eqiad.wmnet  666.9 GB   256     ?       88d9ef9f-d81b-466e-babf-6a283b13f648  b  
UN  restbase1007.eqiad.wmnet  414.1 GB   256     ?       c1b5a012-4840-4096-9a71-ce4d3afb0029  a

/me high-fives @fgiunchedi and @Eevans

I'm really glad that we finally made it.

..aand we are complete:

Datacenter: eqiad                                                                                 
=================                                                                                 
Status=Up/Down                                                                                    
|/ State=Normal/Leaving/Joining/Moving                                                            
--  Address                   Load       Tokens  Owns    Host ID                               Rack                                                                                                 
UN  restbase1004.eqiad.wmnet  803.24 GB  256     ?       798ff758-8c91-46e0-b85e-dad356c46f20  b  
UN  restbase1008.eqiad.wmnet  478.13 GB  256     ?       e9ab408a-309e-4e93-b145-9ac5f2365523  b  
UN  restbase1005.eqiad.wmnet  771.36 GB  256     ?       325e01e8-debe-45f0-a8c2-93b3baa58968  d  
UN  restbase1006.eqiad.wmnet  709.21 GB  256     ?       2abf437d-a16d-406b-a6de-8d28b7dda808  d  
UN  restbase1001.eqiad.wmnet  721.87 GB  256     ?       c021a198-b7f1-4dc2-94d7-9cb8b8a8df28  a  
UN  restbase1002.eqiad.wmnet  789.45 GB  256     ?       fc041cc8-cd28-4030-b29a-05b9a632cafc  a  
UN  restbase1009.eqiad.wmnet  523.07 GB  256     ?       3ee62592-ef41-445a-863f-b05be9ae5ac8  d  
UN  restbase1003.eqiad.wmnet  666.9 GB   256     ?       88d9ef9f-d81b-466e-babf-6a283b13f648  b  
UN  restbase1007.eqiad.wmnet  414.1 GB   256     ?       c1b5a012-4840-4096-9a71-ce4d3afb0029  a

/me high-fives @fgiunchedi and @Eevans

I'm really glad that we finally made it.

indeed, I've pooled restbase100[789] in pybal for restbase too so they are fully in service now