Page MenuHomePhabricator

upgrade RESTBase cluster to Cassandra 2.1.8
Closed, ResolvedPublic

Description

Cassandra 2.1.8 was released earlier this month, with a number of new bug fixes; No major regressions have since been reported.

One important new feature added is the logging of partitions that exceed a configurable threshold during routine compaction. This should come in handy in determining the source of our largest partitions (see: T94121).

Event Timeline

Eevans raised the priority of this task from to Medium.
Eevans updated the task description. (Show Details)
Eevans added a project: RESTBase-Cassandra.
Eevans added a subscriber: Eevans.

proposed plan:

  • upgrade cassandra to 2.1.8 via deb upgrades on the staging cluster
  • benchmark/stresstest
  • upload package to apt.w.o and upgrade production cluster

proposed plan:

  • upgrade cassandra to 2.1.8 via deb upgrades on the staging cluster
  • benchmark/stresstest
  • upload package to apt.w.o and upgrade production cluster

+1

the test cluster has been upgraded to cassandra 2.1.8 (+latest openjdk in T104888)

upgrade plan, starting today:

  • upgrade row A machines, (restbase100[127]) with nodetool drain && sudo apt-get -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install cassandra
  • check regressions, http://grafana.wikimedia.org/#/dashboard/db/cassandra-restbase-eqiad
  • if no regressions proceed with row B (restbase[348]) on wed
  • if no regressions proceed with row C (restbase[569]) on thurs

edit: nodetool flush vs nodetool drain

root@carbon:~# reprepro --noskipold --restrict cassandra update
aptmethod 'http' seems to have a obsoleted redirect handling which causes
reprepro to request files multiple times. Work-around activated, but better
only use it for targets not redirecting (or upgrade to apt >= 0.9.4 if
that is the http method from apt)!
Calculating packages to get...
Getting packages...
Installing (and possibly deleting) packages...
Exporting indices...
Deleting files no longer referenced...
root@carbon:~# reprepro list jessie-wikimedia cassandra
jessie-wikimedia|thirdparty|amd64: cassandra 2.1.8
jessie-wikimedia|thirdparty|source: cassandra 2.1.8
root@carbon:~#

cosmetic issue output contains %s spotted while looking at the logs, benign

restbase1002:~$ grep %s /var/log/cassandra/system.log
INFO  [MemtableFlushWriter:1] 2015-08-11 10:31:36,380 Memtable.java:393 - Completed flushing %s; nothing needed to be retained.  Commitlog position was /var/lib/cassandra/data/system/batchlog-0290003c977e397cac3efdfdc01d626b/system-batchlog-tmp-ka-1-Data.db
INFO  [MemtableFlushWriter:1] 2015-08-11 10:32:07,021 Memtable.java:393 - Completed flushing %s; nothing needed to be retained.  Commitlog position was /var/lib/cassandra/data/system/batchlog-0290003c977e397cac3efdfdc01d626b/system-batchlog-tmp-ka-2-Data.db
INFO  [MemtableFlushWriter:2] 2015-08-11 10:33:07,032 Memtable.java:393 - Completed flushing %s; nothing needed to be retained.  Commitlog position was /var/lib/cassandra/data/system/batchlog-0290003c977e397cac3efdfdc01d626b/system-batchlog-tmp-ka-3-Data.db
INFO  [MemtableFlushWriter:1] 2015-08-11 10:34:07,041 Memtable.java:393 - Completed flushing %s; nothing needed to be retained.  Commitlog position was /var/lib/cassandra/data/system/batchlog-0290003c977e397cac3efdfdc01d626b/system-batchlog-tmp-ka-4-Data.db

upgrade plan, starting today:

  • upgrade row A machines, (restbase100[127]) with nodetool flush && sudo apt-get -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install cassandra
  • check regressions, http://grafana.wikimedia.org/#/dashboard/db/cassandra-restbase-eqiad
  • if no regressions proceed with row B (restbase[348]) on wed
  • if no regressions proceed with row C (restbase[569]) on thurs

very sensible; +1

cosmetic issue output contains %s spotted while looking at the logs, benign

restbase1002:~$ grep %s /var/log/cassandra/system.log
INFO  [MemtableFlushWriter:1] 2015-08-11 10:31:36,380 Memtable.java:393 - Completed flushing %s; nothing needed to be retained.  Commitlog position was /var/lib/cassandra/data/system/batchlog-0290003c977e397cac3efdfdc01d626b/system-batchlog-tmp-ka-1-Data.db
INFO  [MemtableFlushWriter:1] 2015-08-11 10:32:07,021 Memtable.java:393 - Completed flushing %s; nothing needed to be retained.  Commitlog position was /var/lib/cassandra/data/system/batchlog-0290003c977e397cac3efdfdc01d626b/system-batchlog-tmp-ka-2-Data.db
INFO  [MemtableFlushWriter:2] 2015-08-11 10:33:07,032 Memtable.java:393 - Completed flushing %s; nothing needed to be retained.  Commitlog position was /var/lib/cassandra/data/system/batchlog-0290003c977e397cac3efdfdc01d626b/system-batchlog-tmp-ka-3-Data.db
INFO  [MemtableFlushWriter:1] 2015-08-11 10:34:07,041 Memtable.java:393 - Completed flushing %s; nothing needed to be retained.  Commitlog position was /var/lib/cassandra/data/system/batchlog-0290003c977e397cac3efdfdc01d626b/system-batchlog-tmp-ka-4-Data.db

I'll follow up with upstream on this.

upgrade plan, starting today:

  • upgrade row A machines, (restbase100[127]) with nodetool flush && sudo apt-get -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install cassandra
  • check regressions, http://grafana.wikimedia.org/#/dashboard/db/cassandra-restbase-eqiad
  • if no regressions proceed with row B (restbase[348]) on wed
  • if no regressions proceed with row C (restbase[569]) on thurs

On minor nit here, you might consider using drain instead of flush here:

$ nodetool help drain
NAME
        nodetool drain - Drain the node (stop accepting writes and flush all
        column families)

SYNOPSIS
        ...

OPTIONS
   ...

cosmetic issue output contains %s spotted while looking at the logs, benign

restbase1002:~$ grep %s /var/log/cassandra/system.log
INFO  [MemtableFlushWriter:1] 2015-08-11 10:31:36,380 Memtable.java:393 - Completed flushing %s; nothing needed to be retained.  Commitlog position was /var/lib/cassandra/data/system/batchlog-0290003c977e397cac3efdfdc01d626b/system-batchlog-tmp-ka-1-Data.db
INFO  [MemtableFlushWriter:1] 2015-08-11 10:32:07,021 Memtable.java:393 - Completed flushing %s; nothing needed to be retained.  Commitlog position was /var/lib/cassandra/data/system/batchlog-0290003c977e397cac3efdfdc01d626b/system-batchlog-tmp-ka-2-Data.db
INFO  [MemtableFlushWriter:2] 2015-08-11 10:33:07,032 Memtable.java:393 - Completed flushing %s; nothing needed to be retained.  Commitlog position was /var/lib/cassandra/data/system/batchlog-0290003c977e397cac3efdfdc01d626b/system-batchlog-tmp-ka-3-Data.db
INFO  [MemtableFlushWriter:1] 2015-08-11 10:34:07,041 Memtable.java:393 - Completed flushing %s; nothing needed to be retained.  Commitlog position was /var/lib/cassandra/data/system/batchlog-0290003c977e397cac3efdfdc01d626b/system-batchlog-tmp-ka-4-Data.db

I'll follow up with upstream on this.

This has already been reported, and will be fixed in 2.1.9

upgrade plan, starting today:

  • upgrade row A machines, (restbase100[127]) with nodetool flush && sudo apt-get -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install cassandra
  • check regressions, http://grafana.wikimedia.org/#/dashboard/db/cassandra-restbase-eqiad
  • if no regressions proceed with row B (restbase[348]) on wed
  • if no regressions proceed with row C (restbase[569]) on thurs

On minor nit here, you might consider using drain instead of flush here:

looks good, thanks! tomorrow I'll complete the upgrade