Page MenuHomePhabricator

test Cassandra 2.1.7
Closed, ResolvedPublic

Description

Cassandra 2.1.6 has been released with a number of bug fixes.

Event Timeline

Eevans raised the priority of this task from to Needs Triage.
Eevans updated the task description. (Show Details)
Eevans subscribed.
chasemp triaged this task as Medium priority.Jun 8 2015, 6:11 PM
chasemp set Security to None.
chasemp subscribed.

What is needed from SRE here? Packaging?

We are using the Apache debs, so no packaging is needed on our end. We can test by directly pulling in those debs from upstream (in beta labs, then staging), and then update the packages in our apt repo once we are ready to make the switch.

see also related T102015 about cassandra 2.1.5 being in apt, I've imported 2.1.6 on carbon.

root@carbon:/srv/wikimedia/conf# reprepro --restrict cassandra checkupdate
aptmethod 'http' seems to have a obsoleted redirect handling which causes
reprepro to request files multiple times. Work-around activated, but better
only use it for targets not redirecting (or upgrade to apt >= 0.9.4 if
that is the http method from apt)!
Calculating packages to get...
Updates needed for 'jessie-wikimedia|thirdparty|source':
'cassandra': '2.1.5' will be upgraded to '2.1.6' (from 'cassandra'):
 files needed: pool/thirdparty/c/cassandra/cassandra_2.1.6.dsc pool/thirdparty/c/cassandra/cassandra_2.1.6.orig.tar.gz pool/thirdparty/c/cassandra/cassandra_2.1.6.diff.gz
Updates needed for 'jessie-wikimedia|thirdparty|amd64':
'cassandra': '2.1.5' will be upgraded to '2.1.6' (from 'cassandra'):
 files needed: pool/thirdparty/c/cassandra/cassandra_2.1.6_all.deb
'cassandra-tools': '2.1.5' will be upgraded to '2.1.6' (from 'cassandra'):
 files needed: pool/thirdparty/c/cassandra/cassandra-tools_2.1.6_all.deb
Updates needed for 'trusty-wikimedia|thirdparty|source':
Updates needed for 'trusty-wikimedia|thirdparty|amd64':
  nothing new for 'precise-wikimedia|thirdparty|source' (use --noskipold to process anyway)
Updates needed for 'precise-wikimedia|thirdparty|amd64':
  nothing new for 'lucid-wikimedia|thirdparty|source' (use --noskipold to process anyway)
  nothing new for 'lucid-wikimedia|thirdparty|amd64' (use --noskipold to process anyway)
root@carbon:/srv/wikimedia/conf# reprepro --restrict cassandra update jessie-wikimedia
aptmethod 'http' seems to have a obsoleted redirect handling which causes
reprepro to request files multiple times. Work-around activated, but better
only use it for targets not redirecting (or upgrade to apt >= 0.9.4 if
that is the http method from apt)!
Calculating packages to get...
Getting packages...
Installing (and possibly deleting) packages...
Exporting indices...
Deleting files no longer referenced...
root@carbon:/srv/wikimedia/conf# reprepro list jessie-wikimedia cassandra
jessie-wikimedia|thirdparty|amd64: cassandra 2.1.6
jessie-wikimedia|thirdparty|source: cassandra 2.1.6
root@carbon:/srv/wikimedia/conf# 
`

I have updated the staging cluster from 2.1.4 (!) to 2.1.6. After upgrading the first node, I tested bootstrapping from the other nodes. I did not see the streaming failure we saw with the 2.1.3 / 2.1.5 pair in production, but the data set was fairly small.

I think it's worth upgrading a single production node to 2.1.6 once we are done with the bootstrapping of the first node. We could try bootstrapping the second of the new nodes with 2.1.6.

We have moved from 2.1.6 to the pre-release 2.1.7 deb: http://people.apache.org/~jake/cassandra_2.1.7_all.deb

So far things are looking good:

  • metrics are working
  • no issues were found during stress testing in staging, and it's looking good so far in prod

Sadly, the metrics all died over the weekend. Restarting the instances brings them back, but clearly their half-life is drastically reduced to 2.1.3.

Otherwise 2.1.7 is looking promising. A bootstrap attempt managed to finish the stream from the first node, but then failed on the second node. The retry was thwarted by what looks like SSD failures (again).

since we're running 2.1.7 in production I've imported it into apt.wikimedia.org

fgiunchedi renamed this task from begin testing Cassandra 2.1.6 to test Cassandra 2.1.7.Jun 30 2015, 5:38 PM
GWicke claimed this task.

We have been running 2.1.7 in production for a while now. It's been mostly working fine, but also didn't resolve bootstrap issues we are seeing with large (>3T data) instances. We have seen a higher probability for metrics reporting to stop with large storage load than in 2.1.3, but a somwhat lower probability than in 2.1.6.