Description
Related Objects
- Mentioned In
- T102015: put new restbase servers in service
- Mentioned Here
- T102015: put new restbase servers in service
Event Timeline
We are using the Apache debs, so no packaging is needed on our end. We can test by directly pulling in those debs from upstream (in beta labs, then staging), and then update the packages in our apt repo once we are ready to make the switch.
see also related T102015 about cassandra 2.1.5 being in apt, I've imported 2.1.6 on carbon.
root@carbon:/srv/wikimedia/conf# reprepro --restrict cassandra checkupdate aptmethod 'http' seems to have a obsoleted redirect handling which causes reprepro to request files multiple times. Work-around activated, but better only use it for targets not redirecting (or upgrade to apt >= 0.9.4 if that is the http method from apt)! Calculating packages to get... Updates needed for 'jessie-wikimedia|thirdparty|source': 'cassandra': '2.1.5' will be upgraded to '2.1.6' (from 'cassandra'): files needed: pool/thirdparty/c/cassandra/cassandra_2.1.6.dsc pool/thirdparty/c/cassandra/cassandra_2.1.6.orig.tar.gz pool/thirdparty/c/cassandra/cassandra_2.1.6.diff.gz Updates needed for 'jessie-wikimedia|thirdparty|amd64': 'cassandra': '2.1.5' will be upgraded to '2.1.6' (from 'cassandra'): files needed: pool/thirdparty/c/cassandra/cassandra_2.1.6_all.deb 'cassandra-tools': '2.1.5' will be upgraded to '2.1.6' (from 'cassandra'): files needed: pool/thirdparty/c/cassandra/cassandra-tools_2.1.6_all.deb Updates needed for 'trusty-wikimedia|thirdparty|source': Updates needed for 'trusty-wikimedia|thirdparty|amd64': nothing new for 'precise-wikimedia|thirdparty|source' (use --noskipold to process anyway) Updates needed for 'precise-wikimedia|thirdparty|amd64': nothing new for 'lucid-wikimedia|thirdparty|source' (use --noskipold to process anyway) nothing new for 'lucid-wikimedia|thirdparty|amd64' (use --noskipold to process anyway) root@carbon:/srv/wikimedia/conf# reprepro --restrict cassandra update jessie-wikimedia aptmethod 'http' seems to have a obsoleted redirect handling which causes reprepro to request files multiple times. Work-around activated, but better only use it for targets not redirecting (or upgrade to apt >= 0.9.4 if that is the http method from apt)! Calculating packages to get... Getting packages... Installing (and possibly deleting) packages... Exporting indices... Deleting files no longer referenced... root@carbon:/srv/wikimedia/conf# reprepro list jessie-wikimedia cassandra jessie-wikimedia|thirdparty|amd64: cassandra 2.1.6 jessie-wikimedia|thirdparty|source: cassandra 2.1.6 root@carbon:/srv/wikimedia/conf# `
I have updated the staging cluster from 2.1.4 (!) to 2.1.6. After upgrading the first node, I tested bootstrapping from the other nodes. I did not see the streaming failure we saw with the 2.1.3 / 2.1.5 pair in production, but the data set was fairly small.
I think it's worth upgrading a single production node to 2.1.6 once we are done with the bootstrapping of the first node. We could try bootstrapping the second of the new nodes with 2.1.6.
We have moved from 2.1.6 to the pre-release 2.1.7 deb: http://people.apache.org/~jake/cassandra_2.1.7_all.deb
So far things are looking good:
- metrics are working
- no issues were found during stress testing in staging, and it's looking good so far in prod
Sadly, the metrics all died over the weekend. Restarting the instances brings them back, but clearly their half-life is drastically reduced to 2.1.3.
Otherwise 2.1.7 is looking promising. A bootstrap attempt managed to finish the stream from the first node, but then failed on the second node. The retry was thwarted by what looks like SSD failures (again).
We have been running 2.1.7 in production for a while now. It's been mostly working fine, but also didn't resolve bootstrap issues we are seeing with large (>3T data) instances. We have seen a higher probability for metrics reporting to stop with large storage load than in 2.1.3, but a somwhat lower probability than in 2.1.6.