Cassandra upgrades in staging attempted to start root instance
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	Eevans
	Feb 18 2016, 8:56 PM

Description

During the staging upgrade to 2.1.13, the packaging post-install invoked the sysv init script to start the root instance (the one based out of /var/lib/cassandra and /etc/cassandra). On several of the nodes, it actually succeeded. This could be Very Bad if it were to happen in production, particularly if it went unnoticed and the aberrant instance were to bootstrap.

Of the nodes that failed (meaning, where the aberrant instance did not start up)...

restbase2001-test.codfw.wmnet didn't start one due to a missing cassandra.yaml:

eevans@restbase-test2001:~$ bash -x /etc/init.d/cassandra status
+ DESC=Cassandra
+ NAME=cassandra
+ PIDFILE=/var/run/cassandra/cassandra.pid
+ SCRIPTNAME=/etc/init.d/cassandra
+ CONFDIR=/etc/cassandra
+ WAIT_FOR_START=10
+ CASSANDRA_HOME=/usr/share/cassandra
+ FD_LIMIT=100000
+ '[' -e /usr/share/cassandra/apache-cassandra.jar ']'
+ '[' -e /etc/cassandra/cassandra.yaml ']'
+ exit 0

3 others failed only because the data under /var/lib/cassandra predates the cluster rename from "Test Cluster" to "services-test":

ERROR [main] 2016-02-18 18:04:23,351 CassandraDaemon.java:294 - Fatal exception during initialization
org.apache.cassandra.exceptions.ConfigurationException: Saved cluster name Test Cluster != configured name services-test
        at org.apache.cassandra.db.SystemKeyspace.checkHealth(SystemKeyspace.java:613) ~[apache-cassandra-2.1.13.jar:2.1.13]
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:290) [apache-cassandra-2.1.13.jar:2.1.13]
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:564) [apache-cassandra-2.1.13.jar:2.1.13]
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:653) [apache-cassandra-2.1.13.jar:2.1.13]

What is not clear to me, is why this hasn't been an issue before.

And obviously, going forward we need a concrete (non-accidental) way of disabling these non-root instances.

Details

	Subject	Repo	Branch	Lines +/-
	disable package-installed initscript	operations/puppet	production	+15 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Invalid	None	T93751 RFC: Next steps for long-term revision storage -- space needs, storage hierarchies
Declined	Eevans	T93496 Improve revision compression in Cassandra / Brotli or LZMA support
Declined	Eevans	T125904 Brotli compression for Cassandra
Declined	None	T120171 RFC: Differentiate storage strategies for archival storage vs. hot current data
Declined	None	T122028 RFC: Chunked storage algorithms for archival data vs. large-window brotli compression
Declined	Eevans	T125906 Evaluate Brotli compression for Cassandra
Invalid	None	T126582 Log input from cassandra caused logstash process to crash repeatedly
Resolved	• GWicke	T111746 [future] Keep an eye on materialized views in Cassandra 3.0
Resolved	Eevans	T126629 Cassandra 2.2.6
Resolved	None	T127365 Cassandra upgrades in staging attempted to start root instance

Event Timeline

Eevans created this task.Feb 18 2016, 8:56 PM

Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptFeb 18 2016, 8:56 PM

Eevans mentioned this in T126629: Cassandra 2.2.6.Feb 18 2016, 8:58 PM

Eevans added a parent task: T126629: Cassandra 2.2.6.

Change 272612 had a related patch set uploaded (by Eevans):
disable package-installed initscript

https://gerrit.wikimedia.org/r/272612

gerritbot added a project: Patch-For-Review.Feb 22 2016, 10:10 PM

Feels like a bit of a kludge, but I submitted https://gerrit.wikimedia.org/r/#/c/272612/, which overwrites /etc/init.d/cassandra with a no-op script. TTBMK, we're now fully using systemd units (put in place as part of the multi-instance changeset), and I can't think of any scenario where it would be acceptable to run the package-installed initscript.

Change 272612 merged by Filippo Giunchedi:
disable package-installed initscript

https://gerrit.wikimedia.org/r/272612

This has been deployed to all nodes; Resolving.

Cassandra upgrades in staging attempted to start root instanceClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Cassandra upgrades in staging attempted to start root instance
Closed, ResolvedPublic
Actions

Related Objects
Search...