Move deinitialization from c-foreach-restart to Cassandra's systemd unit
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Eevans
	Apr 14 2023, 4:01 PM

Description

As a work-around for T327954, de-initialization of Cassandra needs to occur in the systemd unit, so that it can be applied uniformly to both host shutdown/reboots, and service restarts. Since the deinit sequence will conflict with what is being done in c-foreach-restart, those commands will need to be removed (prior to the unit changes).

Example systemd unit (untested):

[Unit]
Description=distributed storage system for structured data
After=network.target
# On bootstrap / provisioning, don't attempt to start all instances,
# wait instead for the guard file to exist, see also T214166
ConditionPathExists=/etc/cassandra-a/service-enabled

[Service]
User=cassandra
PIDFile=/var/run/cassandra/cassandra-a.pid
LimitNOFILE=100000
LimitMEMLOCK=infinity
Environment="CASSANDRA_INCLUDE=/etc/cassandra.in.sh"
Environment="CASSANDRA_CONF=/etc/cassandra-a"
Environment="CASSANDRA_INSTANCE=sessionstore1001-a"
ExecStart=/usr/sbin/cassandra -p /var/run/cassandra/cassandra-a.pid

# Deinit on shutdown (see: https://phabricator.wikimedia.org/T327954)
ExecStop=-/usr/local/bin/nodetool-a disablethrift
ExecStop=-/usr/local/bin/nodetool-a disablebinary
ExecStop=-/usr/local/bin/nodetool-a disablegossip
ExecStop=-/usr/local/bin/nodetool-a drain
ExecStop=/usr/local/bin/nodetool-a stopdaemon

Details

Subject	Repo	Branch	Lines +/-
sessionstore: disable sessionstore1001 native transport	operations/puppet	production	+1 -0
cassandra: add de-init to systemd unit file	operations/puppet	production	+14 -1
Breakfixes from the 2to3 conversion	operations/debs/cassandra-tools-wmf	master	+5 -5
Do not de-init node prior to restart	operations/debs/cassandra-tools-wmf	master	+2 -8
Missing aqs cluster secrets	labs/private	master	+493 -1

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		Eevans	T327954 session storage: dissonant cluster status after reboot (was: 'cannot achieve consistency level' errors)
		Resolved		Eevans	T334754 Move deinitialization from c-foreach-restart to Cassandra's systemd unit

Event Timeline

Eevans triaged this task as High priority.Apr 14 2023, 4:01 PM

Eevans created this task.

Eevans updated the task description. (Show Details)Apr 18 2023, 12:38 AM

I tested the unit as shown in the description, on cassandra-dev2001, and it seems to work well.

INFO  [RMI TCP Connection(4)-127.0.0.1] 2023-04-18 00:40:24,609 Server.java:179 - Stop listening for CQL clients
WARN  [RMI TCP Connection(6)-127.0.0.1] 2023-04-18 00:40:25,515 StorageService.java:360 - Stopping gossip by operator request
INFO  [RMI TCP Connection(6)-127.0.0.1] 2023-04-18 00:40:25,516 Gossiper.java:1720 - Announcing shutdown
INFO  [RMI TCP Connection(6)-127.0.0.1] 2023-04-18 00:40:25,520 StorageService.java:2484 - Node /10.192.16.14 state jump to shutdown
INFO  [RMI TCP Connection(8)-127.0.0.1] 2023-04-18 00:40:28,453 StorageService.java:1568 - DRAINING: starting drain process
INFO  [RMI TCP Connection(8)-127.0.0.1] 2023-04-18 00:40:28,456 HintsService.java:210 - Paused hints dispatch
INFO  [RMI TCP Connection(8)-127.0.0.1] 2023-04-18 00:40:28,457 Gossiper.java:1720 - Announcing shutdown
INFO  [RMI TCP Connection(8)-127.0.0.1] 2023-04-18 00:40:28,460 StorageService.java:2484 - Node /10.192.16.14 state jump to shutdown
INFO  [RMI TCP Connection(8)-127.0.0.1] 2023-04-18 00:40:30,464 MessagingService.java:985 - Waiting for messaging service to quiesce
INFO  [ACCEPT-/10.192.16.14] 2023-04-18 00:40:30,465 MessagingService.java:1346 - MessagingService has terminated the accept() thread
INFO  [ACCEPT-/10.192.16.14] 2023-04-18 00:40:30,465 MessagingService.java:1346 - MessagingService has terminated the accept() thread
INFO  [RMI TCP Connection(8)-127.0.0.1] 2023-04-18 00:40:30,758 HintsService.java:210 - Paused hints dispatch
INFO  [RMI TCP Connection(8)-127.0.0.1] 2023-04-18 00:40:30,769 StorageService.java:1568 - DRAINED
INFO  [RMI TCP Connection(10)-127.0.0.1] 2023-04-18 00:40:32,027 CassandraDaemon.java:576 - Cassandra shutting down...

Change 909403 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/debs/cassandra-tools-wmf@master] Do not de-init node prior to restart

https://gerrit.wikimedia.org/r/909403

gerritbot added a project: Patch-For-Review.Apr 18 2023, 1:03 AM

Silly question: is this the sort of change we should be getting included upstream?

In T334754#8788532, @MatthewVernon wrote:

Silly question: is this the sort of change we should be getting included upstream?

None of this should be technically necessary; we're doing this to work around some sort of race condition on shutdown (obviously we do need to get that race condition fixed upstream though).

Change 909737 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] cassandra: add de-init to systemd unit file

https://gerrit.wikimedia.org/r/909737

Change 909997 had a related patch set uploaded (by Eevans; author: Eevans):

[labs/private@master] Missing aqs cluster secrets

https://gerrit.wikimedia.org/r/909997

Change 909997 merged by Eevans:

[labs/private@master] Missing aqs cluster secrets

https://gerrit.wikimedia.org/r/909997

Eevans mentioned this in rLPRIda92a7f237e1: Missing aqs cluster secrets.Apr 19 2023, 2:12 PM

Change 909403 merged by Eevans:

[operations/debs/cassandra-tools-wmf@master] Do not de-init node prior to restart

https://gerrit.wikimedia.org/r/909403

Eevans mentioned this in rODCTW13c83576744a: Do not de-init node prior to restart.Apr 20 2023, 3:30 PM

Eevans mentioned this in rODCTWc9acf6242f8c: Release Debian package 1.2.0-1.

Change 910524 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/debs/cassandra-tools-wmf@master] Breakfixes from the 2to3 conversion

https://gerrit.wikimedia.org/r/910524

Change 910524 merged by Eevans:

[operations/debs/cassandra-tools-wmf@master] Breakfixes from the 2to3 conversion

https://gerrit.wikimedia.org/r/910524

Eevans mentioned this in rODCTW7339894a5761: Breakfixes from the 2to3 conversion.Apr 20 2023, 4:04 PM

Change 909737 merged by Eevans:

[operations/puppet@production] cassandra: add de-init to systemd unit file

https://gerrit.wikimedia.org/r/909737

Maintenance_bot removed a project: Patch-For-Review.Apr 20 2023, 4:29 PM

Mentioned in SAL (#wikimedia-operations) [2023-04-20T16:31:36Z] <eevans@cumin1001> START - Cookbook sre.cassandra.roll-restart for nodes matching aqs10[10,13,16,19].eqiad.wmnet: Testing rolling restart (rack1) — T334754 - eevans@cumin1001

Eevans mentioned this in rODCTW9bef6f20819d: Release Debian package 1.2.1-1.Apr 20 2023, 4:39 PM

Mentioned in SAL (#wikimedia-operations) [2023-04-20T17:02:42Z] <eevans@cumin1001> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs10[10,13,16,19].eqiad.wmnet: Testing rolling restart (rack1) — T334754 - eevans@cumin1001

Change 910584 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] sessionstore: disable sessionstore1001 native transport

https://gerrit.wikimedia.org/r/910584

gerritbot added a project: Patch-For-Review.Apr 20 2023, 8:15 PM

Change 910584 merged by Eevans:

[operations/puppet@production] sessionstore: disable sessionstore1001 native transport

https://gerrit.wikimedia.org/r/910584

Maintenance_bot removed a project: Patch-For-Review.Apr 20 2023, 8:30 PM

For reasons I do not understand, this deinitialization sequence is not working (reliably), despite having done so well in ad-hoc testing. The log output makes it pretty clear that all of the commands are executing as intended, and that it does result in two separate gossip shutdown messages, and yet...

Cassandra system logs (reboot #1)

...
INFO  [RMI TCP Connection(6)-127.0.0.1] 2023-04-20 20:22:30,930 Server.java:179 - Stop listening for CQL clients
WARN  [RMI TCP Connection(8)-127.0.0.1] 2023-04-20 20:22:32,090 StorageService.java:360 - Stopping gossip by operator request
INFO  [RMI TCP Connection(8)-127.0.0.1] 2023-04-20 20:22:32,091 Gossiper.java:1720 - Announcing shutdown
INFO  [RMI TCP Connection(8)-127.0.0.1] 2023-04-20 20:22:32,094 StorageService.java:2484 - Node /10.64.0.144 state jump to shutdown
INFO  [RMI TCP Connection(10)-127.0.0.1] 2023-04-20 20:22:35,192 StorageService.java:1568 - DRAINING: starting drain process
INFO  [RMI TCP Connection(10)-127.0.0.1] 2023-04-20 20:22:35,196 HintsService.java:210 - Paused hints dispatch
INFO  [RMI TCP Connection(10)-127.0.0.1] 2023-04-20 20:22:35,197 Gossiper.java:1720 - Announcing shutdown
INFO  [RMI TCP Connection(10)-127.0.0.1] 2023-04-20 20:22:35,201 StorageService.java:2484 - Node /10.64.0.144 state jump to shutdown
INFO  [RMI TCP Connection(10)-127.0.0.1] 2023-04-20 20:22:37,204 MessagingService.java:985 - Waiting for messaging service to quiesce
INFO  [ACCEPT-/10.64.0.144] 2023-04-20 20:22:37,206 MessagingService.java:1346 - MessagingService has terminated the accept() thread
INFO  [ACCEPT-/10.64.0.144] 2023-04-20 20:22:37,207 MessagingService.java:1346 - MessagingService has terminated the accept() thread
INFO  [RMI TCP Connection(10)-127.0.0.1] 2023-04-20 20:22:37,577 HintsService.java:210 - Paused hints dispatch
INFO  [RMI TCP Connection(10)-127.0.0.1] 2023-04-20 20:22:37,592 StorageService.java:1568 - DRAINED
INFO  [RMI TCP Connection(12)-127.0.0.1] 2023-04-20 20:22:38,999 CassandraDaemon.java:576 - Cassandra shutting down...
...

sessionstore1001 status after reboot (split-brain)

Datacenter: codfw
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address                         Load       Tokens       Owns (effective)  Host ID                               Rack
DN  sessionstore2003-a.codfw.wmnet  27.34 GiB  256          100.0%            8d062d5f-b31f-4f58-bde2-ec0e33a489fd  d
DN  sessionstore2002-a.codfw.wmnet  17.11 GiB  256          100.0%            31a83e9e-63a6-4f51-99fb-0040b74cb51d  c
DN  sessionstore2001-a.codfw.wmnet  28.18 GiB  256          100.0%            b461130b-258f-4a93-81e5-63bd31d12406  b
Datacenter: eqiad
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address                         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  sessionstore1001-a.eqiad.wmnet  35.07 GiB  256          100.0%            59918268-6b49-45bb-a29d-9ede6385bdce  a
DN  sessionstore1003-a.eqiad.wmnet  34.28 GiB  256          100.0%            bcdedfef-87e6-4f54-bc33-6e149da90115  d
DN  sessionstore1002-a.eqiad.wmnet  25.09 GiB  256          100.0%            5af82e95-c935-4492-9d8c-04542c0c6ede  c

Cassandra system logs (reboot #2)

INFO  [RMI TCP Connection(552)-127.0.0.1] 2023-04-20 20:47:48,517 Server.java:179 - Stop listening for CQL clients
WARN  [RMI TCP Connection(554)-127.0.0.1] 2023-04-20 20:47:49,494 StorageService.java:360 - Stopping gossip by operator request
INFO  [RMI TCP Connection(554)-127.0.0.1] 2023-04-20 20:47:49,495 Gossiper.java:1720 - Announcing shutdown
INFO  [RMI TCP Connection(554)-127.0.0.1] 2023-04-20 20:47:49,501 StorageService.java:2484 - Node /10.64.0.144 state jump to shutdown
INFO  [RMI TCP Connection(556)-127.0.0.1] 2023-04-20 20:47:52,499 StorageService.java:1568 - DRAINING: starting drain process
INFO  [RMI TCP Connection(556)-127.0.0.1] 2023-04-20 20:47:52,502 HintsService.java:210 - Paused hints dispatch
INFO  [RMI TCP Connection(556)-127.0.0.1] 2023-04-20 20:47:52,504 Gossiper.java:1720 - Announcing shutdown
INFO  [RMI TCP Connection(556)-127.0.0.1] 2023-04-20 20:47:52,507 StorageService.java:2484 - Node /10.64.0.144 state jump to shutdown
INFO  [RMI TCP Connection(556)-127.0.0.1] 2023-04-20 20:47:54,510 MessagingService.java:985 - Waiting for messaging service to quiesce
INFO  [ACCEPT-/10.64.0.144] 2023-04-20 20:47:54,512 MessagingService.java:1346 - MessagingService has terminated the accept() thread
INFO  [ACCEPT-/10.64.0.144] 2023-04-20 20:47:54,585 MessagingService.java:1346 - MessagingService has terminated the accept() thread
ERROR [MessagingService-Incoming-/10.192.32.15] 2023-04-20 20:47:54,729 CassandraDaemon.java:244 - Exception in thread Thread[MessagingService-Incoming-/10.192.32.15,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down
        at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:58)
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
        at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.execute(DebuggableThreadPoolExecutor.java:148)
        at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:1035)
        at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:215)
        at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:195)
        at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:98)
INFO  [RMI TCP Connection(556)-127.0.0.1] 2023-04-20 20:47:56,356 HintsService.java:210 - Paused hints dispatch
INFO  [RMI TCP Connection(556)-127.0.0.1] 2023-04-20 20:47:56,371 StorageService.java:1568 - DRAINED
INFO  [RMI TCP Connection(558)-127.0.0.1] 2023-04-20 20:47:57,765 CassandraDaemon.java:576 - Cassandra shutting down...

sessionstore1001 status after reboot (good)

Datacenter: codfw
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address                         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  sessionstore2003-a.codfw.wmnet  27.34 GiB  256          100.0%            8d062d5f-b31f-4f58-bde2-ec0e33a489fd  d
UN  sessionstore2002-a.codfw.wmnet  17.3 GiB   256          100.0%            31a83e9e-63a6-4f51-99fb-0040b74cb51d  c
UN  sessionstore2001-a.codfw.wmnet  28.37 GiB  256          100.0%            b461130b-258f-4a93-81e5-63bd31d12406  b
Datacenter: eqiad
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address                         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  sessionstore1001-a.eqiad.wmnet  35.13 GiB  256          100.0%            59918268-6b49-45bb-a29d-9ede6385bdce  a
UN  sessionstore1003-a.eqiad.wmnet  34.28 GiB  256          100.0%            bcdedfef-87e6-4f54-bc33-6e149da90115  d
UN  sessionstore1002-a.eqiad.wmnet  25.09 GiB  256          100.0%            5af82e95-c935-4492-9d8c-04542c0c6ede  c

Cassandra system logs (reboot #3)

INFO  [RMI TCP Connection(196)-127.0.0.1] 2023-04-20 20:57:29,979 Server.java:179 - Stop listening for CQL clients
WARN  [RMI TCP Connection(198)-127.0.0.1] 2023-04-20 20:57:30,942 StorageService.java:360 - Stopping gossip by operator request
INFO  [RMI TCP Connection(198)-127.0.0.1] 2023-04-20 20:57:30,942 Gossiper.java:1720 - Announcing shutdown
INFO  [RMI TCP Connection(198)-127.0.0.1] 2023-04-20 20:57:30,944 StorageService.java:2484 - Node /10.64.0.144 state jump to shutdown
INFO  [RMI TCP Connection(200)-127.0.0.1] 2023-04-20 20:57:33,909 StorageService.java:1568 - DRAINING: starting drain process
INFO  [RMI TCP Connection(200)-127.0.0.1] 2023-04-20 20:57:33,912 HintsService.java:210 - Paused hints dispatch
INFO  [RMI TCP Connection(200)-127.0.0.1] 2023-04-20 20:57:33,914 Gossiper.java:1720 - Announcing shutdown
INFO  [RMI TCP Connection(200)-127.0.0.1] 2023-04-20 20:57:33,917 StorageService.java:2484 - Node /10.64.0.144 state jump to shutdown
INFO  [RMI TCP Connection(200)-127.0.0.1] 2023-04-20 20:57:35,921 MessagingService.java:985 - Waiting for messaging service to quiesce
INFO  [ACCEPT-/10.64.0.144] 2023-04-20 20:57:35,923 MessagingService.java:1346 - MessagingService has terminated the accept() thread
INFO  [ACCEPT-/10.64.0.144] 2023-04-20 20:57:35,924 MessagingService.java:1346 - MessagingService has terminated the accept() thread
INFO  [RMI TCP Connection(200)-127.0.0.1] 2023-04-20 20:57:37,215 HintsService.java:210 - Paused hints dispatch
INFO  [RMI TCP Connection(200)-127.0.0.1] 2023-04-20 20:57:37,228 StorageService.java:1568 - DRAINED
INFO  [RMI TCP Connection(202)-127.0.0.1] 2023-04-20 20:57:38,536 CassandraDaemon.java:576 - Cassandra shutting down...

sessionstore1001 status after reboot (split-brain)

Datacenter: codfw
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address                         Load       Tokens       Owns (effective)  Host ID                               Rack
DN  sessionstore2003-a.codfw.wmnet  27.53 GiB  256          100.0%            8d062d5f-b31f-4f58-bde2-ec0e33a489fd  d
DN  sessionstore2002-a.codfw.wmnet  17.3 GiB   256          100.0%            31a83e9e-63a6-4f51-99fb-0040b74cb51d  c
DN  sessionstore2001-a.codfw.wmnet  28.37 GiB  256          100.0%            b461130b-258f-4a93-81e5-63bd31d12406  b
Datacenter: eqiad
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address                         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  sessionstore1001-a.eqiad.wmnet  35.14 GiB  256          100.0%            59918268-6b49-45bb-a29d-9ede6385bdce  a
DN  sessionstore1003-a.eqiad.wmnet  34.28 GiB  256          100.0%            bcdedfef-87e6-4f54-bc33-6e149da90115  d
DN  sessionstore1002-a.eqiad.wmnet  25.09 GiB  256          100.0%            5af82e95-c935-4492-9d8c-04542c0c6ede  c

From sessionstore1002, despite the shutdown, you can see this node immediately reconnect (twice).

Cassandra (sessionstore1002) system logs (reboot #3)

INFO  [GossipStage:1] 2023-04-20 20:57:30,938 Gossiper.java:1177 - InetAddress /10.64.0.144 is now DOWN
INFO  [HintsDispatcher:53] 2023-04-20 20:57:30,942 HintsDispatchExecutor.java:289 - Finished hinted handoff of file 59918268-6b49-45bb-a29d-9ede6385bdce-1682023396569-1.hints to endpoint /10.64.0.144: 59918268-6b49-45bb-a29d-9ede6385bdce, partially
INFO  [GossipStage:1] 2023-04-20 20:57:30,945 StorageService.java:2484 - Node /10.64.0.144 state jump to shutdown
INFO  [HANDSHAKE-sessionstore1001-a.eqiad.wmnet/10.64.0.144] 2023-04-20 20:57:30,969 OutboundTcpConnection.java:561 - Handshaking version with sessionstore1001-a.eqiad.wmnet/10.64.0.144
INFO  [HANDSHAKE-sessionstore1001-a.eqiad.wmnet/10.64.0.144] 2023-04-20 20:57:31,279 OutboundTcpConnection.java:561 - Handshaking version with sessionstore1001-a.eqiad.wmnet/10.64.0.144
INFO  [GossipStage:1] 2023-04-20 20:57:33,913 Gossiper.java:1177 - InetAddress /10.64.0.144 is now DOWN
INFO  [GossipStage:1] 2023-04-20 20:57:33,917 StorageService.java:2484 - Node /10.64.0.144 state jump to shutdown
INFO  [HANDSHAKE-sessionstore1001-a.eqiad.wmnet/10.64.0.144] 2023-04-20 20:57:34,280 OutboundTcpConnection.java:561 - Handshaking version with sessionstore1001-a.eqiad.wmnet/10.64.0.144
INFO  [HANDSHAKE-sessionstore1001-a.eqiad.wmnet/10.64.0.144] 2023-04-20 20:57:35,276 OutboundTcpConnection.java:561 - Handshaking version with sessionstore1001-a.eqiad.wmnet/10.64.0.144

😭

The underlying condition (the race condition on shutdown) that we were attempting to work around no longer exists in Cassandra 4.1.1 (which we're now running everywhere).

That said, the deinit commands were successfully relocated from c-foreach-restart to the systemd unit (the raison d'etre for this ticket), so either way we can close this issue now.

Move deinitialization from c-foreach-restart to Cassandra's systemd unitClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Move deinitialization from c-foreach-restart to Cassandra's systemd unit
Closed, ResolvedPublic
Actions

Related Objects
Search...