Page MenuHomePhabricator

Move deinitialization from c-foreach-restart to Cassandra's systemd unit
Closed, ResolvedPublic

Description

As a work-around for T327954, de-initialization of Cassandra needs to occur in the systemd unit, so that it can be applied uniformly to both host shutdown/reboots, and service restarts. Since the deinit sequence will conflict with what is being done in c-foreach-restart, those commands will need to be removed (prior to the unit changes).

Example systemd unit (untested):

[Unit]
Description=distributed storage system for structured data
After=network.target
# On bootstrap / provisioning, don't attempt to start all instances,
# wait instead for the guard file to exist, see also T214166
ConditionPathExists=/etc/cassandra-a/service-enabled

[Service]
User=cassandra
PIDFile=/var/run/cassandra/cassandra-a.pid
LimitNOFILE=100000
LimitMEMLOCK=infinity
Environment="CASSANDRA_INCLUDE=/etc/cassandra.in.sh"
Environment="CASSANDRA_CONF=/etc/cassandra-a"
Environment="CASSANDRA_INSTANCE=sessionstore1001-a"
ExecStart=/usr/sbin/cassandra -p /var/run/cassandra/cassandra-a.pid

# Deinit on shutdown (see: https://phabricator.wikimedia.org/T327954)
ExecStop=-/usr/local/bin/nodetool-a disablethrift
ExecStop=-/usr/local/bin/nodetool-a disablebinary
ExecStop=-/usr/local/bin/nodetool-a disablegossip
ExecStop=-/usr/local/bin/nodetool-a drain
ExecStop=/usr/local/bin/nodetool-a stopdaemon

Event Timeline

Eevans triaged this task as High priority.Apr 14 2023, 4:01 PM
Eevans created this task.

I tested the unit as shown in the description, on cassandra-dev2001, and it seems to work well.

INFO  [RMI TCP Connection(4)-127.0.0.1] 2023-04-18 00:40:24,609 Server.java:179 - Stop listening for CQL clients
WARN  [RMI TCP Connection(6)-127.0.0.1] 2023-04-18 00:40:25,515 StorageService.java:360 - Stopping gossip by operator request
INFO  [RMI TCP Connection(6)-127.0.0.1] 2023-04-18 00:40:25,516 Gossiper.java:1720 - Announcing shutdown
INFO  [RMI TCP Connection(6)-127.0.0.1] 2023-04-18 00:40:25,520 StorageService.java:2484 - Node /10.192.16.14 state jump to shutdown
INFO  [RMI TCP Connection(8)-127.0.0.1] 2023-04-18 00:40:28,453 StorageService.java:1568 - DRAINING: starting drain process
INFO  [RMI TCP Connection(8)-127.0.0.1] 2023-04-18 00:40:28,456 HintsService.java:210 - Paused hints dispatch
INFO  [RMI TCP Connection(8)-127.0.0.1] 2023-04-18 00:40:28,457 Gossiper.java:1720 - Announcing shutdown
INFO  [RMI TCP Connection(8)-127.0.0.1] 2023-04-18 00:40:28,460 StorageService.java:2484 - Node /10.192.16.14 state jump to shutdown
INFO  [RMI TCP Connection(8)-127.0.0.1] 2023-04-18 00:40:30,464 MessagingService.java:985 - Waiting for messaging service to quiesce
INFO  [ACCEPT-/10.192.16.14] 2023-04-18 00:40:30,465 MessagingService.java:1346 - MessagingService has terminated the accept() thread
INFO  [ACCEPT-/10.192.16.14] 2023-04-18 00:40:30,465 MessagingService.java:1346 - MessagingService has terminated the accept() thread
INFO  [RMI TCP Connection(8)-127.0.0.1] 2023-04-18 00:40:30,758 HintsService.java:210 - Paused hints dispatch
INFO  [RMI TCP Connection(8)-127.0.0.1] 2023-04-18 00:40:30,769 StorageService.java:1568 - DRAINED
INFO  [RMI TCP Connection(10)-127.0.0.1] 2023-04-18 00:40:32,027 CassandraDaemon.java:576 - Cassandra shutting down...

Change 909403 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/debs/cassandra-tools-wmf@master] Do not de-init node prior to restart

https://gerrit.wikimedia.org/r/909403

Silly question: is this the sort of change we should be getting included upstream?

Silly question: is this the sort of change we should be getting included upstream?

None of this should be technically necessary; we're doing this to work around some sort of race condition on shutdown (obviously we do need to get that race condition fixed upstream though).

Change 909737 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] cassandra: add de-init to systemd unit file

https://gerrit.wikimedia.org/r/909737

Change 909997 had a related patch set uploaded (by Eevans; author: Eevans):

[labs/private@master] Missing aqs cluster secrets

https://gerrit.wikimedia.org/r/909997

Change 909997 merged by Eevans:

[labs/private@master] Missing aqs cluster secrets

https://gerrit.wikimedia.org/r/909997

Change 909403 merged by Eevans:

[operations/debs/cassandra-tools-wmf@master] Do not de-init node prior to restart

https://gerrit.wikimedia.org/r/909403

Change 910524 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/debs/cassandra-tools-wmf@master] Breakfixes from the 2to3 conversion

https://gerrit.wikimedia.org/r/910524

Change 910524 merged by Eevans:

[operations/debs/cassandra-tools-wmf@master] Breakfixes from the 2to3 conversion

https://gerrit.wikimedia.org/r/910524

Change 909737 merged by Eevans:

[operations/puppet@production] cassandra: add de-init to systemd unit file

https://gerrit.wikimedia.org/r/909737

Mentioned in SAL (#wikimedia-operations) [2023-04-20T16:31:36Z] <eevans@cumin1001> START - Cookbook sre.cassandra.roll-restart for nodes matching aqs10[10,13,16,19].eqiad.wmnet: Testing rolling restart (rack1) — T334754 - eevans@cumin1001

Mentioned in SAL (#wikimedia-operations) [2023-04-20T17:02:42Z] <eevans@cumin1001> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs10[10,13,16,19].eqiad.wmnet: Testing rolling restart (rack1) — T334754 - eevans@cumin1001

Change 910584 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] sessionstore: disable sessionstore1001 native transport

https://gerrit.wikimedia.org/r/910584

Change 910584 merged by Eevans:

[operations/puppet@production] sessionstore: disable sessionstore1001 native transport

https://gerrit.wikimedia.org/r/910584

For reasons I do not understand, this deinitialization sequence is not working (reliably), despite having done so well in ad-hoc testing. The log output makes it pretty clear that all of the commands are executing as intended, and that it does result in two separate gossip shutdown messages, and yet...

Cassandra system logs (reboot #1)
...
INFO  [RMI TCP Connection(6)-127.0.0.1] 2023-04-20 20:22:30,930 Server.java:179 - Stop listening for CQL clients
WARN  [RMI TCP Connection(8)-127.0.0.1] 2023-04-20 20:22:32,090 StorageService.java:360 - Stopping gossip by operator request
INFO  [RMI TCP Connection(8)-127.0.0.1] 2023-04-20 20:22:32,091 Gossiper.java:1720 - Announcing shutdown
INFO  [RMI TCP Connection(8)-127.0.0.1] 2023-04-20 20:22:32,094 StorageService.java:2484 - Node /10.64.0.144 state jump to shutdown
INFO  [RMI TCP Connection(10)-127.0.0.1] 2023-04-20 20:22:35,192 StorageService.java:1568 - DRAINING: starting drain process
INFO  [RMI TCP Connection(10)-127.0.0.1] 2023-04-20 20:22:35,196 HintsService.java:210 - Paused hints dispatch
INFO  [RMI TCP Connection(10)-127.0.0.1] 2023-04-20 20:22:35,197 Gossiper.java:1720 - Announcing shutdown
INFO  [RMI TCP Connection(10)-127.0.0.1] 2023-04-20 20:22:35,201 StorageService.java:2484 - Node /10.64.0.144 state jump to shutdown
INFO  [RMI TCP Connection(10)-127.0.0.1] 2023-04-20 20:22:37,204 MessagingService.java:985 - Waiting for messaging service to quiesce
INFO  [ACCEPT-/10.64.0.144] 2023-04-20 20:22:37,206 MessagingService.java:1346 - MessagingService has terminated the accept() thread
INFO  [ACCEPT-/10.64.0.144] 2023-04-20 20:22:37,207 MessagingService.java:1346 - MessagingService has terminated the accept() thread
INFO  [RMI TCP Connection(10)-127.0.0.1] 2023-04-20 20:22:37,577 HintsService.java:210 - Paused hints dispatch
INFO  [RMI TCP Connection(10)-127.0.0.1] 2023-04-20 20:22:37,592 StorageService.java:1568 - DRAINED
INFO  [RMI TCP Connection(12)-127.0.0.1] 2023-04-20 20:22:38,999 CassandraDaemon.java:576 - Cassandra shutting down...
...
sessionstore1001 status after reboot (split-brain)
Datacenter: codfw
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address                         Load       Tokens       Owns (effective)  Host ID                               Rack
DN  sessionstore2003-a.codfw.wmnet  27.34 GiB  256          100.0%            8d062d5f-b31f-4f58-bde2-ec0e33a489fd  d
DN  sessionstore2002-a.codfw.wmnet  17.11 GiB  256          100.0%            31a83e9e-63a6-4f51-99fb-0040b74cb51d  c
DN  sessionstore2001-a.codfw.wmnet  28.18 GiB  256          100.0%            b461130b-258f-4a93-81e5-63bd31d12406  b
Datacenter: eqiad
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address                         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  sessionstore1001-a.eqiad.wmnet  35.07 GiB  256          100.0%            59918268-6b49-45bb-a29d-9ede6385bdce  a
DN  sessionstore1003-a.eqiad.wmnet  34.28 GiB  256          100.0%            bcdedfef-87e6-4f54-bc33-6e149da90115  d
DN  sessionstore1002-a.eqiad.wmnet  25.09 GiB  256          100.0%            5af82e95-c935-4492-9d8c-04542c0c6ede  c

Cassandra system logs (reboot #2)
INFO  [RMI TCP Connection(552)-127.0.0.1] 2023-04-20 20:47:48,517 Server.java:179 - Stop listening for CQL clients
WARN  [RMI TCP Connection(554)-127.0.0.1] 2023-04-20 20:47:49,494 StorageService.java:360 - Stopping gossip by operator request
INFO  [RMI TCP Connection(554)-127.0.0.1] 2023-04-20 20:47:49,495 Gossiper.java:1720 - Announcing shutdown
INFO  [RMI TCP Connection(554)-127.0.0.1] 2023-04-20 20:47:49,501 StorageService.java:2484 - Node /10.64.0.144 state jump to shutdown
INFO  [RMI TCP Connection(556)-127.0.0.1] 2023-04-20 20:47:52,499 StorageService.java:1568 - DRAINING: starting drain process
INFO  [RMI TCP Connection(556)-127.0.0.1] 2023-04-20 20:47:52,502 HintsService.java:210 - Paused hints dispatch
INFO  [RMI TCP Connection(556)-127.0.0.1] 2023-04-20 20:47:52,504 Gossiper.java:1720 - Announcing shutdown
INFO  [RMI TCP Connection(556)-127.0.0.1] 2023-04-20 20:47:52,507 StorageService.java:2484 - Node /10.64.0.144 state jump to shutdown
INFO  [RMI TCP Connection(556)-127.0.0.1] 2023-04-20 20:47:54,510 MessagingService.java:985 - Waiting for messaging service to quiesce
INFO  [ACCEPT-/10.64.0.144] 2023-04-20 20:47:54,512 MessagingService.java:1346 - MessagingService has terminated the accept() thread
INFO  [ACCEPT-/10.64.0.144] 2023-04-20 20:47:54,585 MessagingService.java:1346 - MessagingService has terminated the accept() thread
ERROR [MessagingService-Incoming-/10.192.32.15] 2023-04-20 20:47:54,729 CassandraDaemon.java:244 - Exception in thread Thread[MessagingService-Incoming-/10.192.32.15,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down
        at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:58)
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
        at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.execute(DebuggableThreadPoolExecutor.java:148)
        at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:1035)
        at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:215)
        at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:195)
        at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:98)
INFO  [RMI TCP Connection(556)-127.0.0.1] 2023-04-20 20:47:56,356 HintsService.java:210 - Paused hints dispatch
INFO  [RMI TCP Connection(556)-127.0.0.1] 2023-04-20 20:47:56,371 StorageService.java:1568 - DRAINED
INFO  [RMI TCP Connection(558)-127.0.0.1] 2023-04-20 20:47:57,765 CassandraDaemon.java:576 - Cassandra shutting down...
sessionstore1001 status after reboot (good)
Datacenter: codfw
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address                         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  sessionstore2003-a.codfw.wmnet  27.34 GiB  256          100.0%            8d062d5f-b31f-4f58-bde2-ec0e33a489fd  d
UN  sessionstore2002-a.codfw.wmnet  17.3 GiB   256          100.0%            31a83e9e-63a6-4f51-99fb-0040b74cb51d  c
UN  sessionstore2001-a.codfw.wmnet  28.37 GiB  256          100.0%            b461130b-258f-4a93-81e5-63bd31d12406  b
Datacenter: eqiad
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address                         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  sessionstore1001-a.eqiad.wmnet  35.13 GiB  256          100.0%            59918268-6b49-45bb-a29d-9ede6385bdce  a
UN  sessionstore1003-a.eqiad.wmnet  34.28 GiB  256          100.0%            bcdedfef-87e6-4f54-bc33-6e149da90115  d
UN  sessionstore1002-a.eqiad.wmnet  25.09 GiB  256          100.0%            5af82e95-c935-4492-9d8c-04542c0c6ede  c

Cassandra system logs (reboot #3)
INFO  [RMI TCP Connection(196)-127.0.0.1] 2023-04-20 20:57:29,979 Server.java:179 - Stop listening for CQL clients
WARN  [RMI TCP Connection(198)-127.0.0.1] 2023-04-20 20:57:30,942 StorageService.java:360 - Stopping gossip by operator request
INFO  [RMI TCP Connection(198)-127.0.0.1] 2023-04-20 20:57:30,942 Gossiper.java:1720 - Announcing shutdown
INFO  [RMI TCP Connection(198)-127.0.0.1] 2023-04-20 20:57:30,944 StorageService.java:2484 - Node /10.64.0.144 state jump to shutdown
INFO  [RMI TCP Connection(200)-127.0.0.1] 2023-04-20 20:57:33,909 StorageService.java:1568 - DRAINING: starting drain process
INFO  [RMI TCP Connection(200)-127.0.0.1] 2023-04-20 20:57:33,912 HintsService.java:210 - Paused hints dispatch
INFO  [RMI TCP Connection(200)-127.0.0.1] 2023-04-20 20:57:33,914 Gossiper.java:1720 - Announcing shutdown
INFO  [RMI TCP Connection(200)-127.0.0.1] 2023-04-20 20:57:33,917 StorageService.java:2484 - Node /10.64.0.144 state jump to shutdown
INFO  [RMI TCP Connection(200)-127.0.0.1] 2023-04-20 20:57:35,921 MessagingService.java:985 - Waiting for messaging service to quiesce
INFO  [ACCEPT-/10.64.0.144] 2023-04-20 20:57:35,923 MessagingService.java:1346 - MessagingService has terminated the accept() thread
INFO  [ACCEPT-/10.64.0.144] 2023-04-20 20:57:35,924 MessagingService.java:1346 - MessagingService has terminated the accept() thread
INFO  [RMI TCP Connection(200)-127.0.0.1] 2023-04-20 20:57:37,215 HintsService.java:210 - Paused hints dispatch
INFO  [RMI TCP Connection(200)-127.0.0.1] 2023-04-20 20:57:37,228 StorageService.java:1568 - DRAINED
INFO  [RMI TCP Connection(202)-127.0.0.1] 2023-04-20 20:57:38,536 CassandraDaemon.java:576 - Cassandra shutting down...
sessionstore1001 status after reboot (split-brain)
Datacenter: codfw
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address                         Load       Tokens       Owns (effective)  Host ID                               Rack
DN  sessionstore2003-a.codfw.wmnet  27.53 GiB  256          100.0%            8d062d5f-b31f-4f58-bde2-ec0e33a489fd  d
DN  sessionstore2002-a.codfw.wmnet  17.3 GiB   256          100.0%            31a83e9e-63a6-4f51-99fb-0040b74cb51d  c
DN  sessionstore2001-a.codfw.wmnet  28.37 GiB  256          100.0%            b461130b-258f-4a93-81e5-63bd31d12406  b
Datacenter: eqiad
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address                         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  sessionstore1001-a.eqiad.wmnet  35.14 GiB  256          100.0%            59918268-6b49-45bb-a29d-9ede6385bdce  a
DN  sessionstore1003-a.eqiad.wmnet  34.28 GiB  256          100.0%            bcdedfef-87e6-4f54-bc33-6e149da90115  d
DN  sessionstore1002-a.eqiad.wmnet  25.09 GiB  256          100.0%            5af82e95-c935-4492-9d8c-04542c0c6ede  c

From sessionstore1002, despite the shutdown, you can see this node immediately reconnect (twice).

Cassandra (sessionstore1002) system logs (reboot #3)
INFO  [GossipStage:1] 2023-04-20 20:57:30,938 Gossiper.java:1177 - InetAddress /10.64.0.144 is now DOWN
INFO  [HintsDispatcher:53] 2023-04-20 20:57:30,942 HintsDispatchExecutor.java:289 - Finished hinted handoff of file 59918268-6b49-45bb-a29d-9ede6385bdce-1682023396569-1.hints to endpoint /10.64.0.144: 59918268-6b49-45bb-a29d-9ede6385bdce, partially
INFO  [GossipStage:1] 2023-04-20 20:57:30,945 StorageService.java:2484 - Node /10.64.0.144 state jump to shutdown
INFO  [HANDSHAKE-sessionstore1001-a.eqiad.wmnet/10.64.0.144] 2023-04-20 20:57:30,969 OutboundTcpConnection.java:561 - Handshaking version with sessionstore1001-a.eqiad.wmnet/10.64.0.144
INFO  [HANDSHAKE-sessionstore1001-a.eqiad.wmnet/10.64.0.144] 2023-04-20 20:57:31,279 OutboundTcpConnection.java:561 - Handshaking version with sessionstore1001-a.eqiad.wmnet/10.64.0.144
INFO  [GossipStage:1] 2023-04-20 20:57:33,913 Gossiper.java:1177 - InetAddress /10.64.0.144 is now DOWN
INFO  [GossipStage:1] 2023-04-20 20:57:33,917 StorageService.java:2484 - Node /10.64.0.144 state jump to shutdown
INFO  [HANDSHAKE-sessionstore1001-a.eqiad.wmnet/10.64.0.144] 2023-04-20 20:57:34,280 OutboundTcpConnection.java:561 - Handshaking version with sessionstore1001-a.eqiad.wmnet/10.64.0.144
INFO  [HANDSHAKE-sessionstore1001-a.eqiad.wmnet/10.64.0.144] 2023-04-20 20:57:35,276 OutboundTcpConnection.java:561 - Handshaking version with sessionstore1001-a.eqiad.wmnet/10.64.0.144

😭

Eevans closed this task as Resolved.EditedSep 19 2023, 5:03 PM

The underlying condition (the race condition on shutdown) that we were attempting to work around no longer exists in Cassandra 4.1.1 (which we're now running everywhere).

That said, the deinit commands were successfully relocated from c-foreach-restart to the systemd unit (the raison d'etre for this ticket), so either way we can close this issue now.