Page MenuHomePhabricator

Issue with c-foreach-restart on Cassandra version 3.11.11
Closed, DuplicatePublic

Description

I've attempted to run the sre.cassandra.roll-restart cookbook to perform a rolling restart of the aqs_next Cassandra cluster.

This cookbook fails, displaying an error message from c-foreach-restart as shown.

===== NODE GROUP =====
(1) aqs1010.eqiad.wmnet
----- OUTPUT of 'c-foreach-restart -d 10 -a 20 -r 12' -----
2022-06-08 09:46:57,533 INFO     [a] Disabling client ports...
nodetool: Failed to connect to 'localhost:7189' - URISyntaxException: 'Malformed IPv6 address at index 7: rmi://[localhost]:7189'.
Traceback (most recent call last):
  File "/usr/bin/c-foreach-restart", line 60, in <module>
    main()
  File "/usr/bin/c-foreach-restart", line 29, in main
    post_shutdown=args.execute_post_shutdown
  File "/usr/share/cassandra-tools-wmf/cassandra/tools/instances.py", line 59, in restart
    self.nodetool.run("disablebinary")
  File "/usr/share/cassandra-tools-wmf/cassandra/tools/nodetool.py", line 37, in run
    retcode)
cassandra.tools.nodetool.NodetoolCommandException: nodetool command returned exit code 1
================

This does not happen if I use the cookbook on the older aqs hosts, which are running an older version of cassandra. i.e.

sudo cookbook -d sre.cassandra.roll-restart -r T309526 --query aqs100*

This leads me to believe that it is a problem that has arisen with the latest version of cassandra only.

Event Timeline

Based on the mention of nodetool in the error log it's quite likely to be related to https://phabricator.wikimedia.org/T309736.

Based on the mention of nodetool in the error log it's quite likely to be related to https://phabricator.wikimedia.org/T309736.

Right, it's actually a newer version of the JVM causing this (stricter JNDI url parsing). I've queued up 3.11.13 which contains a fix for this (see: r803903).

T309736 has a work-around that might help in the interim.

Ah, thanks @Eevans - I thought you'd have a handle on it. Fee free to merge this ticket in as a duplicate if it makes sense.