Page MenuHomePhabricator

Cassandra (logstash) logging broken
Closed, ResolvedPublic

Description

Logstash logging no longer works for the RESTBase & AQS clusters. The timing corresponds with the upgrades to Bullseye; Logstash logs stop for eaach node immediately after their upgrade. Interestingly, logging still works for the Sessionstore cluster (the first to be upgraded to Bullseye).

The logstash logback encoder hands off JSON messages to rsyslog on localhost:11514 (UDP). A tcpdump shows that Cassandra is not sending packets to rsyslog on the effected nodes.


See also:

T328490: restbase cluster: decommission end-of-life hosts
T331713: Migrate restbase servers to Bullseye
T347738: Upgrade AQS cluster to Bullseye

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Eevans triaged this task as Medium priority.Dec 15 2023, 8:21 PM
Eevans updated the task description. (Show Details)

From the journalctl output:

Dec 15 15:08:17 restbase2032 cassandra[1809747]: 15:08:17,879 |-ERROR in ch.qos.logback.core.joran.action.AppenderAction - Could not create an Appender of type [net.logstash.logback.appender.LogstashSocketAppender]. ch.qos.logback.core.util.DynamicClassLoadingException: Failed to instantiate type net.logstash.logback.appender.LogstashSocketAppender
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at ch.qos.logback.core.util.DynamicClassLoadingException: Failed to instantiate type net.logstash.logback.appender.LogstashSocketAppender
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at ch.qos.logback.core.util.OptionHelper.instantiateByClassNameAndParameter(OptionHelper.java:69)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at ch.qos.logback.core.util.OptionHelper.instantiateByClassName(OptionHelper.java:45)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at ch.qos.logback.core.util.OptionHelper.instantiateByClassName(OptionHelper.java:34)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at ch.qos.logback.core.joran.action.AppenderAction.begin(AppenderAction.java:52)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at ch.qos.logback.core.joran.spi.Interpreter.callBeginAction(Interpreter.java:269)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at ch.qos.logback.core.joran.spi.Interpreter.startElement(Interpreter.java:145)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at ch.qos.logback.core.joran.spi.Interpreter.startElement(Interpreter.java:128)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at ch.qos.logback.core.joran.spi.EventPlayer.play(EventPlayer.java:50)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at ch.qos.logback.core.joran.GenericConfigurator.doConfigure(GenericConfigurator.java:165)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at ch.qos.logback.core.joran.GenericConfigurator.doConfigure(GenericConfigurator.java:152)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at ch.qos.logback.core.joran.GenericConfigurator.doConfigure(GenericConfigurator.java:110)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at ch.qos.logback.core.joran.GenericConfigurator.doConfigure(GenericConfigurator.java:53)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at ch.qos.logback.classic.util.ContextInitializer.configureByResource(ContextInitializer.java:65)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at ch.qos.logback.classic.util.ContextInitializer.autoConfig(ContextInitializer.java:140)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at org.slf4j.impl.StaticLoggerBinder.init(StaticLoggerBinder.java:84)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at org.slf4j.impl.StaticLoggerBinder.<clinit>(StaticLoggerBinder.java:55)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at org.slf4j.LoggerFactory.bind(LoggerFactory.java:150)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at org.slf4j.LoggerFactory.performInitialization(LoggerFactory.java:124)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at org.slf4j.LoggerFactory.getILoggerFactory(LoggerFactory.java:412)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:357)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:383)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at org.apache.cassandra.service.CassandraDaemon.<clinit>(CassandraDaemon.java:135)
Dec 15 15:08:17 restbase2032 cassandra[1809747]: Caused by: java.lang.ClassNotFoundException: net.logstash.logback.appender.LogstashSocketAppender
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         at ch.qos.logback.core.util.OptionHelper.instantiateByClassNameAndParameter(OptionHelper.java:56)
Dec 15 15:08:17 restbase2032 cassandra[1809747]:         at         ... 21 common frames omitted
[ ... ]
Dec 15 15:08:17 restbase2032 cassandra[1809747]: 15:08:17,883 |-ERROR in ch.qos.logback.core.joran.action.AppenderRefAction - Could not find an appender named [UDP]. Did you define it below instead of above in the configuration file?
Dec 15 15:08:17 restbase2032 cassandra[1809747]: 15:08:17,883 |-ERROR in ch.qos.logback.core.joran.action.AppenderRefAction - See http://logback.qos.ch/codes.html#appender_order for more details.

And finally:

eevans@restbase2032:~$ file /srv/deployment/cassandra/logstash-logback-encoder/lib/logstash-logback-encoder-4.2.jar
/srv/deployment/cassandra/logstash-logback-encoder/lib/logstash-logback-encoder-4.2.jar: ASCII text
eevans@restbase2032:~$

So it would appear that the git-fat jar isn't "hydrated", and the jar file is essentially missing.

Change 1003509 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/software/logstash-logback-encoder@master] Bring restbase & aqs targets up to current

https://gerrit.wikimedia.org/r/1003509

Change 1003509 merged by Eevans:

[operations/software/logstash-logback-encoder@master] Bring restbase & aqs targets up to current

https://gerrit.wikimedia.org/r/1003509

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:27:31Z] <eevans@deploy2002> Started deploy [cassandra/logstash-logback-encoder@5c2dd00]: Deploying to updated target list — T353550

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:28:12Z] <eevans@deploy2002> Finished deploy [cassandra/logstash-logback-encoder@5c2dd00]: Deploying to updated target list — T353550 (duration: 00m 41s)

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:30:57Z] <eevans@deploy2002> Started deploy [cassandra/logstash-logback-encoder@5c2dd00]: Deploying to updated target list — T353550

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:31:17Z] <eevans@deploy2002> Finished deploy [cassandra/logstash-logback-encoder@5c2dd00]: Deploying to updated target list — T353550 (duration: 00m 20s)

Change 1003512 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/software/logstash-logback-encoder@master] Fix canary name typo

https://gerrit.wikimedia.org/r/1003512

Change 1003512 merged by Eevans:

[operations/software/logstash-logback-encoder@master] Fix canary name typo

https://gerrit.wikimedia.org/r/1003512

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:35:59Z] <eevans@deploy2002> Started deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:36:13Z] <eevans@deploy2002> Finished deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550 (duration: 00m 16s)

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:38:06Z] <eevans@deploy2002> Started deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:39:23Z] <eevans@deploy2002> Finished deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550 (duration: 01m 17s)

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:41:23Z] <eevans@deploy2002> Started deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:42:08Z] <eevans@deploy2002> Finished deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550 (duration: 00m 45s)

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:43:14Z] <eevans@deploy2002> Started deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:43:29Z] <eevans@deploy2002> Finished deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550 (duration: 00m 14s)

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:46:08Z] <eevans@deploy2002> Started deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:46:42Z] <eevans@deploy2002> Finished deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550 (duration: 00m 34s)

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:50:24Z] <eevans@deploy2002> Started deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:50:29Z] <eevans@deploy2002> Finished deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550 (duration: 00m 05s)

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:50:54Z] <eevans@deploy2002> Started deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:50:58Z] <eevans@deploy2002> Finished deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550 (duration: 00m 04s)

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:51:07Z] <eevans@deploy2002> Started deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:51:11Z] <eevans@deploy2002> Finished deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550 (duration: 00m 03s)

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:53:06Z] <eevans@deploy2002> Started deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:53:13Z] <eevans@deploy2002> Finished deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550 (duration: 00m 07s)

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:53:46Z] <eevans@deploy2002> Started deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:53:52Z] <eevans@deploy2002> Finished deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550 (duration: 00m 06s)

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:53:59Z] <eevans@deploy2002> Started deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:54:05Z] <eevans@deploy2002> Finished deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550 (duration: 00m 05s)

Mentioned in SAL (#wikimedia-operations) [2024-02-14T19:59:14Z] <eevans@cumin1002> START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "a"} and A:restbase and A:eqiad: Restart to pickup logging jars — T353550 - eevans@cumin1002

Change 1003526 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] cassandra: install git-fat to satisfy scap requirement

https://gerrit.wikimedia.org/r/1003526

Mentioned in SAL (#wikimedia-operations) [2024-02-14T20:52:28Z] <eevans@cumin1002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "a"} and A:restbase and A:eqiad: Restart to pickup logging jars — T353550 - eevans@cumin1002

I attempted a scap deploy for the restbase cluster (after updating the list of targets), but that failed because git-fat was missing. Installing git-fat and rerunning the deploy (and in some cases re-rerunning it with -f after some jars still failed to hydrate), ultimately worked. After a Cassandra restart, logging messages are now showing in OpenSearch.

Ostensibly the solution then would be to have Puppet ensure it's installed, but I was surprised to see that it still requires python2:

$ apt-cache depends git-fat
git-fat
  Depends: <python:any>
    python-is-python2
  Depends: <git-core>
    git
  Depends: rsync
$

@MoritzMuehlenhoff I know we're trying to be done with python2 (and that python-is-python2 is something of a hack), is there an alternative I should be aware of? With debmonitor showing 421 installs I'm guessing not, but thought I should ask. :)

Mentioned in SAL (#wikimedia-operations) [2024-02-14T20:57:03Z] <eevans@cumin1002> START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "b"} and A:restbase and A:eqiad: Restart to pickup logging jars — T353550 - eevans@cumin1002

Mentioned in SAL (#wikimedia-operations) [2024-02-14T21:36:42Z] <eevans@cumin1002> END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching P{P:cassandra%rack = "b"} and A:restbase and A:eqiad: Restart to pickup logging jars — T353550 - eevans@cumin1002

Mentioned in SAL (#wikimedia-operations) [2024-02-14T21:41:21Z] <eevans@cumin1002> START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1032.eqiad.wmnet: Restart to pickup logging jars — T353550 - eevans@cumin1002

Mentioned in SAL (#wikimedia-operations) [2024-02-14T21:51:59Z] <eevans@cumin1002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1032.eqiad.wmnet: Restart to pickup logging jars — T353550 - eevans@cumin1002

Mentioned in SAL (#wikimedia-operations) [2024-02-14T22:13:16Z] <urandom> restarting Cassandra: restbase/codfw, row b — T353550

Mentioned in SAL (#wikimedia-operations) [2024-02-14T22:39:30Z] <eevans@cumin1002> START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "c"} and A:restbase and A:codfw: Restart to pickup logging jars — T353550 - eevans@cumin1002

Mentioned in SAL (#wikimedia-operations) [2024-02-14T23:32:27Z] <eevans@cumin1002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "c"} and A:restbase and A:codfw: Restart to pickup logging jars — T353550 - eevans@cumin1002

Mentioned in SAL (#wikimedia-operations) [2024-02-15T00:45:22Z] <eevans@cumin1002> START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "d"} and A:restbase and A:codfw: Restart to pickup logging jars — T353550 - eevans@cumin1002

Change 1003576 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/software/logstash-logback-encoder@master] Updated deployement targets

https://gerrit.wikimedia.org/r/1003576

Change 1003576 merged by Eevans:

[operations/software/logstash-logback-encoder@master] Updated deployement targets

https://gerrit.wikimedia.org/r/1003576

Mentioned in SAL (#wikimedia-operations) [2024-02-15T01:37:24Z] <eevans@cumin1002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "d"} and A:restbase and A:codfw: Restart to pickup logging jars — T353550 - eevans@cumin1002

@MoritzMuehlenhoff I know we're trying to be done with python2 (and that python-is-python2 is something of a hack), is there an alternative I should be aware of? With debmonitor showing 421 installs I'm guessing not, but thought I should ask. :)

Noone has currently taken on the work to port git-fat to Python 3,we have https://phabricator.wikimedia.org/T279509 to track this. As workaround we can still enable it for now (I'll followup on the patch). But there needs to be a solution eventually, as in Bookworm Python 2 is completely gone.

Change 1003526 merged by Eevans:

[operations/puppet@production] cassandra: install git-fat to satisfy scap requirement

https://gerrit.wikimedia.org/r/1003526

Mentioned in SAL (#wikimedia-operations) [2024-02-15T16:43:01Z] <eevans@deploy2002> Started deploy [cassandra/logstash-logback-encoder@162f72f] (aqs): Deploying to updated target list — T353550

Mentioned in SAL (#wikimedia-operations) [2024-02-15T16:43:38Z] <eevans@deploy2002> Finished deploy [cassandra/logstash-logback-encoder@162f72f] (aqs): Deploying to updated target list — T353550 (duration: 00m 37s)

Mentioned in SAL (#wikimedia-operations) [2024-02-15T16:45:12Z] <eevans@deploy2002> Started deploy [cassandra/logstash-logback-encoder@162f72f] (cassandra-dev): Deploying to updated target list — T353550

Mentioned in SAL (#wikimedia-operations) [2024-02-15T16:45:27Z] <eevans@deploy2002> Finished deploy [cassandra/logstash-logback-encoder@162f72f] (cassandra-dev): Deploying to updated target list — T353550 (duration: 00m 15s)

Mentioned in SAL (#wikimedia-operations) [2024-02-15T16:45:55Z] <eevans@deploy2002> Started deploy [cassandra/logstash-logback-encoder@162f72f] (ml-cache): Deploying to updated target list — T353550

Mentioned in SAL (#wikimedia-operations) [2024-02-15T16:46:10Z] <eevans@deploy2002> Finished deploy [cassandra/logstash-logback-encoder@162f72f] (ml-cache): Deploying to updated target list — T353550 (duration: 00m 15s)

Mentioned in SAL (#wikimedia-operations) [2024-02-15T16:46:18Z] <eevans@deploy2002> Started deploy [cassandra/logstash-logback-encoder@162f72f] (sessionstore): Deploying to updated target list — T353550

Mentioned in SAL (#wikimedia-operations) [2024-02-15T16:46:33Z] <eevans@deploy2002> Finished deploy [cassandra/logstash-logback-encoder@162f72f] (sessionstore): Deploying to updated target list — T353550 (duration: 00m 15s)

Mentioned in SAL (#wikimedia-operations) [2024-02-15T16:51:40Z] <eevans@cumin1002> START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "rack1"} and A:aqs and A:eqiad: Restart to pickup logging jars — T353550 - eevans@cumin1002

Mentioned in SAL (#wikimedia-operations) [2024-02-15T17:24:41Z] <eevans@cumin1002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "rack1"} and A:aqs and A:eqiad: Restart to pickup logging jars — T353550 - eevans@cumin1002

The jars are deployed everywhere, all that remains are restarts:

  • restbase
  • aqs
  • cassandra-dev
  • sessionstore
  • ml-cache

Mentioned in SAL (#wikimedia-operations) [2024-02-15T19:31:45Z] <eevans@cumin1002> START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "rack2"} and A:aqs and A:eqiad: Restart to pickup logging jars — T353550 - eevans@cumin1002

Mentioned in SAL (#wikimedia-operations) [2024-02-15T20:06:43Z] <eevans@cumin1002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "rack2"} and A:aqs and A:eqiad: Restart to pickup logging jars — T353550 - eevans@cumin1002

Mentioned in SAL (#wikimedia-operations) [2024-02-15T20:08:30Z] <eevans@cumin1002> START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "rack3"} and A:aqs and A:eqiad: Restart to pickup logging jars — T353550 - eevans@cumin1002

Mentioned in SAL (#wikimedia-operations) [2024-02-15T20:41:48Z] <eevans@cumin1002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "rack3"} and A:aqs and A:eqiad: Restart to pickup logging jars — T353550 - eevans@cumin1002

Mentioned in SAL (#wikimedia-operations) [2024-02-15T20:47:32Z] <eevans@cumin1002> START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "a_c"} and A:aqs and A:codfw: Restart to pickup logging jars — T353550 - eevans@cumin1002

Mentioned in SAL (#wikimedia-operations) [2024-02-15T21:20:46Z] <eevans@cumin1002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "a_c"} and A:aqs and A:codfw: Restart to pickup logging jars — T353550 - eevans@cumin1002

Mentioned in SAL (#wikimedia-operations) [2024-02-15T21:26:02Z] <eevans@cumin1002> START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "b_e"} and A:aqs and A:codfw: Restart to pickup logging jars — T353550 - eevans@cumin1002

Mentioned in SAL (#wikimedia-operations) [2024-02-15T21:59:48Z] <eevans@cumin1002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "b_e"} and A:aqs and A:codfw: Restart to pickup logging jars — T353550 - eevans@cumin1002

Mentioned in SAL (#wikimedia-operations) [2024-02-15T21:59:56Z] <eevans@cumin1002> START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "c_f"} and A:aqs and A:codfw: Restart to pickup logging jars — T353550 - eevans@cumin1002

Mentioned in SAL (#wikimedia-operations) [2024-02-15T22:34:13Z] <eevans@cumin1002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "c_f"} and A:aqs and A:codfw: Restart to pickup logging jars — T353550 - eevans@cumin1002

Mentioned in SAL (#wikimedia-operations) [2024-02-15T22:47:18Z] <eevans@cumin1002> START - Cookbook sre.cassandra.roll-restart for nodes matching A:sessionstore: Restart to pickup logging jars — T353550 - eevans@cumin1002

Mentioned in SAL (#wikimedia-operations) [2024-02-15T23:26:55Z] <eevans@cumin1002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:sessionstore: Restart to pickup logging jars — T353550 - eevans@cumin1002

The jars are deployed everywhere, all that remains are restarts:

  • restbase
  • aqs
  • cassandra-dev
  • sessionstore
  • ml-cache

Mentioned in SAL (#wikimedia-operations) [2024-02-16T00:06:00Z] <eevans@cumin1002> START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Restart to pickup logging jars — T353550 - eevans@cumin1002

Mentioned in SAL (#wikimedia-operations) [2024-02-16T00:27:59Z] <eevans@cumin1002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Restart to pickup logging jars — T353550 - eevans@cumin1002

Mentioned in SAL (#wikimedia-operations) [2024-02-16T00:49:13Z] <eevans@cumin1002> START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache: Restart to pickup logging jars — T353550 - eevans@cumin1002

@MoritzMuehlenhoff I know we're trying to be done with python2 (and that python-is-python2 is something of a hack), is there an alternative I should be aware of? With debmonitor showing 421 installs I'm guessing not, but thought I should ask. :)

Noone has currently taken on the work to port git-fat to Python 3,we have https://phabricator.wikimedia.org/T279509 to track this. As workaround we can still enable it for now (I'll followup on the patch). But there needs to be a solution eventually, as in Bookworm Python 2 is completely gone.

I've opened T357739 to track a permanent fix for this.

Mentioned in SAL (#wikimedia-operations) [2024-02-16T01:26:17Z] <eevans@cumin1002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache: Restart to pickup logging jars — T353550 - eevans@cumin1002

Eevans claimed this task.

The jars are deployed everywhere, all that remains are restarts:

  • restbase
  • aqs
  • cassandra-dev
  • sessionstore
  • ml-cache

Done.