User Details
- User Since
- Feb 27 2015, 10:47 PM (478 w, 8 h)
- Availability
- Available
- IRC Nick
- urandom
- LDAP User
- Eevans
- MediaWiki User
- EEvans (WMF) [ Global Accounts ]
Today
Restarted via the drac and everything seems OK now. I skimmed the logs and didn't see anything that seemed unusual prior to the event.
Also unable to login via the serial console.
Yesterday
The first device is done rebuilding:
Thu, Apr 25
Ok, the rebuild is complete.
Wed, Apr 24
2:23 PM <jclark-ctr> i am swapping sdf again
2:24 PM <jclark-ctr> swapped with one that was just erased
Having some trouble adding sdf2 back into the array: mdadm: Cannot open /dev/sdf2: Device or resource busy :/
Tue, Apr 23
Here is a transcript of everything done (for posterity sake):
Hey @Jclark-ctr: I hope it's OK to assign this one to you as well.
Fri, Apr 19
Thu, Apr 18
Wed, Apr 17
Tue, Apr 16
We've made the upgrade to 4.x already, and we did so without a migration. If I've understood the context above, that was the reason for elevating the priority, so I'm going to drop it down now. Please fee free to readjust if that's wrong.
We've encountered a problem enabling verification for gocql-based clients (see: T352647#9715110). We'll need to implement a custom HostDialer for Cassandra-connecting golang services before this work can continue.
Mon, Apr 15
{"msg":"error: failed to connect to \"[HostInfo hostname=\\\"10.192.48.54\\\" connectAddress=\\\"10.192.48.54\\\" peer=\\\"10.192.48.54\\\" rpc_address=\\\"10.192.48.54\\\" broadcast_address=\\\"\u003cnil\u003e\\\" preferred_ip=\\\"\u003cnil\u003e\\\" connect_addr=\\\"10.192.48.54\\\" connect_addr_source=\\\"connect_address\\\" port=9042 data_centre=\\\"codfw\\\" rack=\\\"A_D\\\" host_id=\\\"5bfa3453-48f8-4c3c-82ea-478c460b6ee5\\\" version=\\\"v4.1.1\\\" state=UP num_tokens=256]\" due to error: x509: cannot validate certificate for 10.192.48.54 because it doesn't contain any IP SANs","appname":"sessionstore","time":"2024-04-15T18:25:35Z","level":"WARNING"}
Thu, Apr 11
Wed, Apr 10
Mon, Apr 8
Fri, Apr 5
The original scope of this ticket was a very specific request to retrieve data, and that request as been met, so I'll close this ticket now.
Thu, Apr 4
Wed, Apr 3
I'm not sure what to make of the results of disabling read-repair. It did not stop the errors entirely, but we can't say there is no change either. The decommissions are now complete, which makes further experimentation difficult. I think CASSANDRA-19120 is the most promising thing, so I propose that we upgrade to Cassandra 4.1.5 when it becomes available, and leave this issue open until the next decommission is needed.
Done!