Page MenuHomePhabricator

Certs from cassandra-ca-manager should have the FQDN in cert's CN
Closed, DeclinedPublic

Description

Certs presented over the network by Cassandra have CN=hostname not CN=FQDN, I _think_ it should be enough to tell cassandra-ca-manager to switch to the fqdn

restbase2008:/etc/cassandra-a$ openssl s_client -connect restbase2008-a.codfw.wmnet:7001
CONNECTED(00000003)
depth=1 CN = rootCa, OU = services, O = WMF, C = US
verify error:num=19:self signed certificate in certificate chain
---
Certificate chain
 0 s:/C=US/O=WMF/OU=services/CN=restbase2008-a
   i:/CN=rootCa/OU=services/O=WMF/C=US
 1 s:/CN=rootCa/OU=services/O=WMF/C=US
   i:/CN=rootCa/OU=services/O=WMF/C=US
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIDAjCCAeoCCQCKluomP1N31jANBgkqhkiG9w0BAQUFADA/MQ8wDQYDVQQDDAZy

Event Timeline

The CN comes from the keystore name attribute in the manifest, is it enough to just change that next time we generate certs?

Eevans renamed this task from certs from cassandra-ca-manager should have the FQDN in cert's CN to Certs from cassandra-ca-manager should have the FQDN in cert's CN.Aug 15 2016, 8:03 PM
Eevans triaged this task as Medium priority.
Eevans updated the task description. (Show Details)

The CN comes from the keystore name attribute in the manifest, is it enough to just change that next time we generate certs?

I think so yeah, should be enough

GWicke edited projects, added Services (later); removed Services.
Eevans lowered the priority of this task from Medium to Low.Jun 7 2021, 7:57 PM

Change 724061 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] cassandra: use FQDN in CN name for future instances

https://gerrit.wikimedia.org/r/724061

Change 724061 merged by Hnowlan:

[operations/puppet@production] cassandra: use FQDN in CN name for future instances

https://gerrit.wikimedia.org/r/724061

The attempted FQDN-use method appears to have failed - Cassandra claims there is an issue with the keystore format despite it being the same format/method as before:

INFO  [main] 2021-10-05 11:43:09,282 IndexSummaryManager.java:80 - Initializing index summary manager with a memory pool size of 614 MB and a resize interval of 60 minutes
ERROR [main] 2021-10-05 11:43:09,297 CassandraDaemon.java:749 - Fatal configuration error
org.apache.cassandra.exceptions.ConfigurationException: Unable to create ssl socket
        at org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:701) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.net.MessagingService.listen(MessagingService.java:681) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.net.MessagingService.listen(MessagingService.java:665) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:796) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:683) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:632) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) [apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620) [apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) [apache-cassandra-3.11.4.jar:3.11.4]
Caused by: java.io.IOException: Error creating the initializing the SSL Context
        at org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:697) ~[apache-cassandra-3.11.4.jar:3.11.4]
        ... 8 common frames omitted
Caused by: java.io.IOException: Invalid keystore format
        at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:666) ~[na:1.8.0_302]
        at sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:57) ~[na:1.8.0_302]
        at sun.security.provider.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:224) ~[na:1.8.0_302]
        at sun.security.provider.JavaKeyStore$DualFormatJKS.engineLoad(JavaKeyStore.java:71) ~[na:1.8.0_302]
        at java.security.KeyStore.load(KeyStore.java:1445) ~[na:1.8.0_302]
        at org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:179) ~[apache-cassandra-3.11.4.jar:3.11.4]
        ... 10 common frames omitted

Could there be an issue with mismatching between hostname and FQDN? FQDN certificates still exist on the puppet master for investigation but keytool doesn't show many discrepancies between them despite the obvious change of CN. For now I am reverting.

The attempted FQDN-use method appears to have failed - Cassandra claims there is an issue with the keystore format despite it being the same format/method as before:

INFO  [main] 2021-10-05 11:43:09,282 IndexSummaryManager.java:80 - Initializing index summary manager with a memory pool size of 614 MB and a resize interval of 60 minutes
ERROR [main] 2021-10-05 11:43:09,297 CassandraDaemon.java:749 - Fatal configuration error
org.apache.cassandra.exceptions.ConfigurationException: Unable to create ssl socket
        at org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:701) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.net.MessagingService.listen(MessagingService.java:681) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.net.MessagingService.listen(MessagingService.java:665) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:796) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:683) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:632) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) [apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620) [apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) [apache-cassandra-3.11.4.jar:3.11.4]
Caused by: java.io.IOException: Error creating the initializing the SSL Context
        at org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) ~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:697) ~[apache-cassandra-3.11.4.jar:3.11.4]
        ... 8 common frames omitted
Caused by: java.io.IOException: Invalid keystore format
        at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:666) ~[na:1.8.0_302]
        at sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:57) ~[na:1.8.0_302]
        at sun.security.provider.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:224) ~[na:1.8.0_302]
        at sun.security.provider.JavaKeyStore$DualFormatJKS.engineLoad(JavaKeyStore.java:71) ~[na:1.8.0_302]
        at java.security.KeyStore.load(KeyStore.java:1445) ~[na:1.8.0_302]
        at org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:179) ~[apache-cassandra-3.11.4.jar:3.11.4]
        ... 10 common frames omitted

Could there be an issue with mismatching between hostname and FQDN? FQDN certificates still exist on the puppet master for investigation but keytool doesn't show many discrepancies between them despite the obvious change of CN. For now I am reverting.

I've seen this exception once or twice before, but only ever when there was something wrong with the file itself. Not to say this couldn't be red herring of some sort though...

Could it be file permissions or something?

I've seen this exception once or twice before, but only ever when there was something wrong with the file itself. Not to say this couldn't be red herring of some sort though...

Could it be file permissions or something?

The permissions were most likely the exact same, they're the same in the secrets repo and are deployed in the same manner on the hosts. I think the only option is to try on a single instance again to try to find the source of the issue, perhaps without remerging the larger puppet changes. keytool etc couldn't find real differences between the files either

Eevans claimed this task.

In light of T288470: Replace cassandra-ca-manager with PKI (and considering it has remained unresolved for 7 years), I'm going to boldly close this as declined.