Page MenuHomePhabricator

Puppet sslcert::ca does not refresh the certificate symlinks when a .crt is updated
Closed, ResolvedPublic

Description

Via https://logstash-beta.wmflabs.org/ I have found out that CirrusSearch yields unknown: Unknown error:60. Even found one occurence when it yields it on a hit of Special:Version.

Logstash fields:

channel CirrusSearch
messageunknown: Unknown error:60
normalized_messageSearch backend error during {queryType} search for '{query}' after {took}: {message}

There is also a few related to not being able to grab the ElasticSearch version:

Search backend error during fetching elasticsearch version after {took}: {message}

Beta Logstash link for last four hours: https://logstash-beta.wmflabs.org/goto/ae4151bf2267c2aae5d87c4e1b02ef15


Reproduction

strace -e stat -f curl https://deployment-elastic06.deployment-prep.eqiad.wmflabs:9243

Check ssl hash

openssl x509 -subject_hash  -noout < /etc/ssl/certs/Puppet_Internal_CA.pem

Fix

Redo all symlinks:

update-ca-certificates --verbose --fresh

Details from T145609#2636308 : change of the CA via Puppet should trigger update-ca-certificates with --fresh. Else the checksum symlinks are off and SSL certificates can not be found.

Event Timeline

hashar created this task.Sep 14 2016, 8:05 AM
Restricted Application added a project: Discovery. · View Herald TranscriptSep 14 2016, 8:05 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

All messages come from deployment-mediawiki04 which is a new MW App we have added yesterday and runs Jessie (vs Trusty). Might be related

elukey added a subscriber: elukey.Sep 14 2016, 8:28 AM
hashar added a subscriber: dcausse.Sep 14 2016, 8:52 AM

[8:38:39] <dcausse> yes curl error codes are not very explicit: https://curl.haxx.se/libcurl/c/libcurl-errors.html
[8:38:47] <dcausse> 60 seems to be CURLE_SSL_CACERT

Tweaks needed to be done:
curl settings appear to be broken and https is failing.
The trick was to download the cert file from relforge1001:/usr/local/share/ca-certificates/Puppet_Internal_CA.crt
copy this file to lxc-vagrant box
make sure that curl will load this file
(identify the cert file opened by default strace curl https://relforge1001.eqiad.wmnet:9243 2>&1 | grep stat)
symlink /etc/ssl/certs/c5aaad6f.0 to this file
It's unclear why curl refused to use the bundle /etc/ssl/certs/ca-certificates.crt ...

Copy pasting from IRC investigations with dcausse / gehel / elukey. To repro:

root@deployment-mediawiki04: strace -e stat -f curl https://deployment-elastic06.deployment-prep.eqiad.wmflabs:9243
stat("/etc/ssl/certs/7abfb60b.0", 0x7ffe7ec372a0) = -1 ENOENT (No such file or directory)
curl: (60) SSL certificate problem: unable to get local issuer certificate

Hashes are different though:

ls -l /etc/ssl/certs/ | grep -i puppet
[11:01:40]  <elukey>	lrwxrwxrwx 1 root root     22 Sep 12 13:50 a861a8e4.0 -> Puppet_Internal_CA.pem
[11:01:43]  <elukey>	lrwxrwxrwx 1 root root     22 Sep 12 13:50 c934af94.0 -> Puppet_Internal_CA.pem
[11:01:45]  <elukey>	lrwxrwxrwx 1 root root     55 Sep 12 13:50 Puppet_Internal_CA.pem -> /usr/local/share/ca-certificates/Puppet_Internal_CA.crt
hashar updated the task description. (Show Details)Sep 14 2016, 9:35 AM

From the syslog on deployment-mediawiki04

update-ca-certificate is run, then the Puppet_Internal_CA.crt is updated since the beta cluster uses its own puppet master. The subject hash change but the symlinks are not updated:

First puppet run:

Sep 12 13:50:58 deployment-mediawiki04 puppet-agent[1436]: (/Stage[main]/Sslcert/Exec[update-ca-certificates]) Triggered 'refresh' from 6 events

The Puppet_Internal_CA file is then later changed:

Sep 12 14:03:17 (/Stage[main]/Base::Certificates/Sslcert::Ca[Puppet_Internal_CA]
  /File[/usr/local/share/ca-certificates/Puppet_Internal_CA.crt])
  Filebucketed /usr/local/share/ca-certificates/Puppet_Internal_CA.crt to puppet
  with sum 9f3978d4816ae16ad737cf46ca10af19

Sep 12 14:03:17 (/Stage[main]/Base::Certificates/Sslcert::Ca[Puppet_Internal_CA]
  /File[/usr/local/share/ca-certificates/Puppet_Internal_CA.crt]/content) content changed
  '{md5}9f3978d4816ae16ad737cf46ca10af19' to
  '{md5}05f38ae56b7395fa123c2b14658702c9'

Sep 12 14:03:17 (/Stage[main]/Base::Certificates/Sslcert::Ca[Puppet_Internal_CA]
  /File[/usr/local/share/ca-certificates/Puppet_Internal_CA.crt])
  Scheduling refresh of Exec[update-ca-certificates]

Sep 12 14:03:34 (/Stage[main]/Sslcert/Exec[update-ca-certificates]) Triggered 'refresh' from 1 events

Seems update-ca-certificatesfails to find that the .crt file has been changed (and hence the subject hash) and does not update the symlink.

I have saved the syslog files on deployment-mediawiki04 in /root/syslog.T145609.

Seems to me that whenever the .crt file is modified by in sslcert:ca, that should rebuild the symlinks with update-ca-certificates --fresh.

debt added a subscriber: debt.

Removing the Search tag from this ticket - please let us know if there is anything else for us to investigate.

hashar renamed this task from On Beta cluster, CirrusSearch yields: unknown: Unknown error:60 to Puppet sslcert::ca does not refresh the certificate symlinks when a .crt is updated.Sep 26 2016, 10:05 AM
hashar triaged this task as Normal priority.
hashar added a project: Operations.
hashar updated the task description. (Show Details)
hashar moved this task from To Triage to Backlog on the Beta-Cluster-Infrastructure board.

Mentioned in SAL (#wikimedia-releng) [2016-10-14T11:30:05Z] <dcausse> deployment-prep running sudo update-ca-certificates --fresh on deployment-ton to fix curl error code 60 in cirrus maint script (T145609)

Change 315934 had a related patch set uploaded (by Gehel):
ssl - make sure certificates are updated "fresh"

https://gerrit.wikimedia.org/r/315934

Change 315934 abandoned by Gehel:
ssl - make sure certificates are updated "fresh"

Reason:
I knew I was missing something. Thanks Faidon.

I'm dropping this, I have no idea how to improve on it.

https://gerrit.wikimedia.org/r/315934

As pointed by @faidon, when running update-ca-certificates with -f there is a small window of time where the certificates symlinks disappear before being recreated. Ideally change of certificates should be done with a different file name, but this is not always practical.

fgiunchedi closed this task as Resolved.Nov 29 2016, 11:55 PM
fgiunchedi claimed this task.
fgiunchedi added a subscriber: fgiunchedi.

Resolving this in favour of T150823: Puppet CA rollover which handles the CA rollover.