Page MenuHomePhabricator

tracking task: Globalsign OCSP unhappiness 2020-03-12
Closed, ResolvedPublic

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 579459 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] switch esams & eqsin to lets-encrypt; globalsign OCSP unhappy

https://gerrit.wikimedia.org/r/579459

Sample error log:

Mar 12 05:42:01 cp3050 CRON[9853]: (root) CMD (/usr/local/sbin/update-ocsp-all 2>&1 | logger -t update-ocsp-all)
[...]
Mar 12 05:42:46 cp3050 update-ocsp-all[9863]: Traceback (most recent call last):
Mar 12 05:42:46 cp3050 update-ocsp-all[9863]:   File "/usr/local/sbin/update-ocsp", line 290, in <module>
Mar 12 05:42:46 cp3050 update-ocsp-all[9863]:     main()
Mar 12 05:42:46 cp3050 update-ocsp-all[9863]:   File "/usr/local/sbin/update-ocsp", line 283, in main
Mar 12 05:42:46 cp3050 update-ocsp-all[9863]:     certs_fetch_ocsp(out_tempfile, args)
Mar 12 05:42:46 cp3050 update-ocsp-all[9863]:   File "/usr/local/sbin/update-ocsp", line 208, in certs_fetch_ocsp
Mar 12 05:42:46 cp3050 update-ocsp-all[9863]:     (ocsp_text, ocsp_err) = check_output_errtext(cmd)
Mar 12 05:42:46 cp3050 update-ocsp-all[9863]:   File "/usr/local/sbin/update-ocsp", line 101, in check_output_errtext
Mar 12 05:42:46 cp3050 update-ocsp-all[9863]:     (" ".join(args), p.returncode, p_err))
Mar 12 05:42:46 cp3050 update-ocsp-all[9863]: Exception: Command openssl ocsp -resp_text -respout /var/cache/ocsp/update-ocsp-TpDAUp.tmp/globalsign-2019-ecdsa-unified.ocsp -issuer /etc/ssl/certs/7cfeae01.0 -verify_other /etc/ssl/certs/7cfeae01.0 -path http://ocsp.globalsign.com/gseccovsslca2018 -host webproxy.esams.wmnet:8080 -cert /etc/ssl/localcerts/globalsign-2019-ecdsa-unified.crt failed with exit code 1, stderr:
Mar 12 05:42:46 cp3050 update-ocsp-all[9863]: Error querying OCSP responder
Mar 12 05:42:46 cp3050 update-ocsp-all[9863]: 139996330263680:error:27076072:OCSP routines:parse_http_line1:server response error:../crypto/ocsp/ocsp_ht.c:260:Code=503,Reason=Service Unavailable
Mar 12 05:42:46 cp3050 update-ocsp-all[9863]: 
Mar 12 05:42:46 cp3050 update-ocsp-all[9863]: OCSP update failed for /etc/update-ocsp.d/globalsign-2019-ecdsa-unified.conf
Mar 12 05:42:48 cp3050 update-ocsp-all[9863]: Traceback (most recent call last):
Mar 12 05:42:48 cp3050 update-ocsp-all[9863]:   File "/usr/local/sbin/update-ocsp", line 290, in <module>
Mar 12 05:42:48 cp3050 update-ocsp-all[9863]:     main()
Mar 12 05:42:48 cp3050 update-ocsp-all[9863]:   File "/usr/local/sbin/update-ocsp", line 283, in main
Mar 12 05:42:48 cp3050 update-ocsp-all[9863]:     certs_fetch_ocsp(out_tempfile, args)
Mar 12 05:42:48 cp3050 update-ocsp-all[9863]:   File "/usr/local/sbin/update-ocsp", line 208, in certs_fetch_ocsp
Mar 12 05:42:48 cp3050 update-ocsp-all[9863]:     (ocsp_text, ocsp_err) = check_output_errtext(cmd)
Mar 12 05:42:48 cp3050 update-ocsp-all[9863]:   File "/usr/local/sbin/update-ocsp", line 101, in check_output_errtext
Mar 12 05:42:48 cp3050 update-ocsp-all[9863]:     (" ".join(args), p.returncode, p_err))
Mar 12 05:42:48 cp3050 update-ocsp-all[9863]: Exception: Command openssl ocsp -resp_text -respout /var/cache/ocsp/update-ocsp-uWwQjw.tmp/globalsign-2019-rsa-unified.ocsp -issuer /etc/ssl/certs/036624bb.0 -verify_other /etc/ssl/certs/036624bb.0 -path http://ocsp.globalsign.com/gsrsaovsslca2018 -host webproxy.esams.wmnet:8080 -cert /etc/ssl/localcerts/globalsign-2019-rsa-unified.crt failed with exit code 1, stderr:
Mar 12 05:42:48 cp3050 update-ocsp-all[9863]: Error querying OCSP responder
Mar 12 05:42:48 cp3050 update-ocsp-all[9863]: 140218955576448:error:27076072:OCSP routines:parse_http_line1:server response error:../crypto/ocsp/ocsp_ht.c:260:Code=503,Reason=Service Unavailable
Mar 12 05:42:48 cp3050 update-ocsp-all[9863]: 
Mar 12 05:42:48 cp3050 update-ocsp-all[9863]: OCSP update failed for /etc/update-ocsp.d/globalsign-2019-rsa-unified.conf
CDanis triaged this task as High priority.Mar 13 2020, 4:49 AM

This seems to be triggered by the outage reported by globalsign in https://www.globalsign.com/en/status:

Updated 12 March 2020, 5:25 pm EDT

We are still working on recovery measures for the outages reported yesterday. We will share and publish an incident report as soon as we have completed our investigation.

We deeply apologize for any inconvenience this may cause to our customers. Please do not hesitate to contact us with any questions.

Impacted services:

-    Certificate application and issue
-    Certificate revocation procedure
-    Confirmation of certificate revocation information by OCSP / CRL is intermittent
-    Time stamping

I'm gonna trigger OCSP response updates in esams & eqsin first, this should get rid of the issue

Mentioned in SAL (#wikimedia-operations) [2020-03-13T06:05:45Z] <vgutierrez> triggering OCSP response updates in esams - T247584

Mentioned in SAL (#wikimedia-operations) [2020-03-13T06:12:45Z] <vgutierrez> triggering OCSP response updates in eqsin - T247584

Mentioned in SAL (#wikimedia-operations) [2020-03-13T06:16:17Z] <vgutierrez> triggering OCSP response updates in eqiad,codfw and ulsfo - T247584

Vgutierrez claimed this task.

Change 579459 abandoned by Vgutierrez:
switch esams & eqsin to lets-encrypt; globalsign OCSP unhappy

https://gerrit.wikimedia.org/r/579459

for future reference, OCSP response update can be triggered like this:

sudo -i cumin -b1 'A:cp-eqiad' "/usr/local/sbin/update-ocsp-all 2>&1 | logger -t update-ocsp-all"