Page MenuHomePhabricator

certspotter: Error retrieving STH from log
Closed, ResolvedPublic

Description

certspotter seems to occasionally have troubles fetching https://ctlog.wosign.com/ct/v1/get-sth.

/usr/bin/certspotter: ctlog.wosign.com: 2017/02/19 04:02:13 Error retrieving STH from log: Get https://ctlog.wosign.com/ct/v1/get-sth: net/http: request canceled while waiting for connection
/usr/bin/certspotter: ct1.digicert-ct.com/log: 2017/02/19 16:06:43 Error retrieving STH from log: Get https://ct1.digicert-ct.com/log/ct/v1/get-sth: net/http: request canceled while waiting for connection
/usr/bin/certspotter: ctlog.wosign.com: 2017/02/25 19:06:20 Error retrieving STH from log: Get https://ctlog.wosign.com/ct/v1/get-sth: net/http: request canceled while waiting for connection
/usr/bin/certspotter: ctlog.wosign.com: 2017/02/26 21:02:11 Error retrieving STH from log: Get https://ctlog.wosign.com/ct/v1/get-sth: net/http: request canceled while waiting for connection

Event Timeline

Yeah, WoSign's CT server seems to be occasionally flaky, I saw someone else complaining about this somewhere. Not sure what we can do about that :/

Since a couple of days both einsteinium and tegmen are spamming root@ every hour with certspotter errors, this time seems that the DigiCert service is responding 400 for the check requests:

/usr/bin/certspotter: ct1.digicert-ct.com/log: 2017/04/09 21:01:01 Error fetching consistency proof between 1523926 and 1524028 (if this error persists, it should be construed as misbehavior by the log): GET https://ct1.digicert-ct.com/log/ct/v1/get-sth-consistency?first=1523926&second=1524028: 400 BAD REQUEST ()

It's probably just an issue on their side that hopefully will be fixed on Monday. An option could be to fallback to the mirrors hosted at the https://www.certificate-transparency.org/ project, like https://digicert.mirror.certificate-transparency.org/ct/v1/get-sth if the primary fails, but I'm not sure if this is supported by certspotter.

Was this resolved or are we still getting failures here?

We get occasional rare failures depending on the availability of the CT log servers. I don't see a way around this unless we make our cronjobs quite a bit more sophisticated (e.g. ignore transient errors but complain when we get more than X number of errors for N hours).

BBlack claimed this task.

Ok I'm gonna say it's not a pressing issue for now then. To revisit the next time it really bothers us!