Page MenuHomePhabricator

certspotter on einsteinium has issues talking to external
Closed, DeclinedPublic

Description

we are getting cronspam from einsteinium with content like:

/usr/bin/certspotter: ctlog.wosign.com: 2017/04/04 09:02:31 Error retrieving STH from log: Get https://ctlog.wosign.com/ct/v1/get-sth: net/http: request canceled while waiting for connection
/usr/bin/certspotter: vega.ws.symantec.com: 2017/04/04 20:01:56 Error retrieving STH from log: Get https://vega.ws.symantec.com/ct/v1/get-sth: dial tcp 216.168.252.217:443: i/o timeout
/usr/bin/certspotter: ctlog.wosign.com: 2017/04/05 05:02:34 Error fetching consistency proof between 1088952 and 1090451 (if this error persists, it should be construed as misbehavior by the log): Get https://ctlog.wosign.com/ct/v1/get-sth-consistency?first=1088952&second=1090451: net/http: timeout awaiting response headers

Event Timeline

I can't reproduce it when manually running the command. Looks like intermittent on the remote side:

root@einsteinium:/etc# sudo -u certspotter /usr/bin/certspotter -watchlist /etc/certspotter/watchlist -state_dir /var/lib/certspotter/state
root@einsteinium:/etc#

faidon moved this task from Up next to In progress on the observability board.

This is basically an artifact of the CT logs failing to respond every now and then, which certspotter complains about. It doesn't happen often.

We can do 2>/dev/null on the cronjob, but that runs the danger of certspotter being completely broken and us never hearing about it. Ideally, we'd have a way to only output errors if they have appeared more than N times/X days, but given how rudimentary the check is right now (just a cronjob), it's not going to be easy.

I'm inclined to decline this -- if it becomes a larger problem, we can always revisit. Let me know if you disagree :)