In January 2017 we started using certspotter (T155807: Monitor Certificate Transparency (CT) logs) to monitor CT logs for things being issued for our domains.
At some point, CT servers started failing a lot and this generated cronspam (T162327: certspotter on einsteinium has issues talking to external, T159137: certspotter: Error retrieving STH from log).
Eventually the cron was disabled in https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/428367/
Unfortunately the list of CT servers was hardcoded. Newer certspotter versions have apparently fixed this, so we should update to a new version that doesn't have the issue and re-enable.
(opened per discussion in #wikimedia-traffic earlier)
Description
Details
Project | Branch | Lines +/- | Subject | |
---|---|---|---|---|
operations/puppet | production | +0 -3 | Revert "certspotter: temporarily disable cron job" |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T204994 Integrate certspotter with certcentral to avoid certspotter notifying us on legitimate certs generated by our certcentral boxes | |||
Open | None | T204993 Update certspotter |
Event Timeline
Adding the Debian maintainer :-) This seems fixed in 0.9-1 so updating stretch-backports to 0.9 could fix this.
Actually it looks like it wasn't in stretch to stretch-backports has highest priority anyway. So the host just needs the package upgraded..?
einstenium and tegmen still run jessie and I didn't build a version for jessie-wikimedia. I believe they're being migrated to stretch as we speak, so maybe we should just wait for that.
T202782: upgrade icinga server to stretch and replace einsteinium for einsteinium, not sure about tegmen
The Icinga servers in production are now running 0.9-1~bpo9+1, but the Cron job still needs to be re-instated.
Change 475453 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] Revert "certspotter: temporarily disable cron job"
What is the status of this nowadays? I ran across it in a different matter while looking for absented crons and found the TODO and link in code that points over here and to the pending patch above. It's been some time since November 2018. Can we just re-enable it or is there more to it?
Change 475453 had a related patch set uploaded (by Dzahn; owner: Alex Monk):
[operations/puppet@production] Revert "certspotter: temporarily disable cron job"
Change 475453 merged by Dzahn:
[operations/puppet@production] Revert "certspotter: temporarily disable cron job"
on icinga1001, alert1001: crons reactivated:
Notice: /Stage[main]/Certspotter/Cron[certspotter]/ensure: created
What I know here is:
- meanwhile icinga servers are on buster
- reviewers approved reverting the "disable cron job"
- I merged it, crons are enabled again and announced it on IRC
- on icinga1001 there is certspotter 0.9-1~bpo9+1
- this ticket is from 2018
So based on that I am now assuming we can call it resolved.
After seeing quite a bit of cronspam from this, I reverted the change again and reopening this.
The list of servers needs to be updated it seems. There are "misbehaving" and non-existing servers. f.e.:
Get https://ct.ws.symantec.com/ct/v1/get-sth: dial tcp: lookup ct.ws.symantec.com on 10.3.0.1:53: server misbehaving Get https://ctlog-gen2.api.venafi.com/ct/v1/get-sth: dial tcp: lookup ctlog-gen2.api.venafi.com on 10.3.0.1:53: server misbehaving Get https://ct2.digicert-ct.com/log/ct/v1/get-sth: dial tcp: lookup ct2.digicert-ct.com on 10.3.0.1:53: no such host
see details in root mail from icinga1001 of today, Nov 18th 2020.