In January 2017 we started using certspotter (T155807: Monitor Certificate Transparency (CT) logs) to monitor CT logs for things being issued for our domains.
At some point, CT servers started failing a lot and this generated cronspam (T162327: certspotter on einsteinium has issues talking to external, T159137: certspotter: Error retrieving STH from log).
Eventually the cron was disabled in https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/428367/
Unfortunately the list of CT servers was hardcoded. Newer certspotter versions have apparently fixed this, so we should update to a new version that doesn't have the issue and re-enable.
(opened per discussion in #wikimedia-traffic earlier)
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T204994 Integrate certspotter with certcentral to avoid certspotter notifying us on legitimate certs generated by our certcentral boxes | |||
Open | None | T204993 Update certspotter | |||
Resolved | ssingh | T303593 increase of network errors on alert1001 after certspotter has been enabled |
Event Timeline
Adding the Debian maintainer :-) This seems fixed in 0.9-1 so updating stretch-backports to 0.9 could fix this.
Actually it looks like it wasn't in stretch to stretch-backports has highest priority anyway. So the host just needs the package upgraded..?
einstenium and tegmen still run jessie and I didn't build a version for jessie-wikimedia. I believe they're being migrated to stretch as we speak, so maybe we should just wait for that.
T202782: upgrade icinga server to stretch and replace einsteinium for einsteinium, not sure about tegmen
The Icinga servers in production are now running 0.9-1~bpo9+1, but the Cron job still needs to be re-instated.
Change 475453 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] Revert "certspotter: temporarily disable cron job"
What is the status of this nowadays? I ran across it in a different matter while looking for absented crons and found the TODO and link in code that points over here and to the pending patch above. It's been some time since November 2018. Can we just re-enable it or is there more to it?
Change 475453 had a related patch set uploaded (by Dzahn; owner: Alex Monk):
[operations/puppet@production] Revert "certspotter: temporarily disable cron job"
Change 475453 merged by Dzahn:
[operations/puppet@production] Revert "certspotter: temporarily disable cron job"
on icinga1001, alert1001: crons reactivated:
Notice: /Stage[main]/Certspotter/Cron[certspotter]/ensure: created
What I know here is:
- meanwhile icinga servers are on buster
- reviewers approved reverting the "disable cron job"
- I merged it, crons are enabled again and announced it on IRC
- on icinga1001 there is certspotter 0.9-1~bpo9+1
- this ticket is from 2018
So based on that I am now assuming we can call it resolved.
After seeing quite a bit of cronspam from this, I reverted the change again and reopening this.
The list of servers needs to be updated it seems. There are "misbehaving" and non-existing servers. f.e.:
Get https://ct.ws.symantec.com/ct/v1/get-sth: dial tcp: lookup ct.ws.symantec.com on 10.3.0.1:53: server misbehaving Get https://ctlog-gen2.api.venafi.com/ct/v1/get-sth: dial tcp: lookup ctlog-gen2.api.venafi.com on 10.3.0.1:53: server misbehaving Get https://ct2.digicert-ct.com/log/ct/v1/get-sth: dial tcp: lookup ct2.digicert-ct.com on 10.3.0.1:53: no such host
see details in root mail from icinga1001 of today, Nov 18th 2020.
The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all such tickets that haven't been updated in 6 months or more. This does not imply any human judgement about the validity or importance of the task, and is simply the first step in a larger task cleanup effort. Further manual triage and/or requests for updates will happen this month for all such tickets. For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!
Change 768065 had a related patch set uploaded (by Ssingh; author: Ssingh):
[operations/puppet@production] certspotter: update package and replace cron with systemd timer
Mentioned in SAL (#wikimedia-operations) [2022-03-10T15:33:23Z] <sukhe> upload certspotter 0.10-1wm1 to apt.wm.o - T204993
Change 768065 merged by Ssingh:
[operations/puppet@production] certspotter: update package and replace cron with systemd timer
Change 776217 had a related patch set uploaded (by Ssingh; author: Ssingh):
[operations/puppet@production] certspotter: switch to a local CT logs list
Change 776217 merged by Ssingh:
[operations/puppet@production] certspotter: switch to a local CT logs list
0.15 has been uploaded to Debian and certspotter is now a proper daemon:
https://tracker.debian.org/news/1419591/accepted-certspotter-0150-1-source-into-unstable/