Page MenuHomePhabricator

Update certspotter
Open, MediumPublic

Description

In January 2017 we started using certspotter (T155807: Monitor Certificate Transparency (CT) logs) to monitor CT logs for things being issued for our domains.
At some point, CT servers started failing a lot and this generated cronspam (T162327: certspotter on einsteinium has issues talking to external, T159137: certspotter: Error retrieving STH from log).
Eventually the cron was disabled in https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/428367/
Unfortunately the list of CT servers was hardcoded. Newer certspotter versions have apparently fixed this, so we should update to a new version that doesn't have the issue and re-enable.
(opened per discussion in #wikimedia-traffic earlier)

Event Timeline

Krenair created this task.Sep 20 2018, 5:58 PM
Restricted Application added a project: SRE. · View Herald TranscriptSep 20 2018, 5:58 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Adding the Debian maintainer :-) This seems fixed in 0.9-1 so updating stretch-backports to 0.9 could fix this.

MoritzMuehlenhoff triaged this task as Medium priority.Sep 24 2018, 10:10 AM
ema moved this task from Triage to TLS on the Traffic board.Oct 1 2018, 9:13 AM

Adding the Debian maintainer :-) This seems fixed in 0.9-1 so updating stretch-backports to 0.9 could fix this.

This is now done :)

So now we just pin the certspotter package to release a=stretch-backports?

Krenair added a comment.EditedOct 6 2018, 5:47 PM

Actually it looks like it wasn't in stretch to stretch-backports has highest priority anyway. So the host just needs the package upgraded..?

einstenium and tegmen still run jessie and I didn't build a version for jessie-wikimedia. I believe they're being migrated to stretch as we speak, so maybe we should just wait for that.

@faidon: Is this now done?

The Icinga servers in production are now running 0.9-1~bpo9+1, but the Cron job still needs to be re-instated.

Change 475453 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] Revert "certspotter: temporarily disable cron job"

https://gerrit.wikimedia.org/r/475453

Dzahn added a comment.Oct 23 2020, 9:47 PM

What is the status of this nowadays? I ran across it in a different matter while looking for absented crons and found the TODO and link in code that points over here and to the pending patch above. It's been some time since November 2018. Can we just re-enable it or is there more to it?

Change 475453 had a related patch set uploaded (by Dzahn; owner: Alex Monk):
[operations/puppet@production] Revert "certspotter: temporarily disable cron job"

https://gerrit.wikimedia.org/r/475453

Change 475453 merged by Dzahn:
[operations/puppet@production] Revert "certspotter: temporarily disable cron job"

https://gerrit.wikimedia.org/r/475453

Dzahn added a comment.Nov 18 2020, 6:59 PM

on icinga1001, alert1001: crons reactivated:

Notice: /Stage[main]/Certspotter/Cron[certspotter]/ensure: created
Dzahn closed this task as Resolved.Nov 18 2020, 7:32 PM
Dzahn claimed this task.

What I know here is:

  • meanwhile icinga servers are on buster
  • reviewers approved reverting the "disable cron job"
  • I merged it, crons are enabled again and announced it on IRC
  • on icinga1001 there is certspotter 0.9-1~bpo9+1
  • this ticket is from 2018

So based on that I am now assuming we can call it resolved.

Dzahn reopened this task as Open.Nov 19 2020, 12:00 AM

After seeing quite a bit of cronspam from this, I reverted the change again and reopening this.

The list of servers needs to be updated it seems. There are "misbehaving" and non-existing servers. f.e.:

Get https://ct.ws.symantec.com/ct/v1/get-sth: dial tcp: lookup ct.ws.symantec.com on 10.3.0.1:53: server misbehaving

Get https://ctlog-gen2.api.venafi.com/ct/v1/get-sth: dial tcp: lookup ctlog-gen2.api.venafi.com on 10.3.0.1:53: server misbehaving

 Get https://ct2.digicert-ct.com/log/ct/v1/get-sth: dial tcp: lookup ct2.digicert-ct.com on 10.3.0.1:53: no such host

see details in root mail from icinga1001 of today, Nov 18th 2020.

Dzahn removed Dzahn as the assignee of this task.Nov 19 2020, 12:00 AM