Page MenuHomePhabricator

improve cron spam visibility
Open, LowPublic

Description

right now cron spam is mostly unlooked at, however there might be real issues lurking

I think one of the problems is that there is no aggregation/deduplication so the same issue ends up with many emails and thus ignored, one way to tackle this would be to provide aggregation/deduplication so the real issue is easy to spot (and possibly fix) for example experimenting with https://github.com/KMNR/raven-cron would be a good start. In short cron wouldn't be reporting output via mail but rather via sentry which would then collate based on e.g. the command line

Related Objects

StatusSubtypeAssignedTask
OpenNone
Duplicatejbond
ResolvedSLyngshede-WMF
OpenNone
Resolvedelukey
Resolvedelukey
Resolvedfaidon
OpenNone
Resolvedfaidon
Resolvedherron
Resolvedherron
ResolvedAndrew
Resolvedfgiunchedi
DeclinedNone
Resolvedjcrespo
ResolvedNone
Resolvedelukey
ResolvedNone
ResolvedDzahn
Resolved ema
ResolvedMoritzMuehlenhoff
ResolvedPRODUCTION ERROR Catrope
ResolvedNone
Resolvedelukey
DuplicateNone
ResolvedNone
ResolvedNone
ResolvedDzahn
Resolvedfaidon
DuplicateNone
Resolvedfgiunchedi
DeclinedNone
DeclinedNone
DeclinedNone
ResolvedNone
Resolvedelukey
OpenNone
ResolvedNone
Resolvedfgiunchedi
ResolvedJoe
ResolvedBBlack
Resolvedfgiunchedi
DuplicateNone
Resolvedelukey
Declinedfaidon
ResolvedMoritzMuehlenhoff
Resolvedfgiunchedi
Resolvedcolewhite
ResolvedPRODUCTION ERRORjcrespo
Resolved Gilles
Resolved Gilles
Resolvedfgiunchedi
ResolvedNone
DuplicateNone
Resolved chasemp
Resolvedjijiki
ResolvedJoe
ResolvedAndrew
Resolvedjcrespo
Resolved mmodell
ResolvedNone
Declinedaaron
Resolved Marostegui
Resolvedjbond
Resolved GTirloni
Resolvedelukey
ResolvedMoritzMuehlenhoff
Resolvedjbond
Resolvedfgiunchedi
ResolvedVolans
ResolvedArielGlenn
Resolvedaaron
ResolvedJMeybohm
Resolved dpifke
DuplicateNone
Resolvedcolewhite
Resolved chasemp
DeclinedNone
ResolvedAndrew
Resolvedfgiunchedi
ResolvedBTullis
ResolvedJelto
ResolvedVgutierrez
OpenNone
Resolvedgreg
OpenNone
OpenNone
DeclinedNone
OpenNone

Event Timeline

fgiunchedi raised the priority of this task from to Needs Triage.
fgiunchedi updated the task description. (Show Details)
fgiunchedi added a project: ops-core.
fgiunchedi changed Security from none to None.
fgiunchedi subscribed.

also note that to test this easily we could either a dummy mailbox to receive cronspam and turn them into sentry messages, this way existing mails wouldn't be touched

@Volans This seems to be what you mentioned in last monitoring meeting when you suggested an Icinga alert for cronspam.

Volans added a project: observability.
Volans added a subscriber: faidon.

@Dzahn thanks for pointing this out, I've merged in as duplicate the other task I had opened.

An option we discussed recently was to ingest mail generated by the servers into Logstash by either pulling events from a mailbox or piping off events at the mail servers. Once in ES, queries could be run and aggregated emails generated as a daily report and/or alerts generated via log alerting.

@herron has asked OIT if a bot inbox is possible.

Ingesting events from a mailbox will also help T230835. If the mx pipe option is chosen, it wouldn't be too difficult to adapt to that paradigm.

Since this ticket has been written most cron jobs have been converted to systemd timers. Maybe that made all of this obsolete? T273673

cron spam is more of a generic term here, that the spam is now coming from jobs spawned by systemd timers doesn't really change the problem :-)

But the difference is now it's not sent to root@ and instead to actual teams. I would hope that means they don't get ignored any longer which is the original problem statement in this ticket. That being said, I think root mail should also be read by someone and that is a topic actually planned for the SRE summit.