We get emails for some failures, and kibana catches others. There's an in between category of MediaWiki failures that get logged to disk and we should keep track of those. A cron job that runs every so many hours, checking dump run logs for recent exception information, would be useful. It would provide more information than the failure emails and could give advance notice of regressions.
Description
Description
Details
Details
Event Timeline
Comment Actions
Change 528995 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] look at dumps logs every so often for exceptions and report them
Comment Actions
Change 528995 merged by ArielGlenn:
[operations/puppet@production] look at dumps logs every so often for exceptions and report them
Comment Actions
Change 529356 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] dump exception checker uses python rather than bash
Comment Actions
Change 529356 merged by ArielGlenn:
[operations/puppet@production] dump exception checker uses python rather than bash