Page MenuHomePhabricator

Setup alerting for l10nupdate
Closed, DeclinedPublic


Automated deploys should have a means to alert humans if something is wrong. l10nupdate currently does not other than a "failed" message in the SAL entry. This has been a problem in the past where l10nupdate was not completing successfully for more than a week yet no one noticed nor reported the issue. It wasn't until @greg noticed via his Twitter feed (where he subscribes to @wikimediatech) at the time of the failure.

An Icinga (or otherwise) alert based on the exit status of the l10nupdate run is needed.

Event Timeline

greg raised the priority of this task from to Needs Triage.
greg updated the task description. (Show Details)
greg added a project: Deployments.
greg added subscribers: Legoktm, thcipriani, demon and 4 others.
thcipriani triaged this task as Medium priority.Aug 5 2015, 4:07 PM
thcipriani moved this task from To Triage to Next: Maintenance on the Deployments board.

What does this task mean? Can someone please expand the task description or close this task?

greg renamed this task from Setup alerts for l10nupdate to Setup alerting for l10nupdate.Mar 7 2017, 6:24 PM
greg updated the task description. (Show Details)
greg added a project: observability.

Thanks for the task description update.

We could, uhh, e-mail wikitech-l on failure? :-) Personally, it kind of feels like logging to the server admin log is (should be?) sufficient. If nobody notices a breakage for a week or longer, it seems the issue is perhaps self-prioritizing.