Page MenuHomePhabricator

MediaWikiCronJobFailed
Open, Needs TriagePublicPRODUCTION ERROR

Description

Common information

  • alertname: MediaWikiCronJobFailed
  • label_cronjob: echo-mail-batch
  • label_team: notifications-echo
  • prometheus: k8s
  • severity: task
  • site: codfw
  • source: prometheus
  • team: notifications-echo

Firing alerts


Event Timeline

Restricted Application changed the subtype of this task from "Task" to "Production Error". · View Herald TranscriptSep 23 2025, 2:10 AM
A_smart_kitten subscribed.

Hey Growth-Team, although Notifications (Echo) looks like it's been removed from your Herald rule, are you still willing to triage cron-job failures related to the extension? Asking as you still appear as its listed maintainer, and as there's a limited amount that the majority of volunteers can do in response to these tasks themselves (at least, without asking an SRE to grab the stack trace of the failure).

(rerouting back to Growth-Team inbox, it looks like this task may have been edited rather than a new one being filed, to reflect (I believe) an error occurring with this cron-job again)

Michael added a subscriber: fgiunchedi.

(rerouting back to Growth-Team inbox, it looks like this task may have been edited rather than a new one being filed, to reflect (I believe) an error occurring with this cron-job again)

Thank you, @A_smart_kitten!

At first glance this looks like an intermittent failure, given that everything seems to be working fine now.

@fgiunchedi Under what circumstances is the bot creating/updating these tasks? I'm not seeing similar tasks for other maintenance script errors. Could we somehow adjust the output of the bot to include more specific information? Like, the exact timestamp of the failure, the wiki against which the failing script was running, a link to the specific error document on logstash, or something like that.

Could we somehow adjust the output of the bot to include more specific information? Like, the exact timestamp of the failure, the wiki against which the failing script was running, a link to the specific error document on logstash, or something like that.

xref T419229#11692934 & some discussion in T410764

(Also, FWIW, the bot also creates these sorts of tasks for at least some other mw-cron maintenance script failures -- e.g. https://phabricator.wikimedia.org/maniphest/query/yIUXnaCeVRWC/#R)