Page MenuHomePhabricator

MediaWiki periodic job update-flaggedrev-stats failed
Closed, ResolvedPublicPRODUCTION ERROR

Description

Common information

  • alertname: MediaWikiCronJobFailed
  • label_cronjob: update-flaggedrev-stats
  • label_team: flaggedrevs
  • prometheus: k8s
  • severity: task
  • site: eqiad
  • source: prometheus
  • team: flaggedrevs

Firing alerts


Event Timeline

Restricted Application changed the subtype of this task from "Task" to "Production Error". · View Herald TranscriptApr 5 2026, 12:15 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Can we get a bit more information? The task description, as-is, is usable only for people who have access to Logstash and/or Prometheus. (I know it’s hard to write a bot like @phaultfinder so that it delivers useful information while also making sure nothing confidential gets leaked, but then the tasks should be assigned to a group/person that amends the descriptions – and the auto-generated descriptions should make this clear.)

ServiceOps should hopefully be able to fetch the stack trace & error time (I believe) :)
(T417020 & T341555#10760093 onwards might have a little bit more context fwiw)

The nearest (timestamp wise) long entry I found yielded a temp network problem. I am not aware how long it takes for @phaultfinder to create a task to be absolutely sure this is the one

image.png (880×2 px, 379 KB)

FWICS that logstash entry is dated 2026-04-05 00:08:54, and this task was filed on 2026-04-05 00:15. In my mind (admittedly based solely on what I remember from anecdotal experience in doing things with these automatically-filed tasks), I would therefore feel comfortable assuming that this task was filed about that error.

attempting to copy down some of the errors in that screenshot (for future searchability)
dewikiquote Fatal error: Uncaught MediaWiki\Config\ConfigException: Failed to load configuration from etcd: (curl error: 28) Timeout was reached in /srv/mediawiki/php-1.46.0-wmf.22/includes/Config/EtcdConfig.php:207
dewikiquote Warning: EtcdConfig failed to fetch data: (curl error: 28) Timeout was reached in /srv/mediawiki/php-1.46.0-wmf.22/includes/Config/EtcdConfig.php on line 180
dewikiquote Warning: dns_get_record(): A temporary server error occurred. in /srv/mediawiki/php-1.46.0-wmf.22/includes/libs/DnsSrvDiscoverer.php on line 55
dewikiquote Warning: EtcdConfig failed to fetch data: (curl error: 6) Couldn't resolve host name in /srv/mediawiki/php-1.46.0-wmf.22/includes/Config/EtcdConfig.php on line 180

IIUC this is tracked as T346971: Uncaught ConfigException: Failed to load configuration from etcd. This is something that seems to occur for mw-cron jobs occasionally: https://phabricator.wikimedia.org/maniphest/query/JPdsxxgWXcKR/#R includes a (possibly-incomplete) list of other cron-job failure tasks containing a similar error.


If I'm reading this line in the mw-cron definition correctly, it seems like this job runs every day. So I guess all that might be left to do here is ServiceOps deleting the failed job (to prevent another task for it from being filed), and then resolving this task as having been a transient error.

jijiki claimed this task.

failed jobs have been deleted, closing this too for T422486