From the Logstash mw-cron dashboard, we can see the blameStartupRegistry.php maintenance script invocation has been failing since Tue Nov 4, as follows:
Script '/srv/mediawiki/php-1.46.0-wmf.1/extensions/WikimediaMaintenance/blameStartupRegistry.php' not found (tried path '/srv/mediawiki/php-1.46.0-wmf.1/extensions/WikimediaMaintenance/blameStartupRegistry.php' and class '/srv/mediawiki/php-1\46\0-wmf\1/extensions/WikimediaMaintenance/blameStartupRegistry\php').
Running it manually via mwscript on the deployment serve shows that it does return an error code:
[22:02 UTC] krinkle at deploy1003.eqiad.wmnet $ mwscript extensions/WikimediaMaintenance/blameStartupRegistry.php testwiki [22:02 UTC] krinkle at deploy1003.eqiad.wmnet (exit=1) $ echo $? 1
We normally get tasks like T404730: MediaWiki periodic job startupregistrystats failed when a job like this fails.
For those curious about this specific instance (not the subject of this task):
- Caused by these patches on Oct 29:
- Went out on Tue 4 Nov with the train: https://sal.toolforge.org/production?p=0&q=%221.46.0-wmf.1%22&d=
- Fixed on Sat 8 Nov by Taavi shortly after I reported this breakage on IRC (mediawiki-core): https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaMaintenance/+/1203232
Other information
- The data in Grafana showed no gap or alert either, due to an unresolved Graphite>Prometheus regression: T394956: Restore support for gauge metrics from MediaWiki PHP (post-Prometheus migration)