Lately we have been having quite a few cronjob failures, with @phaultfinder opening various tasks.
On March 25th @ 15:00 we switched all jobs to eqiad, as part of T413974: Northward Datacenter Switchover (March 2026; codfw to eqiad)
On April 2nd @ 13:17 we pooled codfw back for reads
I updated the mw-cron (MediaWiki Periodic Jobs on k8s) dashboard, in a effort to extract more information. Due to multiline logs, the MediaWiki Maintenance Jobs - k8s search returns a lot of results, making filtering a bit difficult.
Culprit #1
Related to T422455: Massive increase in "EtcdConfig failed to fetch data: Timeout was reached" warnings and errors since March 17th.
{
"query": {
"regexp": {
"log.keyword": ".*curl error: 28.*Timeout was reached.*"
}
}
}Culprit #2
Additionally, there are some jobs which seem to fail due to inability to talk to the DB
{
"query": {
"regexp": {
"log.keyword": ".*Error.+2006.+MySQL server has gone away.*"
}
}
}






