Page MenuHomePhabricator

PHP Notice: Undefined offset in rdbms/loadbalancer/LoadBalancer.php
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
labels.normalized_message
[{reqId}] {exception_url}   PHP Notice: Undefined offset: 10
error.stack_trace
from /srv/mediawiki/php-1.42.0-wmf.25/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1849)
#0 /srv/mediawiki/php-1.42.0-wmf.25/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1849): MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/mediawiki/php-1.42.0-wmf.25/includes/api/ApiMain.php(1571): Wikimedia\Rdbms\LoadBalancer->getMaxLag()
#2 /srv/mediawiki/php-1.42.0-wmf.25/includes/api/ApiMain.php(1608): ApiMain->getMaxLag()
#3 /srv/mediawiki/php-1.42.0-wmf.25/includes/api/ApiMain.php(1934): ApiMain->checkMaxLag(Wikibase\Repo\Api\GetEntities, array)
#4 /srv/mediawiki/php-1.42.0-wmf.25/includes/api/ApiMain.php(922): ApiMain->executeAction()
#5 /srv/mediawiki/php-1.42.0-wmf.25/includes/api/ApiMain.php(893): ApiMain->executeActionWithErrorHandling()
#6 /srv/mediawiki/php-1.42.0-wmf.25/includes/api/ApiEntryPoint.php(158): ApiMain->execute()
#7 /srv/mediawiki/php-1.42.0-wmf.25/includes/MediaWikiEntryPoint.php(199): MediaWiki\Api\ApiEntryPoint->execute()
#8 /srv/mediawiki/php-1.42.0-wmf.25/api.php(44): MediaWiki\MediaWikiEntryPoint->run()
#9 /srv/mediawiki/w/api.php(3): require(string)
#10 {main}
Notes

Error started around 05:00 UTC on 2024-04-03 (Wednesday) with wmf.25 at group0 and has affected both wmf.24 and wmf.25. Change https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1004792 modified function getLagTimes which returns the indices that are currently causing the out of range error. However, the change was merged on 2024-03-22 and had already gone out in the wmf.24 train ago. All of this makes me think some other factor started triggering the error on Wednesday.

The issue affects mainly wikidate and happens in bursts, which points at some kind of data dump/snapshot.

Details

Request URL
https://www.wikidata.org/w/api.php?action=wbgetentities&format=*&maxlag=*&sites=*&titles=*

Event Timeline

The exception has actually been happening with different offset indices for at least a week, e.g. on the 2024-03-28 (Thursday) last week during the wmf.24 train. This aligns better with the deployment of https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1004792

jnuche renamed this task from PHP Notice: Undefined offset: 10 in rdbms/loadbalancer/LoadBalancer.php to PHP Notice: Undefined offset in rdbms/loadbalancer/LoadBalancer.php.Apr 4 2024, 12:29 PM

It's combination of that patch, cache being stalled and depooling replicas for maint. It should be a rather small number from what I'm seeing (3K in a day). I can take care of it once back or some easy patch can be made.

I'm on train duty this week so I'm pinging on this ticket. LoadBalancer:1849 PHP Notice: Undefined offset: 8 is the top error on in logspam-watch output.

Change #1020924 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] rdbms: Protect against stale cache in LB::getMaxLag()

https://gerrit.wikimedia.org/r/1020924

Ladsgroup triaged this task as Medium priority.
Ladsgroup added a project: DBA.
Ladsgroup moved this task from Triage to In progress on the DBA board.

Change #1020924 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Protect against stale cache in LB::getMaxLag()

https://gerrit.wikimedia.org/r/1020924

Change #1025178 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@wmf/1.43.0-wmf.2] rdbms: Protect against stale cache in LB::getMaxLag()

https://gerrit.wikimedia.org/r/1025178

Change #1025178 merged by jenkins-bot:

[mediawiki/core@wmf/1.43.0-wmf.2] rdbms: Protect against stale cache in LB::getMaxLag()

https://gerrit.wikimedia.org/r/1025178

Mentioned in SAL (#wikimedia-operations) [2024-04-29T09:18:13Z] <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:1025178|rdbms: Protect against stale cache in LB::getMaxLag() (T361824)]]

Mentioned in SAL (#wikimedia-operations) [2024-04-29T09:20:44Z] <ladsgroup@deploy1002> ladsgroup: Backport for [[gerrit:1025178|rdbms: Protect against stale cache in LB::getMaxLag() (T361824)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-04-29T09:38:28Z] <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:1025178|rdbms: Protect against stale cache in LB::getMaxLag() (T361824)]] (duration: 20m 15s)

Ladsgroup moved this task from In progress to Done on the DBA board.