Page MenuHomePhabricator

Error: Call to a member function audienceCan() on null
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
labels.normalized_message
[{reqId}] {exception_url}   Error: Call to a member function audienceCan() on null
error.stack_trace
from /srv/mediawiki/php-1.43.0-wmf.13/extensions/CirrusSearch/includes/Api/QueryBuildDocument.php(66)
#0 /srv/mediawiki/php-1.43.0-wmf.13/includes/api/ApiQuery.php(707): CirrusSearch\Api\QueryBuildDocument->execute()
#1 /srv/mediawiki/php-1.43.0-wmf.13/includes/api/ApiMain.php(1953): ApiQuery->execute()
#2 /srv/mediawiki/php-1.43.0-wmf.13/includes/api/ApiMain.php(929): ApiMain->executeAction()
#3 /srv/mediawiki/php-1.43.0-wmf.13/includes/api/ApiMain.php(900): ApiMain->executeActionWithErrorHandling()
#4 /srv/mediawiki/php-1.43.0-wmf.13/includes/api/ApiEntryPoint.php(158): ApiMain->execute()
#5 /srv/mediawiki/php-1.43.0-wmf.13/includes/MediaWikiEntryPoint.php(200): MediaWiki\Api\ApiEntryPoint->execute()
#6 /srv/mediawiki/php-1.43.0-wmf.13/api.php(44): MediaWiki\MediaWikiEntryPoint->run()
#7 /srv/mediawiki/w/api.php(3): require(string)
#8 {main}
Impact
  • Top error in logspam-watch
Notes

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I saw this during a deployment earlier and looked a tiny bit into it. The code is:

foreach ( $this->getRevisionIDs() as $pageId => $revId ) {
	$rev = $revisionStore->getRevisionById( $revId );
	if ( $rev->audienceCan( $rev::DELETED_TEXT, $rev::FOR_PUBLIC ) ) {

And I think this was just due to replication lag, when this request was reading from a replica that hadn’t seen the new revision yet, and that’s why $rev was null? (The RevisionStore uses a replica by default if IDBAccessObject::READ_LATEST isn’t used, if I understand correctly.)

I saw this during a deployment earlier and looked a tiny bit into it. The code is:

foreach ( $this->getRevisionIDs() as $pageId => $revId ) {
	$rev = $revisionStore->getRevisionById( $revId );
	if ( $rev->audienceCan( $rev::DELETED_TEXT, $rev::FOR_PUBLIC ) ) {

And I think this was just due to replication lag, when this request was reading from a replica that hadn’t seen the new revision yet, and that’s why $rev was null? (The RevisionStore uses a replica by default if IDBAccessObject::READ_LATEST isn’t used, if I understand correctly.)

yes we're suspecting replag issue as well but what we don't quite get yet is that \ApiPageSet::getRevisionIDs() should return revisions that exist, could ApiBase use a different replica than the one obtained by RevisionStore?

Hm, I think ApiBase::getDB() (ApiPageSet’s $mDbSource) uses the api group, and RevisionStore uses the default group? (I’m not sure if we still use DB groups in production, but even if we don’t, that might be enough to not reuse the connection and instead give you a second randomly-selected server from the same pool?)

Just noting that this is causing quite a bit of logspam today. Since it was already an issue with prior versions, I won't block the train on it.

dcausse triaged this task as High priority.

This issue started to rise after we optimized the index latency on wikidata in T365831. As pointed out by Lucas the issue is likely caused by a race between the indexing events and mariadb replication, the code did not account for the fact that ApiBase that should identify missing revision might not use the same db replica than the one used by RevisionStore. The solution is probably to not take for granted what ApiPageSet::initFromRevIDs is doing but double check the output RevisionStore::getRevisionById and mark the revision as missing. We could potentially add an artificial delay in the stream processors to give more chances to mariadb replicas to win the race if proven necessary, but the undesirable effects of adding an extra delay vs retrying a couple queries should be pondered.

Change #1056442 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Check the output of RevisionStore::getRevisionById

https://gerrit.wikimedia.org/r/1056442

Change #1056442 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Check the output of RevisionStore::getRevisionById

https://gerrit.wikimedia.org/r/1056442

Change #1056578 had a related patch set uploaded (by Ebernhardson; author: DCausse):

[mediawiki/extensions/CirrusSearch@wmf/1.43.0-wmf.14] Check the output of RevisionStore::getRevisionById

https://gerrit.wikimedia.org/r/1056578

Change #1056580 had a related patch set uploaded (by Ebernhardson; author: DCausse):

[mediawiki/extensions/CirrusSearch@wmf/1.43.0-wmf.15] Check the output of RevisionStore::getRevisionById

https://gerrit.wikimedia.org/r/1056580

Change #1056578 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@wmf/1.43.0-wmf.14] Check the output of RevisionStore::getRevisionById

https://gerrit.wikimedia.org/r/1056578

Change #1056580 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@wmf/1.43.0-wmf.15] Check the output of RevisionStore::getRevisionById

https://gerrit.wikimedia.org/r/1056580

Mentioned in SAL (#wikimedia-operations) [2024-07-24T21:20:43Z] <ebernhardson@deploy1002> Started scap sync-world: Backport for [[gerrit:1056580|Check the output of RevisionStore::getRevisionById (T370770)]]

Mentioned in SAL (#wikimedia-operations) [2024-07-24T21:26:30Z] <ebernhardson@deploy1002> ebernhardson: Backport for [[gerrit:1056580|Check the output of RevisionStore::getRevisionById (T370770)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-07-24T21:32:50Z] <ebernhardson@deploy1002> Finished scap: Backport for [[gerrit:1056580|Check the output of RevisionStore::getRevisionById (T370770)]] (duration: 12m 07s)

Checking the logs this morning, i don't see any new logs since this was deployed. They had already reduced in volume naturally (perhaps slightly less db lag, or slightly more updater lag) without the change, but they now seem to be completely gone.