Page MenuHomePhabricator

GrowthExperiments\Maintenance\UpdateMenteeData: Cannot execute query from GrowthExperiments\Maintenance\UpdateMenteeData while transaction status is ERROR
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   Wikimedia\Rdbms\DBTransactionError: GrowthExperiments\Maintenance\UpdateMenteeData: Commit failed on server(s) db1118: Cannot execute query from GrowthExperiments\Maintenance\UpdateMenteeData while transaction status is ERROR
exception.trace
from /srv/mediawiki/php-1.38.0-wmf.7/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1850)
#0 /srv/mediawiki/php-1.38.0-wmf.7/includes/libs/rdbms/lbfactory/LBFactory.php(249): Wikimedia\Rdbms\LoadBalancer->commitPrimaryChanges(string, integer)
#1 /srv/mediawiki/php-1.38.0-wmf.7/includes/libs/rdbms/lbfactory/LBFactoryMulti.php(236): Wikimedia\Rdbms\LBFactory::Wikimedia\Rdbms\{closure}(Wikimedia\Rdbms\LoadBalancer, string, array)
#2 /srv/mediawiki/php-1.38.0-wmf.7/includes/libs/rdbms/lbfactory/LBFactory.php(251): Wikimedia\Rdbms\LBFactoryMulti->forEachLB(Closure, array)
#3 /srv/mediawiki/php-1.38.0-wmf.7/includes/libs/rdbms/lbfactory/LBFactory.php(319): Wikimedia\Rdbms\LBFactory->forEachLBCallMethod(string, array)
#4 /srv/mediawiki/php-1.38.0-wmf.7/maintenance/includes/Maintenance.php(1248): Wikimedia\Rdbms\LBFactory->commitPrimaryChanges(string)
#5 /srv/mediawiki/php-1.38.0-wmf.7/maintenance/doMaintenance.php(130): Maintenance->shutdown()
#6 /srv/mediawiki/php-1.38.0-wmf.7/extensions/GrowthExperiments/maintenance/updateMenteeData.php(189): require_once(string)
#7 /srv/mediawiki/multiversion/MWScript.php(116): require_once(string)
#8 {main}
Notes

On wmf.7 there were four errors (the last timestamp - Nov 9, 2021 @ 00:19:18.962).

Event Timeline

Etonkovidova renamed this task from Wikimedia\Rdbms\DBTransactionError: GrowthExperiments\Maintenance\UpdateMenteeData: Commit failed on server(s) db1118: Cannot execute query from GrowthExperiments\Maintenance\UpdateMenteeData while transaction status is ERROR to GrowthExperiments\Maintenance\UpdateMenteeData: Commit failed on server(s) db1118: Cannot execute query from GrowthExperiments\Maintenance\UpdateMenteeData while transaction status is ERROR.Tue, Nov 9, 1:33 AM
Etonkovidova renamed this task from GrowthExperiments\Maintenance\UpdateMenteeData: Commit failed on server(s) db1118: Cannot execute query from GrowthExperiments\Maintenance\UpdateMenteeData while transaction status is ERROR to GrowthExperiments\Maintenance\UpdateMenteeData: Cannot execute query from GrowthExperiments\Maintenance\UpdateMenteeData while transaction status is ERROR.
Urbanecm_WMF triaged this task as High priority.

This definitely shouldn't happen. I'll investigate. Thanks for reporting!

Looking at the detailed job logs (mwmaint1002:/var/log/mediawiki):

Logs of enwikis updateMenteeData
Nov  9 00:15:00 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s1[4491]: -----------------------------------------------------------------
Nov  9 00:15:00 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s1[4491]: enwiki
Nov  9 00:15:00 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s1[4491]: -----------------------------------------------------------------
Nov  9 00:19:18 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s1[4491]: MWException from line 3271 of /srv/mediawiki/php-1.38.0-wmf.7/includes/user/User.php: CAS update failed on user_touched. The version of the user to be saved is older than the current version.
Nov  9 00:19:18 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s1[4491]: #0 /srv/mediawiki/php-1.38.0-wmf.7/includes/libs/rdbms/database/Database.php(4803): User->{closure}(Object(Wikimedia\Rdbms\DatabaseMysqli), 'User::saveSetti...')
Nov  9 00:19:18 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s1[4491]: #1 /srv/mediawiki/php-1.38.0-wmf.7/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->doAtomicSection('User::saveSetti...', Object(Closure))
Nov  9 00:19:18 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s1[4491]: #2 /srv/mediawiki/php-1.38.0-wmf.7/includes/libs/rdbms/database/DBConnRef.php(668): Wikimedia\Rdbms\DBConnRef->__call('doAtomicSection', Array)
Nov  9 00:19:18 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s1[4491]: #3 /srv/mediawiki/php-1.38.0-wmf.7/includes/user/User.php(3283): Wikimedia\Rdbms\DBConnRef->doAtomicSection('User::saveSetti...', Object(Closure))
Nov  9 00:19:18 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s1[4491]: #4 /srv/mediawiki/php-1.38.0-wmf.7/extensions/GrowthExperiments/includes/MentorDashboard/MenteeOverview/MenteeOverviewDataUpdater.php(179): User->saveSettings()
Nov  9 00:19:18 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s1[4491]: #5 /srv/mediawiki/php-1.38.0-wmf.7/extensions/GrowthExperiments/maintenance/updateMenteeData.php(123): GrowthExperiments\MentorDashboard\MenteeOverview\MenteeOverviewDataUpdater->updateDataForMentor(Object(User))
Nov  9 00:19:18 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s1[4491]: #6 /srv/mediawiki/php-1.38.0-wmf.7/maintenance/doMaintenance.php(108): GrowthExperiments\Maintenance\UpdateMenteeData->execute()
Nov  9 00:19:18 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s1[4491]: #7 /srv/mediawiki/php-1.38.0-wmf.7/extensions/GrowthExperiments/maintenance/updateMenteeData.php(189): require_once('/srv/mediawiki/...')
Nov  9 00:19:18 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s1[4491]: #8 /srv/mediawiki/multiversion/MWScript.php(116): require_once('/srv/mediawiki/...')
Nov  9 00:19:18 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s1[4491]: #9 {main}
Nov  9 00:19:18 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s1[4491]: [249abec38982fd415b376278] [no req]   Wikimedia\Rdbms\DBTransactionError: GrowthExperiments\Maintenance\UpdateMenteeData: Commit failed on server(s) db1118: Cannot execute query from GrowthExperiments\Maintenance\UpdateMenteeData while transaction status is ERROR
[...]

I also see things like:

Oct 16 04:15:00 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s1[26942]: -----------------------------------------------------------------
Oct 16 04:15:00 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s1[26942]: enwiki
Oct 16 04:15:00 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s1[26942]: -----------------------------------------------------------------
Oct 20 06:35:14 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s1[26942]: Wikimedia\Rdbms\DBQueryError from line 1809 of /srv/mediawiki/php-1.38.0-wmf.4/includes/libs/rdbms/database/Database.php: Error 1053: Server shutdown in progress (db1106)
Oct 20 06:35:14 mwmaint1002 mediawiki_job_growthexperiments-updateMenteeData-s1[26942]: Function: GrowthExperiments\MentorDashboard\MenteeOverview\UncachedMenteeOverviewDataProvider::getEditCountsForUsers

but that's from Oct, and apparently a DB server was shut down during updateMenteeData.php run, which is fine. The CAS validation failure is a real problem though (and as a consequence causes the error you reported; MW refuses to continue with writes, as one of them failed), and it happens quite frequently. That issue is logged as a warning in Logstash: https://logstash.wikimedia.org/goto/3b788816daf227330a7fb1ad0e43575c.

Since 1.38, UserOptionsManager::saveOptions is available (it was internal before that). This method appears to only save preferences it changed (which is what we want), while User::saveSettings saves everything (including preferences).

It appears switching to UserOptionsManager::saveOptions will fix this.

Change 737623 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@master] MenteeOverviewDataUpdater: Use UserOptionsManager::saveOptions

https://gerrit.wikimedia.org/r/737623

This seems like bad error handling somewhere - when the transaction switches to error state, an exception is thrown, how can we make more queries after that? We should either rollback or abort (probably just abort since application code doesn't own the transaction).

The CAS error is basically an edit conflict for user preferences. Using saveOptions will replace that with silent overwriting of changes. I suppose for this specific use case that's quite fine.

Change 737623 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] MenteeOverviewDataUpdater: Use UserOptionsManager::saveOptions

https://gerrit.wikimedia.org/r/737623

[...]
The CAS error is basically an edit conflict for user preferences. Using saveOptions will replace that with silent overwriting of changes. I suppose for this specific use case that's quite fine.

AFAICS it should not overwrite everything; only the user preference we explicitly changed (which is why I feel it's fine).

Change 738626 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@wmf/1.38.0-wmf.7] MenteeOverviewDataUpdater: Use UserOptionsManager::saveOptions

https://gerrit.wikimedia.org/r/738626

Change 738626 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.38.0-wmf.7] MenteeOverviewDataUpdater: Use UserOptionsManager::saveOptions

https://gerrit.wikimedia.org/r/738626

Mentioned in SAL (#wikimedia-operations) [2021-11-15T10:35:27Z] <urbanecm@deploy1002> Synchronized php-1.38.0-wmf.7/extensions/GrowthExperiments/: 05d6550218f21f89171fcb8c73230e0855cf41a4: MenteeOverviewDataUpdater: Use UserOptionsManager::saveOptions (T295339) (duration: 00m 56s)

I checked the job's job, and no similar errors happened since recently.