Page MenuHomePhabricator

Getting a full list of mentors timeouts for long-term mentors (Wikimedia\RequestTimeout\RequestTimeoutException: The maximum execution time of {limit} seconds was exceeded)
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   Wikimedia\RequestTimeout\RequestTimeoutException: The maximum execution time of {limit} seconds was exceeded
FrameLocationCall
from/srv/mediawiki/php-1.43.0-wmf.21/vendor/wikimedia/request-timeout/src/Detail/ExcimerTimerWrapper.php(59)
#0/srv/mediawiki/php-1.43.0-wmf.21/vendor/wikimedia/request-timeout/src/Detail/ExcimerRequestTimeout.php(34)Wikimedia\RequestTimeout\Detail\ExcimerTimerWrapper->exitCriticalSection(int)
#1/srv/mediawiki/php-1.43.0-wmf.21/vendor/wikimedia/request-timeout/src/CriticalSectionProvider.php(134)Wikimedia\RequestTimeout\Detail\ExcimerRequestTimeout->exitCriticalSection(int)
#2/srv/mediawiki/php-1.43.0-wmf.21/vendor/wikimedia/request-timeout/src/CriticalSectionScope.php(57)Wikimedia\RequestTimeout\CriticalSectionProvider->Wikimedia\RequestTimeout\{closure}(int)
#3/srv/mediawiki/php-1.43.0-wmf.21/includes/libs/rdbms/database/Database.php(3117)Wikimedia\RequestTimeout\CriticalSectionScope->exit()
#4/srv/mediawiki/php-1.43.0-wmf.21/includes/libs/rdbms/database/Database.php(719)Wikimedia\Rdbms\Database->completeCriticalSection(string, Wikimedia\RequestTimeout\CriticalSectionScope)
#5/srv/mediawiki/php-1.43.0-wmf.21/includes/libs/rdbms/database/Database.php(634)Wikimedia\Rdbms\Database->executeQuery(Wikimedia\Rdbms\Query, string, int)
#6/srv/mediawiki/php-1.43.0-wmf.21/includes/libs/rdbms/database/Database.php(1339)Wikimedia\Rdbms\Database->query(Wikimedia\Rdbms\Query, string)
#7/srv/mediawiki/php-1.43.0-wmf.21/includes/libs/rdbms/database/DBConnRef.php(127)Wikimedia\Rdbms\Database->select(array, array, array, string, array, array)
#8/srv/mediawiki/php-1.43.0-wmf.21/includes/libs/rdbms/database/DBConnRef.php(351)Wikimedia\Rdbms\DBConnRef->__call(string, array)
#9/srv/mediawiki/php-1.43.0-wmf.21/includes/libs/rdbms/querybuilder/SelectQueryBuilder.php(746)Wikimedia\Rdbms\DBConnRef->select(array, array, array, string, array, array)
#10/srv/mediawiki/php-1.43.0-wmf.21/includes/user/UserSelectQueryBuilder.php(269)Wikimedia\Rdbms\SelectQueryBuilder->fetchResultSet()
#11[internal function]MediaWiki\User\UserSelectQueryBuilder->fetchUserIdentities()
#12/srv/mediawiki/php-1.43.0-wmf.21/extensions/GrowthExperiments/includes/Mentorship/Store/DatabaseMentorStore.php(127)iterator_to_array(Generator)
#13/srv/mediawiki/php-1.43.0-wmf.21/extensions/GrowthExperiments/includes/Api/ApiQueryMentorMentee.php(50)GrowthExperiments\Mentorship\Store\DatabaseMentorStore->getMenteesByMentor(MediaWiki\User\UserIdentityValue, string)
#14/srv/mediawiki/php-1.43.0-wmf.21/includes/api/ApiQuery.php(711)GrowthExperiments\Api\ApiQueryMentorMentee->execute()
#15/srv/mediawiki/php-1.43.0-wmf.21/includes/api/ApiMain.php(1948)ApiQuery->execute()
#16/srv/mediawiki/php-1.43.0-wmf.21/includes/api/ApiMain.php(926)ApiMain->executeAction()
#17/srv/mediawiki/php-1.43.0-wmf.21/includes/api/ApiMain.php(897)ApiMain->executeActionWithErrorHandling()
#18/srv/mediawiki/php-1.43.0-wmf.21/includes/api/ApiEntryPoint.php(153)ApiMain->execute()
#19/srv/mediawiki/php-1.43.0-wmf.21/includes/MediaWikiEntryPoint.php(200)MediaWiki\Api\ApiEntryPoint->execute()
#20/srv/mediawiki/php-1.43.0-wmf.21/api.php(44)MediaWiki\MediaWikiEntryPoint->run()
#21/srv/mediawiki/w/api.php(3)require(string)
#22{main}
Impact

It is not possible to see who s mentored by a given long-term mentor.

Notes

Details

Request URL
https://en.wikipedia.org/w/api.php?action=query&format=xml&gemmmentor=Nick_Moyes&list=growthmentormentee
Related Changes in Gerrit:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
> $start = microtime(true); $res = \MediaWiki\MediaWikiServices::getInstance()->get('GrowthExperimentsMentorStore')->getMenteesByMentor(User::newFromName('Nick_Moyes'), 'primary'); $end = microtime(true);
= 1727370895.2964

> $end - $start
= 75.591993093491

>

Apparently, just getting the list of mentees takes a good 75 seconds. For another mentor, Blaze_Wolf (T330071#10165865), the situation is even worse (82 seconds):

Psy Shell v0.12.3 (PHP 7.4.33 — cli) by Justin Hileman
> $start = microtime(true); $res = \MediaWiki\MediaWikiServices::getInstance()->get('GrowthExperimentsMentorStore')->getMenteesByMentor(User::newFromName('Blaze_Wolf'), 'primary'); $end = microtime(true); $end - $start;
= 82.123007059097

>

Both times are well over what can finish within a single webrequest. It is also a significant portion of the time allowance for jobs (200 seconds), and quite possibly (one of the) root cause(s) for T354222: reassignMenteesJob is not able to finish in time when a mentor has too many mentees assigned.

Urbanecm_WMF renamed this task from For long-term mentors, getting the set of mentees timeouts (Wikimedia\RequestTimeout\RequestTimeoutException: The maximum execution time of {limit} seconds was exceeded) to Getting a full list of mentors timeouts for long-term mentors (Wikimedia\RequestTimeout\RequestTimeoutException: The maximum execution time of {limit} seconds was exceeded).Sep 26 2024, 7:14 PM
Urbanecm_WMF updated the task description. (Show Details)

A significant part of this appears to be taken by UserSelectQueryBuilder, see callgraph.

image.png (1×1 px, 239 KB)

Change #1076016 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@master] DatabaseMentorStore: Cast user IDs to integers before looking them up

https://gerrit.wikimedia.org/r/1076016

Base query used by UserSelectQueryBuilder:

I found that a significant portion of the delay is caused by typing – $ids is an array of strings, not integers. Once that is fixed, the same logic runs within few seconds – verified via mwdebug. Moving to sprint for Code Review.

@Urbanecm_WMF not that the images are not publicly visible (and also not to me), is that intentional?

I found that a significant portion of the delay is caused by typing – $ids is an array of strings, not integers. Once that is fixed, the same logic runs within few seconds

I'm curious: how did you find out that this was the cause?

@Urbanecm_WMF not that the images are not publicly visible (and also not to me), is that intentional?

Not intentional, I forgot to explicitly attach them to this task to ensure visibility. T310833 strikes yet again. Fixed, thanks for the note!

I found that a significant portion of the delay is caused by typing – $ids is an array of strings, not integers. Once that is fixed, the same logic runs within few seconds

I'm curious: how did you find out that this was the cause?

At first, I measured the duration of individual steps DatabaseMentorStore::getMenteesByMentor() does. I did that via a lot of shell.php/eval.php fiddling (T375784#10180969 shows the general pattern I used). This led me to the conclusion that the majority of the time is used by UserSelectQueryBuilder.

Then, I used eval.php with the --d 2 parameter to get the actual SQL query UserSelectQueryBuilder uses (dumped in T375784#10181095). I ran it at one of the replicas, verifying the query itself takes the most time.

Once I opened the query, I realised the IDs are strings (not integers). Since I previously worked on a performance problem with exactly this cause, this was the first thing I checked (I executed the query again, this time with numerical IDs, and it finished much faster [seconds rather than more than a minute]). If interested, more details about the other performance problem can be found in a Techblog post I wrote about it.

Change #1076016 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] DatabaseMentorStore: Cast user IDs to integers before looking them up

https://gerrit.wikimedia.org/r/1076016

Change #1076969 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@wmf/1.43.0-wmf.24] DatabaseMentorStore: Cast user IDs to integers before looking them up

https://gerrit.wikimedia.org/r/1076969

I'll probably backport this later today to all wikis, so we can observe the effect. But functionally, this should be fixed now.

Change #1076969 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.43.0-wmf.24] DatabaseMentorStore: Cast user IDs to integers before looking them up

https://gerrit.wikimedia.org/r/1076969

Mentioned in SAL (#wikimedia-operations) [2024-10-01T08:39:45Z] <urbanecm@deploy2002> Started scap sync-world: Backport for [[gerrit:1076969|DatabaseMentorStore: Cast user IDs to integers before looking them up (T375784)]]

Mentioned in SAL (#wikimedia-operations) [2024-10-01T08:46:43Z] <urbanecm@deploy2002> Finished scap sync-world: Backport for [[gerrit:1076969|DatabaseMentorStore: Cast user IDs to integers before looking them up (T375784)]] (duration: 06m 58s)

I can now successfully download Nick's mentees:

urbanecm@wmf3345 ~ % time curl -s 'https://en.wikipedia.org/w/api.php?action=query&format=xml&gemmmentor=Nick_Moyes&list=growthmentormentee' > /dev/null
curl -s  > /dev/null  0.04s user 0.04s system 0% cpu 11.435 total
urbanecm@wmf3345 ~ %

Still takes a considerable amount of time, but significantly less than a minute. Hopefully this would also help with the other mentorship-related timeouts.

Etonkovidova subscribed.

Checked for wmf.24 - no recent errors (the last timestamp Sep 30, 2024 @ 09:18:26.971 ( logstash link) . Will check it on Oct 04/2024.

Just noting that with one of the original examples resolved in another task, an example with >25000 mentees IS completing on a basic api fetch: https://en.wikipedia.org/w/api.php?action=query&format=jsonfm&list=growthmentormentee&formatversion=2&gemmmentor=Dreamyshade

Just noting that with one of the original examples resolved in another task, an example with >25000 mentees IS completing on a basic api fetch: https://en.wikipedia.org/w/api.php?action=query&format=jsonfm&list=growthmentormentee&formatversion=2&gemmmentor=Dreamyshade

Thank you, @Xaosflux! After re-checking for the error on logstash link - all seems to be normal.