Page MenuHomePhabricator

Inefficient queries to groupstats table in Translate
Open, HighPublic

Description

From https://logstash.wikimedia.org/goto/7c69a7e35b06b42d80485ae5b1771520

Expectation (readQueryRows <=) 10000 by ApiMain::setRequestExpectations not met (actual: 10432):
query-m: SELECT * FROM `translate_groupstats` WHERE tgs_lang = 'X'  [TRX#1e7c07]
from /srv/mediawiki/php-1.38.0-wmf.7/includes/libs/rdbms/TransactionProfiler.php(444)
#0 /srv/mediawiki/php-1.38.0-wmf.7/includes/libs/rdbms/TransactionProfiler.php(272): Wikimedia\Rdbms\TransactionProfiler->reportExpectationViolated(string, Wikimedia\Rdbms\GeneralizedSql, integer)
#1 /srv/mediawiki/php-1.38.0-wmf.7/includes/libs/rdbms/database/Database.php(1516): Wikimedia\Rdbms\TransactionProfiler->recordQueryCompletion(Wikimedia\Rdbms\GeneralizedSql, double, boolean, integer)
#2 /srv/mediawiki/php-1.38.0-wmf.7/includes/libs/rdbms/database/Database.php(1398): Wikimedia\Rdbms\Database->executeQueryAttempt(string, string, boolean, string, integer)
#3 /srv/mediawiki/php-1.38.0-wmf.7/includes/libs/rdbms/database/Database.php(1323): Wikimedia\Rdbms\Database->executeQuery(string, string, integer)
#4 /srv/mediawiki/php-1.38.0-wmf.7/includes/libs/rdbms/database/Database.php(2033): Wikimedia\Rdbms\Database->query(string, string, integer)
#5 /srv/mediawiki/php-1.38.0-wmf.7/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->select(string, string, array, string)
#6 /srv/mediawiki/php-1.38.0-wmf.7/includes/libs/rdbms/database/DBConnRef.php(324): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#7 /srv/mediawiki/php-1.38.0-wmf.7/extensions/Translate/utils/MessageGroupStats.php(420): Wikimedia\Rdbms\DBConnRef->select(string, string, array, string)
#8 /srv/mediawiki/php-1.38.0-wmf.7/extensions/Translate/utils/MessageGroupStats.php(335): MessageGroupStats::selectRowsIdLang(NULL, array, integer)
#9 /srv/mediawiki/php-1.38.0-wmf.7/extensions/Translate/utils/MessageGroupStats.php(117): MessageGroupStats::forLanguageInternal(string, array, integer)
#10 /srv/mediawiki/php-1.38.0-wmf.7/extensions/Translate/api/ApiQueryLanguageStats.php(35): MessageGroupStats::forLanguage(string, integer)
#11 /srv/mediawiki/php-1.38.0-wmf.7/extensions/Translate/api/ApiStatsQuery.php(41): ApiQueryLanguageStats->loadStatistics(string, integer)
#12 /srv/mediawiki/php-1.38.0-wmf.7/includes/api/ApiQuery.php(629): ApiStatsQuery->execute()
#13 /srv/mediawiki/php-1.38.0-wmf.7/includes/api/ApiMain.php(1888): ApiQuery->execute()
#14 /srv/mediawiki/php-1.38.0-wmf.7/includes/api/ApiMain.php(867): ApiMain->executeAction()
#15 /srv/mediawiki/php-1.38.0-wmf.7/includes/api/ApiMain.php(838): ApiMain->executeActionWithErrorHandling()
#16 /srv/mediawiki/php-1.38.0-wmf.7/api.php(90): ApiMain->execute()
#17 /srv/mediawiki/php-1.38.0-wmf.7/api.php(45): wfApiMain()
#18 /srv/mediawiki/w/api.php(3): require(string)
#19 {main}

Event Timeline

Change 737748 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/Translate@master] Add group filtering to languagestats

https://gerrit.wikimedia.org/r/737748

Nikerabbit subscribed.

Fixing this properly requires a redesign our Special:LanguageStats and action=query&meta=languagestats api. That is considerable amount of work. There are a few cases where we don't need statistics for all the groups, which provides low hanging fruit for optimization.

One short-term alternative could be caching the values for an hour in WAN cache. Would that work?

Is the query so slow that it causes issues? Per my understanding 10k rows is an arbitrary limit, and this simple query should be fast even when exceeding that limit.

I'm sure people would complain about outdated statistics very quickly, as they want to see groups that are completely translated to show as such.

Is the query so slow that it causes issues? Per my understanding 10k rows is an arbitrary limit, and this simple query should be fast even when exceeding that limit.

Yes and no, It's currently basically overwhelms the report atm (possibly majority of the current logs is this) so finding other problematic queries is harder now. There are ways around it but I like to keep these logs clean.

The other problem is not the db being slow but the traffic caused by this can cause issues, slow down replication, saturate network, etc. etc.

I'm sure people would complain about outdated statistics very quickly, as they want to see groups that are completely translated to show as such.

Yeah.

The other problem is not the db being slow but the traffic caused by this can cause issues, slow down replication, saturate network, etc. etc.

My napkin estimate is 20 MB per query, and it's somewhat frequent, but not so frequent that I think there is imminent threat.

Personally I am more worried about T292143: Inefficient query in TranslatablePage::getTranslationPages. Fortunately it seems to have a relatively simple fix.

I will be advocating for the aforementioned refactorings to address this task, but it will take time to fully address this issue. I assume this will be like T204026: DBPerformance warning "Query returned XXXX rows: query: SELECT * FROM `translate_metadata`" which took months to fix in an incremental fashion.

My initial thoughts for an incremental plan:

  • Address the common case of Special:Translate when group selector is not opened (just single statsbar)
  • Hot-fix message group selector. We could add config to disable showing stats for it on meta.
  • Address languagestats API: enforce result limits and use query-continue. Unfortunately this requires adding paging support for MessageGroupStats class, which is already complicated.
  • Potentially make message group selector to use query-continue. This just makes the queries smaller, but still transfers same amount of data.
  • Fix message group selector to not show all groups by default. We can use the three structure to show only top-level groups and load expansions on demand.
  • Fix Special:LanguageStats to not show all groups by default. We can use the three structure to show only top-level groups and load expansions on demand.

Even two last ones may not be enough in the long term, and we may need to add paging support for the message group selector and Special:LanguageStats.

Sounds good to me, I leave it to you and the team to decide how they want to get there as long as we reduce queries.

Nikerabbit renamed this task from Translate extension has suboptimal query to Inefficient queries to groupstats table in Translate.Jan 4 2022, 9:19 AM

Hi, I just want to mention that this is the leading cause of rows > 10000 error in production by two orders of magnitude in number of occurrences: https://logstash.wikimedia.org/goto/c7bc0e733e88f3f90f19e9ccac3bd497

Even something to reduce number of occurrences (e.g. lazy loading the 10k rows until the user click on groups list) would be greatly appreciated. Let me know if I can help on anything.

Change 737748 abandoned by Nikerabbit:

[mediawiki/extensions/Translate@master] Add group filtering to languagestats

Reason:

Work continues in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Translate/+/766312

https://gerrit.wikimedia.org/r/737748