Page MenuHomePhabricator

Expectation (readQueryRows <= 10000) not met [from TranslatablePage::isSourcePage]
Open, MediumPublicPRODUCTION ERROR

Description

Error
message
Expectation (readQueryRows <= 10000) … not met (actual: 12307) in …
SELECT rt_page FROM `revtag` JOIN `page` ON ((rt_page = page_id) AND (rt_revision = page_latest) AND rt_type IN ('X') ) GROUP BY rt_page
FrameLocationCall
from/srv/mediawiki/php-1.44.0-wmf.20/includes/libs/rdbms/TransactionProfiler.php(558)
#0
#9/srv/mediawiki/php-1.44.0-wmf.20/extensions/Translate/src/MessageGroupProcessing/RevTagStore.php(235)Wikimedia\Rdbms\SelectQueryBuilder->fetchResultSet()
#10/srv/mediawiki/php-1.44.0-wmf.20/extensions/Translate/src/PageTranslation/TranslatablePage.php(545)…\Translate\…\RevTagStore::getTranslatableBundleIds(string, string)
#11/srv/mediawiki/php-1.44.0-wmf.20/includes/libs/objectcache/WANObjectCache.php(1825)TranslatablePage::getCacheValue(string, int, array, float, array)
#12
#13/srv/mediawiki/php-1.44.0-wmf.20/extensions/Translate/src/PageTranslation/TranslatablePage.php(486)WANObjectCache->getWithSetCallback(string, int, array, array)
#14
#15/srv/mediawiki/php-1.44.0-wmf.20/extensions/Translate/src/PageTranslation/TranslatablePage.php(490)BagOStuff->getWithSetCallback(string, int, Closure, int)
#16/srv/mediawiki/php-1.44.0-wmf.20/extensions/Translate/src/PageTranslation/Hooks.php(339)TranslatablePage::isSourcePage(MediaWiki\Title\Title)
#17/srv/mediawiki/php-1.44.0-wmf.20/includes/HookContainer/HookContainer.php(155)…\Translate\…\Hooks::onBeforePageDisplay(OutputPage, SkinVector22)
#18
#19/srv/mediawiki/php-1.44.0-wmf.20/includes/Output/OutputPage.php(3190)HookRunner->onBeforePageDisplay(OutputPage, SkinVector22)
#20/srv/mediawiki/php-1.44.0-wmf.20/includes/actions/ActionEntryPoint.php(162)OutputPage->output(bool)
#21/srv/mediawiki/php-1.44.0-wmf.20/includes/MediaWikiEntryPoint.php(202)ActionEntryPoint->execute()
#22/srv/mediawiki/php-1.44.0-wmf.20/index.php(58)MediaWikiEntryPoint->run()
#23
Impact

When simply viewing https://meta.wikimedia.org/wiki/Wikimedia_servers, the Translate extension sometimes fetch 12,307 rows from the Translate extensions' revtag database table during the synchronous portion of a GET request.

This is above the generous threshold of 10K rows and thus triggers a DBPerformance warning in Logstash.

The above is a sample from when both getWithSet calls are a cache miss.

Notes

See also " Design for cache miss" at https://wikitech.wikimedia.org/wiki/MediaWiki_Engineering/Guides/Backend_performance_practices.

Depending on the usage pattern for this data, there may be a more scalable solution.

The following are my assumptions, which I am asking Language and Product Localization to confirm/reject:

  • on wikis where the Translate extension, the TranslatablePage::isSourcePage function is called on most or all page views.
  • the TranslatablePage::isSourcePage function needs this dataset on most or all such calls.
  • the TranslatablePage::isSourcePage function is generally only called once, or with only one specific page on a given pageview (i.e. it is not called in a loop with 20 different pages).
  • the TranslatablePage::isSourcePage function requires information from the revtag table. That is, any page could be considered a source page. There is nothing we can guess/compute in PHP from the Title object, PageProps, or LinkCache information. Instead, the only way to know, is to check if the page has an entry in this table.

Exploring some naive ideas:

  • Query just this one row, with a dedicated cache key.
    • Downside: on the long tail, this would usually cache-miss, and thus add 1 extra database query to each pageview, T347123. Currently the query is avoided on most requests at the cost of 1) a massive query in the critical path on miss, and 2) a ton of bandwidth and memory that loads the entire database into memory, T366455.
  • Consider storing this information as a page property instead.
    • This means it can be pre-fetched for the current page in a batch and would thus already be in memory. This data is fetched on page views regardless and thus would not cost extra. (This assumes isSourcePage is mostly called for the current page.)
  • Consider caching this information inside the ParserOutput.
    • This means when the Translate onBeforePageDisplay hook can consult that in-memory, as the page view will have already computed or retreived this from the ParserCache. (This assumes isSourcePage is mostly called for the current page.)
  • Consider a deterministic split such that only one slice is needed.
    • This is different from the ideas in T366455 which depended on a source-side split where all slices are needed by call callers, which was reverted as equal/worse than status quo. If the data is split such that the caller can be satisfied with a single slice at both cache and DB layer, then it would make a big difference. Mapping a page title or page ID to a small distributed set is fairly easy (e.g. mod 2 or mod 3 would already cut MySQL rows and network bandwidth and PHP memory by 50-60%, bias notwithstanding; or e.g. first letter of MD5 hash of title or ID for a 90% cut to 1/16th).
    • The difficulty will be in how to formulate this as performant database query on the revtag table.
    • One way to get here would be to take the query out of the critical path entirely and use a JobQueue job instead. For example, you could have a job that queries this table, where in jobs we tolerate reads upto 100K rows, and either way you could paginate here since it isn't blocking user latency. Then the job would split this into e.g. 10-20 different keys in the MainStash DB (instead of Memcached, so that it is more strongly persisted). The function could then be rewritten as LocalServerCache->getWithSet(one slice) -> get 1 pre-computed slice via MainStash. And if the MainStash row is missing, fallback to a cheap 1-row query from revtags, and queue a job to compute MainStash for next time.
  • If there is a latency-sensitive use case for answering isSourcePage of many different pages in the same web request, then it may be worth providing an batch lookup method for this purpose. Assuming that this will not be on article pageviews, it might not even warrant caching since a single batch query for 10-20 random access titles is likely fine and not a notable addition to the queries that use case already makes.

See also:

Details

Request URL
https://meta.wikimedia.org/wiki/Wikimedia_servers

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Ladsgroup subscribed.

This is not really DBA work, but I stay subscribed in case we are needed.

For context, code search link https://codesearch.wmcloud.org/search/?q=isSourcePage&files=&excludeFiles=&repos=Extension%3ATranslate

I think that a common use case is to check it for the currently viewed page, but would need an audit of callers to see other uses. Most uses seems to be in hooks.

We also have unfinished migration to fully use the translate_translatable_bundles for source of truth. We already set some properties in the ParserOutput, but I don't think that is available in most of the locations where we check this. Adding a page prop feels simple (apart from migration) and could quickly take load off the critical page view paths.

Finally, we will also have similar issue with MessageBundles... it's not apparent yet because we only have one or two.

Nikerabbit renamed this task from Expectation (readQueryRows <= 10000) not by to Expectation (readQueryRows <= 10000) not met [from TranslatablePage::isSourcePage].Mar 13 2025, 11:43 AM

I did an audit for a few levels of callers. Almost all of them only check for one title (and often for the current page view title, but not always). A few cases check multiple titles:

  • Hooks for filtering out titles on ReplaceText. Preloading easily possible.
  • LinksUpdateComplete hook handler that checks all page's categories. Preloading easily possible.
  • Hook to replace transclusion targets in the parser, preloading not possible.

For latter, I am not sure if doing query per page is okay. Are there already safe limits for template transclusion?

I think we could right now switch to querying only one page at a time without negative consequences. There should be a short-lived process cache to avoid duplicate queries in one request, but not something that breaks job queue with stale data when running via long-lived runJobs.php.