Capturing via verbose Logstash and Excimer UI in two (separate) profiles using WikimediaDebug after 1 warmup request that has neither of the options enabled, by viewing en.wikipedia.org's featured article at https://en.wikipedia.org/wiki/Robert_Howard_Hodgkin in private browsing (logged-out).
Prior art:
List of queries
- Logs: https://logstash.wikimedia.org/app/dashboards#/view/x-debug?_g=(time:(from:now-1h,mode:quick,to:now))&_a=(query:(query_string:(query:%27reqId:%22d15a6bb9-27ac-44ac-a1ed-9887c872fbe2%22%27)))
- Flame graph with stack traces for each database query: https://performance.wikimedia.org/excimer/profile/9f93148c22caf7dc
I reduced the logs by using message:SELECT which yields 19 queries, roughly in order:
- [0.001s] Wikimedia\Rdbms\Replication\MysqlReplicationReporter::fetchSecondsSinceHeartbeat SELECT … FROM heartbeat where shard = 's1' …
- Who: rdbms.
- Why: Preflight required once for any new db connection, as requested by one or more later.
- [0.001s] WikiPage::pageData SELECT … FROM page WHERE page_namespace = 0 AND page_title = 'Robert_Howard_Hodgkin' LIMIT 1
- Who: MediaWiki.php.
- Why: This single query feeds many consumers all in one efficient row fetch. E.g. to determine the rendered title (is it a redirect that we need to resolve first?), RequestContext (does the page exist?), ActionFactory (which ContentHandler and Action will hande this request?), Title and PageRecord (page ID), ViewAction/OutputPage/Skin (current revision ID etc.)
- [0.002s] MediaWiki\Output\OutputPage::addCategoryLinksToLBAndGetResult SELECT … FROM page LEFT JOIN page_props ON (pp_propname = 'hiddencat' AND (pp_page = page_id)) WHERE ((page_namespace = 14 AND page_title IN (…,'Featured_articles',…,'1877_births','1951_deaths',…,'British_historians',…) ))
- Who: OutputPage, on behalf of Skin.
- Why: Show category links at the bottom of the page. Interestingly, there is no categorylinks table query, because that data is obtained from the ParserOutput from the ParserCache. Doing so saves a DB query but also ensure integrity and internal concistency with the rendered wikitext revision (i.e. you wouldn't want a category associated with a template like "citation needed", when it is no longer in the content or vice versa). This query is to help separate the "hidden" categories from the regular ones. This can change at any time through edits to category pages and rather than cache this in ParserCache and require refresh links to propagate on the entire site after (unprotected) Category edits, we instead query this at the last-minute.
- [0.001s] MediaWiki\Page\PageProps::getProperties SELECT pp_page,pp_propname,pp_value FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'notalk'
- [0s] MediaWiki\Page\PageProps::getProperties SELECT pp_page,pp_propname,pp_value FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'newsectionlink'
- [0s] MediaWiki\Page\PageProps::getProperties SELECT pp_page,pp_propname,pp_value FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'archivedtalk'
- Who: DiscussionTools extension.
- Why: To determine whether this is a talk page, via isAvailableForTitle() from DiscussionTools\PageHooks::onOutputPageBeforeHTML.
- [0.001s] MediaWiki\Page\PageProps::getProperties SELECT pp_page,pp_propname,pp_value FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'templatedata'
- Who: TemplateData extension.
- Why: To localise template documentation, in case this is a page view (which it might not be) for a Template page (which it might not be) with templatedata docs in the parser output (which it might not have).
- [0.001s] MediaWiki\Extension\PageTriage\QueueLookup::getByPageId SELECT ptrp_page_id,ptrp_reviewed,ptrp_created,ptrp_deleted,ptrp_tags_updated,ptrp_reviewed_updated,ptrp_last_reviewed_by FROM pagetriage_page WHERE ptrp_page_id = 54249587 LIMIT 1
- Who: PageTriage extension.
- Why: To decide whether to add its user interface to the article, via isPageUnreviewed() from PageTriage\Hooks::onArticleViewFooter.
- [0s] MediaWiki\Extension\Disambiguator\Lookup::filterDisambiguationPageIds SELECT page_id,rd_from FROM redirect JOIN page ON ((rd_namespace=page_namespace) AND (rd_title=page_title)) WHERE rd_from = 54249587
- [0s] MediaWiki\Extension\Disambiguator\Lookup::filterDisambiguationPageIds SELECT pp_page FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'disambiguation'
- Who: RelatedArticles extension.
- Why: To decide whether to output a "related articles" below the article. Via MediaWiki\Extension\Disambiguator\Hooks::isDisambiguationPage from RelatedArticles\Hooks::onSkinAfterContent.
- [0.001s] PageImages\PageImages::fetchPageImage SELECT pp_value FROM page_props WHERE pp_page = 54249587 AND pp_propname IN ('page_image','page_image_free') ORDER BY pp_propname LIMIT 1
- Who: PageImages extension.
- Why: To insert <meta property="og:image"> into the HTML <head>
- [0.001s] LinkBatch::doQuery (for Skin::preloadExistence) SELECT … FROM page WHERE ((page_namespace = 1 AND page_title = 'Robert_Howard_Hodgkin') OR (page_namespace = 2 AND page_title = 'X.X.X.X/sandbox'))
- Who: Skin.
- Why: For links in the skin interface sidebar, personal tools, page actions, etc. all batched together to decide whether they are blue or red.
- [0s] MediaWiki\User\TalkPageNotificationManager::dbCheckNewUserMessages SELECT user_ip FROM user_newtalk WHERE user_ip = 'X.X.X.X' LIMIT 1
- Who: Skin.
- Why: "Orange bar of doom", to say "You have new messages".. Note that this for any edit session, including for unregistered users for whom messages are left on a User_talk page based on their IP-address.
- [0.001s] MediaWiki\Page\PageStore::getPageByNameViaLinkCache SELECT … FROM page WHERE page_namespace = 12 AND page_title = 'Category' LIMIT 1
- Who: Skin.
- Why: The Help:Categories link, labelled "Categories", for the category box at the bottom of article.
- [0.001s] MediaWiki\Extension\Disambiguator\Lookup::filterDisambiguationPageIds SELECT page_id,rd_from FROM redirect JOIN page ON ((rd_namespace=page_namespace) AND (rd_title=page_title)) WHERE rd_from = 54249587
- [0s] MediaWiki\Extension\Disambiguator\Lookup::filterDisambiguationPageIds SELECT pp_page FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'disambiguation'
- Who: RelatedArticles extension.
- Why: To decide whether the already decided on "related articles" below the article, should load its CSS/JS module as well. Via MediaWiki\Extension\Disambiguator\Hooks::isDisambiguationPage from RelatedArticles\Hooks::onBeforePageDisplay.
- [0.001s] Wikimedia\Rdbms\Replication\MysqlReplicationReporter::fetchSecondsSinceHeartbeat SELECT … FROM heartbeat.heartbeat WHERE shard = 's8' …
- Who: rdbms.
- Why: Preflight for one or more Wikidata queries after this.
- [0.011s] Wikibase\Lib\Store\Sql\Terms\DatabaseTermInLangIdsResolver::resolveTermsViaJoin SELECT … FROM wbt_term_in_lang JOIN … WHERE wbtl_type_id = 2 AND wbxl_language = 'en' AND wbit_item_id = 30153087
- Who: Wikidata (WikibaseClient extension)
- Why: To create the <script type="application/ld+json"> metadata element and add it to the end of the HTML output stream for machine readable association between Wikipedia articles and Wikidata.org entities (e.g. for automated clients, bots, and crawlers). Via Wikibase\Client\ClientHooks::onSkinAfterBottomScripts hook.
- [0.002s] MediaWiki\Revision\RevisionStore::fetchRevisionRowFromConds SELECT … FROM revision … JOIN actor comment comment_rev_comment page ON … LEFT JOIN user ON … WHERE page_namespace = 0 AND page_title = 'Robert_Howard_Hodgkin' ORDER BY rev_timestamp ASC,rev_id ASC LIMIT 1
- Who: Wikidata (WikibaseClient extension)
- Why: To compute the datePublished property within the <script type="application/ld+json"> element. Via RevisionStore::getFirstRevision as indirectly caleld from the same Wikibase\Client\ClientHooks::onSkinAfterBottomScripts hook handler.
Performance analysis
- SELECT pp_page,pp_propname,pp_value FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'notalk'
- [0s] MediaWiki\Page\PageProps::getProperties SELECT pp_page,pp_propname,pp_value FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'newsectionlink'
- [0s] MediaWiki\Page\PageProps::getProperties SELECT pp_page,pp_propname,pp_value FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'archivedtalk'
- Who: DiscussionTools extension.
- Why: To determine whether this is a talk page, via isAvailableForTitle() from DiscussionTools\PageHooks::onOutputPageBeforeHTML.
At glance one would think that articles can be more cheaply ruled out as a potential talk page by other means before doing a page props query for notalk. However, I suspect that this is yet another victim of wgExtraSignatureNamespaces and how its thus inheritently expensive for DiscussionTools to determine what a "discussion" page is (ref T336020, T249293, T245890, T249036; for various prior art and trade offs).
Looking more closely, one thing that looks odd to me is the inline cache in DiscussionTools\HooksUtils::hasPagePropCached. This is caching the result of MediaWiki core's PageProps->getProperties service, which already has an inline cache.
But above all, if this is specific to page views (i.e. not when viewing action=edit, action=history, or action=info), then you can assume a ParserOutput object, which already has page props in it. Fetching them by page ID should not be needed at all. The PageProps service can't do that, since its contract is bound to the latest revision in the database, not the revision shown. However, perhaps DiscussionTools could make some of this hook decision via the onOutputPageParserOutput hook, which runs just before the onOutputPageBeforeHTML hook, and has access to the ParserOutput object and all its rich metadata.
- [0.001s] MediaWiki\Page\PageProps::getProperties SELECT pp_page,pp_propname,pp_value FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'templatedata'
- Who: TemplateData extension.
- Why: To localise template documentation, […]
This was introduced Nov 2022 last year as part of T316759 and seems to introduce a mechanism not unlike core's <mw:editsection> placeholder. But, because it hooks into OutputPage, this means these placeholders are leak outside the content layer and not taken care of until here in the Skin-level with OutputPage and its browser-facing hooks. In other words, Parsoid and other APIs likely still leak the placeholders through non-standard HTML.
A more appropiate hook would be onParserOutputPostCacheTransform which happens inside ParserOutput::getText just like where <mw:editsection> happens. These was introduced in 2017 (T171797).
Adopting this would have three major benefits:
- Reverts the added database query above in favour of free access via the in-memory ParserOutput object, provided to this hook.
- Avoids race conditions where template pages are missing localisation or needlessly running this code when page_props and ParserCache return a different revision ID (e.g. when under load via PoolCounter).
- Avoids leaking non-standard HTML to APIs.
- [0.001s] LinkBatch::doQuery (for Skin::preloadExistence) SELECT … FROM page WHERE ((page_namespace = 1 AND page_title = 'Robert_Howard_Hodgkin') OR (page_namespace = 2 AND page_title = 'X.X.X.X/sandbox'))
- Who: Skin.
Seems fine as-is. This is batched and heavily optimised to a bare minimum. The Article/Talk button (page actions tabs) and user talk page (personal portlet link) need to have the correct URL (redlink=1), tooltip ("Page doesn't exist") and blue/red coloring as they are highly visible and relevant to editor worklflow and engagement.
For speed it doesn't matter much whether we query 1 or 20 titles in a batch, but for what its worth, we did reduce the query size a lot here in the past through T299099: Question: Is Skin::preloadExistence still relevant?.
- [0s] MediaWiki\Extension\Disambiguator\Lookup::filterDisambiguationPageIds SELECT page_id,rd_from FROM redirect JOIN page ON ((rd_namespace=page_namespace) AND (rd_title=page_title)) WHERE rd_from = 54249587
- [0s] MediaWiki\Extension\Disambiguator\Lookup::filterDisambiguationPageIds SELECT pp_page FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'disambiguation'
- Who: RelatedArticles extension.
- Why: To decide whether to output a "related articles" below the article. Via MediaWiki\Extension\Disambiguator\Hooks::isDisambiguationPage from RelatedArticles\Hooks::onSkinAfterContent.
- [0.001s] MediaWiki\Extension\Disambiguator\Lookup::filterDisambiguationPageIds SELECT page_id,rd_from FROM redirect JOIN page ON ((rd_namespace=page_namespace) AND (rd_title=page_title)) WHERE rd_from = 54249587
- [0s] MediaWiki\Extension\Disambiguator\Lookup::filterDisambiguationPageIds SELECT pp_page FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'disambiguation'
- Who: RelatedArticles extension.
- Why: To decide whether the already decided on "related articles" below the article, should load its CSS/JS module as well. Via MediaWiki\Extension\Disambiguator\Hooks::isDisambiguationPage from RelatedArticles\Hooks::onBeforePageDisplay.
There's a lot to unpack here.
First of all, the RelatedArticles extension doesn't actually output anything outside the Minerva skin or MobileFrontend. That fact is realized about three method calls after two full database roundtrips through a static check on the skin name. Those checks need to be re-ordered and that will immediately remove 21% of all databae queries on all page views in production.
Secondly, the RelatedArticles extension is performing two queries where 1 would suffice. It apparently calls the Disambiguator lookup service with $includeRedirects = true. I have no idea why this parameter exists or why it is true by default. There are only 3 calls across production codebases and it not one appears to intentionally follow redirects. This was introduced in the Disambiguator extension in T88305 as an unused parameter for a use case relating to onGetLinkColours where links to disambig pages and links to redirects to disambig pages get the same colours. That's fine, except this hook calls the underlying logic direclty and so doesn't even need to expose the $includeRedirects paremeter, much less set it to true by default, even less set it to true for the call from the RelatedArticles extension.
Thirdly, the RelatedArticles extension is performing the exact same database queries twice on every single page view. Once to decide whether to insert an empty <div> element, and then the whole suite of database queries and computations a second time to decide whether to queue a CSS/JS module.
Conclusion
The situation in the RelatedArticles extension reminds us that it's important to periodically run your dev environment with $wgDebugToolbar and $wgDebugDumpSql enabled (see DevelopmentSettings.php for an example). The number of queries on plain MediaWiki core is around 9 by default. Small enough to become familiar with and remember, and more importantly, to stand out if it goes and stays up, for someone in your team to notice. Even if it goes up by only 1, that's a big deal.
Given the scale of Wikipedia and our budget, the quota for a feature is generallys 0 database queries added per (cached) page view. The fact that we have hundreds of extension deployed and yet only 10-15 database queries on page views means this is generally feasible with relatively little effort. It's not about how hard it is, it's about noticing it.
The above 19 are exceptional features where there was no other way and a trade-off was made in consultation with SRE, or, where we budgeted for it by saving something else, or where it benefits multiple features through a single query (e.g. batched, or re-usable information), or.... it flew under the radar temporarily... which works if only 1 or 2 extension do so and if we notice and revert or pay back within a few months!