Page MenuHomePhabricator

Reduce database queries on parsercached logged-out page views (Sep 2023)
Open, Needs TriagePublic

Description

Capturing via verbose Logstash and Excimer UI in two (separate) profiles using WikimediaDebug after 1 warmup request that has neither of the options enabled, by viewing en.wikipedia.org's featured article at https://en.wikipedia.org/wiki/Robert_Howard_Hodgkin in private browsing (logged-out).

Prior art:

List of queries

I reduced the logs by using message:SELECT which yields 19 queries, roughly in order:

  1. [0.001s] Wikimedia\Rdbms\Replication\MysqlReplicationReporter::fetchSecondsSinceHeartbeat SELECT … FROM heartbeat where shard = 's1' …
    • Who: rdbms.
    • Why: Preflight required once for any new db connection, as requested by one or more later.
  2. [0.001s] WikiPage::pageData SELECT … FROM page WHERE page_namespace = 0 AND page_title = 'Robert_Howard_Hodgkin' LIMIT 1
    • Who: MediaWiki.php.
    • Why: This single query feeds many consumers all in one efficient row fetch. E.g. to determine the rendered title (is it a redirect that we need to resolve first?), RequestContext (does the page exist?), ActionFactory (which ContentHandler and Action will hande this request?), Title and PageRecord (page ID), ViewAction/OutputPage/Skin (current revision ID etc.)
  3. [0.002s] MediaWiki\Output\OutputPage::addCategoryLinksToLBAndGetResult SELECT … FROM page LEFT JOIN page_props ON (pp_propname = 'hiddencat' AND (pp_page = page_id)) WHERE ((page_namespace = 14 AND page_title IN (…,'Featured_articles',…,'1877_births','1951_deaths',…,'British_historians',…) ))
    • Who: OutputPage, on behalf of Skin.
    • Why: Show category links at the bottom of the page. Interestingly, there is no categorylinks table query, because that data is obtained from the ParserOutput from the ParserCache. Doing so saves a DB query but also ensure integrity and internal concistency with the rendered wikitext revision (i.e. you wouldn't want a category associated with a template like "citation needed", when it is no longer in the content or vice versa). This query is to help separate the "hidden" categories from the regular ones. This can change at any time through edits to category pages and rather than cache this in ParserCache and require refresh links to propagate on the entire site after (unprotected) Category edits, we instead query this at the last-minute.
  4. [0.001s] MediaWiki\Page\PageProps::getProperties SELECT pp_page,pp_propname,pp_value FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'notalk'
  5. [0s] MediaWiki\Page\PageProps::getProperties SELECT pp_page,pp_propname,pp_value FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'newsectionlink'
  6. [0s] MediaWiki\Page\PageProps::getProperties SELECT pp_page,pp_propname,pp_value FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'archivedtalk'
    • Who: DiscussionTools extension.
    • Why: To determine whether this is a talk page, via isAvailableForTitle() from DiscussionTools\PageHooks::onOutputPageBeforeHTML.
  1. [0.001s] MediaWiki\Page\PageProps::getProperties SELECT pp_page,pp_propname,pp_value FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'templatedata'
    • Who: TemplateData extension.
    • Why: To localise template documentation, in case this is a page view (which it might not be) for a Template page (which it might not be) with templatedata docs in the parser output (which it might not have).
  1. [0.001s] MediaWiki\Extension\PageTriage\QueueLookup::getByPageId SELECT ptrp_page_id,ptrp_reviewed,ptrp_created,ptrp_deleted,ptrp_tags_updated,ptrp_reviewed_updated,ptrp_last_reviewed_by FROM pagetriage_page WHERE ptrp_page_id = 54249587 LIMIT 1
    • Who: PageTriage extension.
    • Why: To decide whether to add its user interface to the article, via isPageUnreviewed() from PageTriage\Hooks::onArticleViewFooter.
  1. [0s] MediaWiki\Extension\Disambiguator\Lookup::filterDisambiguationPageIds SELECT page_id,rd_from FROM redirect JOIN page ON ((rd_namespace=page_namespace) AND (rd_title=page_title)) WHERE rd_from = 54249587
  2. [0s] MediaWiki\Extension\Disambiguator\Lookup::filterDisambiguationPageIds SELECT pp_page FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'disambiguation'
    • Who: RelatedArticles extension.
    • Why: To decide whether to output a "related articles" below the article. Via MediaWiki\Extension\Disambiguator\Hooks::isDisambiguationPage from RelatedArticles\Hooks::onSkinAfterContent.
  1. [0.001s] PageImages\PageImages::fetchPageImage SELECT pp_value FROM page_props WHERE pp_page = 54249587 AND pp_propname IN ('page_image','page_image_free') ORDER BY pp_propname LIMIT 1
    • Who: PageImages extension.
    • Why: To insert <meta property="og:image"> into the HTML <head>
  1. [0.001s] LinkBatch::doQuery (for Skin::preloadExistence) SELECT … FROM page WHERE ((page_namespace = 1 AND page_title = 'Robert_Howard_Hodgkin') OR (page_namespace = 2 AND page_title = 'X.X.X.X/sandbox'))
    • Who: Skin.
    • Why: For links in the skin interface sidebar, personal tools, page actions, etc. all batched together to decide whether they are blue or red.
  1. [0s] MediaWiki\User\TalkPageNotificationManager::dbCheckNewUserMessages SELECT user_ip FROM user_newtalk WHERE user_ip = 'X.X.X.X' LIMIT 1
    • Who: Skin.
    • Why: "Orange bar of doom", to say "You have new messages".. Note that this for any edit session, including for unregistered users for whom messages are left on a User_talk page based on their IP-address.
  1. [0.001s] MediaWiki\Page\PageStore::getPageByNameViaLinkCache SELECT … FROM page WHERE page_namespace = 12 AND page_title = 'Category' LIMIT 1
    • Who: Skin.
    • Why: The Help:Categories link, labelled "Categories", for the category box at the bottom of article.
  1. [0.001s] MediaWiki\Extension\Disambiguator\Lookup::filterDisambiguationPageIds SELECT page_id,rd_from FROM redirect JOIN page ON ((rd_namespace=page_namespace) AND (rd_title=page_title)) WHERE rd_from = 54249587
  2. [0s] MediaWiki\Extension\Disambiguator\Lookup::filterDisambiguationPageIds SELECT pp_page FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'disambiguation'
    • Who: RelatedArticles extension.
    • Why: To decide whether the already decided on "related articles" below the article, should load its CSS/JS module as well. Via MediaWiki\Extension\Disambiguator\Hooks::isDisambiguationPage from RelatedArticles\Hooks::onBeforePageDisplay.
  1. [0.001s] Wikimedia\Rdbms\Replication\MysqlReplicationReporter::fetchSecondsSinceHeartbeat SELECT … FROM heartbeat.heartbeat WHERE shard = 's8' …
    • Who: rdbms.
    • Why: Preflight for one or more Wikidata queries after this.
  1. [0.011s] Wikibase\Lib\Store\Sql\Terms\DatabaseTermInLangIdsResolver::resolveTermsViaJoin SELECT … FROM wbt_term_in_lang JOIN … WHERE wbtl_type_id = 2 AND wbxl_language = 'en' AND wbit_item_id = 30153087
    • Who: Wikidata (WikibaseClient extension)
    • Why: To create the <script type="application/ld+json"> metadata element and add it to the end of the HTML output stream for machine readable association between Wikipedia articles and Wikidata.org entities (e.g. for automated clients, bots, and crawlers). Via Wikibase\Client\ClientHooks::onSkinAfterBottomScripts hook.
  1. [0.002s] MediaWiki\Revision\RevisionStore::fetchRevisionRowFromConds SELECT … FROM revision … JOIN actor comment comment_rev_comment page ON … LEFT JOIN user ON … WHERE page_namespace = 0 AND page_title = 'Robert_Howard_Hodgkin' ORDER BY rev_timestamp ASC,rev_id ASC LIMIT 1
    • Who: Wikidata (WikibaseClient extension)
    • Why: To compute the datePublished property within the <script type="application/ld+json"> element. Via RevisionStore::getFirstRevision as indirectly caleld from the same Wikibase\Client\ClientHooks::onSkinAfterBottomScripts hook handler.

Performance analysis

  1. SELECT pp_page,pp_propname,pp_value FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'notalk'
  2. [0s] MediaWiki\Page\PageProps::getProperties SELECT pp_page,pp_propname,pp_value FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'newsectionlink'
  3. [0s] MediaWiki\Page\PageProps::getProperties SELECT pp_page,pp_propname,pp_value FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'archivedtalk'
    • Who: DiscussionTools extension.
    • Why: To determine whether this is a talk page, via isAvailableForTitle() from DiscussionTools\PageHooks::onOutputPageBeforeHTML.

At glance one would think that articles can be more cheaply ruled out as a potential talk page by other means before doing a page props query for notalk. However, I suspect that this is yet another victim of wgExtraSignatureNamespaces and how its thus inheritently expensive for DiscussionTools to determine what a "discussion" page is (ref T336020, T249293, T245890, T249036; for various prior art and trade offs).

Looking more closely, one thing that looks odd to me is the inline cache in DiscussionTools\HooksUtils::hasPagePropCached. This is caching the result of MediaWiki core's PageProps->getProperties service, which already has an inline cache.

But above all, if this is specific to page views (i.e. not when viewing action=edit, action=history, or action=info), then you can assume a ParserOutput object, which already has page props in it. Fetching them by page ID should not be needed at all. The PageProps service can't do that, since its contract is bound to the latest revision in the database, not the revision shown. However, perhaps DiscussionTools could make some of this hook decision via the onOutputPageParserOutput hook, which runs just before the onOutputPageBeforeHTML hook, and has access to the ParserOutput object and all its rich metadata.

ACTION (DiscussionTools): Consider if checks can work without database queries by using ParserOutput.
  1. [0.001s] MediaWiki\Page\PageProps::getProperties SELECT pp_page,pp_propname,pp_value FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'templatedata'
    • Who: TemplateData extension.
    • Why: To localise template documentation, […]

This was introduced Nov 2022 last year as part of T316759 and seems to introduce a mechanism not unlike core's <mw:editsection> placeholder. But, because it hooks into OutputPage, this means these placeholders are leak outside the content layer and not taken care of until here in the Skin-level with OutputPage and its browser-facing hooks. In other words, Parsoid and other APIs likely still leak the placeholders through non-standard HTML.

A more appropiate hook would be onParserOutputPostCacheTransform which happens inside ParserOutput::getText just like where <mw:editsection> happens. These was introduced in 2017 (T171797).

Adopting this would have three major benefits:

  • Reverts the added database query above in favour of free access via the in-memory ParserOutput object, provided to this hook.
  • Avoids race conditions where template pages are missing localisation or needlessly running this code when page_props and ParserCache return a different revision ID (e.g. when under load via PoolCounter).
  • Avoids leaking non-standard HTML to APIs.
ACTION (TemplateData): Consider if checks can work without database queries by using ParserOutput.
  1. [0.001s] LinkBatch::doQuery (for Skin::preloadExistence) SELECT … FROM page WHERE ((page_namespace = 1 AND page_title = 'Robert_Howard_Hodgkin') OR (page_namespace = 2 AND page_title = 'X.X.X.X/sandbox'))
    • Who: Skin.

Seems fine as-is. This is batched and heavily optimised to a bare minimum. The Article/Talk button (page actions tabs) and user talk page (personal portlet link) need to have the correct URL (redlink=1), tooltip ("Page doesn't exist") and blue/red coloring as they are highly visible and relevant to editor worklflow and engagement.

For speed it doesn't matter much whether we query 1 or 20 titles in a batch, but for what its worth, we did reduce the query size a lot here in the past through T299099: Question: Is Skin::preloadExistence still relevant?.

  1. [0s] MediaWiki\Extension\Disambiguator\Lookup::filterDisambiguationPageIds SELECT page_id,rd_from FROM redirect JOIN page ON ((rd_namespace=page_namespace) AND (rd_title=page_title)) WHERE rd_from = 54249587
  2. [0s] MediaWiki\Extension\Disambiguator\Lookup::filterDisambiguationPageIds SELECT pp_page FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'disambiguation'
    • Who: RelatedArticles extension.
    • Why: To decide whether to output a "related articles" below the article. Via MediaWiki\Extension\Disambiguator\Hooks::isDisambiguationPage from RelatedArticles\Hooks::onSkinAfterContent.
  3. [0.001s] MediaWiki\Extension\Disambiguator\Lookup::filterDisambiguationPageIds SELECT page_id,rd_from FROM redirect JOIN page ON ((rd_namespace=page_namespace) AND (rd_title=page_title)) WHERE rd_from = 54249587
  4. [0s] MediaWiki\Extension\Disambiguator\Lookup::filterDisambiguationPageIds SELECT pp_page FROM page_props WHERE pp_page = 54249587 AND pp_propname = 'disambiguation'
    • Who: RelatedArticles extension.
    • Why: To decide whether the already decided on "related articles" below the article, should load its CSS/JS module as well. Via MediaWiki\Extension\Disambiguator\Hooks::isDisambiguationPage from RelatedArticles\Hooks::onBeforePageDisplay.

There's a lot to unpack here.

First of all, the RelatedArticles extension doesn't actually output anything outside the Minerva skin or MobileFrontend. That fact is realized about three method calls after two full database roundtrips through a static check on the skin name. Those checks need to be re-ordered and that will immediately remove 21% of all databae queries on all page views in production.

ACTION (RelatedArticles): Fix order of operations in RelatedArticles\Hooks::hasRelatedArticles to check skins before making database queries.

Secondly, the RelatedArticles extension is performing two queries where 1 would suffice. It apparently calls the Disambiguator lookup service with $includeRedirects = true. I have no idea why this parameter exists or why it is true by default. There are only 3 calls across production codebases and it not one appears to intentionally follow redirects. This was introduced in the Disambiguator extension in T88305 as an unused parameter for a use case relating to onGetLinkColours where links to disambig pages and links to redirects to disambig pages get the same colours. That's fine, except this hook calls the underlying logic direclty and so doesn't even need to expose the $includeRedirects paremeter, much less set it to true by default, even less set it to true for the call from the RelatedArticles extension.

ACTION (RelatedArticles): Fix RelatedArticles\Hooks::isDisambiguationPage to set $includeRedirects = false to skip a pointless database query.
ACTION (Disambiguator): Consider changing DisambiguatorLookup service to remove or change default of the $includeRedirects parameter.

Thirdly, the RelatedArticles extension is performing the exact same database queries twice on every single page view. Once to decide whether to insert an empty <div> element, and then the whole suite of database queries and computations a second time to decide whether to queue a CSS/JS module.

ACTION (RelatedArticles): Fix RelatedArticles\Hooks to either use a single hook for both purposes, or to let one re-use information from the other. For example, use onBeforePageDisplay to communicate with the Skin in a way that removes the need for a onSkinAfterContent hook, or that leaves information that the onSkinAfterContent can use. Or the other way around, let onBeforePageDisplay send a signal to OutputPage, for example OutputPage::setProperty which the onSkinAfterContent hook can and draw its separation concern to only base on that predetermined boolean signal instead of concerning itself with the full computation again from scratch and having to keep them in sync.

Conclusion

The situation in the RelatedArticles extension reminds us that it's important to periodically run your dev environment with $wgDebugToolbar and $wgDebugDumpSql enabled (see DevelopmentSettings.php for an example). The number of queries on plain MediaWiki core is around 9 by default. Small enough to become familiar with and remember, and more importantly, to stand out if it goes and stays up, for someone in your team to notice. Even if it goes up by only 1, that's a big deal.

Given the scale of Wikipedia and our budget, the quota for a feature is generallys 0 database queries added per (cached) page view. The fact that we have hundreds of extension deployed and yet only 10-15 database queries on page views means this is generally feasible with relatively little effort. It's not about how hard it is, it's about noticing it.

The above 19 are exceptional features where there was no other way and a trade-off was made in consultation with SRE, or, where we budgeted for it by saving something else, or where it benefits multiple features through a single query (e.g. batched, or re-usable information), or.... it flew under the radar temporarily... which works if only 1 or 2 extension do so and if we notice and revert or pay back within a few months!

Event Timeline

Krinkle added a parent task: Restricted Task.Sep 22 2023, 3:06 AM
Krinkle renamed this task from Reduce database queries on cached logged-out page views (Sep 2023) to Reduce database queries on parsercached logged-out page views (Sep 2023).Sep 22 2023, 3:13 AM

Thanks for the detailed analysis. I would be more than happy to review any incoming patches.

  1. [0.001s] LinkBatch::doQuery (for Skin::preloadExistence) SELECT … FROM page WHERE ((page_namespace = 1 AND page_title = 'Robert_Howard_Hodgkin') OR (page_namespace = 2 AND page_title = 'X.X.X.X/sandbox'))
    • Who: Skin.

Seems fine as-is. This is batched and heavily optimised to a bare minimum. The Article/Talk button (page actions tabs) and user talk page (personal portlet link) need to have the correct URL (redlink=1), tooltip ("Page doesn't exist") and blue/red coloring as they are highly visible and relevant to editor worklflow and engagement.

Even if batching makes the effect negligible, why is User:X.X.X.X/sandbox queried when only named users do actually see the sandbox link? (Please note that the talk page link, which anons do see, points to Special:MyTalk and is always blue, probably to make the output more cacheable.) rESAL includes/Hooks.php should be fixed so that whenever onSkinTemplateNavigation__Universal returns early, onSkinPreloadExistence doesn’t add the sandbox title to the preload array.

  1. [0.001s] MediaWiki\Page\PageStore::getPageByNameViaLinkCache SELECT … FROM page WHERE page_namespace = 12 AND page_title = 'Category' LIMIT 1
    • Who: Skin.
    • Why: The Help:Categories link, labelled "Categories", for the category box at the bottom of article.

This is the result of local customization of https://en.wikipedia.org/wiki/MediaWiki:Pagecategorieslink (default is a special page link), Skin could preload this title, use a known link and assume the help page exists always or omit LinkRenderer as done in 1fa903a2

Change 960133 had a related patch set uploaded (by Majavah; author: Majavah):

[mediawiki/extensions/SandboxLink@master] Do not preload info for link which would not be used

https://gerrit.wikimedia.org/r/960133

Change 960134 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/extensions/RelatedArticles@master] Simplify Hooks::hasRelatedArticles

https://gerrit.wikimedia.org/r/960134

Change 960134 merged by jenkins-bot:

[mediawiki/extensions/RelatedArticles@master] Simplify Hooks::hasRelatedArticles

https://gerrit.wikimedia.org/r/960134

Change 960133 merged by jenkins-bot:

[mediawiki/extensions/SandboxLink@master] Do not preload info for link which would not be used

https://gerrit.wikimedia.org/r/960133

DiscussionTools

At glance one would think that articles can be more cheaply ruled out as a potential talk page by other means before doing a page props query for notalk. However, I suspect that this is yet another victim of wgExtraSignatureNamespaces and how its thus inheritently expensive for DiscussionTools to determine what a "discussion" page is (ref T336020, T249293, T245890, T249036; for various prior art and trade offs).

Actually it may be possible! It can't avoid these queries on some wikis which are configured so that any page can be a discussion page, but on wiki with a more reasonable config like Wikipedia, it probably can. Never mind, I forgot how it works already… It always needs to check at least for 'newsectionlink'.

Looking more closely, one thing that looks odd to me is the inline cache in DiscussionTools\HooksUtils::hasPagePropCached. This is caching the result of MediaWiki core's PageProps->getProperties service, which already has an inline cache.

This is necessary, because PageProps only caches positive results (when the page has the prop), but not negative ones (when the page doesn't have the prop). We complained about this at T297300: PageProps doesn't cache result when it is false.

But above all, if this is specific to page views (i.e. not when viewing action=edit, action=history, or action=info), then you can assume a ParserOutput object, which already has page props in it. Fetching them by page ID should not be needed at all. The PageProps service can't do that, since its contract is bound to the latest revision in the database, not the revision shown. However, perhaps DiscussionTools could make some of this hook decision via the onOutputPageParserOutput hook, which runs just before the onOutputPageBeforeHTML hook, and has access to the ParserOutput object and all its rich metadata.

ACTION (DiscussionTools): Consider if checks can work without database queries by using ParserOutput.

This seems like a good idea, but very difficult. I've been thinking about this for like an hour, and I'm still not sure how to do it. The problem is that it calls this function all over the place, including in places that do not have access to the ParserOutput, e.g. in the onGetActionName hook handler (on every page view), or in maintenance scripts elsewhere. So everything would need to be refactored to check the props either in the database or in parser output, or maybe to avoid checking them based on some external knowledge (this may be feasible for onGetActionName – it only needs to override the action for non-existent pages, which can't have any props). And even once you do that, you may still discover that it's still doing unavoidable queries in some code path, or you may discover that the props from the database and from parser output are out-of-sync and breaking something. On top of that, you can't just move the code to onOutputPageParserOutput, because unlike onOutputPageBeforeHTML it doesn't allow changing the HTML to be added; another hook would be needed. I'm not sure if this is worth the effort.

It could at least fetch the 3 props it needs in one query.

TemplateData

This was introduced Nov 2022 last year as part of T316759 and seems to introduce a mechanism not unlike core's <mw:editsection> placeholder. But, because it hooks into OutputPage, this means these placeholders are leak outside the content layer and not taken care of until here in the Skin-level with OutputPage and its browser-facing hooks. In other words, Parsoid and other APIs likely still leak the placeholders through non-standard HTML.

A more appropiate hook would be onParserOutputPostCacheTransform which happens inside ParserOutput::getText just like where <mw:editsection> happens. These was introduced in 2017 (T171797).

(…)

ACTION (TemplateData): Consider if checks can work without database queries by using ParserOutput.

onParserOutputPostCacheTransform can't be used here, because it doesn't provide information about the user interface language, which it needs to localize the links. It could fetch it from the global context, but that seems worse.

TemplateData code comments don't explain that (maybe they should), but funnily enough there's an identical situation in DiscussionTools, which has the following complaint about it:

// ParserOutputPostCacheTransform hook would be a better place to do this,
// so that when the ParserOutput is used directly without using this hook,
// we don't leave half-baked interface elements in it (see e.g. T292345, T294168).
// But that hook doesn't provide parameters that we need to render correctly
// (including the page title, interface language, and current user).

Change 960158 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/DiscussionTools@master] HookUtils: Fetch all needed props in one query

https://gerrit.wikimedia.org/r/960158

Change 960158 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@master] HookUtils: Fetch all needed props in one query

https://gerrit.wikimedia.org/r/960158

It seems like @matmarex has handled the DiscussionTools-related facets of this task an TemplateData being worked on depends on Parsoid-related work that has not yet been implemented.

With the above in mind, the Editing Team is assuming there is nothing left for us to do on this task.

Of course, please let us know if you see it differently.

Change 967301 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/core@master] skins: Preload page from message "pagecategorieslink"

https://gerrit.wikimedia.org/r/967301

Change 967301 merged by jenkins-bot:

[mediawiki/core@master] skins: Preload page from message "pagecategorieslink"

https://gerrit.wikimedia.org/r/967301

  1. [0.002s] MediaWiki\Revision\RevisionStore::fetchRevisionRowFromConds SELECT … FROM revision … JOIN actor comment comment_rev_comment page ON … LEFT JOIN user ON … WHERE page_namespace = 0 AND page_title = 'Robert_Howard_Hodgkin' ORDER BY rev_timestamp ASC,rev_id ASC LIMIT 1
    • Who: Wikidata (WikibaseClient extension)
    • Why: To compute the datePublished property within the <script type="application/ld+json"> element. Via RevisionStore::getFirstRevision as indirectly caleld from the same Wikibase\Client\ClientHooks::onSkinAfterBottomScripts hook handler.

FWIW, we don’t need all the information this query returns – all we actually access is the rev_timestamp, we don’t need to join actor, comment and user for that. Do you think it would be worth simplifying that query? (It would probably make the code a bit uglier because we have to either bypass RevisionLookup::getFirstRevision() or somehow enhance it with “omit X joins” flags, but perhaps it’s still a good tradeoff for a query that runs on parsercached logged-out page views.)

You may be able to use RevisionStore::newSelectQueryBuilder directly if that helps.

I would just use a plain SelectQueryBuilder tbh, even the RevisionStore one already adds joins we don’t need. But that makes the code uglier, so I’m wondering if it’s worth it (how expensive the joins are for the database).

Are you sure? RevisionSelectQueryBuilder only joins to actor. Other tables like comment, user, or page are added when you explicitly call ->joinPage() or something like that.