Page MenuHomePhabricator

Explore removing WikiModuleTitleInfo in ResourceLoader, in favour of standard LinkCache
Open, In Progress, MediumPublic

Description

Background

These use cases and requirements are met and implemented since 2011. They are described here for context when we remove/replace this implementation.

ResourceLoader has a "WikModuleTitleInfo" subsystem to track the existence and current revision ID of pages of which the content is bundled inside one or more modules. These are used by user-generated resourced that are created and edited on-wiki. Examples include Gadgets, site scripts, user scripts, and the GlobalCssJs extension.

Use cases:

  • In order to support bundling pages like MediaWiki:Common.css, User:Example/vector.js, and MediaWiki:Gadget-example.js in a ResourceLoader module, we need a version hash so that changes actually propagate to browers. This version hash should change based on the composition (page title and order), and content (i.e. last edit's revision ID or last purge, such page_latest or page_touched). The version hash is computed on the load.php?module=startup request, which is strongly cached, including for logged-in users, and has a low cardinality, so this has a relatively low backend request rate.
  • In order to avoid making <link> stylesheet requests in the browser for modules that aren't enabled on the wiki, we need to know whether these pages actually exist (Module::isKnownEmpty). This happens on every page view. Page views have a long tail with high cardinality, thus a high backend request rate, including for logged-in users.

Requirements:

  • Constant time. There can be any number of such pages on a given wiki. To ensure our backend webserver latency doesn't grow uncontrollably, and to ensure database load stays relatively constant, we need to batch these and retrieve them in a single query whenever possible.
  • Caching. Since we need this information on pageviews, we need to actually perform zero queries in most cases, which is only possible if we cache it in Memcached. This in turn means we need cache invalidation, i.e. after every edit and/or a cache key that naturally caches. See also MediaWIki Engineering practices, T302623, T347123, and T302538.

Today, we meet the above through three local functions in the MediaWiki\ResourceLoader\WikiModule class: fetchTitleInfo, invalidateModuleCache, and preloadTitleInfo.

Problem

The implementation has been stable with largely no defect or need to change since 2015. Recently, in adopting Domain Events within MediaWiki core, we found that ResourceLoader consumes one hook that would need to adopt Domain Events. This is fine, but it made me take a new look at this subsystem.

It seems to me that we might not need this subsystem anymore, because MediaWiki core nowadays provides the LinkCache service which keeps track of the same information already, in a way that is much more dependency-free. It works quite differently than our implemetnation, so it's not obvious that it can definitely fit our needs, but it seems worth exploring. If succesful, it would allow us to remove this dependency and with it a significant number of indirect dependencies and coupling between ResourceLoader and other MediaWiki core components.

See also:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Krinkle updated the task description. (Show Details)

Change #1132095 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] ResourceLoader: Test for WikiModule titleinfo cache and purge

https://gerrit.wikimedia.org/r/1132095

Change #1132141 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] LinkCache: Improve high-level docs

https://gerrit.wikimedia.org/r/1132141

Hokwelum changed the task status from Open to In Progress.Jun 23 2025, 1:33 PM

Change #1132095 merged by jenkins-bot:

[mediawiki/core@master] ResourceLoader: Test for WikiModule titleinfo cache and purge

https://gerrit.wikimedia.org/r/1132095

Krinkle triaged this task as Medium priority.Jul 18 2025, 1:06 AM

Change #1170451 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] page: Move LinkCache from includes/cache/ to includes/page

https://gerrit.wikimedia.org/r/1170451

Change #1132141 merged by jenkins-bot:

[mediawiki/core@master] LinkCache: Improve high-level docs

https://gerrit.wikimedia.org/r/1132141

Update @Krinkle;

This is what I'm experiencing:

When I populate the cache with linkbatch->execute()

It always does $this->doQuery(); I don't want this, I want it to try to fetch from the cache first before running the query.

If I use PageStore::getPageForLink, it tries to get from the cache, and if not there, it fetches, but it doesn’t do this for a batch. It will run the query again for all pages not in the cache. So if this was 1000, it will do 1000 queries for each of those.

LinkaCache:addLinkObj is deprecated, and PageStore::getPageForLink is what we should use now.

Hannah I discussed this in a meeting, but I'll summarise it here as well.

This is what I'm experiencing:

When I populate the cache with linkbatch->execute()

It always does $this->doQuery(); I don't want this, I want it to try to fetch from the cache first before running the query.

This surprised me but is indeed how it works. It turns out that the main use cases we have today for LinkCache (per docs, this is 1: page titles such from Parser and Skin, and 2: template and message key lookups from the Parser and MessageBlobStore) fall into two buckets that interestingly have never needed a batch with caching.

The first use case involves primarily page links to namespaces that don't qualify for persistent caching because they'd populute the cache and be ineffective anyway. Chances are at least one of the page titles in the batch won't be in the cache, and once we do a DB query we might as well do them in a single batch. This is actually an important optimisation because performing cache lookups for something that won't yield results is actually slower than not having a cache, so for this use case we not only not need a cache, we explicitly don't want a cache here because the caller knows they're not likely to be a hit.

The second use case involves single lookups where logically it never knows more than one in advance. There is simply never a need for a batch here, and all relevant codepaths (e.g. Title::newFromText, PageStore) naturally will involve LinkCache and do a look up there for the process-cache and persistent-cache before a DB query. It just works.

All that means that, while LinkBatch feeds its results into LinkCache, it is entirely obvious to the process-cache and persistent-cache and doesn't check them, and with today's use cases, that seems fair.

For the TitleInfo cache in ResourceLoader WikiModule, we have use titles that do quality for the persistent cache, and we know them in advance. This means we want to 1) use the cache first, and 2) use a batch for the rest. This isn't something LinkBatch does today, and may justify a new option being added here for us that is cache-aware and can do a batch read from the cache first, and then fetch anything remaining from the database.

If I use PageStore::getPageForLink, it tries to get from the cache, and if not there, it fetches, but it doesn’t do this for a batch. It will run the query again for all pages not in the cache. So if this was 1000, it will do 1000 queries for each of those.

LinkCache:addLinkObj is deprecated, and PageStore::getPageForLink is what we should use now.

I was confused here at first but I understand now that you went into these methods as a way to access the LinkCache internals from the outside, to try and do this without touching LinkCache, but that's indeed unsupported since it should be a blind cache that "just works" on the side.

LinkCache:addLinkObj even in its non-deprecated form doesn't help us here since it would not use a batch. It's not for this purpose. Idem for PageStore::getPageForLink.

The other finding is that ResourceLoader WikiModule generally involves two kinds of titles in its preload batch: NS_MEDIAWIKI and NS_USER. NS_MEDIAWIKI, such as for MediaWiki:Common.css qualifies for the persistent cache of LinkCache and should work as-is (well, after we add support for try-cache-first in LinkBatch). The NS_USER titles, such as User:Example/common.css do not qualify. We should probably not make these persistent cache for everything in LinkCache as that would pollute Memcached with too many unpopular items. However, the subset of titles used for user scripts and WikiModules might make sense to cache, since that's what we do today for resourceloader-titleinfo as well, and that seems to be working well. If we remove this, it means logged-in page views will always have an extra db lookup instead of having a chance at some short-term caching for repeat views.

I recommend validating how effective this is today. In the Grafana dashboard for WANObjectCache by Key group / resourceloader-titleinfo the cache hit rate is reported as 99%, but this might be due to site-wide batch dominating over the user-specific batch. We can temporary patch the cache key here to give the user-specific batch from OutputPage a slightly differnet cache key like resourceloader-titleinfo-user and see if that is effective at all.

If not, then we can transition that one to a normal uncached LinkBatch. If yes, then we may want to create a way to allow some NS_USER titles to be added with a persistent cache. For example, LinkBatch:add() could take an option to indicate persistent caching, which doQuery would then somehow use to override LinkCache::usePersistentCache.

Change #1178009 had a related patch set uploaded (by Hokwelum; author: Hokwelum):

[mediawiki/core@master] ResourceLoader: Temporily track cache usage of preloaded NS_USER title info

https://gerrit.wikimedia.org/r/1178009

Change #1178835 had a related patch set uploaded (by Hokwelum; author: Hokwelum):

[mediawiki/core@wmf/1.45.0-wmf.14] ResourceLoader: Temporily track cache usage of preloaded NS_USER title info

https://gerrit.wikimedia.org/r/1178835

Change #1178835 merged by jenkins-bot:

[mediawiki/core@wmf/1.45.0-wmf.14] ResourceLoader: Temporily track cache usage of preloaded NS_USER title info

https://gerrit.wikimedia.org/r/1178835

Mentioned in SAL (#wikimedia-operations) [2025-08-14T13:26:40Z] <lucaswerkmeister-wmde@deploy1003> Started scap sync-world: Backport for [[gerrit:1178835|ResourceLoader: Temporily track cache usage of preloaded NS_USER title info (T393835)]]

Mentioned in SAL (#wikimedia-operations) [2025-08-14T13:28:52Z] <lucaswerkmeister-wmde@deploy1003> lucaswerkmeister-wmde, hokwelum: Backport for [[gerrit:1178835|ResourceLoader: Temporily track cache usage of preloaded NS_USER title info (T393835)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-08-14T13:43:46Z] <lucaswerkmeister-wmde@deploy1003> Finished scap sync-world: Backport for [[gerrit:1178835|ResourceLoader: Temporily track cache usage of preloaded NS_USER title info (T393835)]] (duration: 17m 06s)

Hello @Krinkle, this is what it looks like for titleinfo-user cache hit.

We do have about 99% hit rate and close to 6k req/s for the last 24 hours.

titleinfo.png (1×2 px, 486 KB)

Tgr renamed this task from Explore removing WikModuleTitleInfo in ResourceLoader, in favour of standard LinkCache to Explore removing WikiModuleTitleInfo in ResourceLoader, in favour of standard LinkCache.Sep 8 2025, 2:25 PM

Change #1172700 had a related patch set uploaded (by Hokwelum; author: Hokwelum):

[mediawiki/core@master] ResourceLoader: Use Linkcache to store and get information of Wikimodule pages

https://gerrit.wikimedia.org/r/1172700

How I tested Foreign Wiki pages'. For subclasses with a foreign $this->getDB(), getting page info from Title will result to null as it looks at the local db.

I have two wikis, enwiki and igwiki sharing one codebase.

**Resources.php

'testing' => ['class' => TestModule::class ],

**TestModule.php

protected function getPages( Context $context ) {		
		$pages = [];
		if ( $this->getConfig()->get( MainConfigNames::UseSiteCss ) ) {
			$pages["MediaWiki:Testing.css"] = [ 'type' => 'style' ];
		}
		return $pages;
	}

/**
 * @return IDatabase
*/
protected function getDB() {
		$lbFactory = MediaWikiServices::getInstance()->getDBLoadBalancerFactory();

		$lb = $lbFactory->getMainLB( 'igwiki' );
		return $lb->getConnection( DB_REPLICA, [], 'igwiki' );
	}
public function getType() {
	return self::LOAD_STYLES;
}
public function getGroup() {
       return self::GROUP_SITE;
}

In OutputPage.php, add "testing" to $this->addModuleStyles( [...] ). On Igwiki, create MediaWiki:Testing.css page. With the previous patch, enwiki wouldn't pull any result from the DB, because it checks the current local db.

Change #1224863 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] ResourceLoader: Minor cleanup after LinkCache refactor

https://gerrit.wikimedia.org/r/1224863

Change #1172700 merged by jenkins-bot:

[mediawiki/core@master] ResourceLoader: Replace WikiModule title info cache with LinkCache

https://gerrit.wikimedia.org/r/1172700

Change #1237239 had a related patch set uploaded (by Hokwelum; author: Hokwelum):

[mediawiki/core@master] Resourceloader: follow-up to Id1722baeabcb4908e52c1a9aeb24263beb67577f

https://gerrit.wikimedia.org/r/1237239