Page MenuHomePhabricator

Phase out SqlModuleDependencyStore
Open, MediumPublic

Description

Background

Context: https://www.mediawiki.org/wiki/ResourceLoader/Architecture

Each module has a version hash. The startup module ships a registry of all module names and their hashes to the browser client. The server builds this registry in a single response (for the "startup" module). Given ~1000 modules, this leaves relatively little time for each module. As such, we employ broadly two mechanisms today to speed this up (compared to the naive approach of building all 1000 modules serially and then hashing their output):

  1. When responding with the startup registry, instead of fully building each module into a temporary variable and hashing its content, we instead take only the list of CSS and JS input file paths, hash the MD4 hashes of each file. The build step is designed to be deterministic. A software-defined version constant is hashed in to account for changes (e.g. change bundle format, update minifier, update Less compiler, etc.).
  1. To account for module output variance by indirect means, we wait for the first request for the actual module, take note of the files read, and the next time the startup registry is build, the version hash will take those into account. Thus after a deployment, some modules take one extra 5-minute cache window before reaching their final cache key. The important thing is that caches are always invalidated within 5 minutes. Invalidating one extra time before settling on a cache key is fairly harmless.

Examples of indirect file references:

  • A CSS stylesheet referencing a versioned SVG image. When the SVG file changes, the URL will change from background: url(foo.svg?11111) to background: url(foo.svg?22222). This means the stylesheet output has changed without the CSS file changing.
  • A Less stylesheet, when compiled to CSS, is generaly quite short with most of its code being imported from other .less files, whcih are then eventually flattened to a single large CSS file.

Status quo

When this feature first launched, we wrote this metadata into the core MediaWIki tables (the module_deps table). This was problematic at WMF scale for several reasons, including violating the best practice of not performing writes to the core DB cluster on GET requests, and being incompatible with Multi-DC. Details at T113916: Switch ResourceLoader file dependency tracking to MultiDC-friendly backend.

Short story: We introduced the MainStash service based on BagOStuff, and created a new DependencyStore implementation that uses this instead.

This is significantly faster for small wikis using sqlite as well, as it allows writes to go to a separate non-conflicing db file rather than the main one which is likely locked.

To do

  • Turn on wgResourceLoaderUseObjectCacheForDeps in MediaWiki 1.41 by default. (It is already turned on for Wikipedia at WMF.)
  • Remove the wgResourceLoaderUseObjectCacheForDeps setting in MediaWIki 1.42.
  • Remove the SqlModuleDependencyStore class.
  • Simplify DependencyStore by consolidating it with the KeyValueDependencyStore class.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Turn on wgResourceLoaderUseObjectCacheForDeps in MediaWiki 1.41 by default. (It is already turned on for Wikipedia at WMF.)

The branch cut is rather soon. Wanna make it a blocker for the release?

This is scheduled for FY2024-25 Q1 (Jul-Sep).