Page MenuHomePhabricator

Redesign ResourceLoader's file dependency tracking (module_deps)
Open, MediumPublic

Description

Status quo

Following T90001, we spawned T113092 for the msg_resource table. This task is for the module_deps table.

The population of the module_deps table is deterministic. It is currently stored in the main DB because we want high persistence, due to the high cost of regeneration.

The data is queried for thousands of modules at once from the "startup" module. Generating it all at once would be impossible within our desired HTTP response time (would take tens of seconds).

It's typically populated in a distributed fashion, e.g. from separate on-demand module requests for load.php.

We deal with absence by generating a temporary placeholder version hash. Then, after a user actually needed the module, and requests it with the temporary version hash, that request will do the in-depth computation and stores it in the database. From then on-wards, the "startup" module will contain the correct version hash.

This means that after a deployment, modules for which version hash computation is expensive, will first get invalidated to a temporary hash, and then invalidated again a few minutes later to the eventual one. This is a bit wasteful, but an intentional design decision for ResourceLoader. Improving or avoiding this aspect is outside the scope of this task.

Problem statement

Due to this data being stored in the main MySQL databases, it requires that load.php write rows to the table from GET requests. This is a performance and availability anti-pattern.

The objective is to store this data elsewhere, outside the databases. But ideally in a way that still upholds as much as possible the persistence.

Ideas

@Catrope and I had a brain-storm session last week (in context of T102578) and came up with the following known issues:

  • The table stores absolute paths which means when a wmf-branch roll over, it loses track of some files, thus causing a needless cache invalidation. Since old wmf branches are not immediately removed (in part because we have multiple versions in deployment at any one time), the old file paths are not obviously wrong. As such, the table can even end up including both old and new versions of the same file. This and more is tracked under T111481.
  • Lots of old data is left in module_deps from modules that no longer exist in recent versions of MediaWiki core and extensions, because there is no TTL and no garbage collection.

Also, since the values are deterministic, we do not need a store that is replicated across data centres. A dc-local store is sufficient.

Event Timeline

Krinkle created this task.Sep 27 2015, 11:25 PM
Krinkle raised the priority of this task from to Needs Triage.
Krinkle updated the task description. (Show Details)
Krinkle added subscribers: Krinkle, Catrope.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 27 2015, 11:25 PM
Krinkle triaged this task as Medium priority.Sep 30 2015, 12:14 AM
Krinkle set Security to None.
Krinkle moved this task from Inbox to Backlog: Small & Maintenance on the Performance-Team board.
Krinkle moved this task from Inbox to Backlog on the MediaWiki-ResourceLoader board.
Krinkle added a subscriber: ori.EditedNov 24 2015, 6:52 AM

Current aim is to continue to making module building perform faster with the intent of eventually enabling run-time "module content versioning" for all modules – in which case this tracking system becomes obsolete.

The recently-added caching layer for LESS complication (thanks to @ori) has brought us a much closer to making it possible to compute all 3000+ module's versions ad-hoc in the "startup" module. This progress makes me hopeful we'll be able to do this within a quarter or two.

Krinkle claimed this task.Dec 6 2016, 12:52 AM

(From Offsite) Using BagOStuff could work for this. It has an Sql subclass we can keep as default, but users can configure it to something else.

Krinkle removed Krinkle as the assignee of this task.Feb 8 2017, 6:08 PM
Krinkle updated the task description. (Show Details)Mar 3 2019, 4:10 PM
Krinkle assigned this task to aaron.Jun 10 2019, 1:45 PM

Change 519741 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] [WIP] resourceloader: move indirect module dependency path tracking to BagOStuff

https://gerrit.wikimedia.org/r/519741

Change 519746 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] objectcache: optmize lock() and unlock() for SqlBagOStuff and clean up base method

https://gerrit.wikimedia.org/r/519746

Change 519766 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] bagostuff: optimize SqlBagOStuff and fix failing segmentation tests

https://gerrit.wikimedia.org/r/519766

Change 520148 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] objectcache: clean up RedisBagOStuff and optimize changeTTLMulti()

https://gerrit.wikimedia.org/r/520148

Change 519746 merged by jenkins-bot:
[mediawiki/core@master] objectcache: optimize lock() and unlock() methods in SqlBagOStuff

https://gerrit.wikimedia.org/r/519746

Change 519766 merged by jenkins-bot:
[mediawiki/core@master] bagostuff: optimize SqlBagOStuff and fix failing segmentation tests

https://gerrit.wikimedia.org/r/519766

Change 520148 merged by jenkins-bot:
[mediawiki/core@master] objectcache: clean up RedisBagOStuff and optimize changeTTLMulti()

https://gerrit.wikimedia.org/r/520148