The population of the module_deps table is deterministic. It is currently stored in the main DB because we want high persistence, due to the high cost of regeneration.
The data is queried for thousands of modules at once from the "startup" module. Generating it all at once would be impossible within our desired HTTP response time (would take tens of seconds).
It's typically populated in a distributed fashion, e.g. from separate on-demand module requests for load.php.
We deal with absence by generating a temporary placeholder version hash. Then, after a user actually needed the module, and requests it with the temporary version hash, that request will do the in-depth computation and stores it in the database. From then on-wards, the "startup" module will contain the correct version hash.
This means that after a deployment, modules for which version hash computation is expensive, will first get invalidated to a temporary hash, and then invalidated again a few minutes later to the eventual one. This is a bit wasteful, but an intentional design decision for ResourceLoader. Improving or avoiding this aspect is outside the scope of this task.
Due to this data being stored in the main MySQL databases, it requires that load.php write rows to the table from GET requests. This is a performance and availability anti-pattern.
The objective is to store this data elsewhere, outside the databases. But ideally in a way that still upholds as much as possible the persistence.
- The table stores absolute paths which means when a wmf-branch roll over, it loses track of some files, thus causing a needless cache invalidation. Since old wmf branches are not immediately removed (in part because we have multiple versions in deployment at any one time), the old file paths are not obviously wrong. As such, the table can even end up including both old and new versions of the same file. This and more is tracked under T111481.
- Lots of old data is left in module_deps from modules that no longer exist in recent versions of MediaWiki core and extensions, because there is no TTL and no garbage collection.
Also, since the values are deterministic, we do not need a store that is replicated across data centres. A dc-local store is sufficient.