Even when visiting just a single translation unit page in action=edit, it does a full table scan on translate_metadata table. That is almost certainly unnecessary and probably has an adverse performance impact. It also probably won't scale in the long term. If there are places where it does need to do a full scan then that place should be using it, others like action=edit on single pages like this shouldn't do it.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Avoid table scans of translate_metadata when possible | mediawiki/extensions/Translate | master | +96 -53 |
Related Objects
Event Timeline
Originally the table was so small that it was easier and faster to just load everything instead of doing multiple queries. Most likely no longer so.
It could be replaced with simple caching layer that supports batching. Callers should be validated so that appropriate batching is done when required.
Some interesting query data:
SELECT tmd_key, COUNT(*) AS n, AVG(CHAR_LENGTH(tmd_value)) AS ave, SUM(CHAR_LENGTH(tmd_value)) AS total FROM translate_metadata GROUP BY tmd_key ORDER BY n DESC; stdClass Object ( [tmd_key] => prioritylangs [n] => 17656 [ave] => 0.8527 [total] => 15056 ) stdClass Object ( [tmd_key] => maxid [n] => 6018 [ave] => 1.5254 [total] => 9180 ) stdClass Object ( [tmd_key] => priorityforce [n] => 278 [ave] => 2.7518 [total] => 765 ) stdClass Object ( [tmd_key] => priorityreason [n] => 278 [ave] => 18.6151 [total] => 5175 ) stdClass Object ( [tmd_key] => description [n] => 161 [ave] => 33.3106 [total] => 5363 ) stdClass Object ( [tmd_key] => name [n] => 161 [ave] => 24.3043 [total] => 3913 ) stdClass Object ( [tmd_key] => subgroups [n] => 161 [ave] => 1609.7391 [total] => 259168 ) > SELECT tmd_key, COUNT(*) AS n, AVG(CHAR_LENGTH(tmd_group)) AS ave, SUM(CHAR_LENGTH(tmd_group)) AS total FROM translate_metadata GROUP BY tmd_key ORDER BY n DESC; stdClass Object ( [tmd_key] => prioritylangs [n] => 17656 [ave] => 51.1121 [total] => 902435 ) stdClass Object ( [tmd_key] => maxid [n] => 6018 [ave] => 46.4762 [total] => 279694 ) stdClass Object ( [tmd_key] => priorityforce [n] => 278 [ave] => 50.7302 [total] => 14103 ) stdClass Object ( [tmd_key] => priorityreason [n] => 278 [ave] => 50.7302 [total] => 14103 ) stdClass Object ( [tmd_key] => description [n] => 161 [ave] => 27.4845 [total] => 4425 ) stdClass Object ( [tmd_key] => name [n] => 161 [ave] => 27.4845 [total] => 4425 ) stdClass Object ( [tmd_key] => subgroups [n] => 161 [ave] => 27.4845 [total] => 4425 )
I'm not seeing a simple way to cache anything.
How many rows are typically needed in such a request? What about the number of TranslateMetadata::get calls too?
Change 494146 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/extensions/Translate@master] Avoid table scan translate_metadata queries by using batched preloading
Change 494146 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Avoid table scans of translate_metadata when possible
Some cases still tale scan, but those are pages or API modules that aggregate everything, so it's not avoidable there.