Page MenuHomePhabricator

TranslateMetadata::get does full table scans unnecessarily
Closed, ResolvedPublic

Description

Even when visiting just a single translation unit page in action=edit, it does a full table scan on translate_metadata table. That is almost certainly unnecessary and probably has an adverse performance impact. It also probably won't scale in the long term. If there are places where it does need to do a full scan then that place should be using it, others like action=edit on single pages like this shouldn't do it.

Event Timeline

Originally the table was so small that it was easier and faster to just load everything instead of doing multiple queries. Most likely no longer so.

It could be replaced with simple caching layer that supports batching. Callers should be validated so that appropriate batching is done when required.

Nikerabbit triaged this task as Medium priority.Jun 25 2016, 1:04 PM

Some interesting query data:

SELECT tmd_key, COUNT(*) AS n, AVG(CHAR_LENGTH(tmd_value)) AS ave, SUM(CHAR_LENGTH(tmd_value)) AS total FROM translate_metadata GROUP BY tmd_key ORDER BY n DESC;
stdClass Object
(
    [tmd_key] => prioritylangs
    [n] => 17656
    [ave] => 0.8527
    [total] => 15056
)
stdClass Object
(
    [tmd_key] => maxid
    [n] => 6018
    [ave] => 1.5254
    [total] => 9180
)
stdClass Object
(
    [tmd_key] => priorityforce
    [n] => 278
    [ave] => 2.7518
    [total] => 765
)
stdClass Object
(
    [tmd_key] => priorityreason
    [n] => 278
    [ave] => 18.6151
    [total] => 5175
)
stdClass Object
(
    [tmd_key] => description
    [n] => 161
    [ave] => 33.3106
    [total] => 5363
)
stdClass Object
(
    [tmd_key] => name
    [n] => 161
    [ave] => 24.3043
    [total] => 3913
)
stdClass Object
(
    [tmd_key] => subgroups
    [n] => 161
    [ave] => 1609.7391
    [total] => 259168
)


> SELECT tmd_key, COUNT(*) AS n, AVG(CHAR_LENGTH(tmd_group)) AS ave, SUM(CHAR_LENGTH(tmd_group)) AS total FROM translate_metadata GROUP BY tmd_key ORDER BY n DESC;
stdClass Object
(
    [tmd_key] => prioritylangs
    [n] => 17656
    [ave] => 51.1121
    [total] => 902435
)
stdClass Object
(
    [tmd_key] => maxid
    [n] => 6018
    [ave] => 46.4762
    [total] => 279694
)
stdClass Object
(
    [tmd_key] => priorityforce
    [n] => 278
    [ave] => 50.7302
    [total] => 14103
)
stdClass Object
(
    [tmd_key] => priorityreason
    [n] => 278
    [ave] => 50.7302
    [total] => 14103
)
stdClass Object
(
    [tmd_key] => description
    [n] => 161
    [ave] => 27.4845
    [total] => 4425
)
stdClass Object
(
    [tmd_key] => name
    [n] => 161
    [ave] => 27.4845
    [total] => 4425
)
stdClass Object
(
    [tmd_key] => subgroups
    [n] => 161
    [ave] => 27.4845
    [total] => 4425
)

I'm not seeing a simple way to cache anything.

How many rows are typically needed in such a request? What about the number of TranslateMetadata::get calls too?

Change 494146 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/extensions/Translate@master] Avoid table scan translate_metadata queries by using batched preloading

https://gerrit.wikimedia.org/r/494146

Is this the same as T204026?

More or less.

Change 494146 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Avoid table scans of translate_metadata when possible

https://gerrit.wikimedia.org/r/494146

aaron claimed this task.

Some cases still tale scan, but those are pages or API modules that aggregate everything, so it's not avoidable there.