Summary
The query to cu_changes in CheckUserTemporaryAccountsByIPLookup::getTempAccountsFromIPAddress isn't efficient and can trigger TransactionProfiler warnings. We should look at how this query performance can be improved, as it is called when viewing Special:Contributions and Special:IPContributions
Background
- Logstash is reporting that the query performed by CheckUserTemporaryAccountsByIPLookup::getTempAccountsFromIPAddress isn't efficient, as it has triggered TransactionProfiler warnings
- For example, this logstash log says that the query took 6.7 seconds to execute
- Pasting that query into production English Wikipedia and running an EXPLAIN produces the following
- The "Using filesort" suggests that the query isn't efficient enough
describe SELECT actor_name,MAX(cuc_timestamp) AS `timestamp` FROM `cu_changes` JOIN `actor` ON ((actor_id=cuc_actor)) WHERE (actor_name LIKE '~2%' ESCAPE '`') AND ((cuc_ip_hex >= 'v6-X' AND cuc_ip_hex <= 'v6-X')) GROUP BY actor_name ORDER BY timestamp DESC LIMIT 100; +------+-------------+------------+--------+---------------------------------------------------------+-----------------+---------+-----------------------------+-------+--------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+------------+--------+---------------------------------------------------------+-----------------+---------+-----------------------------+-------+--------------------------------------------------------+ | 1 | SIMPLE | cu_changes | range | cuc_ip_hex_time,cuc_actor_ip_time,cuc_actor_ip_hex_time | cuc_ip_hex_time | 258 | NULL | 74040 | Using index condition; Using temporary; Using filesort | | 1 | SIMPLE | actor | eq_ref | PRIMARY,actor_name | PRIMARY | 8 | enwiki.cu_changes.cuc_actor | 1 | Using where | +------+-------------+------------+--------+---------------------------------------------------------+-----------------+---------+-----------------------------+-------+--------------------------------------------------------+
- Actually running the query on production showed that the query took 2.2 seconds to run
- Repeating the query again makes it run in 0.2 seconds, which suggests that the replica MariaDB instance is storing some kind of data that caches the expensive part of the operation
- However, I could not get the query to run slowly on the special page on production English Wikipedia
- Given the above point, it's possible the replica DBs are storing some kind of data to make the query faster over multiple calls
Acceptance criteria
- The query to cu_changes is more efficient, so that TransctionProfiler warnings are not created and ideally that the EXPLAIN does not indicate that filesorts are being used

