Page MenuHomePhabricator

list=logevents slow or times out for users with many log entries but no recent activity
Open, LowPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:
Query either takes 20+ seconds to resolve, or it times out with an internal_api_error_DBQueryTimeoutError error.

What should have happened instead?:
Query should resolve in a few seconds, like https://en.wikipedia.org/w/index.php?title=Special:Log/ST47&limit=1 does.

Other information (browser name/version, screenshots, etc.):
Raised at https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Logevents_API_slow_for_inactive_users%3F

Per @Cryptic and @Anomie in that VPT thread, while Special:Log uses the log_actor_time index, API queries (as well as quarry queries such as https://quarry.wmcloud.org/query/97383) are just searching through the log_times index. Adding an letype, e.g. letype=block, does seem to force it to use the log_actor_type_time index which returns results almost instantly.

T71222 was reporting this same issue back in 2019, but the workaround that resolved that bug was undone in 2021 in https://gerrit.wikimedia.org/r/q/I9198c29a. T303089 reported the same issue for Special:Log, but that fix didn't seem to apply to API or quarry queries.

Event Timeline

Soon we will upgrade s1 to 10.11 (T394371) maybe it's fixed in that version. The reason the type hint removed and then added was very likely similar, after upgrade mariadb could have picked the right index, etc. These things change a lot based on mariadb versions and table stats.

These things change a lot based on mariadb versions and table stats.

In other words, it wasn't really fixed when the index hint was removed, someone just saw it working once and jumped the gun. Then it was added back in one place but not the other because the people involved in that didn't think there might be another place.

Soon we will upgrade s1 to 10.11 (T394371) maybe it's fixed in that version.

That seems overly optimistic given the history here. More likely is that it might work if the table stats are just right after the upgrade, then break again later on.

Soon we will upgrade s1 to 10.11 (T394371) maybe it's fixed in that version. The reason the type hint removed and then added was very likely similar, after upgrade mariadb could have picked the right index, etc. These things change a lot based on mariadb versions and table stats.

There are a few hosts running 10.11 on s1 already - so if there's a way for you to test on those we can check that.

Atieno triaged this task as Low priority.Oct 7 2025, 2:58 PM

I've tried a few times the URLs on the task description but they seem to work to me, any chances anyone has the query handy so I can test on the prompt?

@Anomie gives a query against the real tables in T71222#742492, though that's now very out of date (actor table, at minimum).

But perhaps an example with a longer time would be more illustrative?

https://en.wikipedia.org/w/api.php?action=query&list=logevents&leprop=timestamp&ledir=older&leuser=DGG&lelimit=1 times out after 30 seconds. Similar query with on toolforge with the redacted views:

MariaDB [enwiki_p]> SELECT * FROM logging_userindex JOIN actor_logging ON actor_id = log_actor WHERE actor_name = 'DGG' ORDER BY log_timestamp DESC LIMIT 1\G
*************************** 1. row ***************************
        log_id: 144862893
      log_type: patrol
    log_action: patrol
 log_timestamp: 20230329153142
     log_actor: 9406
 log_namespace: 118
     log_title: Weizenbaum_Institute_for_the_Networked_Society_-_The_German_Internet_Institute
log_comment_id: 10
    log_params: a:3:{s:8:"4::curid";s:10:"1147057086";s:9:"5::previd";s:1:"0";s:7:"6::auto";i:0;}
   log_deleted: 0
      log_page: 73412827
      actor_id: 9406
    actor_user: 2157954
    actor_name: DGG
1 row in set (1 min 5.702 sec)

Concurrent explain:

MariaDB [enwiki_p]> SHOW EXPLAIN FOR 49146868\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: actor
         type: const
possible_keys: PRIMARY,actor_name
          key: actor_name
      key_len: 257
          ref: const
         rows: 1
        Extra:
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: logging
         type: index
possible_keys: log_actor_type_time,log_type_action,log_type_time,log_actor_time,log_actor_deleted
          key: log_times
      key_len: 14
          ref: NULL
         rows: 112457196
        Extra: Using where
*************************** 3. row ***************************
           id: 4
  select_type: SUBQUERY
        table: logging
         type: range
possible_keys: log_actor_type_time,log_type_action,log_type_time,log_actor_time,log_actor_deleted
          key: log_actor_type_time
      key_len: 42
          ref: NULL
         rows: 165187
        Extra: Using index condition; Using where
3 rows in set, 1 warning (0.002 sec)

On-wiki Special:Log, and the same API query with &letype=patrol added, finish practically instantly.

Yeah, that URL gives the timeout and the query is (this is MariaDB 10.11) - our current production version:

SELECT  log_id,log_type,log_action,log_timestamp,log_deleted  FROM `logging` JOIN `actor` ON ((actor_id=log_actor))   WHERE (log_type NOT IN ('spamblacklist','titleblacklist','urlshortener','abusefilterprivatedetails','abusefilterblockeddomainhit','abusefilter-protected-vars','oath','checkuser-temporary-account','checkuser-private-event','ipinfo','suppress')) AND actor_name = 'DGG' AND ((log_deleted & 4) != 4)  ORDER BY log_timestamp DESC,log_id DESC LIMIT 2

cumin2024@db1169.eqiad.wmnet[enwiki]> show explain for 2640969;
+------+-------------+---------+-------+------------------------------------------------------------------+------------+---------+-------+-----------+-------------+
| id   | select_type | table   | type  | possible_keys                                                    | key        | key_len | ref   | rows      | Extra       |
+------+-------------+---------+-------+------------------------------------------------------------------+------------+---------+-------+-----------+-------------+
|    1 | SIMPLE      | actor   | const | PRIMARY,actor_name                                               | actor_name | 257     | const |         1 | Using index |
|    1 | SIMPLE      | logging | index | log_actor_type_time,log_type_action,log_type_time,log_actor_time | log_times  | 14      | NULL  | 112332602 | Using where |
+------+-------------+---------+-------+------------------------------------------------------------------+------------+---------+-------+-----------+-------------+
2 rows in set, 1 warning (0.001 sec)

the explain is so optimistic:

cumin2024@db1169.eqiad.wmnet[enwiki]> explain SELECT  log_id,log_type,log_action,log_timestamp,log_deleted  FROM `logging` JOIN `actor` ON ((actor_id=log_actor))   WHERE (log_type NOT IN ('spamblacklist','titleblacklist','urlshortener','abusefilterprivatedetails','abusefilterblockeddomainhit','abusefilter-protected-vars','oath','checkuser-temporary-account','checkuser-private-event','ipinfo','suppress')) AND actor_name = 'DGG' AND ((log_deleted & 4) != 4)  ORDER BY log_timestamp DESC,log_id DESC LIMIT 2;
+------+-------------+---------+-------+------------------------------------------------------------------+------------+---------+-------+------+-------------+
| id   | select_type | table   | type  | possible_keys                                                    | key        | key_len | ref   | rows | Extra       |
+------+-------------+---------+-------+------------------------------------------------------------------+------------+---------+-------+------+-------------+
|    1 | SIMPLE      | actor   | const | PRIMARY,actor_name                                               | actor_name | 257     | const | 1    | Using index |
|    1 | SIMPLE      | logging | index | log_actor_type_time,log_type_action,log_type_time,log_actor_time | log_times  | 14      | NULL  | 1281 | Using where |
+------+-------------+---------+-------+------------------------------------------------------------------+------------+---------+-------+------+-------------+
2 rows in set (0.003 sec)

Ignoring the index definitely makes the query fly vs my manual query runtime of 2mins:

cumin2024@db1169.eqiad.wmnet[enwiki]> SELECT  log_id,log_type,log_action,log_timestamp,log_deleted  FROM `logging` ignore index (log_times) JOIN `actor` ON ((actor_id=log_actor))   WHERE (log_type NOT IN ('spamblacklist','titleblacklist','urlshortener','abusefilterprivatedetails','abusefilterblockeddomainhit','abusefilter-protected-vars','oath','checkuser-temporary-account','checkuser-private-event','ipinfo','suppress')) AND actor_name = 'DGG' AND ((log_deleted & 4) != 4)  ORDER BY log_timestamp DESC,log_id DESC LIMIT 2
    -> ;
+-----------+----------+------------+----------------+-------------+
| log_id    | log_type | log_action | log_timestamp  | log_deleted |
+-----------+----------+------------+----------------+-------------+
| 144862893 | patrol   | patrol     | 20230329153142 |           0 |
| 144862891 | move     | move       | 20230329153140 |           0 |
+-----------+----------+------------+----------------+-------------+
2 rows in set (0.002 sec)

On the MW side of things I don't know how painful is to revert (and maintain) https://gerrit.wikimedia.org/r/c/mediawiki/core/+/700515 vs how bad is having this feature not working. I will leave that to @Ladsgroup.

Wow, the stats are so out of date, no wonder why the optimizer is going nuts - let's see how this works with a refresh on the stats. :

+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| enwiki        | logging    | log_times  | 2022-05-26 17:07:07 | n_diff_pfx01 |   87385704 |          20 | log_timestamp                     |
| enwiki        | logging    | log_times  | 2022-05-26 17:07:07 | n_diff_pfx02 |  112955046 |          20 | log_timestamp,log_id              |
| enwiki        | logging    | log_times  | 2022-05-26 17:07:07 | n_leaf_pages |     167888 |        NULL | Number of leaf pages in the index |
| enwiki        | logging    | log_times  | 2022-05-26 17:07:07 | size         |     192064 |        NULL | Number of pages in the index      |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
4 rows in set (0.003 sec)

Even after refreshing the stats, the optimizer seems to be choosing the wrong plan. I've filed a MariaDB bug (https://jira.mariadb.org/browse/MDEV-38072) but I believe we should reconsider adding the ignore hint back to the code if this feature is important and worth the possible pain of maintaining it in code as I've mentioned at T405476#11363865