list=logevents slow for users with last log action long time ago
Open, NormalPublic

Description

Loading https://de.wikipedia.org/w/api.php?action=query&list=logevents&lelimit=1&leprop=timestamp&leuser=Gardini&format=jsonfm takes for me 114 seconds (try to append some bogus parameter if you hit the cache), and I have similar results for other users which are inactive for a long time. OTOH https://de.wikipedia.org/wiki/Spezial:Logbuch/Gardini is fast, though it provides even more information.


Version: 1.24rc
Severity: normal
URL: https://de.wikipedia.org/w/api.php?action=query&list=logevents&lelimit=1&leprop=timestamp&leuser=Gardini&format=jsonfm

Details

Reference
bz69222
bzimport raised the priority of this task from to Normal.
bzimport set Reference to bz69222.
Schnark created this task.Aug 7 2014, 10:23 AM
Anomie added a comment.EditedAug 7 2014, 12:35 PM

Sean, this seems to be a database issue. The database query here is:

SELECT log_id,log_type,log_action,log_timestamp,log_deleted  FROM `logging` LEFT JOIN `user` ON ((user_id=log_user)) LEFT JOIN `page` ON ((log_namespace=page_namespace) AND (log_title=page_title))  WHERE (log_type != 'suppress') AND log_user = '103726' AND ((log_deleted & 12) != 12)  ORDER BY log_timestamp DESC,log_id DESC LIMIT 2

This seems to run fine on db1026, db1021, and db1045; EXPLAIN says it's using the user_time index, which makes sense. On db1005, db1037, and db1049, EXPLAIN says it uses the times index and executes slowly.

db1005
stdClass Object
(
    [id] => 1
    [select_type] => SIMPLE
    [table] => logging
    [type] => index
    [possible_keys] => type_time,user_time,log_user_type_time
    [key] => times
    [key_len] => 16
    [ref] => 
    [rows] => 6716
    [Extra] => Using where
)
db1026
stdClass Object
(
    [id] => 1
    [select_type] => SIMPLE
    [table] => logging
    [type] => ref
    [possible_keys] => type_time,user_time,log_user_type_time
    [key] => user_time
    [key_len] => 4
    [ref] => const
    [rows] => 21108
    [Extra] => Using where
)
db1021
stdClass Object
(
    [id] => 1
    [select_type] => SIMPLE
    [table] => logging
    [type] => ref
    [possible_keys] => type_time,user_time,log_user_type_time
    [key] => user_time
    [key_len] => 4
    [ref] => const
    [rows] => 21108
    [Extra] => Using where
)
db1037
stdClass Object
(
    [id] => 1
    [select_type] => SIMPLE
    [table] => logging
    [type] => index
    [possible_keys] => type_time,user_time,log_user_type_time
    [key] => times
    [key_len] => 16
    [ref] => 
    [rows] => 6194
    [Extra] => Using where
)
db1045
stdClass Object
(
    [id] => 1
    [select_type] => SIMPLE
    [table] => logging
    [type] => ref
    [possible_keys] => type_time,user_time,log_user_type_time
    [key] => user_time
    [key_len] => 4
    [ref] => const
    [rows] => 21108
    [Extra] => Using where
)
db1049
stdClass Object
(
    [id] => 1
    [select_type] => SIMPLE
    [table] => logging
    [type] => index
    [possible_keys] => type_time,user_time,log_user_type_time
    [key] => times
    [key_len] => 16
    [ref] => 
    [rows] => 6497
    [Extra] => Using where
)
Anomie moved this task from Unsorted to Non-Code on the MediaWiki-API board.Feb 19 2015, 6:33 PM

Now I get an error for the query (and for other similar [= where a user has been inactive for a long time] queries):

{
    "servedby": "mw1203",
    "error": {
        "code": "internal_api_error_DBQueryError",
        "info": "[c0f53a23] Database query error"
    }
}

That's because a database timeout was added at some point, to kill such long-running queries rather than allowing them to continue to consume resources.

Krenair added a subscriber: Krenair.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 4 2015, 2:54 AM
Krenair removed Springle as the assignee of this task.Oct 4 2015, 2:54 AM

Has anybody fixed this recently? The above query and similar ones are now much faster than they were some days ago. I regularly test my user scripts, one of them calling the API with the above query. While this was consistently slow ever since I reported the issue, starting last week (still before the update to 1.28.0-wmf.2) it became fast.

Anomie added a subscriber: jcrespo.May 23 2016, 4:57 PM

Nothing was done on the PHP side. I see @jcrespo recently (7d2f575) switched s5 API queries from db1045 (which currently chooses the wrong index for the query, according to EXPLAIN) to mostly db1070 (which currently chooses the right index). db1071 (the backup api node) is currently choosing the wrong index, though.

Poking at db1045 and db1071 a bit more, even simple queries like SELECT * FROM logging WHERE log_user = 103726 ORDER BY log_timestamp LIMIT 10 that seem like a perfect fit for the user_time index aren't using it.

Volans added a subscriber: Volans.May 23 2016, 5:11 PM

@jcrespo I took a quick look and db1070 has few indexes less than db1071:

KEY `log_user_type_time` (`log_user`,`log_type`,`log_timestamp`),
KEY `log_title_type_time` (`log_title`(16),`log_type`,`log_timestamp`)

And the explain on db1071 uses log_user_type_time instead of user_time for the query @Anomie reported. This causes it to use filesort for the sorting instead of the index.

The above commit was done due to the loss of an s5 server, that got fried. I reorganized the servers among the ones that made more sense, if this fixes things we need to understand why (it could be the MariaDB 10 upgrade, that is now exposed to certain amount of new queries, or it could be the special partitioning).

The problem with this is that for each query that is fixed, others are "broken", so we need to have a more scientific method for this- I will try to add some monitoring in the future.

As a short term thing, I will try to understand the differences between db1045 and db1071 an the others (probably it is the partitioning needed for other queries to go fast).