Page MenuHomePhabricator

Review special replica partitioning of certain tables by `xx_user`
Open, NormalPublic

Description

I came across the fact today that the logging table is partitioned by log_user on certain replicas. Reasonably soon that column will be dropped as part of T188327: Deploy refactored actor storage.

As of this morning we shouldn't have any queries reading log_user anymore (they're using log_actor instead), and the writes will be being turned off over the next few weeks (leaving the column with its default 0 in new rows). If the turning off of the writes goes well, the task for dropping log_user will hopefully be filed in early June.

I see we have paritioning for revision and rev_user on some replicas as well. That table is in the a worse situation with respect to reads (queries formerly targeting rev_user are now targeting the revision_actor_temp table) and is the same for writes. The final dropping of rev_user will not be in June; I have no timeframe for T215466 yet.

If we have other core tables partitioned by an xx_user column, they're likely in the same situation as logging.

I have no idea whether we should partition by something else (log_actor?) or just remove the partitioning. The reasoning for separate replicas in 085e3563ed may still apply even without the partitioning.

Note this task doesn't necessarily require that anything be done now. I just wanted to raise the issue with some lead time before the schema change request.

Event Timeline

Anomie created this task.May 13 2019, 5:50 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 13 2019, 5:50 PM
Marostegui moved this task from Triage to In progress on the DBA board.May 13 2019, 7:58 PM
Marostegui added a subscriber: Marostegui.

Thanks for raising this.
Keep in mind that revision or logging are partitioned, as you pointed out, only on the special slaves (those on the recentchanges, watchlist etc groups). But to make it even more interesting, they are only partitioned on certain wikis (big ones normally). For instance it is partitioned on enwiki but not on small wikis like the ones on s3 as those are small enough not to run into problems. What is the baseline to decide which ones need it and which ones don't, is probably (and this is a guess, as I wasn't here when it was decided) query performance under a certain number of rows.
And if that wasn't enough, there are some wikis that have less partitions, for example revision on enwiki vs eswiki.

And of course, those special slaves also have different PKs for revision and ' logging` too.

I am not sure if we should partition something by log_actor but probably worth checking the query performance for the queries that will be arriving to those special slaves (which I guess it was the case for 085e3563ed at the time)

@Marostegui moved this task from In progress to Blocked external/Not db team on the DBA board.

I guess we should ask Performance-Team then?

@Marostegui moved this task from In progress to Blocked external/Not db team on the DBA board.

I guess we should ask Performance-Team then?

I moved it into that column of the DBA dashboard cause we have already given our opinion and we are blocked on a decision to see how/when we can move forward.

So, I guess we'd need to decide what to do with the logging table on the special replicas as we are already at T233135: Schema change for refactored actor and comment storage.
While 085e3563ed0c might still apply, it is also true that we have more powerful hardware (more buffer pool) and specially faster storage SSD across the fleet.
Removing the partitions isn't the only issue, the problem is that we have to also modify the PK of logging table as currently it is:

PRIMARY KEY (`log_id`,`log_user`),

That can lead to all sort of issues with the optimizer, but not necessarily bad anymore. Maybe what we can try is to place a normal (non partitioned replica, and with a log_id PK) into enwiki logging section and see how it performs.

I will try to place a non-partitioned host tomorrow on the logging enwiki section for some time and enable the slow log to see what I get for the logging table so we can see how it would affect and if we need to partition by something else or change the PK.

Marostegui triaged this task as Normal priority.Wed, Sep 18, 3:19 PM

Two options- the partitioning is no longer needed due to the new schema (preferred) or b) the partitioning is still needed, partition by actor id. Check on a large enwiki host with low weight.

Marostegui added a comment.EditedThu, Sep 19, 6:00 AM

I have captured 30 minutes of read queries on db1105 (rc on enwiki) and replayed them into db1089 (non partitioned enwiki slave) and nothing has sticked out. Specially nothing related to the logging table.
So I am going to pool db1089 on the logging service for a few hours and see what we get leaving it there for a few hours.

Mentioned in SAL (#wikimedia-operations) [2019-09-19T06:04:41Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Temporarily pool db1089 into enwiki logpager T223151', diff saved to https://phabricator.wikimedia.org/P9130 and previous config saved to /var/cache/conftool/dbconfig/20190919-060440-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2019-09-19T07:22:35Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Give more logpager weight to db1089 T223151', diff saved to https://phabricator.wikimedia.org/P9131 and previous config saved to /var/cache/conftool/dbconfig/20190919-072234-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2019-09-19T13:08:49Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Pool db1089 into contributions service T223151', diff saved to https://phabricator.wikimedia.org/P9133 and previous config saved to /var/cache/conftool/dbconfig/20190919-130848-marostegui.json

db1089 has been serving pretty almost 8h now the logpager service handling 50% of its traffic:

"logpager": {
   "db1089": 2,
   "db1099:3311": 1,
   "db1105:3311": 1

There has been no slow queries reported, nothing regarding the query response time has changed as well, and finally, no trace of errors on the logs. So that is looking good. I also found no issues when replaying the traffic captured on db1105 (T223151#5505180).

I have now also included it on the contributions service as well and I will leave it like that until tomorrow.

"contributions": {
  "db1089": 1,
  "db1099:3311": 1,
  "db1105:3311": 1

If everything keeps looking good, I will go ahead and alter one of the existing special slaves and remove its partitioning and change its PK from:
PRIMARY KEY (log_id,log_user),

To:

`PRIMARY KEY (log_id),`

I would create a subtask to track that schema change to do it before (or along with) T233135: Schema change for refactored actor and comment storage

In order to give more context, if partitioning was still needed, query timeouts/long running happen almost immediately for queries like recentchanges/user contributions. I believe specially for enwiki, commonswiki and wikidata. Refactoring, index hints, mariadb version and schema changes may have made it obsolete (that would be great news, 14 special servers less to maintain) ?

After leaving db1089 all night this is the only query reported involving logging table that has some differences:

SELECT /* IndexPager::buildQueryInfo (LogPager) */ /*! STRAIGHT_JOIN */ log_id, log_type, log_action, log_timestamp, log_namespace, log_title, log_params, log_deleted, user_id, user_name, user_editcount, comment_log_comment.comment_text AS `log_comment_text`, comment_log_comment.comment_data AS `log_comment_data`, comment_log_comment.comment_id AS `log_comment_cid`, actor_log_user.actor_user AS `log_user`, actor_log_user.actor_name AS `log_user_text`, log_actor, (SELECT GROUP_CONCAT(ctd_name SEPARATOR ', ') FROM `change_tag` JOIN `change_tag_def` ON ((ct_tag_id=ctd_id)) WHERE ct_log_id=log_id ) AS `ts_tags` FROM `logging` JOIN `comment` `comment_log_comment` ON ((comment_log_comment.comment_id = log_comment_id)) JOIN `actor` `actor_log_user` ON ((actor_log_user.actor_id = log_actor)) LEFT JOIN `user` ON ((user_id=actor_log_user.actor_user)) WHERE (log_type NOT IN ('xx', 'xx', 'xx', 'xx', 'xx')) AND ((log_actor = 'xx')) AND ((log_deleted & 4) = 0) AND (log_type != 'review') AND (log_type != 'xx') AND (log_type != 'xx') AND (log_type != 'xx') ORDER BY log_timestamp DESC LIMIT 51;

This query on db1089 (non partitioned) has a different plan from the one on db1105 (partioned). However, none of the differences affect the log_user field or PK, it is just the times vs actor_time index.
db1105. The query on db1089 takes 8 seconds and on db1105 it takes 0.01 seconds.
These are the explains

db1105

+------+--------------------+---------------------+--------+----------------------------------------------------------+-----------------------+---------+----------------------------------+------+-------------+
| id   | select_type        | table               | type   | possible_keys                                            | key                   | key_len | ref                              | rows | Extra       |
+------+--------------------+---------------------+--------+----------------------------------------------------------+-----------------------+---------+----------------------------------+------+-------------+
|    1 | PRIMARY            | logging             | ref    | type_time,actor_time,log_actor_type_time,log_type_action | actor_time            | 8       | const                            |   10 | Using where |
|    1 | PRIMARY            | comment_log_comment | eq_ref | PRIMARY                                                  | PRIMARY               | 8       | enwiki.logging.log_comment_id    |    1 |             |
|    1 | PRIMARY            | actor_log_user      | const  | PRIMARY                                                  | PRIMARY               | 8       | const                            |    1 |             |
|    1 | PRIMARY            | user                | eq_ref | PRIMARY                                                  | PRIMARY               | 4       | enwiki.actor_log_user.actor_user |    1 | Using where |
|    2 | DEPENDENT SUBQUERY | change_tag          | ref    | change_tag_log_tag_id,change_tag_tag_id_id               | change_tag_log_tag_id | 5       | enwiki.logging.log_id            |   15 | Using index |
|    2 | DEPENDENT SUBQUERY | change_tag_def      | eq_ref | PRIMARY                                                  | PRIMARY               | 4       | enwiki.change_tag.ct_tag_id      |    1 |             |
+------+--------------------+---------------------+--------+----------------------------------------------------------+-----------------------+---------+----------------------------------+------+-------------+
6 rows in set (0.00 sec)

db1089

+------+--------------------+---------------------+--------+----------------------------------------------------------+-----------------------+---------+----------------------------------+-------+-------------+
| id   | select_type        | table               | type   | possible_keys                                            | key                   | key_len | ref                              | rows  | Extra       |
+------+--------------------+---------------------+--------+----------------------------------------------------------+-----------------------+---------+----------------------------------+-------+-------------+
|    1 | PRIMARY            | logging             | index  | type_time,actor_time,log_actor_type_time,log_type_action | times                 | 16      | NULL                             | 58752 | Using where |
|    1 | PRIMARY            | comment_log_comment | eq_ref | PRIMARY                                                  | PRIMARY               | 8       | enwiki.logging.log_comment_id    |     1 |             |
|    1 | PRIMARY            | actor_log_user      | const  | PRIMARY                                                  | PRIMARY               | 8       | const                            |     1 |             |
|    1 | PRIMARY            | user                | eq_ref | PRIMARY                                                  | PRIMARY               | 4       | enwiki.actor_log_user.actor_user |     1 | Using where |
|    2 | DEPENDENT SUBQUERY | change_tag          | ref    | change_tag_log_tag_id,change_tag_tag_id_id               | change_tag_log_tag_id | 5       | enwiki.logging.log_id            |    23 | Using index |
|    2 | DEPENDENT SUBQUERY | change_tag_def      | eq_ref | PRIMARY                                                  | PRIMARY               | 4       | enwiki.change_tag.ct_tag_id      |     1 |             |
+------+--------------------+---------------------+--------+----------------------------------------------------------+-----------------------+---------+----------------------------------+-------+-------------+
6 rows in set (0.00 sec)

This same query on a codfw host (non partitioned) uses the log_actor_type_time index, which could mean the optimizer is being dumb again in some cases.

db2102:

+------+--------------------+---------------------+--------+----------------------------------------------------------+-----------------------+---------+----------------------------------+-------+----------------------------------------------------+
| id   | select_type        | table               | type   | possible_keys                                            | key                   | key_len | ref                              | rows  | Extra                                              |
+------+--------------------+---------------------+--------+----------------------------------------------------------+-----------------------+---------+----------------------------------+-------+----------------------------------------------------+
|    1 | PRIMARY            | logging             | range  | type_time,actor_time,log_actor_type_time,log_type_action | log_actor_type_time   | 42      | NULL                             | 56299 | Using index condition; Using where; Using filesort |
|    1 | PRIMARY            | comment_log_comment | eq_ref | PRIMARY                                                  | PRIMARY               | 8       | enwiki.logging.log_comment_id    |     1 |                                                    |
|    1 | PRIMARY            | actor_log_user      | const  | PRIMARY                                                  | PRIMARY               | 8       | const                            |     1 |                                                    |
|    1 | PRIMARY            | user                | eq_ref | PRIMARY                                                  | PRIMARY               | 4       | enwiki.actor_log_user.actor_user |     1 | Using where                                        |
|    2 | DEPENDENT SUBQUERY | change_tag          | ref    | change_tag_log_tag_id,change_tag_tag_id_id               | change_tag_log_tag_id | 5       | enwiki.logging.log_id            |    14 | Using index                                        |
|    2 | DEPENDENT SUBQUERY | change_tag_def      | eq_ref | PRIMARY                                                  | PRIMARY               | 4       | enwiki.change_tag.ct_tag_id      |     1 |                                                    |
+------+--------------------+---------------------+--------+----------------------------------------------------------+-----------------------+---------+----------------------------------+-------+----------------------------------------------------+
6 rows in set (0.04 sec)

This could be a matter of refreshing the stats on that table anyways. Which is what I will try next, first on db2102 and later on db1089.

Mentioned in SAL (#wikimedia-operations) [2019-09-20T05:27:27Z] <marostegui> Analyze table enwiki.logging on db2102 - T223151

Mentioned in SAL (#wikimedia-operations) [2019-09-20T05:29:04Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Remove db1089 from logpager and contributions after testing, repool back with normal weight on main traffic T223151', diff saved to https://phabricator.wikimedia.org/P9136 and previous config saved to /var/cache/conftool/dbconfig/20190920-052902-marostegui.json

So, running the analyze didn't change things on db2102, it kept choosing log_actor_type_time
However an alter table engine=InnoDB,force made the optimizer to choose times index on that host, which is even a worse plan, the optimizer being dumb again!.

So I don't think running the analyze on db1089 will change things, but I can try next week.

On db1089 the optimizer is being definitely dumb as well, this is the real scan it does:

`
root@db1089.eqiad.wmnet[enwiki]> show explain for 7320686887;
+------+--------------------+---------------------+--------+--------------------------------------------+-----------------------+---------+-------------------------------+----------+----------------+
| id   | select_type        | table               | type   | possible_keys                              | key                   | key_len | ref                           | rows     | Extra          |
+------+--------------------+---------------------+--------+--------------------------------------------+-----------------------+---------+-------------------------------+----------+----------------+
|    1 | PRIMARY            | actor_log_user      | const  | PRIMARY                                    | PRIMARY               | 8       | const                         |        1 | Using filesort |
|    1 | PRIMARY            | user                | const  | PRIMARY                                    | PRIMARY               | 4       | const                         |        1 |                |
|    1 | PRIMARY            | logging             | index  | NULL                                       | times                 | 16      | NULL                          | 78474403 | Using where    |
|    1 | PRIMARY            | comment_log_comment | eq_ref | PRIMARY                                    | PRIMARY               | 8       | enwiki.logging.log_comment_id |        1 |                |
|    2 | DEPENDENT SUBQUERY | change_tag          | ref    | change_tag_log_tag_id,change_tag_tag_id_id | change_tag_log_tag_id | 5       | enwiki.logging.log_id         |       23 | Using index    |
|    2 | DEPENDENT SUBQUERY | change_tag_def      | eq_ref | PRIMARY                                    | PRIMARY               | 4       | enwiki.change_tag.ct_tag_id   |        1 |                |
+------+--------------------+---------------------+--------+--------------------------------------------+-----------------------+---------+-------------------------------+----------+----------------+
6 rows in set, 1 warning (0.00 sec)

If hinted to use actor_times INDEX the query runs in 0.01 on db1089 like in db1105.

This query seems to be choosing between three indexes depending on the host: times (bad one), actor_time (good one), log_actor_type_time (which doesn't seem as bad as times and the query runs in 0.01 too).
@Anomie should we try to hint the optimizer to choose actor_times or IGNORE times for this query, given that either actor_times or log_actor_type_time makes the query run faster? That would allow us to get rid of the partitioning and the different PK.

Host and indexes used:

db2130.codfw.wmnet:3306
log_actor_type_time
db2116.codfw.wmnet:3306
log_actor_type_time
db2112.codfw.wmnet:3306
log_actor_type_time
db2103.codfw.wmnet:3306
log_actor_type_time
db2102.codfw.wmnet:3306
times
db2097.codfw.wmnet:3311
log_actor_type_time
db2094.codfw.wmnet:3311
times
db2092.codfw.wmnet:3306
times
db2088.codfw.wmnet:3311
actor_time
db2085.codfw.wmnet:3311
actor_time
db2072.codfw.wmnet:3306
times
db2071.codfw.wmnet:3306
times
db2048.codfw.wmnet:3306
times
db1139.eqiad.wmnet:3311
log_actor_type_time
db1134.eqiad.wmnet:3306
actor_time
db1124.eqiad.wmnet:3311
times
db1119.eqiad.wmnet:3306
times
db1118.eqiad.wmnet:3306
times
db1114.eqiad.wmnet:3306
actor_time
db1106.eqiad.wmnet:3306
times
db1105.eqiad.wmnet:3311
actor_time
db1099.eqiad.wmnet:3311
actor_time
db1089.eqiad.wmnet:3306
times
db1083.eqiad.wmnet:3306
times
db1080.eqiad.wmnet:3306
times
db1067.eqiad.wmnet:3306
times

It might be a coincidence, but the 10.3 host (db1114) does use the best index
This is the exact query that was reported: P9137

I have testing also https://mariadb.com/kb/en/library/server-system-variables/#optimizer_use_condition_selectivity which is what MariaDB people recommended to me during our chat at Percona Live, but that change needs a lot more research than just a couple of hours as it can impact query all query plans, performance etc, so I would leave it aside from this particular issue.

For what is worth and to give some perspective, the above query has only been reported 13 times in 24h, so it is not such a big deal.

All that reminds me of T71222#4994266. We might have to add index ignoring to LogPager like we did for ApiQueryLogEvents in rMWe6021abc9c16: ApiQueryLogEvents: Add IGNORE INDEX to avoid MariaDB optimizer bug.

Yeah, I believe so too, so +1 to include the IGNORE.

I am going to create a page on wikitech to document these cases (this and the one you mentioned) so we can check and remove those once 10.3 is deployed (if the optimizer keeps looking good there)

Change 538326 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] LogPager: Add IGNORE INDEX to avoid MariaDB optimizer bug

https://gerrit.wikimedia.org/r/538326

Change 538326 merged by jenkins-bot:
[mediawiki/core@master] LogPager: Add IGNORE INDEX to avoid MariaDB optimizer bug

https://gerrit.wikimedia.org/r/538326

Change 538326 merged by jenkins-bot:
[mediawiki/core@master] LogPager: Add IGNORE INDEX to avoid MariaDB optimizer bug
https://gerrit.wikimedia.org/r/538326

@Marostegui: This patch should be deployed on all Wikimedia wikis by Thursday, somewhere around 19:00–21:00 UTC. If it's blocking something, let me know and we can backport it.

Thanks - no, nothing is blocked. I can start with codfw.

Change 538326 merged by jenkins-bot:
[mediawiki/core@master] LogPager: Add IGNORE INDEX to avoid MariaDB optimizer bug
https://gerrit.wikimedia.org/r/538326

@Marostegui: This patch should be deployed on all Wikimedia wikis by Thursday, somewhere around 19:00–21:00 UTC. If it's blocking something, let me know and we can backport it.

I assume this was finally deployed, right? I am asking because I was reviewing the slow queries just in case, for some reason, the IGNORE was causing something (even though we tested it), and I haven't seen anything related to this specific query (which is good).

Yes, it's included in wmf/1.34.0-wmf.24. As of about 19:23 UTC on 2019-09-26, according to https://wikitech.wikimedia.org/wiki/Server_Admin_Log, that was deployed to all Wikimedia wikis.