Page MenuHomePhabricator

Discrepancies with logging table on different wikis
Closed, ResolvedPublic

Description

Author: metatron

Description:
Queries on logging-table perform different on different wikis.
Atm there are different key definitions on eg. enwiki and dewiki

  • enwiki has a key log_user_text_time(ok to have), dewiki doesn't
  • dewiki has a key log_user_type_time(important to have), enwiki doesn't
  • same for log_title_time / log_page_id_time
enwiki | CREATE TABLE `logging`
...
PRIMARY KEY (`log_id`), 
  KEY `type_time` (`log_type`,`log_timestamp`), 
  KEY `user_time` (`log_user`,`log_timestamp`), 
  KEY `page_time` (`log_namespace`,`log_title`,`log_timestamp`),
  KEY `times` (`log_timestamp`), 
  KEY `log_page_id_time` (`log_page`,`log_timestamp`), 
  KEY `log_user_text_time` (`log_user_text`(16),`log_timestamp`), 
  KEY `log_title_time` (`log_title`(16),`log_timestamp`) 
) ENGINE=TokuDB AUTO_INCREMENT=58015836 DEFAULT CHARSET=binary `compression`='tokudb_zlib'
dewiki | CREATE TABLE `logging`
...
PRIMARY KEY (`log_id`),
  KEY `type_time` (`log_type`,`log_timestamp`),
  KEY `user_time` (`log_user`,`log_timestamp`),
  KEY `page_time` (`log_namespace`,`log_title`,`log_timestamp`),
  KEY `times` (`log_timestamp`),
  KEY `log_user_type_time` (`log_user`,`log_type`,`log_timestamp`),
  KEY `log_page_id_time` (`log_page`,`log_timestamp`)
) ENGINE=TokuDB AUTO_INCREMENT=62684641 DEFAULT CHARSET=binary `compression`='tokudb_zlib'

Schema change progress

  • wikitech
    • labswiki
    • labtestwiki

Details

Reference
bz69127

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:36 AM
bzimport added a project: Cloud-VPS.
bzimport set Reference to bz69127.
bzimport created this task.Aug 4 2014, 9:39 PM
Krenair added a subscriber: Krenair.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 18 2015, 3:51 AM
Springle set Security to None.Aug 28 2015, 4:22 AM
Springle added a subscriber: jcrespo.
Krinkle moved this task from Triage to Backlog on the DBA board.Sep 23 2015, 4:27 AM
Krinkle moved this task from Backlog to Triage on the DBA board.Sep 23 2015, 7:02 AM
chasemp triaged this task as Low priority.Nov 30 2015, 4:18 PM
chasemp added a subscriber: chasemp.
bd808 removed Springle as the assignee of this task.Jun 8 2017, 4:39 AM
Marostegui added a subscriber: Marostegui.EditedJun 9 2017, 1:34 PM

For the record, this is the current status of the discrepancies of this table (based on tables.sql: https://phabricator.wikimedia.org/source/mediawiki/browse/master/maintenance/tables.sql

s1 (enwiki )and s2(bgwiki bgwiktionary cswiki enwikiquote enwiktionary eowiki fiwiki idwiki itwiki nlwiki nowiki plwiki ptwiki svwiki thwiki trwiki zhwiki), s4 (commonswiki):
missing:

KEY `type_action` (`log_type`,`log_action`,`log_timestamp`),
KEY `log_user_text_type_time` (`log_user_text`(191),`log_type`,`log_timestamp`),

Extra indexes:

KEY `log_title_time` (`log_title`(16),`log_timestamp`),
KEY `log_title_type_time` (`log_title`(16),`log_type`,`log_timestamp`)

s5:
dewiki all the hosts are missing:

KEY `type_action` (`log_type`,`log_action`,`log_timestamp`),
KEY `log_user_text_type_time` (`log_user_text`(191),`log_type`,`log_timestamp`),

And some hosts have 1 or 2 extra indexes:

db1092.eqiad.wmnet
  KEY `log_title_time` (`log_title`(16),`log_timestamp`)
db1063.eqiad.wmnet
  KEY `log_title_time` (`log_title`(16),`log_timestamp`),
  KEY `log_title_type_time` (`log_title`(16),`log_type`,`log_timestamp`)

wikidatawiki is missing:

KEY `log_user_text_type_time` (`log_user_text`(191),`log_type`,`log_timestamp`),

And have the two extra indexes:

KEY `log_title_time` (`log_title`(16),`log_timestamp`),
KEY `log_title_type_time` (`log_title`(16),`log_type`,`log_timestamp`)

s6 (frwiki, jawiki, ruwiki):
They are missing:

KEY `log_user_text_type_time` (`log_user_text`(191),`log_type`,`log_timestamp`),

And they have the two extra indexes:

KEY `log_title_time` (`log_title`(16),`log_timestamp`),
KEY `log_title_type_time` (`log_title`(16),`log_type`,`log_timestamp`)

s7 (arwiki cawiki eswiki fawiki frwiktionary hewiki huwiki kowiki metawiki rowiki ukwiki viwiki):
Missing:

KEY `log_user_text_type_time` (`log_user_text`(191),`log_type`,`log_timestamp`),

Has the two extra indexes:

KEY `log_title_time` (`log_title`(16),`log_timestamp`),
KEY `log_title_type_time` (`log_title`(16),`log_type`,`log_timestamp`)

I have not checked s3.
So why do we have those two extra indexes everywhere? I haven't found anything on gerrit about them - found this though: T68961

KEY `log_title_time` (`log_title`(16),`log_timestamp`),
KEY `log_title_type_time` (`log_title`(16),`log_type`,`log_timestamp`)

Should we unify this table to what tables.sql says?
If so, we'd need to first check if removing those indexes will have any impact and I suspect it would as otherwise why would they be there?
If they are important why aren't them on tables.sql?
Should we add the other two missing indexes where they are indeed missing?

MusikAnimal added a subscriber: MusikAnimal.

FYI, the lack of log indexes on some wikis is causing issues for XTools. See, for example, T171264#3468493.

kaldari raised the priority of this task from Low to Normal.Jul 25 2017, 12:22 AM

Yeah, the log_title_time index seems especially helpful when doing things like looking up a block log. You could use log_page instead of log_title, using the ID of the user's userpage, but that doesn't work if they haven't created a userpage :( Also we'd have to look up the page ID first. If there's another, quicker cross-wiki way of checking the block log, please enlighten me :)

bd808 moved this task from Backlog to Wiki replicas on the Data-Services board.
jcrespo renamed this task from Database upgrade MariaDB 10: Discrepancies with logging table on different wikis to Discrepancies with logging table on different wikis.Aug 7 2017, 8:19 AM

Nothing to do with MariaDB 10, I assume at some point was a blocker. This is probably a multi-month maintenance such as T132416.

As far as I can see, mediawiki doesn't define log_title_time nor log_title_type_time ( see https://phabricator.wikimedia.org/source/mediawiki/browse/master/maintenance/tables.sql ). Many hosts lack log_user_text_type_time (log_user_text, log_type, log_timestamp), and some tables have some string-containing keys only indexing the first 16 bytes/characters.

It is important to separate production from wikireplicas- wikireplicas can have additional indexes, production should not- or we should patch mediawiki first.

Peachey88 updated the task description. (Show Details)Jan 2 2019, 8:23 AM
Marostegui moved this task from Backlog to Next on the DBA board.Feb 25 2019, 2:23 PM
Marostegui moved this task from Next to In progress on the DBA board.

I am going to try to work on this, as this has bitten us already a few times already, the most recent time on a global user rename.
This is in a not very good shape (T71127#3335238), I will try to combine this with T51199

After analyzing the indexes on T217397 and testing a few hosts (T217397#4997997 and T217397#5010633) with the exact table definition on tables.sql, we are going to unify logging table based on tables.sql indexes.

Marostegui added a comment.EditedMar 11 2019, 8:51 AM

The following hosts on s1 eqiad have the indexes unified with tables.sql (T217397#5010633)

  • labsdb1012
  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1003
  • dbstore1001
  • db1124
  • db1119
  • db1118
  • db1106
  • db1105
  • db1099
  • db1089
  • db1083
  • db1080
  • db1067
Marostegui added a comment.EditedMar 11 2019, 8:54 AM

The following hosts on s4 eqiad have the indexes unified with tables.sql (T217397#5010633)

  • labsdb1012
  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1004
  • db1125
  • db1121
  • db1103
  • db1102
  • db1097
  • db1091
  • db1084
  • db1081
  • db1068
Marostegui added a comment.EditedMar 11 2019, 8:56 AM

The following hosts on s5 eqiad have the indexes unified with tables.sql (T217397#5010633)

  • labsdb1012
  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1003
  • db1124
  • db1113
  • db1110
  • db1102
  • db1100
  • db1097
  • db1096
  • db1082
  • db1070
Marostegui updated the task description. (Show Details)Mar 11 2019, 9:00 AM
Marostegui updated the task description. (Show Details)Mar 11 2019, 9:11 AM
Marostegui updated the task description. (Show Details)Mar 11 2019, 9:13 AM

Mentioned in SAL (#wikimedia-operations) [2019-03-12T07:18:34Z] <marostegui> Deploy schema change on db2052 (s5 codfw master), this will generate lag on codfw T71127 T51199

Marostegui updated the task description. (Show Details)Mar 12 2019, 8:02 AM
Marostegui updated the task description. (Show Details)Mar 12 2019, 12:10 PM
Marostegui updated the task description. (Show Details)Mar 12 2019, 2:04 PM
Marostegui added a comment.EditedMar 12 2019, 2:58 PM

s6 eqiad progress

  • labsdb1012
  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1005
  • dbstore1001
  • db1125
  • db1113
  • db1098
  • db1096
  • db1093
  • db1088
  • db1085
  • db1061
Marostegui updated the task description. (Show Details)Mar 12 2019, 3:30 PM
Marostegui updated the task description. (Show Details)Mar 14 2019, 8:43 AM
Marostegui updated the task description. (Show Details)Mar 14 2019, 9:26 AM
Marostegui updated the task description. (Show Details)EditedMar 18 2019, 6:37 AM

s8 eqiad progress

  • labsdb1012
  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1005
  • db1124
  • db1116
  • db1109
  • db1104
  • db1101
  • db1099
  • db1092
  • db1087
  • db1071
Marostegui updated the task description. (Show Details)Mar 18 2019, 6:38 AM
Marostegui updated the task description. (Show Details)Mar 18 2019, 9:33 AM
Marostegui updated the task description. (Show Details)Mar 20 2019, 9:20 AM
Marostegui updated the task description. (Show Details)Mar 20 2019, 10:13 AM
Marostegui added a comment.EditedMar 20 2019, 11:02 AM

s7 eqiad progress

  • labsdb1012
  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1003
  • db1125
  • db1116
  • db1101
  • db1098
  • db1094
  • db1090
  • db1086
  • db1079
  • db1062
Marostegui updated the task description. (Show Details)Mar 22 2019, 7:08 AM
Marostegui updated the task description. (Show Details)
Marostegui updated the task description. (Show Details)Mar 25 2019, 7:20 AM
Marostegui updated the task description. (Show Details)Mar 26 2019, 8:08 AM
Marostegui added a comment.EditedMar 26 2019, 8:21 AM

s2 eqiad progress

  • labsdb1012
  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1004
  • db1125
  • db1122
  • db1105
  • db1103
  • db1095
  • db1090
  • db1076
  • db1074
  • db1066
Marostegui updated the task description. (Show Details)Mar 26 2019, 8:36 AM
Marostegui updated the task description. (Show Details)Mar 28 2019, 6:20 AM
Marostegui updated the task description. (Show Details)Mar 28 2019, 7:16 AM
Marostegui added a comment.EditedMar 28 2019, 8:01 AM

s3 eqiad progress

  • labsdb1012
  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1004
  • db1124
  • db1123
  • db1095
  • db1075
  • db1077
  • db1078
Marostegui updated the task description. (Show Details)Mar 28 2019, 8:01 AM
Marostegui closed this task as Resolved.Apr 1 2019, 7:01 AM
Marostegui updated the task description. (Show Details)

All done! One less drift between production and tables.sql