Page MenuHomePhabricator

Add index to mediawiki_page_create_1 table
Closed, ResolvedPublic

Description

In order to facilitate T170850, the mediawiki_page_create_1 table (used for eventlogging on the stat servers) needs a two column index for: database, rev_timestamp (or something similar).

Event Timeline

kaldari created this task.Jul 18 2017, 9:06 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 18 2017, 9:06 PM
kaldari updated the task description. (Show Details)Jul 18 2017, 9:15 PM

I see that T170925 already exists and is almost the same thing. Will follow up there and likely close this as a duplicate.

We can add extra indexes to MySQL tables, but they'll be one offs. That is, other EventBus event tables that we might import will not automatically have these indexes when EventLogging creates the new tables. This includes new tables created because of schema version changes. There is about to be a change to add the parsedcomment field, which will be a schema version change. This will result in a new table called mediawiki_page_create_2. We'd have to manually add these indexes there too.

I don't mind doing this, since I think you only care about the page_create table(s), and I think you only care for the short term.

The long term solution is to stop using MySQL for this :p

We can add extra indexes to MySQL tables, but they'll be one offs

@Ottomata: All of the existing EventLogging tables (not from EventBus) have wiki, timestamp 2-column indexes. It seems like this would be an obvious index to add to all the EventBus tables as well, as it will certainly be the most common type of query. I would be happy with it either being automatic or a one-off, but it seems like making it automatic (like the indexes added in https://gerrit.wikimedia.org/r/#/c/365999/1/eventlogging/jrm.py) would make more sense.

The long term solution is to stop using MySQL for this :p

Now you're just trolling me ;)

Ottomata added a subscriber: elukey.EditedJul 19 2017, 5:21 PM

All of the existing EventLogging tables (not from EventBus) have wiki, timestamp 2-column indexes

KEY `wiki_timestamp` (`wiki`,`timestamp`)

Huh. I have no idea what creates this index. EventLogging doesn't do it. I don't see this in puppet.

@elukey, @mforns, any ideas?

fdans assigned this task to Ottomata.Jul 24 2017, 4:04 PM
fdans edited projects, added Analytics-Kanban; removed Analytics.
fdans moved this task from Next Up to In Progress on the Analytics-Kanban board.
elukey edited projects, added Analytics; removed Analytics-Kanban.Aug 3 2017, 3:45 PM
elukey moved this task from Incoming to Dashiki on the Analytics board.
Nuria added a subscriber: Nuria.Aug 14 2017, 4:05 PM

Looks like indexes are there:

UNIQUE KEY ix_mediawiki_page_create_2_meta_id (meta_id),

KEY `ix_mediawiki_page_create_2_performer_user_registration_dt` (`performer_user_registration_dt`),
KEY `ix_mediawiki_page_create_2_meta_dt` (`meta_dt`),
KEY `ix_mediawiki_page_create_2_rev_timestamp` (`rev_timestamp`)

) ENGINE=TokuDB DEFAULT CHARSET=binary compression='tokudb_zlib' |

Resolving. Please re-open if needed

Nuria closed this task as Resolved.Aug 14 2017, 4:05 PM

@Nuria, they were asking about an index on wiki database, not on these more generic fields.

Nuria reopened this task as Open.Aug 14 2017, 5:08 PM

@Ottomata: @Nettrom (who is working as a contract analyst for the CommTech team) is looking into creating a dashboard from the mediawiki_page_create_2 table. The table already has 6 million rows (and it's only a couple of months old), so this index is going to be vital to the feasibility of running regular queries against it. Could you just manually create a new database, rev_timestamp 2-column index on the table? I understand that this will not automatically be recreated if the table is versioned again, but that's OK.

Mentioned in SAL (#wikimedia-analytics) [2017-08-21T13:26:19Z] <ottomata> adding index on (database, rev_timestamp) on mediawiki_page_create_2 table on dbstore1002: T170990

Mentioned in SAL (#wikimedia-operations) [2017-08-21T13:26:23Z] <ottomata> adding index on (database, rev_timestamp) on mediawiki_page_create_2 table on dbstore1002: T170990

Mentioned in SAL (#wikimedia-analytics) [2017-08-21T13:46:16Z] <ottomata> adding index on (database, rev_timestamp) on mediawiki_page_create_2 table on db1047: T170990

Mentioned in SAL (#wikimedia-operations) [2017-08-21T13:46:18Z] <ottomata> adding index on (database, rev_timestamp) on mediawiki_page_create_2 table on db1047: T170990

Allright, done! I've added this index to mediawiki_page_create_2 on both 'slaves' dbstore1002 (analytics-store) and db1047.

On analytics-store, I was able to run:

alter table `mediawiki_page_create_2` add index `ix_mediawiki_page_create_2_database_rev_timestamp_T170990` (`database`, `rev_timestamp`);

On db1047, the above alter statement resulted in 'Specified key was too long; max key length is 3072 bytes', so I chose to index a 64 char prefix of database instead, as:

alter table `mediawiki_page_create_2` add index `ix_mediawiki_page_create_2_database_rev_timestamp_T170990` (`database`(64), `rev_timestamp`);

I don't know of any database names more than 64 chars, so hopefully this won't be a problem for you, especially since you will likely be using analytics-store to query this table (right?)

kaldari closed this task as Resolved.Aug 21 2017, 5:26 PM

@Ottomata: Awesome! Thank you!

DannyH moved this task from New & TBD Tickets to Archive on the Community-Tech board.