In order to facilitate T170850, the mediawiki_page_create_1 table (used for eventlogging on the stat servers) needs a two column index for: database, rev_timestamp (or something similar).
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Ottomata | T150369 Record an event every time a new content namespace page is created | |||
Resolved | kaldari | T170850 Visualize page create events for all wikis | |||
Resolved | Ottomata | T170990 Add index to mediawiki_page_create_1 table |
Event Timeline
I see that T170925 already exists and is almost the same thing. Will follow up there and likely close this as a duplicate.
We can add extra indexes to MySQL tables, but they'll be one offs. That is, other EventBus event tables that we might import will not automatically have these indexes when EventLogging creates the new tables. This includes new tables created because of schema version changes. There is about to be a change to add the parsedcomment field, which will be a schema version change. This will result in a new table called mediawiki_page_create_2. We'd have to manually add these indexes there too.
I don't mind doing this, since I think you only care about the page_create table(s), and I think you only care for the short term.
The long term solution is to stop using MySQL for this :p
We can add extra indexes to MySQL tables, but they'll be one offs
@Ottomata: All of the existing EventLogging tables (not from EventBus) have wiki, timestamp 2-column indexes. It seems like this would be an obvious index to add to all the EventBus tables as well, as it will certainly be the most common type of query. I would be happy with it either being automatic or a one-off, but it seems like making it automatic (like the indexes added in https://gerrit.wikimedia.org/r/#/c/365999/1/eventlogging/jrm.py) would make more sense.
The long term solution is to stop using MySQL for this :p
Now you're just trolling me ;)
Looks like indexes are there:
UNIQUE KEY ix_mediawiki_page_create_2_meta_id (meta_id),
KEY `ix_mediawiki_page_create_2_performer_user_registration_dt` (`performer_user_registration_dt`), KEY `ix_mediawiki_page_create_2_meta_dt` (`meta_dt`), KEY `ix_mediawiki_page_create_2_rev_timestamp` (`rev_timestamp`)
) ENGINE=TokuDB DEFAULT CHARSET=binary compression='tokudb_zlib' |
Resolving. Please re-open if needed
@Nuria, they were asking about an index on wiki database, not on these more generic fields.
@Ottomata: @Nettrom (who is working as a contract analyst for the CommTech team) is looking into creating a dashboard from the mediawiki_page_create_2 table. The table already has 6 million rows (and it's only a couple of months old), so this index is going to be vital to the feasibility of running regular queries against it. Could you just manually create a new database, rev_timestamp 2-column index on the table? I understand that this will not automatically be recreated if the table is versioned again, but that's OK.
Mentioned in SAL (#wikimedia-analytics) [2017-08-21T13:26:19Z] <ottomata> adding index on (database, rev_timestamp) on mediawiki_page_create_2 table on dbstore1002: T170990
Mentioned in SAL (#wikimedia-operations) [2017-08-21T13:26:23Z] <ottomata> adding index on (database, rev_timestamp) on mediawiki_page_create_2 table on dbstore1002: T170990
Mentioned in SAL (#wikimedia-analytics) [2017-08-21T13:46:16Z] <ottomata> adding index on (database, rev_timestamp) on mediawiki_page_create_2 table on db1047: T170990
Mentioned in SAL (#wikimedia-operations) [2017-08-21T13:46:18Z] <ottomata> adding index on (database, rev_timestamp) on mediawiki_page_create_2 table on db1047: T170990
Allright, done! I've added this index to mediawiki_page_create_2 on both 'slaves' dbstore1002 (analytics-store) and db1047.
On analytics-store, I was able to run:
alter table `mediawiki_page_create_2` add index `ix_mediawiki_page_create_2_database_rev_timestamp_T170990` (`database`, `rev_timestamp`);
On db1047, the above alter statement resulted in 'Specified key was too long; max key length is 3072 bytes', so I chose to index a 64 char prefix of database instead, as:
alter table `mediawiki_page_create_2` add index `ix_mediawiki_page_create_2_database_rev_timestamp_T170990` (`database`(64), `rev_timestamp`);
I don't know of any database names more than 64 chars, so hopefully this won't be a problem for you, especially since you will likely be using analytics-store to query this table (right?)