Page MenuHomePhabricator

Script adds indices to the Edit schema on analytics-store [5 pts] {lion}
Closed, ResolvedPublic

Description

For performance reasons, we need a few indices on Edit_*

The way staging.milimetric_edit is set up works pretty well, the full history can be computed in 2 minutes with the query we merged. So the script should basically copy those indices. Sean asked for this script to live somewhere that requires code review, so probably in the data-warehouse repository we should make a well named folder for it.

Event Timeline

Milimetric claimed this task.
Milimetric raised the priority of this task from to Needs Triage.
Milimetric updated the task description. (Show Details)
Milimetric added a project: Analytics-Kanban.
Milimetric moved this task to Next Up on the Analytics-Kanban board.
Milimetric subscribed.
kevinator renamed this task from Script adds indices to the Edit schema on analytics-store to Script adds indices to the Edit schema on analytics-store {bear}.Feb 12 2015, 2:10 AM
kevinator triaged this task as Medium priority.
kevinator set Security to None.
kevinator renamed this task from Script adds indices to the Edit schema on analytics-store {bear} to Script adds indices to the Edit schema on analytics-store [5 pts] {bear}.Feb 12 2015, 4:34 PM
kevinator updated the task description. (Show Details)
kevinator subscribed.
  • staging.milimetric_edit has the indices that seem to work
  • on data-warehouse repo on an specific folder. (analytics-store/log/Edit_12345_indices.sql)

staging.milimetric_edit has the following indexes:

KEY ix_milimetric_edit_session (event_editingSessionId) USING HASH,
KEY ix_milimetric_edit_action (event_action),
KEY ix_milimetric_edit_ip (clientIp) USING HASH,
KEY ix_milimetric_edit_time (timestamp)

Considering that normal EL tables already have an index set like:

PRIMARY KEY (id),
UNIQUE KEY ix_Edit_10676603_uuid (uuid),
KEY ix_Edit_10676603_timestamp (timestamp),
KEY wiki_timestamp (wiki,timestamp)

only these 3 indexes should be added:

KEY ix_milimetric_edit_session (event_editingSessionId) USING HASH,
KEY ix_milimetric_edit_action (event_action),
KEY ix_milimetric_edit_ip (clientIp) USING HASH

However, I didn't find any use of 'clientIp' index. The field appeared in a group by clause in https://gerrit.wikimedia.org/r/#/c/188601/3/edit/save_rates_low_noise.sql but I think it has no effect in the query.
Also, didn't find any other query that uses it in https://gerrit.wikimedia.org/r/#/admin/projects/analytics/limn-edit-data.

So, I would use indexes just for event_action and event_editingSessionId columns. There are other columns that could offer nice breakdowns like event_action.init.type, event_failure.type, event_abort.type, etc. But they are currently not used by scheduled reports; and as EL is write-intensive, it's better to have less indexes.

The names of the indexes have been chosen to be consistent with current EL indexes, and the type is HASH, because these columns will be not queried with order operators. Thus, the final index creation code:

CREATE INDEX ix_Edit_*_event_action ON Edit_* (event_action) USING HASH;
CREATE INDEX ix_Edit_*_event_editingSessionId ON Edit_* (event_editingSessionId) USING HASH;
gerritbot subscribed.

Change 190404 had a related patch set uploaded (by Mforns):
Add SQL script to create indexes in EL Edit tables

https://gerrit.wikimedia.org/r/190404

Patch-For-Review

Change 190404 merged by Milimetric:
Add SQL script to create indexes in EL Edit tables

https://gerrit.wikimedia.org/r/190404

kevinator renamed this task from Script adds indices to the Edit schema on analytics-store [5 pts] {bear} to Script adds indices to the Edit schema on analytics-store [5 pts] {lion}.Feb 19 2015, 2:09 AM