Add index to `comment_id` field in `comment` table (all wikis)
Copying data from labs-db onto the analytics hadoop cluster would take advantage of having an index on the `comment_id` field of the `comment` table. This need is for all wikis hosted in labs.
There are 2 ways we could copy the data:
- joining tables in mysql - The index would speed up the join
- copying the full table - the index would be useful to retrieve min and max ids as well as chunks of the ids more efficiently (sqoop uses 2 step approach, one getting the min and max ids to get, then splitting the main query in subsets of ids)
For performance reasons, we probably will prefer the second approach, preventing to have to join the comment table to the multiple tables needing it, but rather do the join in hadoop.