Page MenuHomePhabricator

Remove duplicate comment_temp indexes
Closed, DeclinedPublic

Description

✗ "imgcomment_name" index can be removed as redundant (covered by "PRIMARY")
✗ "revcomment_rev" index can be removed as redundant (covered by "PRIMARY")

Event Timeline

CREATE TABLE /*_*/image_comment_temp (
  -- Key to img_name (ugh)
  imgcomment_name varchar(255) binary NOT NULL,
  -- Key to comment_id
  imgcomment_description_id bigint unsigned NOT NULL,
  PRIMARY KEY (imgcomment_name, imgcomment_description_id)
) /*$wgDBTableOptions*/;
-- Ensure uniqueness
CREATE UNIQUE INDEX /*i*/imgcomment_name ON /*_*/image_comment_temp (imgcomment_name);

CREATE TABLE /*_*/revision_comment_temp (
  -- Key to rev_id
  revcomment_rev int unsigned NOT NULL,
  -- Key to comment_id
  revcomment_comment_id bigint unsigned NOT NULL,
  PRIMARY KEY (revcomment_rev, revcomment_comment_id)
) /*$wgDBTableOptions*/;
-- Ensure uniqueness
CREATE UNIQUE INDEX /*i*/revcomment_rev ON /*_*/revision_comment_temp (revcomment_rev);

Hmm. Do these actually count as duplicate? I'm guessing not, as the PK is doing unique for the two columns as a pair, but would allow multiple revcomment_comment_id per revcomment_rev. Whereas the unique index will make sure only 1 of revcomment_rev...

You guess correctly. Technically we could probably just make those unique indexes be the primary keys, but according to T153333#3283613 the PK should cover both columns for associative tables like these (presumably to be able to "Using index").

We can ask the DBAs if you want to be sure about it.

Obviously they're only temporary tables, so it doesn't matter too much if they're "not perfect" longer term....

With an optional unique index on the opposite direction if the reverse join is needed (but often it is not).

The question, I guess, is whether we need the UNIQUE index of just that one column...? Even more so if we don't need it the other way round as per this comment

While MediaWiki shouldn't ever insert rows that violate the "one comment_id per revision" constraint, it's helpful to check it at the database level too.

Primary key being over the 2 columns is only for n:n relationship. If one of the tables have uniqe keys, it means it is a 1:n relationship, not the same case (not a general case associative table)- the unique key should adapt to the several use cases and the physical optimization- set the PK as the most common way of accessing the table (left to right or right to left)- as always, we optimize for what is actually being run, not the theoretical/book example.

You lost me in there, @jcrespo ;) I'll tell you what's being run and hopefully you can tell us what the indexes should look like.

The queries in MediaWiki to these tables are always left-to-right. It's 1:n in that the left side should only ever have 1 match, hence the UNIQUE indexes defined on the left columns. Eventually we want to merge these tables into revision and image.

For revision_comment_temp, most accesses are going to be selecting revision and comment fields FROM revision LEFT JOIN revision_comment_temp ON (rev_id = revcomment_rev) LEFT JOIN comment ON (revcomment_comment_id = comment_id). When we get to step 6 of T166733, those will change to JOINs instead. Some legacy code paths may do SELECT comment_id, comment_text, comment_data FROM revision_comment_temp JOIN comment ON (revcomment_comment_id = comment_id) WHERE revcomment_rev = '...', although I think we cleaned all those up already as of c2f432625. There will also be some INSERTs and DELETEs, the latter always specifying only the left column (usually with multiple values, i.e. revcomment_rev IN (...)).

For image_comment_temp it will be largely the same. There will also be some INSERT SELECT statements that directly select imgcomment_description_id (along with a bunch of image table fields), FROM image LEFT JOIN image_comment_temp ON (img_name = imgcomment_name), with no join to comment.

Tool Forge's databases will probably need to add an index to handle the right-to-left case for the filtering that has to be done there.

Based on my limited understanding of your description, I would do PRIMARY (left, right) and UNIQUE(left).

Thanks. That's what we're already doing, so I'm going to decline this request to change it.