Add artificial primary key to flow_wiki_ref and flow_ext_ref
Closed, ResolvedPublic

Description

We used to have a UNIQUE constraint, but it was removed at rEFLW873cbbca3e61: Remove unique constraints from flow_*_ref. See also T92284: S9. LQT: Unique index flow_ext_ref_pk blocking false positives.

We should discuss that, as well as alternatives like an artificial primary key (as we have on most of the Flow tables).

Related Objects

Mattflaschen-WMF updated the task description. (Show Details)
Mattflaschen-WMF raised the priority of this task from to Needs Triage.
Mattflaschen-WMF assigned this task to Springle.
Restricted Application added a project: Collaboration-Team-Triage. · View Herald TranscriptAug 20 2015, 2:13 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Mattflaschen-WMF removed Springle as the assignee of this task.Aug 20 2015, 2:14 AM
Mattflaschen-WMF set Security to None.

Allow me to help with this at any time (ping me for support). If the real primary key is unsuitable, because it is too large, or does not cover the uniqueness properly because of index limitations, an artificial primary key is certainly an alternative.

Please note that for the particular engine we are using, there is not such a thing as a table without a primary key: if we do not define one explicitly, and there is no other unique constraint, innodb will create a hidden column for it, but that cannot be accessed by the user. This means that certain maintenance tasks, like altering a table online, checksuming it or even row based replication, in some cases, will be not possible or have worse performance.

Those things will affect you directly, because they can be the difference between taking months (so a failover is necessary) to perform an alter or taking hours.

Catrope triaged this task as High priority.Aug 27 2015, 12:11 AM
Mattflaschen-WMF renamed this task from Add primary key or unique constraint to flow_wiki_ref and flow_ext_ref to Add artificial primary key to flow_wiki_ref and flow_ext_ref.Aug 28 2015, 10:01 PM

We'll probably go with an artificial primary key (which I think should be a UUID), both to reduce index size and avoid collisions (which can be caused by prefixing even if the full URL is unique).

Change 238393 had a related patch set uploaded (by Matthias Mullie):
Add artificial primary key to flow_wiki_ref and flow_ext_ref

https://gerrit.wikimedia.org/r/238393

In T109676#1585731, @Mattflaschen wrote:

We'll probably go with an artificial primary key (which I think should be a UUID)

@Mattflaschen: is there a particular reason you thought it should be a UUID? (see comment on https://gerrit.wikimedia.org/r/#/c/238393/)
I mostly made it a UUID for consistency with the rest of the Flow tables & because you had also already expressed a preference for UUID :p

@jcrespo When shall we schedule this for?

In T109676#1585731, @Mattflaschen wrote:

We'll probably go with an artificial primary key (which I think should be a UUID)

@Mattflaschen: is there a particular reason you thought it should be a UUID? (see comment on https://gerrit.wikimedia.org/r/#/c/238393/)
I mostly made it a UUID for consistency with the rest of the Flow tables

Consistency (e.g. with workflow_id) was my primary reason as well.

I also thought it might cause some small issues. Of course, now that I go looking for it, I can't seem to find any code that specifically *needs* a UUID as PK (except that strings that were not UUIDs but are the right length would be Bad due to convertUUIDs).

I still think it would be best unless it causes significant performance problems (probably not, since Jcrespo +1'ed on Gerrit after we explicitly discussed this). If we standardize things like coding conventions, we should probably standardize this as well, absent a good reason to make an exception.

@Mattflaschen, I am ok with you explanation. Just remember that when there is no other reason, in general, an auto_increment would be preferred for pure size and monotonic increase reasons (which has some advantages for InnoDB). Not an issue for small tables.

@Catrope We should make T111084 a blocker, so after that is deployed.

jcrespo moved this task from Triage to Backlog on the DBA board.Sep 25 2015, 9:08 AM
jcrespo moved this task from Backlog to Blocked external/Not db team on the DBA board.

Now that T111084 has been applied, please ping me to proceed with this when you are ready, so that we can comply with the recently approved T112637.

Mattflaschen-WMF added a comment.EditedOct 28 2015, 9:19 PM

Now that T111084 has been applied, please ping me to proceed with this when you are ready, so that we can comply with the recently approved T112637.

@jcrespo We're ready for you to move ahead on https://gerrit.wikimedia.org/r/#/c/238393/ .

In T109676#1763501, @Mattflaschen wrote:

Now that T111084 has been applied, please ping me to proceed with this when you are ready, so that we can comply with the recently approved T112637.

@jcrespo We're ready for you to move ahead on https://gerrit.wikimedia.org/r/#/c/238393/ .

Actually, I retested and found an issue, so hold off for now.

Mattflaschen-WMF added a comment.EditedOct 30 2015, 1:28 AM
In T109676#1764634, @Mattflaschen wrote:
In T109676#1763501, @Mattflaschen wrote:

Now that T111084 has been applied, please ping me to proceed with this when you are ready, so that we can comply with the recently approved T112637.

@jcrespo We're ready for you to move ahead on https://gerrit.wikimedia.org/r/#/c/238393/ .

Actually, I retested and found an issue, so hold off for now.

@jcrespo See updates at https://gerrit.wikimedia.org/r/#/c/238393/ and T117138. I think all the glitches I saw were caused by T117138, which was not introduced by this patch.

After Monday morning (we have a morning SWAT then for T117138), this is ready for you to look at again.

@Mattflaschen, let's try to do formally once the new proposed workflow:

  • Keep owning this ticket, and create a substask (on Monday) with the projects DBA and #blocked-by-schema-change (the first one doesn't exist, but it will be a rename of #Database). I will own that task, but in the future it will be assigned to one of the 2 DBAs
  • As you will want to do 2 phases at separate times, I will ask you to create another ticket for the second phase (that way you know what exactly has been done and what is left to do)
  • Please add the following information to the subtask: What (point me to the git repo with the schema change), Where (officewiki on the main servers and flowdb on x1), when (point me to the merge of https://gerrit.wikimedia.org/r/#/c/238393/ and the supposed date- Monday), why? (this ticket) and that you have checked that application will work after the change (that should be a yes)

I think this has little overhead, in exchange I will reward you with a faster ticket processing (among other reasons because I will notice it faster).

@Mattflaschen, let's try to do formally once the new proposed workflow:

  • Keep owning this ticket, and create a substask (on Monday) with the projects DBA and #blocked-by-schema-change (the first one doesn't exist, but it will be a rename of #Database). I will own that task, but in the future it will be assigned to one of the 2 DBAs

Done, except DBA apparently doesn't exist yet, so I used Database. This is T117783: Add ref_id to flow_wiki_ref and flow_ext_ref (DBA phase 1).

  • As you will want to do 2 phases at separate times, I will ask you to create another ticket for the second phase (that way you know what exactly has been done and what is left to do)

Done. This is T117787: Make ref_id of flow_wiki_ref and flow_ext_ref not null and PK (DBA phase 2).

  • Please add the following information to the subtask: What (point me to the git repo with the schema change), Where (officewiki on the main servers and flowdb on x1), when (point me to the merge of https://gerrit.wikimedia.org/r/#/c/238393/ and the supposed date- Monday), why? (this ticket) and that you have checked that application will work after the change (that should be a yes)

Done. To be clear, I was not saying this had to be done Monday, just sometime after Monday morning.

The current deployment plan is based on doing T117783: Add ref_id to flow_wiki_ref and flow_ext_ref (DBA phase 1) before merge (or at least before deployment, but before merge is less error-prone). Let me know if that is acceptable.

Done, except DBA apparently doesn't exist yet, so I used Database.

You did well. It requires discussion for the rename, which is like an unnecessary overhead, but anyway... :-)

To be clear, I was not saying this had to be done Monday, just sometime after Monday morning.

I understanded correctly. I only asked to have time to schedule it.

Let me know if that is acceptable.

This is more than ok. Thanks for taking the time.

Change 238393 merged by jenkins-bot:
Add artificial primary key to flow_wiki_ref and flow_ext_ref

https://gerrit.wikimedia.org/r/238393

Change 251415 had a related patch set uploaded (by Mattflaschen):
Add artificial primary key to flow_wiki_ref and flow_ext_ref

https://gerrit.wikimedia.org/r/251415

Change 251416 had a related patch set uploaded (by Mattflaschen):
Add artificial primary key to flow_wiki_ref and flow_ext_ref

https://gerrit.wikimedia.org/r/251416

Change 251415 abandoned by Alex Monk:
Add artificial primary key to flow_wiki_ref and flow_ext_ref

Reason:
Branch no longer in use.

https://gerrit.wikimedia.org/r/251415

Change 251416 merged by jenkins-bot:
Add artificial primary key to flow_wiki_ref and flow_ext_ref

https://gerrit.wikimedia.org/r/251416

Change 251558 had a related patch set uploaded (by Mattflaschen):
Add isset to handle when ref_id is not in row returned from MC index

https://gerrit.wikimedia.org/r/251558

Change 251558 merged by jenkins-bot:
Add isset to handle when ref_id is not in row returned from MC index

https://gerrit.wikimedia.org/r/251558

Change 259444 had a related patch set uploaded (by Catrope):
Make ref_id the primary key for DbStorage purposes

https://gerrit.wikimedia.org/r/259444

I don't think think this is Blocked anymore. Does anyone still want to do https://gerrit.wikimedia.org/r/259444 ?

In T109676#2074343, @Mattflaschen wrote:

I don't think think this is Blocked anymore. Does anyone still want to do https://gerrit.wikimedia.org/r/259444 ?

Is it needed/valuable? I admit I didn't really know what I was doing there.

I don't think so.

Matthias said in 9ca081f7675cd3a1e23bac1eb9c04d539fb47f6d, "The newly added ref_id is not used in the code, it's only meant to make DB maintenance easier."

Change 259444 abandoned by Catrope:
Make ref_id the primary key for DbStorage purposes

https://gerrit.wikimedia.org/r/259444

Catrope closed this task as Resolved.Mar 1 2016, 2:49 AM

OK, cool. This is done then.