Page MenuHomePhabricator

Add autoincrement id to EventLogging MySQL tables. {oryx}
Closed, ResolvedPublic8 Estimated Story Points

Description

DOH!

The eventlogging_sync.sh script that is used to do custom replication of EventLogging tables from m4-master will not work with parallelized MySQL EventLogging consumer processes. There is a race condition in that script that causes the 'slave' database from not receiving all events. This has been happening at least since we started running parallel MySQL consumers back on Dec 3.

eventlogging_sync.sh needs something more reliable to sync on other than timestamps. We should bring back autoincrement ids on all EL tables, and then adapt eventlogging_sync.sh to use the id instead of a timestamp.

Event Timeline

Ottomata raised the priority of this task from to Medium.
Ottomata updated the task description. (Show Details)
Ottomata added subscribers: Ottomata, jcrespo, Nuria, mforns.

I just tested on the Echo_7731316 table in beta. Adding an autoincrement key is transparent to EventLogging if the table already exist. https://gerrit.wikimedia.org/r/#/c/188270 should also be reverted so that new tables are always created with this field.

Milimetric renamed this task from Add autoincrement id to EventLogging MySQL tables. to Add autoincrement id to EventLogging MySQL tables. {oryx} [8 pts].Feb 8 2016, 6:09 PM

Change 269347 had a related patch set uploaded (by Madhuvishy):
Add autoincrement IDs to mysql tables

https://gerrit.wikimedia.org/r/269347

Change 269347 merged by Nuria:
Add autoincrement IDs to mysql tables

https://gerrit.wikimedia.org/r/269347

Stashbot subscribed.

Mentioned in SAL [2016-02-16T18:27:21Z] <madhuvishy> Deployed and restarted eventlogging with Auto-increment ID and Update config/schemas submodule changes (T124741 and T125135)

@jcrespo The auto-increment id change went live on our side - new schemas or tables that get created should have an ID column now on. I think we are ready for the next steps - to migrate existing tables to have the ID, and then modifying the sync script to use the ID. (cc: @Nuria @Ottomata)

@jcrespo: Tagging this task with dba so it appears in your queue

jcrespo added a subscriber: madhuvishy.

Taking it from here, will analyze current state to see what changes are needed and the changes on the replication script.

Milimetric renamed this task from Add autoincrement id to EventLogging MySQL tables. {oryx} [8 pts] to Add autoincrement id to EventLogging MySQL tables. {oryx}.Feb 22 2016, 8:58 PM
Milimetric set the point value for this task to 8.

Ping @jcrespo , let us know if now that edit table is trimmed we can proceed with this.

Ping @jcrespo , let us know if now that edit table is trimmed we can proceed with this.

I will probably be able to work on this on one months' time(?)

I will probably be able to work on this on one months' time(?)

All right!

jcrespo removed jcrespo as the assignee of this task.

@Aklapper I am not working on this, but people keep assigning it to me. Please mediate.

Removing jcrespo, Sorry, in our team we keep tickets assigned to ourselves even if it is "paused" work we are hoping to do later.

Hi

Is this ticket a consequence of: https://phabricator.wikimedia.org/T87661?
So removing the autoincrement key in the end broke the script? (T87661#1978162)?

I might lack a bit of history here but I have some questions so I can understand better what's going on :-)

  1. The autoincrement keys were removed by someone who's not a DBA, why is a DBA needed to get them added back?
  2. Does the script absolutely need the autoincrement back to be able to work? I mean, can it be done in some other way to avoid that blocker?

Thanks and sorry if this has already been discussed!

Hah, hilarious. Ok, IF all of the current tables that eventlogging is writing to have auto-increment IDs, then yeah, no action from DBA needed. If not, then the current (latest revision) tables will need to have auto-increment IDs added by DBA.

We need them back in order to parallelize the process that writes to MySQL. eventlogging_sync.sh only syncs events where master ts > than the latest slave ts. If multiple processes insert events into the master, the order in which events are inserted will no longer necessarily be by ts (it isn't even with one process, but in practice one process usually inserts in ts order).

Once we are sure that all current tables have auto-increment IDs, eventlogging_sync.sh can be modified to sync where master id > latest slave id, instead of relying on timestamp.

Or, we could just do away with this crazy eventlogging_sync.sh replication, and use normal MySQL replication. If we did that, we wouldn't have this problem. :)

and use normal MySQL replication

Which I'd love, but this application refuses to be compatible with, and I got 0 help when I raised it before.

@Marostegui : +1 to what @Ottomata said

The prior incarnation of the system (which has been heavily upgraded now) required removal of autoincrement ID due to performance issues when writing but that should no longer be relevant.

Hah, hilarious. Ok, IF all of the current tables that eventlogging is writing to have auto-increment IDs, then yeah, no action from DBA needed. If not, then the current (latest revision) tables will need to have auto-increment IDs added by DBA.

So that is to be reviewed then, right?

We need them back in order to parallelize the process that writes to MySQL. eventlogging_sync.sh only syncs events where master ts > than the latest slave ts. If multiple processes insert events into the master, the order in which events are inserted will no longer necessarily be by ts (it isn't even with one process, but in practice one process usually inserts in ts order).

Once we are sure that all current tables have auto-increment IDs, eventlogging_sync.sh can be modified to sync where master id > latest slave id, instead of relying on timestamp.

Thanks for the explanation, I appreciate it :-)

Or, we could just do away with this crazy eventlogging_sync.sh replication, and use normal MySQL replication. If we did that, we wouldn't have this problem. :)

This is key, is that somehow doable from the application side? It would help to reduce this snowflake we have now :-)

Thanks

This is key, is that somehow doable from the application side?

Note analytics has refused to acknowledge/decline this problem several times: Here: T87661#1978162, here: T124307#1952728 , here: T124306#2257169 and here: T124306#2776927

Note analytics has refused to acknowledge/decline this problem several times

Not sure this is an accurate representation of what I read on those tickets buuuuuuuut I don't think Analytics was aware that action was needed on our part for T124307. T124307 looks like the parent ticket for all of these little things. We'd love to make all this better.

T124306 is about eventlogging lag monitoring, not about the replication process itself itself.

I don't really know the history around T87661 and the TokuDB migration, but yeah, removing the auto-increment seems like a bad idea in hindsight. Apparently Sean asked us to do this at the time. Nuria has more context.

This is key, is that somehow doable from the application side? It would help to reduce this snowflake we have now :-)

T124307 is probably the better ticket to deal with. This task should be a subticket of that. Will change.

@Marostegui let's talk about the overall EventLogging replication issue there.

Ok, IF all of the current tables that eventlogging is writing to have auto-increment IDs, then yeah, no action from DBA needed. If not, then the current (latest revision) tables will need to have auto-increment IDs added by DBA.

So that is to be reviewed then, right?

Yeah, we should review that. Will bring this back into Analytics world for review.

This is key, is that somehow doable from the application side? It would help to reduce this snowflake we have now :-)

T124307 is probably the better ticket to deal with. This task should be a subticket of that. Will change.

@Marostegui let's talk about the overall EventLogging replication issue there.

Sounds good!

Ok, IF all of the current tables that eventlogging is writing to have auto-increment IDs, then yeah, no action from DBA needed. If not, then the current (latest revision) tables will need to have auto-increment IDs added by DBA.

So that is to be reviewed then, right?

Yeah, we should review that. Will bring this back into Analytics world for review.

Great - thanks. Once you've got that we can see how to move on

Just a note: now that EL tables were renamed, and active tables recreated in T160454, they should all have an auto-increment id key. We could potentially now modify the eventlogging_sync.sh replication to replicate based on id instead of timestamp.

That would be a huge win! And it would make the terbium back-filling unnecessary finally!