Page MenuHomePhabricator

Investigate Re-evaluate the use of INSERT IGNORE on ipblocks [8H]
Closed, ResolvedPublic

Description

Following an incident (T263842) where there was a duplicate entry on ipblocks table, it was seeing that the writes uses INSERT IGNORE:

INSERT /* MediaWiki\Block\DatabaseBlockStore::insertBlock  */ IGNORE INTO `ipblocks`

INSERT IGNORE will silently ignore errors and the code might re-trigger the same action, which can lead to duplicate entries errors.
We are not fully sure if this was the cause of the above incident, but it is definitely recommended to re-evaluate if we really need to use INSERT IGNORE or it can be switched to a normal INSERT.

Event Timeline

Marostegui added a subscriber: Niharika.

We've had another issue with this table and duplicate keys on some hosts in codfw (T277632). Replication broke and we need to rebuild them
We aren't fully sure what caused it, but having INSERT IGNORE is an unsafe statement and can cause lots of data drifts.
Can we please re-evaluate the use of this statement and if there are good reasons against turning it into a INSERT?

Majavah added a subscriber: Majavah.

Untagging CentralAuth since it doesn't directly write to the ipblocks table, if the other uses of INSERT IGNORE in CA are a problem please file separate tasks for those.

ARamirez_WMF renamed this task from Re-evaluate the use of INSERT IGNORE on ipblocks to Investigate Re-evaluate the use of INSERT IGNORE on ipblocks [8H].Apr 21 2021, 4:28 PM

INSERT IGNORE is primarily used to try prevent duplicates, while INSERT will return an error on duplicate entries INSERT IGNORE will issue a warning and discard the entry.

Technically insert ignore should have stopped all the duplicates, an alternative can be using a REPLACE however since this deletes the previous duplicate row before inserting it might cause on cascade/referential constraint issues.

We could try one one the 2 in code,

  1. Manually check if row exists before inserting then skip if it does (essentially INSERT IGNORE that is code driven) or
  2. Try to run an INSERT but put it in a try catch and just issue a continue on duplicate key error and log other errors.

Tested using try catch doesn't work since we get this error `Cannot execute query from MediaWiki\Block\DatabaseBlockStore::insertBlock while transaction status is ERROR```
so probably the reason we used ignore, since it will not generate any DB errors.

Tchanders added a subscriber: Tchanders.

Manually check if row exists before inserting then skip if it does (essentially INSERT IGNORE that is code driven) or

It sounds like this is a safe option for getting rid of the INSERT IGNORE. @Niharika - anything to add?

I discussed this task with @LSobanski and they are inclined towards us picking up this work at our leisure (preferably after SecurePoll).
We can repurpose this into an actionable ticket and add it to our backlog.

Follow-up work to be done in T283603. Closing this investigation task.