Page MenuHomePhabricator

Ensure we are coping with deadlock Exceptions properly & our q doesn't corrupt at high volume, reduce unecessary potentially blocking queries
Open, MediumPublic

Description

Somehow we're getting foreign key constraint violations in Civi when inserting address or email records. The contact insert succeeds enough to return an ID, but somehow it's rolled back before we do the address or email insert.

Event Timeline

DStrine triaged this task as Medium priority.Dec 11 2017, 8:40 PM
DStrine moved this task from Triage to Sprint C Feb 2nd to Feb 16th on the Fundraising-Backlog board.

@Ejegg I found it -the exception one & second time around am commenting on the right phab :-) - @DStrine merged the exceptions one into this one. I've pulled it in since it;'s the thing we have both been trying to figure out - ie. how to get rid of problems at volume - editing title to reflect that

Eileenmcnaughton renamed this task from Weird FK constraint problems in Civi to Ensure we are coping with deadlock Exceptions properly & our q doesn't corrupt at high volume.Dec 11 2019, 9:23 AM

Change 556335 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/crm/civicrm@master] Remove unnecessary query when updating org, if no name udpate.

https://gerrit.wikimedia.org/r/556335

Eileenmcnaughton renamed this task from Ensure we are coping with deadlock Exceptions properly & our q doesn't corrupt at high volume to Ensure we are coping with deadlock Exceptions properly & our q doesn't corrupt at high volume, reduce unecessary potentially blocking queries.Dec 11 2019, 9:36 AM

Change 556335 merged by jenkins-bot:
[wikimedia/fundraising/crm/civicrm@master] Remove unnecessary query when updating org, if no name udpate.

https://gerrit.wikimedia.org/r/556335

Change 559250 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/crm@master] Handle constraint errors on location insert as deadlocks.

https://gerrit.wikimedia.org/r/559250

Change 559251 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/crm@master] Handle constraint errors on location insert as deadlocks.

https://gerrit.wikimedia.org/r/559251

Change 559250 abandoned by Eileen:
Handle constraint errors on location insert as deadlocks.

https://gerrit.wikimedia.org/r/559250

Change 559251 abandoned by Eileen:
Handle constraint errors on location insert as deadlocks.

https://gerrit.wikimedia.org/r/559251

Change 559292 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/crm@master] Throw database contention exception on constraint errors in places they are known to be deadlock caused

https://gerrit.wikimedia.org/r/559292

Change 558312 had a related patch set uploaded (by Ejegg; owner: Eileen):
[wikimedia/fundraising/crm/civicrm@master] Add error code for deadlocks

https://gerrit.wikimedia.org/r/558312

Change 558315 had a related patch set uploaded (by Ejegg; owner: Eileen):
[wikimedia/fundraising/crm@master] Handle deadlocks as WMFExceptions.

https://gerrit.wikimedia.org/r/558315

Change 558315 merged by jenkins-bot:
[wikimedia/fundraising/crm@master] Handle deadlocks as WMFExceptions.

https://gerrit.wikimedia.org/r/558315

Change 559292 merged by jenkins-bot:
[wikimedia/fundraising/crm@master] Throw database contention exception on constraint errors

https://gerrit.wikimedia.org/r/559292

Change 559627 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/crm@master] Another constraint / deadlock catch

https://gerrit.wikimedia.org/r/559627

Change 559627 merged by jenkins-bot:
[wikimedia/fundraising/crm@master] Another constraint / deadlock catch

https://gerrit.wikimedia.org/r/559627

I deployed some patches today and then put the server under load to see what happened with deadlocks. I tracked 7 of them as follows

1 ) Resolved on retry - (no change from today's patch) deadlock on merge "UPDATE IGNORE civicrm_group_contact_cache SET contact_id = 10571027 WHERE con..." - it seems it worked on retry & as it was simple there appears to be nothing not copied

  1. Unresolved RecurringQueueConsumer-::importSubscriptionPayment contact id 40347371 created but contribution rolled back - "DB Error: deadlock" "INSERT INTO civicrm_address (contact_id , location_type_id https://phabricator.wikimedia.org/T178104
  2. Resolved by further patch redacted@gmail.com - I requeued - & put up gerrit - now recovered & fix deployed - checked contact & looks good
  3. Resolved - patch worked contribution for 1228509 failed on insert to contribution_extra - patch worked - this was requeued & reimported
  4. Resolved on retry - (no change from today's patch) one Nora entered that seemed to retry & work
  5. Resolved by further patch Another like 3 but since the patch had gone out it went into the pending queue
  6. Resolved - another insert on email

As can be seen 6/7 were resolved fine, 2 of which were unaffected by today's deploy & the last 2 were fixed by it.

The unresolved one is troubling though. It reached

if (empty($ctRecord['contribution_id'])) {
  // TODO: this scenario should be handled by the wmf_civicrm_contribution_message_import function.

  // Map the tracking record to the CiviCRM contribution
  wmf_civicrm_message_update_contribution_tracking($msg, $contribution);

  // update the contact
  $contact = wmf_civicrm_message_contact_update($msg, $recur_record->contact_id);

  // Insert the location record
  // This will be duplicated in some cases in the main message_import, but should
  // not have a negative impact. Longer term it should be removed from here in favour of there.
  wmf_civicrm_message_location_update($msg, $contact);
}

And successfully updated the contribution tracking -

select * FROM contribution_tracking where id = 76566677 order by id desc LIMIT 1;

However, when it got to the duplicate update function the deadlock in that caused the contribution stored to that table to be rolled back. I think we need to more deadlock retries higher in the stack. We could stop retrying in the DataBaseObject class & that would probably be more reliable @Ejegg

Change 558312 abandoned by Eileen:
Add error code for deadlocks

https://gerrit.wikimedia.org/r/558312

Change 560363 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/crm@master] Add indexes to civicrm_acl_cache.modified_date. * * Despite being basically empty this table is involved in deadlocks * frequently. * * I've added this index upstream * https://github.com/civicrm/civicrm-core/pull/16144 * It should be super quick to run so won't require an outage. * * I'm looking into further ways to improve contention on this table * https://lab.civicrm.org/dev/core/issues/1486

https://gerrit.wikimedia.org/r/560363

Change 560363 merged by jenkins-bot:
[wikimedia/fundraising/crm@master] Add indexes to civicrm_acl_cache.modified_date

https://gerrit.wikimedia.org/r/560363