Page MenuHomePhabricator

Spike: Civi Failmail Error: IMPORT_CONTACT Couldn't store email for the contact.
Closed, ResolvedPublic2 Estimated Story Points

Description

We've seen 50 failmails in the past three hours for Ingenico donations that get this error. A sample message is below, in which I've replaced the donor's email address with ***** These process as successful donations at the portal, but don't appear in Civi.

A message was removed from ActiveMQ due to the following error(s):

Error: IMPORT_CONTACT Couldn't store email for the contact. Source: array ( 'error_code' => 'unknown error', 'sql' => 'INSERT INTO civicrm_email (contact_id , location_type_id , email , is_primary , is_billing ) VALUES ( 15756069 , 1 , \'***\' , 1 , 1 ) [nativecode=1205 Lock wait timeout exceeded; try restarting transaction]', 'tip' => 'add debug=1 to your API call to have more info about the error', 'is_error' => 1, 'error_message' => 'DB Error: unknown error', 'debug_information' => 'INSERT INTO civicrm_email (contact_id , location_type_id , email , is_primary , is_billing ) VALUES ( 15756069 , 1 , \'**\' , 1 , 1 ) [nativecode=1205 Lock wait timeout exceeded; try restarting transaction]', )

---Source---

https://civicrm.wikimedia.org/queue/donations-damaged/globalcollect-3781646641

Event Timeline

MBeat33 triaged this task as Unbreak Now! priority.Aug 14 2016, 9:52 PM

Setting this as unbreak now as these donors will not receive TY emails, so the pipeline's broken in a way donors will see.

@Ejegg suggested this may possibly be due to the new dedupe process running in the background, which can safely be switched off.

I turned off the dedupe jenkins job but they are still happening. The error rate does not seem regular or predictable. The message only lists the blocked query and not the blocking one so I'm not sure how to track down the problem.

Normally the mysql command

SHOW ENGINE INNODB STATUS

reveals the query causing the hang - but I know in the past when we looked @Jgreen said it didn't show anything

Possibly relevant....

InnoDB: transactions deadlock detected, dumping detailed information.
160814 19:36:37

  • (1) TRANSACTION:

TRANSACTION 84A0560D, ACTIVE 24 sec inserting
mysql tables in use 4, locked 4
LOCK WAIT 8 lock struct(s), heap size 1248, 4 row lock(s), undo log entries 4
MySQL thread id 8159654, OS thread handle 0x7fb6283f4700, query id 812645971 10.64.40.109 civicrm update
/* https://civicrm.wikimedia.org/user/1 */ INSERT INTO civicrm_email (contact_id , location_type_id , email , is_primary , is_billing ) VALUES ( 15756122 , 1 , '<censored>' , 1 , 1 )

  • (1) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 1702876 page no 81414 n bits 608 index UI_email of table civicrm.civicrm_email trx id 84A0560D lock_mode X locks gap before rec insert intention waiting

  • (2) TRANSACTION:

TRANSACTION 849EEB08, ACTIVE 6026 sec fetching rows
mysql tables in use 2, locked 2
209765 lock struct(s), heap size 26196408, 41641049 row lock(s)
MySQL thread id 8157576, OS thread handle 0x7fb628704700, query id 812216189 10.64.40.109 civicrm Copying to tmp table on disk
/* https://civicrm.wikimedia.org/user/1 */ INSERT INTO dedupe (id1, id2, weight) SELECT t1.contact_id id1, t2.contact_id id2, 5 weight FROM civicrm_email t1 JOIN civicrm_email t2 USING (email) WHERE t1.contact_id < t2.contact_id AND t1.email IS NOT NULL AND t1.email <> '' GROUP BY id1, id2 ON DUPLICATE KEY UPDATE weight = weight + VALUES(weight)

  • (2) HOLDS THE LOCK(S):

RECORD LOCKS space id 1702876 page no 81414 n bits 608 index UI_email of table civicrm.civicrm_email trx id 849EEB08 lock mode S

  • (2) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 1702876 page no 178924 n bits 136 index PRIMARY of table civicrm.civicrm_email trx id 849EEB08 lock mode S waiting

  • WE ROLL BACK TRANSACTION (1)

This query should not happen

INSERT INTO dedupe (id1, id2, weight) SELECT t1.contact_id id1, t2.contact_id id2, 5 weight FROM civicrm_email t1 JOIN civicrm_email t2 USING (email) WHERE t1.contact_id < t2.contact_id AND t1.email IS NOT NULL AND t1.email <> '' GROUP BY id1, id2 ON DUPLICATE KEY UPDATE weight = weight + VALUES(weight)

That would only be triggered with no limit

OK I've got a handle on what is happening here. Each time the dedupe runs it runs on the next block of contacts - it's currently up to id 260156 so next time it's selecting contacts from 260156 to 260356

However, it turns out these ids don't exist and so it is dropping the criteria.

I can fix this fairly easily

Change 304926 had a related patch set uploaded (by Eileen):
Fix criteria range selection on dedupe.

https://gerrit.wikimedia.org/r/304926

Change 304926 merged by jenkins-bot:
Fix criteria range selection on dedupe.

https://gerrit.wikimedia.org/r/304926

Change 305407 had a related patch set uploaded (by Eileen):
Further fix to criteria range on dedupe - exclude deleted

https://gerrit.wikimedia.org/r/305407

Very small additional patch for review

Change 305407 merged by jenkins-bot:
Further fix to criteria range on dedupe - exclude deleted

https://gerrit.wikimedia.org/r/305407

Eileenmcnaughton moved this task from Doing to Done on the Fundraising Sprint Qwerty Thwacking board.

probably should have closed this last sprint