Page MenuHomePhabricator

Sandra can't dedupe Name + address matching contacts (no email match)
Open, Needs TriagePublic

Description

Sandra's goal is to be able to dedupe contacts first name + last name + street address match

Generally they start with a group & apply ^^ rule but it often spins off big queries - groups to merge....

Eileen merge -picks
DR Language queries
DR Recurring 7 days

  • also do (e.g) Country instead of group

Notes are from original description

Starting at ~2:28 UTC 2023-12-22 a user dedupe query ran long which eventually caused deadlocks and stopped user activity in civi

When the deadlocks were noticed with failmail, we would stop the offending queries using a combo of innotop to find the query and civicrm_query_killer to terminate the process. More documentation on that here: https://wikitech.wikimedia.org/wiki/Fundraising/Data_and_flow/Failmail_zoo#Fail_Mail_Storm_-_lots_of_different_jobs_failing_at_once

This happened again with two more users that were deduping, they were all running queries they had been running all week. Here is a snapshot of the graph showing the contention and innodb wait times. Direct link to the board.

frdb1005_contention_with_dedupe.png (527×954 px, 74 KB)

The queries would be logged to the mysql slow log after they completed or were terminated. They are found on frlog1002 at /var/log/remote/fundraisingdb-mysql-slow and the previous days compressed logs are in /srv/archive/frlog1002/logs.

With help from Eileen we narrowed the issue down to the specific dedupe rule Individual (General) Address and name (General). Using different rules works as normal.

The current working theory is that with the rule above and having to use such a large range of contacts to get results (100k, 150k) that its just working on data where there are lots of possible matches or getting stuck on some data that is not what we expected.

DR is pausing using the above rule and tested other rules, Sandra will msg us if she is looking at testing a new rule. There is more discussion in this thread: https://wikimedia.slack.com/archives/G015YLP3BLP/p1703266028217519

Event Timeline

Cstone updated the task description. (Show Details)

@SHust - can you and I have a check in about this so I can understand the workflows of the team a bit better on dedupes?

Hi @AKanji-WMF of course! Please send me an invite whenever is convenient for you.

Eileenmcnaughton renamed this task from User initiated deduping query never finishing and causing deadlocks to Sandra can't dedupe Name + address matching contacts (no email match).Feb 6 2024, 9:45 PM
Eileenmcnaughton updated the task description. (Show Details)

I have an update on this - & am working on a fix - the reason it changed is because it does the smaller tables first & at some point the address table began to have more rows that the contact table

Exciting news, thank you! Let me know when we can test it out :)