Sandra's goal is to be able to dedupe contacts first name + last name + street address match
Generally they start with a group & apply ^^ rule but it often spins off big queries - groups to merge....
Eileen merge -picks
DR Language queries
DR Recurring 7 days
- also do (e.g) Country instead of group
Notes are from original description
Starting at ~2:28 UTC 2023-12-22 a user dedupe query ran long which eventually caused deadlocks and stopped user activity in civi
When the deadlocks were noticed with failmail, we would stop the offending queries using a combo of innotop to find the query and civicrm_query_killer to terminate the process. More documentation on that here: https://wikitech.wikimedia.org/wiki/Fundraising/Data_and_flow/Failmail_zoo#Fail_Mail_Storm_-_lots_of_different_jobs_failing_at_once
This happened again with two more users that were deduping, they were all running queries they had been running all week. Here is a snapshot of the graph showing the contention and innodb wait times. Direct link to the board.
The queries would be logged to the mysql slow log after they completed or were terminated. They are found on frlog1002 at /var/log/remote/fundraisingdb-mysql-slow and the previous days compressed logs are in /srv/archive/frlog1002/logs.
With help from Eileen we narrowed the issue down to the specific dedupe rule Individual (General) Address and name (General). Using different rules works as normal.
The current working theory is that with the rule above and having to use such a large range of contacts to get results (100k, 150k) that its just working on data where there are lots of possible matches or getting stuck on some data that is not what we expected.
DR is pausing using the above rule and tested other rules, Sandra will msg us if she is looking at testing a new rule. There is more discussion in this thread: https://wikimedia.slack.com/archives/G015YLP3BLP/p1703266028217519