Page MenuHomePhabricator

Fix dedupe merge wrapper script to track modified date
Open, Needs TriagePublic

Description

Our automated dedupe script tracks the contact ID it got up to each run & then does the next chunk the next run. This used to be fine as we didn't do deduping on ingress so we were always adding - but it feels like maybe we should consider looking at modified_date now - so that when a contact is updated they are then queued for dedupe

This could be a bit tricksy when we do large contact updates - ie at fiscal year end - because when we do that many contacts will have a recently modified date....

Event Timeline

I pushed out a fix to the limit & re-started the job but since it starts right back at the earliest modified_date it's not processing anything yet - I'll check in in a while

Change #1202838 had a related patch set uploaded (by Eileen; author: Eileen):

[wikimedia/fundraising/crm@master] Further queue_task fixes

https://gerrit.wikimedia.org/r/1202838

Change #1202838 merged by jenkins-bot:

[wikimedia/fundraising/crm@master] Further queue_task fixes

https://gerrit.wikimedia.org/r/1202838

Change #1211245 had a related patch set uploaded (by Eileen; author: Eileen):

[wikimedia/fundraising/crm@master] Respect permissions on dedupe criteria

https://gerrit.wikimedia.org/r/1211245

Change #1211246 had a related patch set uploaded (by Eileen; author: Eileen):

[wikimedia/fundraising/crm@master] Fix query retrieving max id

https://gerrit.wikimedia.org/r/1211246

Change #1211245 merged by jenkins-bot:

[wikimedia/fundraising/crm@master] Respect permissions on dedupe criteria

https://gerrit.wikimedia.org/r/1211245

Change #1211246 merged by jenkins-bot:

[wikimedia/fundraising/crm@master] Fix query retrieving max id

https://gerrit.wikimedia.org/r/1211246

This appears to be working OK now & around 600 contacts have merged since I rolled it out an hour ago.

Currently 3 jobs are running - one starting from early 2024, one from the start of the db and the main one - which is deduping new contacts in real time.

The expectation is the other 2 will be deleted once they catch up

Change #1211332 had a related patch set uploaded (by Eileen; author: Eileen):

[wikimedia/fundraising/crm@master] Debug output to generate a command to try for replication

https://gerrit.wikimedia.org/r/1211332

Change #1211332 merged by Eileen:

[wikimedia/fundraising/crm@master] Debug output to generate a command to try for replication

https://gerrit.wikimedia.org/r/1211332

Change #1213132 had a related patch set uploaded (by Eileen; author: Eileen):

[wikimedia/fundraising/crm@master] Improve dedupe logging

https://gerrit.wikimedia.org/r/1213132

Change #1213132 merged by jenkins-bot:

[wikimedia/fundraising/crm@master] Improve dedupe logging

https://gerrit.wikimedia.org/r/1213132

Change #1213560 had a related patch set uploaded (by Eileen; author: Eileen):

[wikimedia/fundraising/crm@master] Avoid group_concat when fetching ids

https://gerrit.wikimedia.org/r/1213560

Change #1213561 had a related patch set uploaded (by Eileen; author: Eileen):

[wikimedia/fundraising/crm@master] Cleaner / more complete command output

https://gerrit.wikimedia.org/r/1213561

Change #1213560 merged by jenkins-bot:

[wikimedia/fundraising/crm@master] Avoid group_concat when fetching ids

https://gerrit.wikimedia.org/r/1213560

Change #1213561 merged by jenkins-bot:

[wikimedia/fundraising/crm@master] Cleaner / more complete command output

https://gerrit.wikimedia.org/r/1213561

Change #1214105 had a related patch set uploaded (by Lars SG; author: Lars SG):

[wikimedia/fundraising/crm@master] Clarify test comment

https://gerrit.wikimedia.org/r/1214105

Change #1214105 merged by Eileen:

[wikimedia/fundraising/crm@master] Clarify test comment

https://gerrit.wikimedia.org/r/1214105