Page MenuHomePhabricator

Restore target smart data lost on legacy merge screen
Closed, ResolvedPublic

Description

Target Smart data is being lost in the current merge process. There are ~7000 records containing this type of data (I searched on Net Worth) that have been deleted during the merge process. Not everyone had a Net Worth in the Target Smart update so this could be a higher number.  I also don't know overall which ones were merged which way (donor services, MG&E, or through the script). Is there a way to restore this data to donor records? Or reverse the merging done on these records?

Event Timeline

So having checked a couple -

19540112 was merged prior to the import
28299802 has data input at 2019-08-29 07:19:33 = merged 2019-08-30 20:35:16 - by a user - so that was after

I then did some mysql queries and none of the affected rows were merged by the dedupe script - they were all merged by people. This would also mean they were merged on the old screen I believe, as the Deduper uses the script code.

I think what we can say is that the old merge interface is not keeping this data reliably - and this is not a recent regression - but also there are a bunch where they were merged between when we sent the data to targetsmart & when we got it back & it was added on the old ID.

select count(*) FROM civicrm_value_1_prospect_5 INNER JOIN civicrm_contact c ON entity_id = c.id WHERE c.is_deleted=1 AND (estimated_net_worth_144 <> '' OR family_composition_173 <> '' OR income_range <> '' );
8733

Details

Select query
SELECT * FROM civicrm_value_1_prospect_5 INNER JOIN civicrm_contact c ON entity_id = c.id LEFT JOIN civicrm_activity_contact ac ON ac.contact_id = c.id LEFT JOIN civicrm_activity a ON a.id = ac.activity_id WHERE c.is_deleted=1 AND (estimated_net_worth_144 <> '' OR family_composition_173 <> '' OR income_range <> '' ) AND activity_type_id = 87

Less detail, incl merger

SELECT c.id as deleted_contact_id, merger.contact_id as merger_contact_id, YEAR(activity_date_time), MONTH(activity_date_time)
FROM civicrm_value_1_prospect_5 INNER JOIN civicrm_contact c ON entity_id = c.id
LEFT JOIN civicrm_activity_contact ac ON ac.contact_id = c.id AND ac.record_type_id =3
LEFT JOIN civicrm_activity a ON a.id = ac.activity_id
LEFT JOIN civicrm_activity_contact merger ON merger.activity_id = a.id AND merger.record_type_id = 2
WHERE c.is_deleted=1
AND (estimated_net_worth_144 <> '' OR family_composition_173 <> '' OR income_range <> '' ) AND activity_type_id = 87;

SELECT YEAR(activity_date_time) as year, MONTH(activity_date_time) as month, merger.contact_id as merger_contact_id, count(*) FROM civicrm_value_1_prospect_5 INNER JOIN civicrm_contact c ON entity_id = c.id LEFT JOIN civicrm_activity_contact ac ON ac.contact_id = c.id AND ac.record_type_id =3 LEFT JOIN civicrm_activity a ON a.id = ac.activity_id LEFT JOIN civicrm_activity_contact merger ON merger.activity_id = a.id AND merger.record_type_id = 2 WHERE c.is_deleted=1 AND (estimated_net_worth_144 <> '' OR family_composition_173 <> '' OR income_range <> '' ) AND activity_type_id = 87 GROUP BY year, month, merger_contact_id;

Here is how it breaks down

+------+-------+-------------------+----------+

yearmonthmerger_contact_idcount(*)

+------+-------+-------------------+----------+

2017618601171
2017634450221
20176110271701
20176172151412
201762064564858
2017763537431
20177187202351
2017720645648156
2018250396281
20182157576641
20182206456481
2018350396281
2018363537434
20183206456481
20183206903472
201832104185818
20183260343743
2018448461561
2018450396281
2018463537435
201841061738722
20184151245751
20184206456482
201842104185815
20184260343745
2018548461561
2018550396281
20185106173872
20185151245751
20185157576641
20185206903473
20185210418582
20185260343741
2018650396282
20186635374312
201861061738729
20186206456481
20186206903474
201862104185819
20186260343746
2018763537435
201871061738716
20187206903473
201872104185813
2018850396281
20188106173872
20188110271701
20188151245751
20188210418581
2018936861981
20189106173871
20189110271701
20189151245752
20189206456481
20181048461561
20181050396283
201810106173879
201810151245751
201810206456481
201810206903471
201810210418581
201810217881821
20181148461562
20181150396282
20181163537433
201811106173873
201811110271701
201811157576646
201811206456485
201811210418581
201812484615613
20181263537432
201812106173874
201812110271701
201812157576641
201812206456481
201812206903474
201812260343741
2019150396282
20191106173874
20191157576643
20191210418582
20192106173879
2019350396281
2019363537432
20193106173872
20193157576641
20193206456481
20193260343749
2019436861987
20194106173872
20194110271701
20194157576641
20194210418582
20194278047651
20194291324871
20194291461771
20194309967561
201943177934910
201953686198189
2019550396281
201951061738710
201952064564821
20195210418581
20195260343743
201953177934931
20196368619823
2019650396283
2019663537432
201961061738739
20196110271702
20196151245751
201962064564838
20196210418584
201962603437437
2019631779349327
201973686198431
201975039628132
2019763537431
201971061738770
201971082723241
20197110271701
20197131668371
20197151245752
20197157576641
20197202677471
2019720645648161
2019721041858317
20197217881821
20197228688501
2019726034374104
20197280196712
20197317793491504
20197319239261
20197319279411
2019734488457500
2019734488458124
201983686198154
201985039628579
201986353743250
2019810617387298
20198108272323
20198131668371
201982064564815
2019821041858247
20198260343743
20198280196716
20198317793491659
2019834488457399
201983448845850
20199368619828
2019963537436
201991061738746
20199206456481
201992104185819
2019931779349208
20199344884572
20199344884582

@Eileenmcnaughton Do you have an idea of how many records with the Target Smart data were deleted?

@NNichols so 7k if you go right back but around $2k recently - but in many cases the data was also on the kept contact or has since been added toit

Any sense on how we restore this data? Michael has a phab task in asking that all prospect information be automatically merged which will save the data from this in the future. However, we are starting to put together lists of donors we (Major Gifts & Endowment) want to suppress from fundraising emails by end of next week. We're using the Net Worth data points as criteria for this list.

It would need to be done by fr-tech I think - I don't think you can.

Eileenmcnaughton renamed this task from Restoring Deleted Target Smart data to Prevent target smart data from being lost on legacy merge screen.Oct 14 2019, 8:39 PM
Eileenmcnaughton renamed this task from Prevent target smart data from being lost on legacy merge screen to Restore target smart data from being lost on legacy merge screen.Oct 14 2019, 10:42 PM
Eileenmcnaughton renamed this task from Restore target smart data from being lost on legacy merge screen to Restore target smart data lost on legacy merge screen.Oct 15 2019, 3:11 AM

@NNichols I just ran an update which ensured that any prospecting data was moved to the merged-to contact where there was prospecting data on the deleted contact but not the merged to contact. I think this is probably the majority of them & most of those remaining have valid prospecting data on both but I'll do some more queries

Extended (but sadly slow) query for finding any others

SELECT v.entity_id as deleted_contact, v2.entity_id as kept_contact, v.estimated_net_worth_144, v.income_range, v.biography_66, v.charitable_contributions_decile, 
       v.disc_income_decile, v.family_composition_173, v.voter_party, v.occupation_175
FROM civicrm_value_1_prospect_5 v
INNER JOIN civicrm_contact deleted_contact ON v.entity_id = deleted_contact.id AND deleted_contact.is_deleted = 1
INNER JOIN civicrm_activity_contact ac ON ac.contact_id = deleted_contact.id AND ac.record_type_id = 3
INNER JOIN civicrm_activity a ON a.id = ac.activity_id AND activity_type_id = 87
INNER JOIN civicrm_activity_contact merger ON merger.activity_id = a.id AND merger.record_type_id = 2
    AND activity_type_id = 87
LEFT JOIN civicrm_value_1_prospect_5 v2 ON v2.entity_id = merger.id
WHERE
   (v.estimated_net_worth_144 IS NOT NULL AND v.estimated_net_worth_144 != v2.estimated_net_worth_144)
   OR (v.income_range IS NOT NULL AND v.income_range != v2.income_range)
   OR (v.biography_66 IS NOT NULL AND v.biography_66 != v2.biography_66)
   OR (v.charitable_contributions_decile IS NOT NULL AND v.charitable_contributions_decile != v2.charitable_contributions_decile)
   OR (v.disc_income_decile IS NOT NULL AND v.disc_income_decile != v2.disc_income_decile)
   OR (v.family_composition_173 IS NOT NULL AND v.family_composition_173 != v2.family_composition_173)
   OR (v.voter_party IS NOT NULL AND v.voter_party != v2.voter_party)
   OR (v.occupation_175 IS NOT NULL AND v.occupation_175 != v2.occupation_175)
   LIMIT 10;

@LeanneS @NNichols the scope of this task is to restore Targetsmart data where the checkboxes may have been missed during merge. My current thinking is to find all deleted contacts with Targetsmart data where the merged to contact does not have TS data & move the data across. I previously did a restore for the 'low-hanging fruit' but there are more for a second round.

However I have 2 questions

  1. if some fields have been moved across but not others (e.g income but not charitable decile) should I move across the not-moved field. The alternative is to assume that indicates a deliberate decision was made & to skip
  2. Which of the fields in the prospect tab should I recover. Only targetsmart or some others too. The fields are:

TargetSmart fields

Field
TIncome_Range
TEstimated_Net_Worth
TCharitable_Contributions_Decile
TDisc_Income_Decile
TFamily_Composition
TVoter_Party
TOccupation

Not TS fields

Reviewed
Stage
Capacity
Steward
Solicitor
Affinity
Philanthropic_Interests
Philanthropic_History
Biography
Interests
Origin
On_Hold
Subject_Area_Interest
University_Affiliation
next_step
Notes
Board_Affiliations
Endowment_Stage
PG_Stage
Survey_Responses

I checked again & there are

  • 443 rows where there is targetsmart data in the deleted prospect record & no prospect record
  • 498 rows with ts data in the deleted record & a prospect record on the kept contact

Query 1 - no record (note the v2.id condition is hacked onto the generic query hence some stuff could go but it does no harm

SELECT
       v.entity_id as deleted_contact, v2.entity_id as kept_contact_id,
       v.estimated_net_worth_144 as lost_net_worth, v2.estimated_net_worth_144,
       v.income_range as lost_income_range, v2.income_range,
       v.charitable_contributions_decile as lost_charitible_decile, v2.charitable_contributions_decile,
       v.disc_income_decile as lost_income_decile, v2.disc_income_decile,
       v.family_composition_173 as lost_family, v2.family_composition_173,
       v.voter_party as lost_party, v.voter_party,
       v.occupation_175 as lost_occupation, v.occupation_175
FROM
    civicrm_value_1_prospect_5 v
        INNER JOIN civicrm_contact c ON entity_id = c.id 
        INNER JOIN civicrm_activity_contact ac ON ac.contact_id = c.id AND ac.record_type_id =3
        INNER JOIN civicrm_activity a ON a.id = ac.activity_id  AND activity_type_id = 87
        INNER JOIN civicrm_activity_contact mergedto USE INDEX(index_record_type) ON mergedto.activity_id = a.id AND mergedto.record_type_id = 1
        LEFT JOIN civicrm_value_1_prospect_5 v2 ON mergedto.contact_id = v2.entity_id
WHERE c.is_deleted=1
AND
  v2.id IS NULL AND (
            (v.estimated_net_worth_144 IS NOT NULL AND v.estimated_net_worth_144 != '')
            AND (v2.estimated_net_worth_144 = '' OR v2.estimated_net_worth_144 IS NULL)
        OR ((v.income_range IS NOT NULL AND v.income_range != '') AND (v2.income_range IS NULL OR v2.income_range = ''))
        OR ((v.charitable_contributions_decile IS NOT NULL AND v.charitable_contributions_decile != '') AND
            (v2.charitable_contributions_decile IS NULL OR v2.charitable_contributions_decile = ''))
        OR ((v.disc_income_decile IS NOT NULL AND v.disc_income_decile != '') AND
            (v2.disc_income_decile IS NULL OR v2.disc_income_decile = ''))
        OR ((v.family_composition_173 IS NOT NULL AND v.family_composition_173 != '') AND
            (v2.family_composition_173 IS NULL OR v2.family_composition_173 = ''))
        OR ((v.voter_party IS NOT NULL AND v.voter_party != '') AND (v2.voter_party IS NULL OR v2.voter_party = ''))
        OR ((v.occupation_175 IS NOT NULL AND v.occupation_175 != '') AND
            (v2.occupation_175 IS NULL OR v2.occupation_175 = ''))
    )

Query - with a different id

SELECT
       v.entity_id as deleted_contact, v2.entity_id as kept_contact_id,
       v.estimated_net_worth_144 as lost_net_worth, v2.estimated_net_worth_144,
       v.income_range as lost_income_range, v2.income_range,
       v.charitable_contributions_decile as lost_charitible_decile, v2.charitable_contributions_decile,
       v.disc_income_decile as lost_income_decile, v2.disc_income_decile,
       v.family_composition_173 as lost_family, v2.family_composition_173,
       v.voter_party as lost_party, v.voter_party,
       v.occupation_175 as lost_occupation, v.occupation_175
FROM
    civicrm_value_1_prospect_5 v
        INNER JOIN civicrm_contact c ON entity_id = c.id 
        INNER JOIN civicrm_activity_contact ac ON ac.contact_id = c.id AND ac.record_type_id =3
        INNER JOIN civicrm_activity a ON a.id = ac.activity_id  AND activity_type_id = 87
        INNER JOIN civicrm_activity_contact mergedto USE INDEX(index_record_type) ON mergedto.activity_id = a.id AND mergedto.record_type_id = 1
        LEFT JOIN civicrm_value_1_prospect_5 v2 ON mergedto.contact_id = v2.entity_id
WHERE c.is_deleted=1
AND
  v2.id IS NOT NULL AND (
            (v.estimated_net_worth_144 IS NOT NULL AND v.estimated_net_worth_144 != '')
            AND (v2.estimated_net_worth_144 = '' OR v2.estimated_net_worth_144 IS NULL)
        OR ((v.income_range IS NOT NULL AND v.income_range != '') AND (v2.income_range IS NULL OR v2.income_range = ''))
        OR ((v.charitable_contributions_decile IS NOT NULL AND v.charitable_contributions_decile != '') AND
            (v2.charitable_contributions_decile IS NULL OR v2.charitable_contributions_decile = ''))
        OR ((v.disc_income_decile IS NOT NULL AND v.disc_income_decile != '') AND
            (v2.disc_income_decile IS NULL OR v2.disc_income_decile = ''))
        OR ((v.family_composition_173 IS NOT NULL AND v.family_composition_173 != '') AND
            (v2.family_composition_173 IS NULL OR v2.family_composition_173 = ''))
        OR ((v.voter_party IS NOT NULL AND v.voter_party != '') AND (v2.voter_party IS NULL OR v2.voter_party = ''))
        OR ((v.occupation_175 IS NOT NULL AND v.occupation_175 != '') AND
            (v2.occupation_175 IS NULL OR v2.occupation_175 = ''))
    )
;

@NNichols @LeanneS I previously restored the lion's share of these but left it open as there were some trickier ones. On revisiting I see that there are

  1. 443 rows where the deleted contact has a prospect record and the remaining contact has no prospect record at all. This is the type of record we previously restored & we can fairly easily do so.
  2. 498 records which have prospect data but there is data on the deleted one that is not on the new one. It's hard with these to be sure whether it's appropriate to restore the data since in some cases a choice might have been made between a 'set' of data - e.g 493554 was merged to 334574 and the net worth went from L (high) to C (low) in the process so it probably doesn't make sense to recover their charitable decile.

I'm inclined to think that it doesn't make sense to record the 498 records in set 2 as there is a return on effort issue for what might not be great data. However, there are a small handful where the lost data says they were high-ish net worth individuals - maybe you want to manually check some or all of these. They are sorted by net_worth so by the time you get to K you are down to $1million

deleted_contact kept_contact_id lost_net_worth
147173 93272 N
248558 2291310 M
23214605 3440517 M
31740878 5621620 L
598171 210473 L
633155 1127867 L
5981874 2043342 L
9803561 15631782 L
10443509 1376192 L
2376682 27982838 L
29742092 3187832 L
14129499 2363058 L
30758007 13963971 L
16478090 1703054 L
17051388 3497599 L
18483 435975 L
29729759 14625804 L
9697452 rMW32809209463b L
776436 7977417 L
29741978 6058530 L
2841451 1004191 L
6638664 3192541 K
8838784 1640623 K
11042322 3192541 K
4473469 333347 K
7027529 31012245 K
23470010 28576891 K
796967 28883022 K
521850 26501897 K
18586976 13887660 K
23619321 28892656 K
30580954 2229229 K

@LeanneS @NNichols - I've restored what I think makes sense to restore - see https://phabricator.wikimedia.org/T232636#5818528 for what is not included in that