Page MenuHomePhabricator

Conflicting fields in some Acoustic contact records
Closed, ResolvedPublic

Description

I ran a query to check which fund lifetime_usd_total points to, when I noticed discrepancies in the individual contact records.

Query:
WHEN lifetime_usd_total is greater than 0
AND (latest_donation_date is blank
OR latest_usd_amount is equal to 0)
AND endowment_last_donation_date is not blank

I expected this query to return 0 contacts because my inclination is that lifetime_usd_total only applies to annual fund donations, not endowment. But instead, 21 contacts populated this query. I looked at the first two contact records and found that latest_donation_date is blank, but the rest of their record shows recent donations to the annual fund within the last 5 years.

ContactID 30996345
ContactID 1764053

I compared what I saw in Acoustic to Civi and found out:

  • ContactID 30996345 has no record for ever donating to the annual fund in civi, yet this donor has a lifetime_usd_total over $100 and shows a donation every year since 2014 in Acoustic. Acoustic also shows that their first donation was in 2007 when the latest_donation_date is blank.
  • ContactID 1764053 matches what I found in civi, although the following fields are blank or zero in Acoustic when they shouldn't be: highest_donation_date, highest_native_amount, highest_native_currency, highest_usd_amount, latest_native_amount. This contact's first donation date is 2008.

Essentially, my concern is that annual fund donors appeared in a query meant to remove all annual fund donors. We heavily rely on latest_don_date and first_donation_date as prime fields for targeting/segmentation. This shows me that it's possible we have been including contacts we don’t want for our email sends.

Can fr-tech identify what's happening here? Should we refrain from using "is blank" in queries and rely on "is equal to 0" instead?

Event Timeline

CCogdill_WMF added a subscriber: Ppena.

Thanks so much for catching this, Katie! This seems like a big deal so I am setting the priority at High.

Katie noted that one possible implication is we've been sending emails to people we didn't mean to, and she's right. But another implication, and one that would affect revenue even more, is that we haven't been emailing people we should because they didn't meet latest_donation_date criteria, which is essential in our Acoustic queries. We need to know ASAP why this data doesn't line up.

cc @Ppena @MSuijkerbuijk_WMF

It looks like stats for CID 30996345 have been incompletely combined with stats from CID 51099 for the silverpop export. CID 30996345 has a single primary email address in Civi. CID 51099 has a different primary email address, but has a non-primary email address that is the same as CID 30996345. The names are completely different between those two contacts, and CID 51099's name doesn't correspond to either of their email addresses. I see no rows in the export for the CID 51099's primary email address.

I see two ways to fix this:

  • combine the stats completely and continue to export a single record for those two contact records with the name on the later one
  • add missing filters on email.is_primary when deduping and combining stats so that we export two contact records, one with the primary email of CID 50199 and one with the primary email of CID 30996345

Thanks for looking into that, Elliott. Are you able to confirm the same issue occurred for ContactID 1764053? I wonder if there's multiple contacts that replicate this issue.

As far as those proposed solutions go, if CID 51099 has a different primary email address, then it would make the most sense to me to have CID 51099 and CID 30996345 separated out completely. Let me know what would be needed on our side to help with this.

Overall for CID 30996345, it sounds sort of like a bad dedupe, but I see you mentioned the two contact records were "incompletely combined" - do you have a sense of scale here for this issue, or is than edge case?

@KHaggard Looking at 1764053 - this is a contact who is no longer being exported & is on the suppression list. So it's a different issue & seems to be 'by design'

Ok, it's good to know that cid 1764053 will never be mailed to. Would it be helpful if I included the CIDs from the other 19 contacts from my query in the description? I listed two as a starting point, but maybe it would be helpful to list the rest in order to figure out if the others are incompletely combined like CID 30996345. I'm still not clear on if this issue is widespread across the database or just contained in a select few contacts.

Also, if you think it will help, I have some available space in my calendar next week if we want to hop on a call to talk through this.

@KHaggard yes - please provide the others - I've spotted something in the silverpop script that looks a bit wrong to me & I'm wondering if it might relate

Great, here's the rest of the CIDs. There's actually 20 in total:
40331964
5011493
40803102
40872242
12395037
40824161
40633943
5007349
1476323
39329944
40520065
37036568
2376823
1693790
40189493
40187588
29506294
40900834
40685899
40855524

Of the list these are all cases where they were added to the suppression list but not updated due to their opt out flags. This is by design

Merged to Opted out On suppression list
40331964 yes yes
40803102 14623589 yes yes
12395037 4871126 yes yes
40824161 17599424 yes yes
5007349 2204764 yes yes
1476323 11348 yes yes
40520065 yes yes
2376823 48110 yes yes
1693790 261507 yes yes
40187588 19524161 yes yes
40900834 yes yes
40685899 yes yes
40855524 38260 yes yes

This one is not updated due to an opt out & would be on the suppression list, but he is a staff member
37036568 13166837 yes no *

These 6 need more digging

cidmerged to
5011493
4087224219672658
40633943
39329944
40189493
2950629447911
DStrine lowered the priority of this task from High to Medium.Jun 23 2020, 8:58 PM
DStrine moved this task from Sprint +2 to Sprint +1 on the Fundraising-Backlog board.

Just adding in one more cid here, pertaining to the annual fund latest_donation_date field conversation we had yesterday in the Donor Services call. This donor is on the suppression list, which explains why both_funds_latest_donation_date is blank, but curious on why the annual fund latest_donation_date is blank when there's a first donation date populated. CID 1707407

That last contact id is deleted now - they have no donation dates in our system but they won't update that through as they are deleted

Per our convo in T260708 - moving my comment to here.

"I ran a query to check for any contacts in en6C and FR who donated within the last 5 years and are not opted out that have blanks for all_funds_latest_currency / all_funds_currency_symbol. The count wasn't huge, but it returned a bit over 6k records. I'm pasting a few cids from that query below for you to look at. Let me know what you think.

1768225 (dedupe contact - is on suppression list)
1769498
1765981"

I think this is worth looking into, but we also know that our database in IBM has a lot of "junk drawer" contacts that are either on the suppression list or just don't make sense. I could see it being worth eventually creating a new yes/no field that is "Suppressed" so we can clean up my query counts a bit.

@CCogdill_WMF @MNoorWMF @KHaggard I believe you were going to contact legal on this. did you have any discussions about this or do you have any information to add?

@CCogdill_WMF Were you going to contact legal? I'm not sure what we need to reach Legal for - but this task still stands and I was planning on creating a new task to link to this. It seems to be pointing to a bigger issue that our database is carrying contacts that don't match what's in civi, and so I cannot verify with fr-tech when new fields are successfully backfilled. We have quite a few contacts in IBM with nonsensical data and it meddles with our query counts.

Will make a new phab soon and will link it here.

New phab for investigating the weird contacts we can't place: T261705

Adding in some new mystery contacts from another phab (T265700) into this one to investigate why a small number of contacts did not get modified by a full database update and are also not on the Master Suppression List.

39304535
22660624
28077406
18203378
43893568
45911670
39916155
29396677
25388232
39385003

@Eileenmcnaughton started investigating a few of these, so I'm dropping in her comments from the other task:

So looking at the first one - he has 2 references in silverpop

This is for his current primary address
https://engage4.silverpop.com/searchRecipient.do?action=edit&recipientId=482245963665&listId=9644238

And this is for is old primary address (with the old style numbers)
https://engage4.silverpop.com/searchRecipient.do?action=edit&recipientId=310967082458&listId=9644238

The old primary address would be added to the suppression list I believe (@Ejegg @mepps) - even though he has what looks like another duplicate record in 24487031

This seems like this issue was fixed as expected but there are still some unclear things around the suppression list still

The second one

contact id 22660624

https://engage4.silverpop.com/lists.do?action=listSummary&listId=9644238

has been merged in the silverpop export into 42965336 - so there are no updates going out on 22660624
(they are not actually merged in civi)

The record in silverpop has a different - presumably old - email - which is the same as 16773866

@KHaggard should we close this out? Is there another issue we need to create as a follow on?

KHaggard claimed this task.

Wow this ticket is so old it has our original silverpop field names! We can close this out @Eileenmcnaughton
Out of curiosity I ran that query again and it returns zero as expected so all data is sorted out and clear.