Page MenuHomePhabricator

Opt-in numbers in IBM not as expected
Closed, ResolvedPublic4 Estimated Story Points

Description

Hi there! Yesterday, using IBM's querying tools, I was trying to give myself an updated sense of what percentage of our list has given us a clear opt-in response and what percentage shows a NULL response. The numbers shook out as follows:
https://docs.google.com/spreadsheets/d/1JY8Zz4awTu7STBVq7oDOeIytxmXkgHlY4VgqwQjuf7E/edit#gid=1637312242

Some of these look reasonable to the naked eye: 43% of enGB have opted-in or out seems sane, for example. But others are very obviously incorrect: itIT has 41k opt-ins when according to my email performance data, 100k itIT donors have opted-in (optin = 1 / YES) through emails alone. Considering banners also ran, that opt-in number should be well over 100k.

svSE also looks off. I'm expecting about 16k - 20k opt-ins (optin = 1 / YES) but only see 3.

Why does this data seem out of date? Is it importing properly into IBM? Does it take longer to update than other fields? I need to know ASAP because our repermissioning testing is slated to start next week and I need to make sure I'm not emailing folks who have already opted-in.

Related Objects

Event Timeline

CCogdill_WMF created this task.

Bumping this ticket as I think this might be an important bug!

I mentioned comparing the data in IBM to "email performance data". I'm basing that on the statler query we run to collect A/B test results. Here is a sample query--this is one for only one series of Italian tests:
./statler "sp5845" -s 20190201000000 -e 20190415000000 --optin

We input the results from these queries into our email stats doc: https://docs.google.com/spreadsheets/d/1WqOlWFWe0-wuD70j3e4C6b7-uSx-J06lOePK_6_yET8/edit?ts=5c36497c#gid=1273793918. If you scroll to column AH, you can check the totals for opt-in=1 and it's over 115k for Italy.

I just reran the queries in IBM per @Ejegg's suggestion and am seeing basically the same numbers as before:

itIT full list (IBM)itIT optin NO (IBM)itIT optin NO (statler)itIT optin YES (IBM)itIT optin YES (statler)
697,4253,4317,49241,597116,297

FR-tech, the approach to this would be to first check the numbers in Civi, looking at the civicrm_value_1_communication_4.opt_in field for donors with preferred_language=it.

If those numbers look like @CCogdill_WMF's iIT optin NO (IBM) and iIT optin YES (IBM) numbers, we know some donors with the optin YES flag set in a banner or in donatewiki aren't having that opt-in field saved in Civi.
@Pcoombe had the foresight to both send opt_in=0 or opt_in=1 on the querystring, and ALSO add that flag to the utm_source.

So donors who have utm_source like '%optin0%' should also have civicrm_value_1_communication_4.opt_in = 0, and donors who have utm_source like '%optin1%' should have civicrm_value_1_communication_4.opt_in = 1. If you find a big chunk of donors who aren't getting that saved, it's time to start looking for their records in the logs (payments-* for the front-end, fundraising-drupal and/or process-control for the import into Civi) to see where we lose the info.

Just to clarify, the optIn0 / optIn1 is only added to the utm_source for banners, not for donors coming from donatewiki (i.e. emails)

@CCogdill_WMF Is it difficult to rerun the IBM queries? If it's not, I'd like to see the up to date numbers. It would help me track down what's going on here.

Hey @XenoRyet, yeah, it's pretty easy to rerun the queries. Looks like the numbers haven't really changed.

Updated numbers for itIT from IBM

full listopt-in Yesopt-in No
697235415843430

Ok, so I'm seeing 221,652 contacts in Civi with opt-in 1 and language preference of 'it-IT', which seems more in line with what we were expecting. I'll take a look at the export next, see what might be going wrong there.

@XenoRyet does grouping by email address reduce those numbers much? The Silverpop export does its own de-duping by email.

@Ejegg, no not much. Drops by a few hundred.

This is looking like it must be something on IBM's side somewhere, maybe in their import? Here's what I have so far:

Our database has 200,000ish opt-ins for it_IT, and the latest file we've exported to IBM matches that.
Our file has about 650,000 total rows for it_IT, and that also matches what IBM has for a query on it_IT

Our file has the following counts on our opt-in field
58795 NO
204589 YES
390375 ""

The IBM query for all of it_IT has the following counts for their opt-in field
3430 "NO"
41569 "YES"
652052 ""

So it looks like the opt-ins are getting lost somewhere between our export and IBM's database. Only other possibility is that we're querying for the wrong thing on IBM's side, but that seems unlikely.

Sigh... so this definitely seems to be a field mapping issue on the IBM side. My apologies! I think I know how the problem originated: the field that civi passes over in the export is called opted_in which is the same name as an IBM system field. When we set up new imports (i.e. if it breaks and we need to kick the job, or if we add a new column and need to remap fields), IBM defaults to overwriting whatever data is in this field in place of their system data. There is a manual way to override this, but it is easy to miss and with more people joining the team, the likelihood this mistake repeats itself is high.

So I am changing my ask: is it possible to rename this field in the Civi export so it doesn't match IBM's column header? We call this field latest_optin_response when we manually rename it.

Sorry again for not catching this sooner!

cc @KHaggard, I'm going to tag you in a trello card with some next steps but here is the context if you're curious.

@CCogdill_WMF I think renaming the field in the civi export should be a doable thing, but we should probably do that work under a new ticket. I'll go ahead and create that.

As far as this ticket goes, do you think we can call it done?

Yeah, I think so. Thank you so much!

XenoRyet moved this task from Doing to Done on the Fundraising Sprint King Kong vs. Mozilla board.
XenoRyet set the point value for this task to 4.