Page MenuHomePhabricator

Civi to Acoustic export: Confirming value for Total Duplicate Addresses
Closed, ResolvedPublic

Description

After the Civi maintenance concluded, I compared Acoustic's import job reports for the week. While everything seems to be working normally again, I wanted to flag one change in case it's an issue.

From Mar 9th to Mar 10th, the "Total Duplicate Addresses" column has decreased from 5,294,310 to 119 and has maintained 119 since.

This seems like a good thing, but I want to confirm with fr-tech that duplicate addresses in the Unsubscribes file was worked on during the maintenance?

I'm going to be OOO from Mar 15 - 19, so please reach out to @TSkaff and @nisrael in my absence.

Screenshot of reports:

Screen Shot 2021-03-13 at 7.30.32 AM.png (566×1 px, 202 KB)

Event Timeline

Hi email folks.

I can't see an obvious reason for this change. The data in the unsubscribe file we're looking at from today (15th) looks good. Could you paste in the acoustic stats for today's import to see if this is fixed?

I also can't access the file sent over on the 10th as we clear out the "used" export files prior to generating new ones :(

Thanks!

This comment was removed by nisrael.

So if we upload the same 5 million addresses each day & each day we upload a handful of new ones then almost all *should* be duplicates - the change suggests that we are not updating the same ones - but I don't know how to get a view into what is happening at the silverpop end -

  • is the list growing?
  • can we see any of the ones that were duplicates in the past

@bsisolak do you have any ideas? The change at our end was to change the encoding to utf8mb4 (from utf8) - but that should have minimal impact on emails which don't tend to have characters that would change

From email discussion:

Brian:

I figured out a way to sanity check this in Acoustic. Agree that the import process has added people to the MSL and _All database as of yesterday. So seems to me like you could send.

Now whey the numbers changed… that’s a mystery. And what other implication could there be? (that was rhetorical, as I’m not sure)

One idea, tomorrow turn off all the recurring imports and then turn it back on the following day with the new files. I wonder if it will figure out the difference. If you look at any of the import mapping files they all start: <?xml version="1.0" encoding="UTF-8"?>

Me:

Thanks Brian

So am I right in thinking you have ruled out

  1. that updates are not getting through and
  2. that we are doing a tonne of new updates & potentially adding people who should not be added

Regarding the import mapping files - we sftp up this data as a csv - are you referring to other updates that we send or something directly configured in Silverpop?

Brian

When a recurring import job is created, all it’s doing is setting up a nightly API call, which requires a Mapping file and Data File. Once the imports run those are saved in Acoustic (for up to 30 days). You can go download them and look at them. That is how I pulled the mapping files.
As there is not PII in these or anything sensitive, attached them for reference.

I’m wondering of it will auto-detect a different encoding when you restart them. Test the MSL import first to start with.

This also helps explain when you rename a column in Acoustic why the recurring import fail (as the mapping files no longer match the database).

We did some testing today doing the export from a view where email was cast back to utf8() and the file generated this way was identical (we compared sha sums) to the one generated where we exported from the table which used utf8mb4

To validate this locally I did the following

git checkout master

mysql>> CREATE DATABASE silverpop 

 env PYTHONPATH=~/dev/fundraising-dev/src/tools python3 silverpop_export/update.py
env PYTHONPATH=~/dev/fundraising-dev/src/tools python3 silverpop_export/export.py

sha1sum1 /tmpUnsubscribes-20210318170611.csv

versus

git checkout 6ed4eb9c1024544d3c2960f3543c681846

mysql>> DROP DATABASE silverpop;
mysql>> CREATE DATABASE silverpop COLLATE utf8_unicode_ci CHARSET utf8;

 env PYTHONPATH=~/dev/fundraising-dev/src/tools python3 silverpop_export/update.py
env PYTHONPATH=~/dev/fundraising-dev/src/tools python3 silverpop_export/export.py

sha1sum1 /tmp/Unsubscribes-20210318172121.csv

The sha1sum was identifcal as was the case on production.

I think we can rule out the files uploaded to silverpop being affected by the database characterset change.
So whatever is causing the duplicate message to be different must be something else

FR-tech note - I was able to get the following to run

  1. silverpop tests to run within my civicrm docker container
  2. silverpop update script to run on my localhost

I couldn't get either to work in both places. My notes are here on what I did https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/tools/+/673176

testing from @bsisolak concurs

Brian : "Did come up with something else to test. I took the Master Suppression List and exported it, and then turned around and imported back into itself using an Add/Update import.

And I now think this is a total coincidence and we are barking up the wrong tree. When I ran that import the number of duplicates was 0."

Hey everyone I'm back! I'm meeting with Brian tomorrow to talk through everything that happened last week, so I'll comment here again after that to wrap up and resolve this ticket.

In sum, from what I'm hearing so far today, it looks like Acoustic changed it's definition of "Total Duplicate Addresses" the exact same day the Civi maintenance started. This is to say that even if the Civi maintenance did not happen, I would have still found this weird change in the Acoustic reports. With that said, nothing appears to be broken and we can continue sending emails. Sorry for the confusion, but I'm happy we were able to figure out what was going on. Thank you!

I'll ask Brian how we can learn about these kinds of Acoustic changes well in advance to avoid this from happening again.

KHaggard claimed this task.

Following up that my previous comment is still accurate and I can resolve this task now. It seems this was a one-time occurrence and unfortunately there's nothing we can do to ensure that we will get external communications ahead of time from Acoustic.

Before March 10th, DUPLICATES = TOTAL_VALID-(SQL_ADDED + SQL_UPDATED)
After March 10th, Duplicates are now just counting how many email address appear more than once in the Civi export file alone.

Thank you all again for your hard work on this and at least we are walking away with better knowledge of Acoustic report definitions than what we had before.