Page MenuHomePhabricator

SPIKE: Unsubscribes import decreasing in size over 2 days
Closed, ResolvedPublic4 Estimated Story PointsSpike

Description

From 5/16 to 5/18, the unsubscribes import into IBM has been decreasing: ~5k records from 5/16-5/17 and ~65k records from 5/17-5/18. That seems like suspicious behavior, especially because unsubscribes generally tend to increase in size every day. Can we make sure we aren't removing people we shouldn't?

Screen Shot 2019-05-20 at 3.47.56 PM.png (297×1 px, 113 KB)

Event Timeline

KHaggard created this task.
DStrine moved this task from Triage to Current Sprint on the Fundraising-Backlog board.
DStrine subscribed.

adding to sprint. Someone should take some time to look into this.

So, we only keep on disk the last two files uploaded, and today's Unsubscribes file is bigger than yesterday's. Checking to see if the uploads from the 16th-18th are still available in Silverpop's FTP site

Update on FTP: a week ago I tried to get in via various means and was unable. On the cluster, only the user which runs the upload is allowed to make that connection. I tried FTP-ing from my laptop via VPN (after setting the VPN address as allowed in the Silverpop UI) but it still wasn't letting me get in.

Hello!

This morning, I just saw a slight decrease again. Please see attached screenshot.

slight decrease.png (290×815 px, 90 KB)

@DStrine @XenoRyet the below query is working locally for me. It should give us a temp table full of records that show all contacts who went from unsubscribed to subscribed in 2019 accompanied by the dates they were resubscribed. As discussed we should then be able to group the results by month/day to chart the rates of subscribers over the year to confirm that we do see an increase in opt_ins (decrease in opt_outs). Hopefully that makes sense!

CREATE TEMPORARY TABLE unsubscribes_2019
SELECT 
    log.id AS contact_id,
    log.log_date AS update_date,
    log.is_opt_out AS is_opt_out,
    pre.log_date AS previous_change,
    pre.log_action AS old_log_action,
    pre.is_opt_out AS old_is_opt_out
FROM
    log_civicrm_contact log
        INNER JOIN
    (SELECT 
        id, log_action, log_date, is_opt_out
    FROM
        log_civicrm_contact) pre ON (pre.id = log.id
        AND pre.log_date <= log.log_date
        AND pre.is_opt_out = 1)
WHERE
    log.log_action = 'Update'
        AND log.is_opt_out = 0
        AND log.log_date >= '2019-01-01'
ORDER BY update_date ASC;

Unfortunately, I can't run that query as-is on frdev due to not having permission to create temp tables. I wonder if @Jeff_G or @cwdent could add that for us?

hmm something isn't right, either the query above isn't what we want? or the IBM data is exaggerating the unsubscribe reductions indicating something is up.

The above query results in a temp table being created with 2286 records who resubscribed for the whole of 2019 which is not enough to explain the reduction highlighted in the ticket description.

MariaDB [civicrm]> select count(*) from unsubscribes_2019;
+----------+

count(*)

+----------+

2286

+----------+

@Ejegg can you sanity check the approach & query, please?

Hi @jgleeson, that query looks pretty good, but there are a few other fields that also can get someone on the unsubscribes list. We're doing that for people whose email is marked as on hold (civicrm_email.on_hold=1), people marked as 'do not email' (civicrm_contact.do_not_email=1), or who are marked as 'do not solicit' in the communications table (civicrm_value_1_communication_4.do_not_solicit=1). I can run those queries and tally up the total.

Change 516464 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/tools@master] Ensure opt is is cast to a boolean

https://gerrit.wikimedia.org/r/516464

@jgleeson from email "could you elaborate a little more on how we can use the conn_id to work out what might be causing the silverpop unsubscribe discrepancies please?" - I am not totally sure what @Ejegg was referring to but regarding what log_conn_id means....

When a CiviCRM php process starts it sets the mysql connection id - https://www.w3schools.com/sql/func_mysql_connection_id.asp
When Civi writes to the log table this value goes into log_conn_id. So we can generally learn whether a single process or multiple processes created a change by checking if there is just one contact update with the given log_conn_id vs n.

There is an exception in dedupe as we deliberately manipulate the log_conn_id and change it for each dedupe pair. Civi has some revert functionality based on log_conn_id and this allows us to revert a single pair out of a long running process

Change 516464 abandoned by Eileen:
Ensure opt is is cast to a boolean

Reason:
this was only discussion code

https://gerrit.wikimedia.org/r/516464

I think I have the explanation - there was an artificial spike in the numbers on the unsubscribe list for the three days we were importing all the remind-me-later contacts, and then it went back to normal.

We build our export lists in a series of steps, all performed on a continually-updating copy of the live database.

  • select all the active contacts and their emails as the basis of the include list
  • munge data
  • munge data
  • munge data
  • select all the emails from the log tables as the basis of the exclude list (so we unsubscribe old email addresses when people change their address)
  • remove from the exclude list all the 'good (not opted out)' emails from our include list

When a record is created during all those "munge data" steps, it erroneously lands on our exclude list for that day's export. The next day it will land on the include list.

Between 5/14 and 5/17 we imported about 2 million records, fast enough that tens of thousands were being creating during the almost two hours of the "munge data" phase each day and landing erroneously on the exclude list.

We could fix this by taking the timestamp of the first (include) query and using it as an upper limit for the exclude query.

So does that mean a bunch of opt-ins have been added to the unsubscribe
list? When an email gets added to IBM's master suppression list (which is
where the unsubscribes import points), it stays there, even if the next
night's file doesn't include that email. Can I get a list of people who
have been erroneously added to the suppression list? I think I can purge a
whole csv from the list as a bulk job.

Oh darn! That could include all contacts created during almost 2 hours of each day for the last few years (as long as we've been doing the exports this way). Since those hours are generally 6AM UTC-7:45 AM UTC, I guess Eastern Europe and Israel would have the most issues.

We can definitely come up with an approximate list by hour of creation, and remove any addresses with any of the opt-out flags set.

Change 519079 had a related patch set uploaded (by Ejegg; owner: Ejegg):
[wikimedia/fundraising/tools@master] Fix mistaken opt-out of records created during export

https://gerrit.wikimedia.org/r/519079

We are redefining this task to stop the issue from continuing. We have another task for restoring the previous data: T226571

Change 519079 merged by jenkins-bot:
[wikimedia/fundraising/tools@master] Fix mistaken opt-out of records created during export

https://gerrit.wikimedia.org/r/519079

@CCogdill_WMF we put up a fix yesterday. It's a belt-and-suspenders fix that both moves the filling of the exclude table to right after the filling of the staging table, and also uses the max email ID in the staging table as an upper limit for populating the exclude table.

I was worried it would slow down the export, but this morning's seems to have taken the same time as yesterday's.

So assuming this morning's import worked on the IBM side, we should be good to use the current contents of the mailing list to purge mistakenly-added contacts from the suppression list.

Awesome! Am I right in thinking we should have seen decreases in... both export files last night? Or just the Main Database import? @KHaggard can you advise how the IBM imports looked from last night?

Katie, I'm adding a topic to our Monday check-in to go over the purge process Elliott mentioned. Let's do some data hygiene next week! :)

@CCogdill_WMF Looks like the unsubscribes list grew by 1000 and the main database import shrank by the same amount. I can't see why the SQL change would have cause it - is that normal traffic to the opt-out pages?

@CCogdill_WMF Hello! Yes, I'm attaching a screenshot of yesterday's import next to today's import. Looks like the main database decreased by ~1k and the unsubscribes increased by ~1k also.

Screen Shot 2019-06-28 at 8.45.20 AM.png (609×819 px, 133 KB)

Shouldn't be related to the opt-out pages. We aren't sending traffic there
until that tracking task is done.

Sorry, I meant the TY mail unsubscribe (linked in the footer) plus the Silverpop-hosted unsubscribe page (from which we now pull updates to Civi). Do those two combined normally generate 1,000 unsubscribes in a day?

We sent a big bulk email blast on Thursday, so yeah, that seems entirely
possible.

Last week, we performed the process I outlined above in silverpop and purged anyone from our Master Suppression List (MSL) who had been opted-out by the unsubscribes import. That removed 1.25M records. Then, the nightly import imported 936k new rows, which suggests that we successfully removed about 310k records from the MSL that were not supposed to be there. This feels like a big win!

We did notice through this process that Civi does not have a record of all unsubscribes made on the IBM side. I think these are probably unsubscribes that predate the APIs Eileen set up between IBM > Civi.

Is there anything else we should check or QA on the email side?

This is in the deployed column. Do we need to do anything else here?

Let's make a new task for the old opt-outs and call this one done.

XenoRyet set the point value for this task to 4.Jul 23 2019, 8:10 PM