Page MenuHomePhabricator

Discrepancy between Acoustic “Active Recurring” counts and Metabase/Civi (blocking 6ENC sends)
Open, HighPublic

Description

For the Active Recurring audience, Acoustic list sizes are much lower than the equivalent counts from Metabase/Civi. The gap is wider than normal and is blocking list delivery for 6C (English-speaking countries).

[[[ https://wikimedia.slack.com/archives/C045WH0QYS2/p1762849880003399 | Slack for reference ]]]

Historical reference: in August (Cookie email 1), Acoustic returned ~550k for Active Recurring in 6C; current counts appear well below that despite additional EN acquisition.

What’s Been Tried So Far

Civi queries

  • Context cohort (segments 500/600; statuses 20/25/30): 976,683 (excludes master suppression; doesn’t directly include opt-in/MGF suppression).
  • Closer to Active Recurring (segments 400/450; statuses 2/12; excludes master suppression): 314,748 — closer to Acoustic counts.

Acoustic spot checks

  • Using AF_has_active_recurring_donation = Yes but relaxing segment/status filters returned very low results (~3,100), suggesting field/definition misalignment.

Event Timeline

Lars triaged this task as High priority.
Lars moved this task from Triage to Chaos Crew Backlog on the Fundraising-Backlog board.
Lars moved this task from Backlog to In Progress on the Fundraising Tech - Chaos Crew board.

For the record, there are two bugs causing contacts to be incorrectly added to the master suppression list:

The Master Suppression List is generated by creating a list of all emails that CiviCRM has or has had, then removing from that list all of the current, primary emails for non-deleted contacts that aren't opted out. The result is uploaded as the MSL. To generate these two lists, we look for contacts who have been modified recently, rather than checking every record.

Unfortunately, the list of all emails was generated by looking at emails modified in the last 14 days, while the list of emails to remove from that list was generated by looking at emails modified in the last 7 days. This is normally fine, as the table of modified contacts is persistent between runs so we still subtract the older modified contacts. However, if we clear out this table (with drop_incremental_schema.sql) as we did on Oct 17, then the result is that we add emails for contacts modified between 7-14 days ago to the MSL.

Secondly, since the process takes about an hour to run and we generate the list of newly modified good emails at the start of the process and then subtract it from the list of all emails to generate the MSL at the end of the process, any emails for contacts who are modified during that hour end up on the MSL (unless they are new emails, which we guard against with a check on the max email id).

Both of these issues can be resolved by purging the MSL in Acoustic and re-uploading the full list.

I would like to re-factor the SQL that generates the master suppression list so that it is additive instead of subtractive (i.e. we explicitly select the emails that should be added to the MSL, rather than adding all emails and then subtracting those we believe are good). This should be simpler to read and debug and less likely to be subject to the kinds of bug that appeared in this case. It will also allow us to confidently (and frequently) purge the MSL in Acoustic, knowing we can easily upload a new, full list.

Here is what I understand the MSL should include (with all emails non '' and not NULL):

All emails for contacts who are opt_in = no, opted out, do not email, on hold or do not solicit, unless the email is attached as a primary email to a contact who is not (and not deleted).
All emails attached to deleted contacts, unless they are also attached as a primary email to a non-deleted contact (who is not opted as above).
All non-primary emails, unless they are a primary email for another non-deleted contact (who is not opted out as above).

And then to account for deleted contacts:
All emails from log_civicrm_email that aren't in civicrm_email.

Finally, remove any emails in civicrm_uf_match, because they are users that we never want to add to the MSL.

Thanks @MSuijkerbuijk_WMF. Just for clarity, the above is what I have pulled out of the existing queries, i.e. it is the existing intended behaviour of the MSL exports.

I'll also restate this in a different way which should be easier to read:

All emails (not '' or NULL), except those which are

  • primary emails for contacts
  • which are not on hold
  • who are not opt_in = no, not opted out, not do not email or not do not solicit
  • and who are not deleted.

And then to account for deleted contacts:
All emails from log_civicrm_email that aren't in civicrm_email.

Finally, remove any emails in civicrm_uf_match, because they are users that we never want to add to the MSL.

Hi @Lars - My request in Slack is to have this confirmed with Eileen or Brian as they are aware of the rules in Civi of how the MSL works.
I am not that close to how the logs and rules have been set up technically, so I cannot approve these changes myself.

@MSuijkerbuijk_WMF Yes, I've already shown this to Eileen and Elliott and one of them will approve it before it is put into use.

Change #1208422 had a related patch set uploaded (by Lars SG; author: Lars SG):

[wikimedia/fundraising/tools@master] Prevent contacts from being incorrectly added to the suppression list and opted out. This is done by making sure the date range we consider for modifications is consistent throughout the whole export process.

https://gerrit.wikimedia.org/r/1208422

Change #1208422 merged by jenkins-bot:

[wikimedia/fundraising/tools@master] Prevent contacts from being incorrectly added to the suppression list and opted out. This is done by making sure the date range we consider for modifications is consistent throughout the whole export process.

https://gerrit.wikimedia.org/r/1208422

Change #1215730 had a related patch set uploaded (by Lars SG; author: Lars SG):

[wikimedia/fundraising/tools@master] Add option of full rebuild of suppresion_list. No change to any current behaviour, but this gives us the option of rebuilding the suppression list from zero. This will make it feasible to check if opted out contacts in Acoustic (which we uploaded) should actually be opted on a large scale by looking for them in the silverpop_excluded. Added test shows that this behaves the same as the current incremental approach, at least as far as the cases tested. We can use this to re opt-in contacts that we removed, but not to contacts opted out directly in Acoustic as some of this data is currently missing in CiviCRM (T411896) and I don't think present in CiviCRM as we go back further in time, so that process will require additional work.

https://gerrit.wikimedia.org/r/1215730

Change #1215730 merged by jenkins-bot:

[wikimedia/fundraising/tools@master] Add option of full rebuild of suppresion_list.

https://gerrit.wikimedia.org/r/1215730

Change #1216707 had a related patch set uploaded (by Lars SG; author: Lars SG):

[wikimedia/fundraising/crm@master] Add BulkReOptIn API action for Acoustic contacts.

https://gerrit.wikimedia.org/r/1216707

Change #1216707 merged by jenkins-bot:

[wikimedia/fundraising/crm@master] Add BulkReOptIn API action for Acoustic contacts.

https://gerrit.wikimedia.org/r/1216707