CPS, our analytics consultants, found what appears to be a problem with the mailing data in CiviCRM. I asked CPS to pull data showing how interacting with our emails impacts future donations. They were pulling a huge number of rows using the query below and parsing the raw data to try and answer my questions:
SELECT mp.email, mp.event_type, mp.mailing_identifier, mp.recipient_action_datetime FROM civicrm.civicrm_mailing_provider_data mp WHERE recipient_action_datetime >= '2017-07-01 00:00:00' AND recipient_action_datetime <= '2018-01-01 00:00:00';
As I understood it, the civicrm_mailing_provider_data table would only retain up to one engagement event per mailing_identifier per email address, i.e. if a donor RECEIVED and OPENED an email, the table would only retain the OPENED event since it happened more recently. However, CPS' analysis showed not only did the table have multiple events per email/mailing_id, but it suggested that the same user received the same email more than once. 89% of mailings show they were SENT twice, once at the time of the send and once about 7 hours later. This document helps detail how single mailings receive multiple engagements.
I don't know how to troubleshoot this very well myself without breaking Civi/Excel by trying to deal with huge amounts of data, but what CPS found seems compelling. Are we double counting data somehow? It's important to us to be able to use this data to segment our lists along with the obvious analytical uses, so it would help to figure out what CPS is seeing.
Let me know how I can help and if you want CPS to come in on this task.