Page MenuHomePhabricator

Restrict MailingProvider data table to one row per action
Open, Needs TriagePublic

Description

The civicrm_mailing_provider table is 100GB.

The table is used

  • by @EYener for data analysis
  • by DS & Major gifts to see what emails people have been sent (MG have requested it be more searchable)

One feature of this table is it often stores multiple Opens etc for the same mailing - this seems to annoy most people & take up disk space. This phab is to consider only having one row per action

When we write to this table we use

 INSERT IGNORE INTO civicrm_mailing_provider_data
(`contact_identifier`, `mailing_identifier`, `email`, `event_type`, `recipient_action_datetime`, `contact_id`)

And any row where contact_identifier + recipient_action_datetime + event_type is unique will be inserted

We could consider changing that uniqueness check or having a follow up process to remove duplicates

Question - perhaps mostly for @EYener - are there times when the duplicates are good / needed - e.g could it be 2 sends? Would a uniqueness check on

contact_identifier + mailing_identifier + event_type

work instead? I assume that would prioritise the earliest

Event Timeline

Cool! The uniqueness check suggested sounds good and beneficial to me. I would like to double check and make sure that it captures the first event occurrence.

  • Would it be possible to get an event record total count? IE, if the contact identifier + event + mailing identifier has multiple instances (of opens, say) could there be a counter? I'm not sure this is necessary or requested, but for curiosity.
  • I don't think there would be multiples on the sent or received event - this probably only happens on the open but I would have to check
  • Does this table impact any others? I'm thinking specifically of civicrm.civicrm_mailing or civicrm.civicrm_mailing_stats. I'm not sure how civicrm.civicrm_mailing_stats gets populated, but if it is utilizing the mailing_provider table to roll up metrics this would impact other metrics as well.