The civicrm_mailing_provider table is 100GB.
The table is used
- by @EYener for data analysis
- by DS & Major gifts to see what emails people have been sent (MG have requested it be more searchable)
One feature of this table is it often stores multiple Opens etc for the same mailing - this seems to annoy most people & take up disk space. This phab is to consider only having one row per action
When we write to this table we use
INSERT IGNORE INTO civicrm_mailing_provider_data (`contact_identifier`, `mailing_identifier`, `email`, `event_type`, `recipient_action_datetime`, `contact_id`)
And any row where contact_identifier + recipient_action_datetime + event_type is unique will be inserted
We could consider changing that uniqueness check or having a follow up process to remove duplicates
Question - perhaps mostly for @EYener - are there times when the duplicates are good / needed - e.g could it be 2 sends? Would a uniqueness check on
contact_identifier + mailing_identifier + event_type
work instead? I assume that would prioritise the earliest