Civicrm's database uses log_% tables to store each change to corresponding tables. The log_% are populated by triggers that copy each write to a table to the corresponding log table. Row compression is in use. They grow without expiration, and with about four years of historical data they account for about 50% of the total disk footprint of the civicrm database.
Extrapolating from testing with log_civicrm_activity, if we were to trim these tables to records newer than 1 year, we would recover about 308GB or 37% the overall civicrm database footprint.
We've talked about expiration timelines and it sounds as though FR tech agrees that 1 year may be an acceptable max age for data in log_% tables. This task is to address that question and to develop a process to purge old data that can be automated. There are also several log tables that are not useful, which we should be able to disable altogether.
Here's a summary of action items:
- deploy civicrm patch to disable trigger/table creation for unnecessary tables
- drop trigger civicrm_financial_item_after_insert
- drop trigger civicrm_financial_item_after_update
- drop trigger civicrm_financial_item_after_delete
- drop table log_civicrm_financial_item
- drop trigger civicrm_entity_financial_trxn_after_insert
- drop trigger civicrm_entity_financial_trxn_after_update
- drop trigger civicrm_entity_financial_trxn_after_delete
- drop table log_civicrm_entity_financial_trxn
- delete from log_civicrm_entity_tag (actually this was done as a table swap and drop to recover the 14GB it consumed)
- drop trigger civicrm_financial_trxn_after_insert
- drop trigger civicrm_financial_trxn_after_update
- drop trigger civicrm_financial_trxn_after_delete
- drop table log_civicrm_financial_trxn
- drop trigger civicrm_line_item_after_insert
- drop trigger civicrm_line_item_after_update
- drop trigger civicrm_line_item_after_delete
- drop table log_civicrm_line_item
- drop trigger civicrm_mailing_after_insert
- drop trigger civicrm_mailing_after_update
- drop trigger civicrm_mailing_after_delete
- drop table log_civicrm_mailing
- drop trigger civicrm_mailing_job_after_insert
- drop trigger civicrm_mailing_job_after_update
- drop trigger civicrm_mailing_job_after_delete
- drop table log_civicrm_mailing_job
- drop trigger civicrm_subscription_history_after_insert
- drop trigger civicrm_subscription_history_after_update
- drop trigger civicrm_subscription_history_after_delete
- drop table log_civicrm_subscription_history
Additional things we haven't solved yet:
- decide on date range to keep (>1y seems feasible)
- map out a safe procedure to purge old data
- script the purge procedure and time it on frdev
- decide whether the purge procedure can happen live or requires an outage