Page MenuHomePhabricator

Longer preservation of raw data in web analytics
Open, Needs TriagePublic3 Story Points

Description

The raw data Matomo collects is currently preserved for 14 days. This keeps the size of the database low, but makes it hard to use the user segmentation feature, Increasing the period of time for which Matomo keeps the raw data probably has a huge impact on the database. To find a middle ground we need to know big the impact would be. Ideally, the raw data can be kept for three months.

Acceptance Criteria

  • We have an estimation about the impact on the needed disk space.

Notes

  • Keeping the raw data for a longer period of time might also result in reduced performance during report creation.
  • Increasing the period of time now and compare database sizes help us finding a metric.

Event Timeline

Restricted Application added a project: WMDE-FUN-Team. · View Herald TranscriptJul 18 2019, 10:27 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
kai.nissen updated the task description. (Show Details)Jul 18 2019, 12:01 PM
kai.nissen updated the task description. (Show Details)Jul 18 2019, 12:23 PM
kai.nissen set the point value for this task to 3.

I have now increased the data retention from 14 to 21 days. We should re-visit the database size page again next week Thursday (15th of August) and see if we notice any change.

Note that tomorrow morning, a campaign is starting at 10 am. The size difference of the database now and the database next Thursday would be "roughly" equivalent to the data of a week with a 2% banner campaign.

@tmletzko @kai.nissen The estimated long-term database increase should be around 7 GB during peak season and the server still has 35 GB available, so even if it turns out being more than that, we should still have more than enough disk space on the tracking server.

I have increased the data retention to 180 days now. Since it's always a bit hard to judge how this could otherwise affect Matomo, please let us know in case you see some odd behavior / feel like data is missing or if Matomo becomes extremely slow.