Page MenuHomePhabricator

Evaluate matomo deployment options for log import [2hr]
Closed, ResolvedPublic

Description

After looking through matomo as a group, we determined that the log import method will be a better fit for our needs than javascript tracking:

  • We want to explicitly avoid collecting many of the data points that javascript tracking enables such as client ip address, geolocation, and detailed client system information
  • We don't want users to be able to opt out of the anonymous data points that we do wish to collect, which include very basic aggregate statistics such as unique daily visitor count and daily login count

We need to spend a little bit of time exploring the best way to import the nginx access logs into matomo. That could mean:

  • sharing files via NFS or syslog and keeping matomo on a separate server, or
  • putting matomo on the same server as TWLight's nginx service and importing logs directly, or
  • maybe something else we haven't thought of

Question to answer:

  • What's the best way to get nginx access log data into matomo? Criteria include:
    • reliability/availability/performance
    • ease of deployment
    • ease of managing data retention
    • ease of maintenance

Event Timeline

The matomo folks maintain a python etl script that uses the same api as the javascript tracker. The cleanest solution is to:

  • add a syslog container to twlight
  • configure twlight nginx to ship logs there
  • continuously run the etl script on the syslog container

Our matomo instance will treat this exactly like it would incoming javascript client data, so there won't need to be any changes there. We won't be dumping log files onto external storage, so there are no new log retention concerns.

https://github.com/matomo-org/matomo-log-analytics/tree/4.x-dev