Page MenuHomePhabricator

Ingest Cloud VPS audit logs into production logging pipeline
Open, Needs TriagePublic

Description

As part of T127717, Cloud VPS instances have been forwarding their audit logs to a syslog server since end of 2022. This setup works well (you may see how rating your ownwork has something to do with conflict of interest...).

However, one of its known limitations is: logs of Cloud VPS instances are stored on other Cloud VPS instances. As long as the incidents are contained to instances in regular Cloud VPS projects (i.e. not the 'cloudinfra' project), storing the logs merely within Cloud VPS is fine. If the threat actor has control over the Cloud VPS infrastructure and/or the whole cloud realm, the syslog servers are of no use. (a public task is not suitable for in-depth discussion of the threat scenarios, though)

The Wikimedia Foundation does not seem to use truly off-site audit logging storage (with measures against tampering, e.g. append-only logging), but for the Cloud VPS environment, the production realm is close to 'off-site', especially with all the ongoing work to steer management functions and network traffic away from production. The Observability team is responsible for the production logging pipeline. @fgiunchedi has suggested pulling logs from the syslog servers directly makes sense to prevent cross-realm flows that ingress into production directly. We're not sure how this fits in the Cloud LB model.

  • Log volume: anyone from the Cloud Services team could gather the numbers. See T276291 for my ballpark estimation. Definitely no more than 80 GB of net logging, although the on-disk usage will be greater (OpenSearch overhead, replication, ...).
  • Retention: default of 90 days is fine.

Thoughts are welcome.

Event Timeline

Thank you for following up @Southparkfan !

Some thoughts/clarifications: the model I proposed to pull logs is indeed to avoid cloud vps-initiated flows towards production. The easiest implementation would be a periodic (r)sync of audit log files (assuming they are stored as files on cloudinfra) onto production syslog servers for archival purposes. In other words the audit log files will be not ingested into the production logging pipelines, but are available for consultation to root (or folks with access to syslog servers). I believe this is a good trade off between the goal (a redundant/safe place for audit logs) and ease of implementation.