The rdf-streaming-updater uses Flink, which creates checkpoints in Thanos-swift object storage. A recent audit by @dcausse discovered ~1 TB of data. After removing stale/unnecessary data, total usage was down to ~20 GB.
This suggests that we need to be more aggressive about removing data, particularly because we will soon be moving the Search Update Pipeline to Flink.
Creating this ticket to:
- ~~Create monitoring/alerts for object storage usage~~ These were already created by @dcausse , see [[ https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater?orgId=1&viewPanel=14&from=1698844803863&to=1698866403863 | this dashboard ]] for an example of metrics use, and the alerts live [[ https://github.com/wikimedia/operations-alerts/blob/master/team-search-platform/rdf_streaming_updater_global.yaml#L20 | here ]] .
- Decide whether or not we need an automated cleanup process, and
- Design/implement cleanup if so.