See parent task.
Empty and remove the rdf-streaming-updater-codfw .
I may have to fire up swiftly but will use low concurrency.
See parent task.
Empty and remove the rdf-streaming-updater-codfw .
I may have to fire up swiftly but will use low concurrency.
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | dcausse | T314835 wdqs space usage on thanos-swift | |||
| Resolved | bking | T316031 Clean up the rdf-streaming-updater-codfw container from thanos-swift. |
Per conversation with @dcausse , we need to keep all objects within the T314835 (pseudo) folder in the 'rdf-streaming-updater-codfw' container (not to be confused with the 'rdf-streaming-updater-codfw-T314835' container)
Using swiftly's fordo feature, I think we can get this accomplished.
Example read-only command using head:
swiftly for -p commons/checkpoints/1475a2038f088807f9d695aea3e1c7e3/ rdf-streaming-updater-codfw do head "<item>"
Note that "<item>" is literal.
Mentioned in SAL (#wikimedia-operations) [2022-08-23T17:33:07Z] <inflatador> 'bking@cumin starting thanos-swift cleanup for wdqs T316031'
Swiftly is running in a tmux window on cumin1001. Command run:
swiftly --cache-auth --eventlet --concurrency=5 for -p commons/ rdf-streaming-updater-codfw do delete "<item>"
using the command swiftly head rdf-streaming-updater-codfw, we can see the delete happening at a rate of ~200 objects/minute. We can probably bump up concurrency, but not unless we get approval from the data persistence team.
Per the output of swiftly head rdf-streaming-updater-codfw, the rdf-streaming-updater-codfw swift container is down to 2,853,739,944 bytes, which works out to about 2.6 GB. Will verify with stakeholders and close with their approval.
@bking thanks for running the cleanup!
I can confirm that the wikidata and commons pseudo-folders are empty, the flink_ha_storage folder also needs to be emptied.
Something I don't fully understand yet is why https://thanos.wikimedia.org/graph?g0.deduplicate=1&g0.expr=swift_account_stats_bytes_total{account%3D%22AUTH_wdqs%22}&g0.max_source_resolution=0s&g0.partial_response=0&g0.range_input=8w&g0.stacked=0&g0.store_matches=[]&g0.tab=0 still reports 19TB of usage.
Even adding the stat of rdf-streaming-updater-codfw, rdf-streaming-updater-eqiad and rdf-streaming-updater-staging I don't see such usage.
rdf-streaming-updater-codfw:
Account: AUTH_wdqs
Container: rdf-streaming-updater-codfw
Objects: 1177
Bytes: 3031622935rdf-streaming-updater-eqiad:
Account: AUTH_wdqs
Container: rdf-streaming-updater-eqiad
Objects: 2666
Bytes: 60190521403rdf-streaming-updater-staging:
Account: AUTH_wdqs
Container: rdf-streaming-updater-staging
Objects: 1395
Bytes: 5870706686I cleaned out the flink_ha_storage pseudofolder from the rdf-streaming-updater-codfw bucket as requested above. However, per @dcausse dashboard link above, it appears that there's still ~19 TB used on Thanos.
Will circle back with @fgiunchedi to see if the wdqs user is still using too much space on Thanos cluster.
Thank you for following up, I think the culprit is the fact that the S3 compat API stores chunks of big files in a separate container (suffixed with +segments). See also the audit below I ran logged into swift as wdqs:flink:
# swift list | xargs -n1 swift stat | grep -e Container -e Objects -e Bytes
Container: rdf-streaming-updater-codfw
Objects: 125
Bytes: 336259981
Container: rdf-streaming-updater-codfw+segments
Objects: 3752575
Bytes: 19043034820255
Container: rdf-streaming-updater-codfw-T314835
Objects: 0
Bytes: 0
Container: rdf-streaming-updater-eqiad
Objects: 3079
Bytes: 61140716537
Container: rdf-streaming-updater-eqiad+segments
Objects: 13832
Bytes: 55454475470
Container: rdf-streaming-updater-staging
Objects: 1423
Bytes: 5878764535
Container: thanos-swift
Objects: 48
Bytes: 25856176
Container: updater
Objects: 2921
Bytes: 86172646251
Container: updater+segments
Objects: 552
Bytes: 2856364302
Container: updater-zbyszko
Objects: 31
Bytes: 1622577
Container: updater-zbyszko-v2
Objects: 36
Bytes: 120047231Per @fgiunchedi 's comment above, I started the delete task on cumin1001 again, this time targeting swift container rdf-streaming-updater-codfw+segments.
Update: we're down to 5 TB used . Swiftly is still running on cumin1001, will check again in a few days.
Swiftly dies every few days due to 404s (a fairly common response from Swift when you ask it to delete a file that's already gone), but it picks back up again when I start running the same command. I just tried a fordo against the wikidata/ pseudo-folder structure and it finished within a minute or two.
I think that's good enough, so I'm closing this ticket. If anyone else disagrees, feel free to re-open and ping us again.
@bking I see that the doc has been updated, can we move this ticket to the Needs reporting column?