See parent task.
Empty and remove the rdf-streaming-updater-codfw .
I may have to fire up swiftly but will use low concurrency.
See parent task.
Empty and remove the rdf-streaming-updater-codfw .
I may have to fire up swiftly but will use low concurrency.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | dcausse | T314835 wdqs space usage on thanos-swift | |||
Resolved | bking | T316031 Clean up the rdf-streaming-updater-codfw container from thanos-swift. |
Per conversation with @dcausse , we need to keep all objects within the T314835 (pseudo) folder in the 'rdf-streaming-updater-codfw' container (not to be confused with the 'rdf-streaming-updater-codfw-T314835' container)
Using swiftly's fordo feature, I think we can get this accomplished.
Example read-only command using head:
swiftly for -p commons/checkpoints/1475a2038f088807f9d695aea3e1c7e3/ rdf-streaming-updater-codfw do head "<item>"
Note that "<item>" is literal.
Mentioned in SAL (#wikimedia-operations) [2022-08-23T17:33:07Z] <inflatador> 'bking@cumin starting thanos-swift cleanup for wdqs T316031'
Swiftly is running in a tmux window on cumin1001. Command run:
swiftly --cache-auth --eventlet --concurrency=5 for -p commons/ rdf-streaming-updater-codfw do delete "<item>"
using the command swiftly head rdf-streaming-updater-codfw, we can see the delete happening at a rate of ~200 objects/minute. We can probably bump up concurrency, but not unless we get approval from the data persistence team.
Per the output of swiftly head rdf-streaming-updater-codfw, the rdf-streaming-updater-codfw swift container is down to 2,853,739,944 bytes, which works out to about 2.6 GB. Will verify with stakeholders and close with their approval.
@bking thanks for running the cleanup!
I can confirm that the wikidata and commons pseudo-folders are empty, the flink_ha_storage folder also needs to be emptied.
Something I don't fully understand yet is why https://thanos.wikimedia.org/graph?g0.deduplicate=1&g0.expr=swift_account_stats_bytes_total{account%3D%22AUTH_wdqs%22}&g0.max_source_resolution=0s&g0.partial_response=0&g0.range_input=8w&g0.stacked=0&g0.store_matches=[]&g0.tab=0 still reports 19TB of usage.
Even adding the stat of rdf-streaming-updater-codfw, rdf-streaming-updater-eqiad and rdf-streaming-updater-staging I don't see such usage.
rdf-streaming-updater-codfw:
Account: AUTH_wdqs Container: rdf-streaming-updater-codfw Objects: 1177 Bytes: 3031622935
rdf-streaming-updater-eqiad:
Account: AUTH_wdqs Container: rdf-streaming-updater-eqiad Objects: 2666 Bytes: 60190521403
rdf-streaming-updater-staging:
Account: AUTH_wdqs Container: rdf-streaming-updater-staging Objects: 1395 Bytes: 5870706686
I cleaned out the flink_ha_storage pseudofolder from the rdf-streaming-updater-codfw bucket as requested above. However, per @dcausse dashboard link above, it appears that there's still ~19 TB used on Thanos.
Will circle back with @fgiunchedi to see if the wdqs user is still using too much space on Thanos cluster.
Thank you for following up, I think the culprit is the fact that the S3 compat API stores chunks of big files in a separate container (suffixed with +segments). See also the audit below I ran logged into swift as wdqs:flink:
# swift list | xargs -n1 swift stat | grep -e Container -e Objects -e Bytes Container: rdf-streaming-updater-codfw Objects: 125 Bytes: 336259981 Container: rdf-streaming-updater-codfw+segments Objects: 3752575 Bytes: 19043034820255 Container: rdf-streaming-updater-codfw-T314835 Objects: 0 Bytes: 0 Container: rdf-streaming-updater-eqiad Objects: 3079 Bytes: 61140716537 Container: rdf-streaming-updater-eqiad+segments Objects: 13832 Bytes: 55454475470 Container: rdf-streaming-updater-staging Objects: 1423 Bytes: 5878764535 Container: thanos-swift Objects: 48 Bytes: 25856176 Container: updater Objects: 2921 Bytes: 86172646251 Container: updater+segments Objects: 552 Bytes: 2856364302 Container: updater-zbyszko Objects: 31 Bytes: 1622577 Container: updater-zbyszko-v2 Objects: 36 Bytes: 120047231
Per @fgiunchedi 's comment above, I started the delete task on cumin1001 again, this time targeting swift container rdf-streaming-updater-codfw+segments.
Update: we're down to 5 TB used . Swiftly is still running on cumin1001, will check again in a few days.
Swiftly dies every few days due to 404s (a fairly common response from Swift when you ask it to delete a file that's already gone), but it picks back up again when I start running the same command. I just tried a fordo against the wikidata/ pseudo-folder structure and it finished within a minute or two.
I think that's good enough, so I'm closing this ticket. If anyone else disagrees, feel free to re-open and ping us again.
@bking I see that the doc has been updated, can we move this ticket to the Needs reporting column?