I noticed the "thanos compact errors" alert firing (and recovering shortly after):
PROBLEM - Thanos compact has high percentage of failures on alert1001 is CRITICAL: job=thanos-compact
And indeed there have been sporadic errors the last couple of days:
The timing nicely aligns with sdf failing on thanos-be1003 (T285664 + T285662)
The error on the thanos-compact side has to do with multipart uploads, e.g.
Jun 29 20:55:36 thanos-fe2001 thanos-compact[5024]: level=error ts=2021-06-29T20:55:36.083160442Z caller=compact.go:386 msg="retriable error" err="compaction: group 0@10531109435386935375: upload of 01F9CRN1CWYQJVEV8Q86EHRX4V failed: upload chunks: upload file /srv/thanos-compact/compact/0@10531109435386935375/01F9CRN1CWYQJVEV8Q86EHRX4V/chunks/000001 as 01F9CRN1CWYQJVEV8Q86EHRX4V/chunks/000001: upload s3 object: One or more of the specified parts could not be found. The part might not have been uploaded, or the specified entity tag might not have matched the part's entity tag.
And I found this bug (fixed in swift 2.21, we're running 2.19 on buster) which sounds promising: https://bugs.launchpad.net/swift/+bug/1636663