We had reports (T383034, T383034) of some thumbnail failures for some users. Further digging showed that they were getting HTTP 401 (unauthorised) from codfw swift. This was because whilst the underlying object still existed (and could be inspected with swift stat, which goes via the rings), the container DB for the container wikipedia-commons-local-thumb.f8 was missing (examples in P71802), and indeed attempting to swift stat wikipedia-commons-local-thumb.f8 resulted in Container 'wikipedia-commons-local-thumb.f8' not found.
This is unlikely to be the result of a correctly-issued deletion request, because (per our docs) deleting a container first deletes the contents, and those were still extant where inspected. Also, the account still "thought" it had a container named wikipedia-commons-local-thumb.f8 (per swift list).
The container DBs, however, were all missing - I checked all six locations in the output of sudo swift-get-nodes /etc/swift/container.ring.gz AUTH_mw wikipedia-commons-local-thumb.f8 and in no case was the containing directory extant, never mind the db file.
We were able to restore service by effectively re-creating the container: swift post wikipedia-commons-local-thumb.f8 --read-acl 'mw:thumbor,mw:media,.r:*' --write-acl 'mw:thumbor,mw:media', and thumbor largely coped with the extra load.
ms-fe2009 first said 401 to a request for something in wikipedia-commons-local-thumb.f8 at 07:20:50 on 2025-01-05 giving us an approximate time-stamp for the deletion.
Inspecting swift logs for the day (sudo cumin -x --force --no-progress --no-color -o txt "A:codfw and P{O:swift::proxy}" "zgrep -F 'DELETE' /var/log/swift/proxy-access.log.1.gz | grep 'wikipedia-commons-local-thumb.f8'" >~/junk/T383023) produces 5208 DELETE requests for items in that container, but they all contain the string px-, meaning they were for objects within the container not the container itself.
It might be instructive to narrow that window down further by inspecting the other codfw frontends, and then checking frontend logs for the relevant time window. But this is a deeply concerning mystery at the moment. This was the only affected thumbnail container in codfw, running a check of all containers (there are 43k of them) is ongoing, and will take a few hours.