Page MenuHomePhabricator

File not found: /v1/AUTH_mw/wikipedia-commons-local-public.7e/7/7e/EC02-0162-69_l_%2824374651802%29.jpg
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:
Error message : "File not found: /v1/AUTH_mw/wikipedia-commons-local-public.7e/7/7e/EC02-0162-69_l_%2824374651802%29.jpg"

What should have happened instead?:

The file should be displayed as on Flickr : https://live.staticflickr.com/1650/24374651802_4840d0fc74_o.jpg

Event Timeline

I've gone looking, and the problem is that only one swift cluster has this object:

root@ms-fe1009:/etc/swift# swift stat wikipedia-commons-local-public.7e '7/7e/EC02-0162-69_l_(24374651802).jpg'
Object HEAD failed: http://ms-fe.svc.eqiad.wmnet/v1/AUTH_mw/wikipedia-commons-local-public.7e/7/7e/EC02-0162-69_l_%2824374651802%29.jpg 404 Not Found

root@ms-fe2009:~# swift stat wikipedia-commons-local-public.7e '7/7e/EC02-0162-69_l_(24374651802).jpg'
               Account: AUTH_mw
             Container: wikipedia-commons-local-public.7e
                Object: 7/7e/EC02-0162-69_l_(24374651802).jpg
          Content Type: image/jpeg
        Content Length: 3879987
         Last Modified: Tue, 10 Oct 2023 20:29:16 GMT
                  ETag: 4f2cb256727668ff5de11f7d6c6b04c3
       Meta Sha1Base36: 7pv41d7evsdntwmzcp84grgppsrkkbu
           X-Timestamp: 1696969755.85914
         Accept-Ranges: bytes
            X-Trans-Id: tx44c5b56c75ef4f82bfdd8-0065266628
X-Openstack-Request-Id: tx44c5b56c75ef4f82bfdd8-0065266628

Thankfully, the upload was recent enough that I can go looking in swift logs. I gathered all swift-frontend logs for this object (NB it has to be url-quoted again) thus:

sudo cumin -x --force --no-progress --no-color -o txt O:swift::proxy "zgrep -F '7/7e/EC02-0162-69_l_%252824374651802%2529.jpg' /var/log/swift/proxy-access.log /var/log/swift/proxy-access.log.1.gz" > ~/junk/T348586.txt

And can then look at the results. My critical finding is that no attempt was made to PUT this into eqiad:

mvernon@cumin1001:~$ grep ' PUT ' junk/T348586.txt 
ms-fe2010.codfw.wmnet: /var/log/swift/proxy-access.log.1.gz:Oct 10 20:29:15 ms-fe2010 proxy-server: 10.192.0.161 10.192.16.76 10/Oct/2023/20/29/15 PUT /v1/AUTH_mw/wikipedia-commons-local-public.7e/7/7e/EC02-0162-69_l_%252824374651802%2529.jpg HTTP/1.0 201 - wikimedia/multi-http-client%20v1.1 AUTH_tk3e66e48ea... 3879987 - 4f2cb256727668ff5de11f7d6c6b04c3 tx775fbbaa8db04b98abb9c-006525b41b - 0.0702 - - 1696969755.856173992 1696969755.926355124 0

(those two IPs are mw2291 and ms-fe2010).

So something in the Mediawiki stack has failed to write the two copies of this image it's supposed to, and that's why it's left in this bad state.

On Monday, the weekly rclone job should rectify this and copy the object over to eqiad, but in the mean time I think this bug report might usefully be sent to the MW folks.

As I mentioned in T341007, get this error too and have collected them here: https://commons.wikimedia.org/wiki/User:Beao/Images_with_upload_error

They are all replacement uploads, and the weird thing is that the upload before my upload is displayed as missing, and it looks like my upload is the upload I tried to replace. So I'm guessing the system assumed the old file had been moved and my upload succeeded, before it threw the error.

Today's the first day I don't notice any error on large imports (~1000 files) so the issue seems fixed for new uploads.

Then let's close this in order to have less confusion. :)