Page MenuHomePhabricator

img_metadata missing
Open, Needs TriagePublic

Description

I parsed img_metadata in my GIF check report. I figured these errors would be fixed with newer software and an update script (T32961). Over the years more files appeared on the list. Currently, 3,211,696 files on Commons (8.89%) are missing metadata.

SELECT CONCAT(img_major_mime,"/",img_minor_mime) AS MIME, img_metadata,
       CONCAT("File:", REPLACE(img_name,"_"," ")) AS Example, COUNT(*) AS COUNT
FROM image
WHERE LENGTH(img_metadata)<9 /*smallest is 9 bytes: {"x";i:0} */
GROUP BY 1, 2 ORDER BY COUNT(*) DESC;
MIMEimg_metadataExampleCOUNT(*)
image/jpeg0File:"A Perspective View of Fort William" by Jan Van Ryne, 1754.jpg3,182,426
image/jpeg-1File:!-2013-wschowa-przyczyna-gorna-palac-abri.jpg16,140
image/png0File:"Après le bain" (dessin par Georges A. Gardenty, 1893).png8,179
audio/midiBlankFile:"Bebop-rebop" early bop phrase.mid4,817
image/gif0File:1. FCA Darmstadt.gif124
application/pdfb:0;File:A imprensa em Goa nos séculos XVI, XVII e XVIII.pdf10

Update from 2022-02-03

MIMEimg_metadataExampleCOUNT
image/jpeg0File:"Batavians defeating Romans on the Rhine" by Otto van Veen.jpg3109304
image/png0File:(แรงเสียดทาน).png68819
image/jpeg-1File:"Friends for ever" (carte postale de propagande franco-américaine, janvier 1917).jpg15941
audio/midiFile:"Bebop-rebop" early bop phrase.mid4844
application/slaFile:(8567) 1996 HW1 3D model.stl2385
image/gif0File:1. FCA Darmstadt.gif697
application/pdfFile:06.45 Management rep letter.pdf60 }

Event Timeline

Which is used as a value to mean our metadata extractor couldnt understand the file format.

Some of those files got deleted since then. Maybe it would be useful to rerun the query?

TheDJ subscribed.

Some of those files got deleted since then. Maybe it would be useful to rerun the query?

done

Thanks!

It is interesting to note that some files looks valid, e.g, File:06.45 Management rep letter.pdf above does open for me in Firefox.

I fixed 06.45 Management rep letter.pdf using mutool.