List of steps to reproduce:
- Compare rows for file [[ https://commons.wikimedia.org/wiki/File:1988_810810_Aielo_de_Malferit.pdf | 1988_810810_Aielo_de_Malferit.pdf ]] in commonswiki-20220220-image.sql.gz and commonswiki-20220401-image.sql.gz dumps (there are many more other similar PDFs which have worse metadata now in comparison with the older dump).
What happens?:
20220220 has the following entry:
('1988_810810_Aielo_de_Malferit.pdf',71798,1239,1754,'{\"data\":{\"Producer\":\"iText 1.4.7 (by lowagie.com)\",\"CreationDate\":\"Mon May 5 17:54:06 2008 UTC\",\"ModDate\":\"Mon May 5 17:54:06 2008 UTC\",\"Tagged\":\"no\",\"UserProperties\":\"no\",\"Suspects\":\"no\",\"Form\":\"none\",\"JavaScript\":\"no\",\"Pages\":\"1\",\"Encrypted\":\"no\",\"pages\":{\"1\":{\"Page size\":\"595 x 842 pts (A4)\",\"Page rot\":\"0\"}},\"File size\":\"71798 bytes\",\"Optimized\":\"no\",\"PDF version\":\"1.4\",\"mergedMetadata\":{\"pdf-Producer\":\"iText 1.4.7 (by lowagie.com)\",\"pdf-Encrypted\":\"no\",\"pdf-PageSize\":[\"595 x 842 pts (A4)\"],\"pdf-Version\":\"1.4\"},\"text\":[\"\",\"\"]}}',0,'OFFICE','application','pdf',44,543926,'20121213213138','643t3fa39pqaw1zecd7ybn3r3g99isu')
Observe 1239 and 1754 for width and height, respectively. And "Pages": "1" in metadata.
20220401 has the following entry:
('1988_810810_Aielo_de_Malferit.pdf',71798,0,0,'',0,'OFFICE','application','pdf',44,543926,'20121213213138','643t3fa39pqaw1zecd7ybn3r3g99isu')
Observe no width, height, and no metadata.
What should have happened instead?:
Both dumps should have the same PDF metadata: PDF has not changed since the last dump. But now PDF has no width/height or page count information anymore.
Recently metadata of many files was updated because of the change to json for pdfs and djvu files.
Confirmed no metadata, no width and height.