Page MenuHomePhabricator

Python/pillow/imagehash does not detect matching image
Open, Needs TriagePublicBUG REPORT

Description

Both p and d hashes fail to detect matching images. Images might have something else having been done (downscaling with bad sampling?) which confuses the hash calculation.

Image A):
https://commons.wikimedia.org/wiki/File:Changing_of_thoughts_between_Sweden_and_Finland_1941_(JOKAHBL3C_B67-2).tif

Image B)
https://finna.fi/Record/museovirasto.841efcc6-bbd3-47b1-b48a-8b502fc13d4e

Image was uploaded from B to A, downloading both and trying to match the images fails in the python module (too much difference for identical images?)

This is likely a bug somewhere in the python module.

Event Timeline

Phash diff: 30, image1: 94dbac9b8ac4c2cb, image2: 87f807f00ff007f8
Dhash diff: 26, image1: 701844a8a9a4a6a6, image2: 0000000000000040

jpeg-version uses 8-bit integer rgb/alpha, while tiff uses 16-bit integer grayscale.
visually checking the jpeg-version seems to have more "noise" in the image (worse compression).

python pillow claims that image band is ('I',) (32-bit signed integer) for tiff downloaded from commons, which isn't true: it should be 16-bit since the sha-hash matches that of finna image..