Project Name: imagehash
Developer account usernames of requestors: tuukka zache-tool
Purpose: Calculating perceptual hash values (phash and dhash) for Commons images for duplicate detection etc.
Brief description:
- Test using Postgres and its indexed hamming distance queries for quickly finding candidate near-duplicate images.
- Expose the data over an API (hopefully Sparql).
- Test how much faster a VPS can process images (this is currently too slow on Toolforge).
Current implementation on Toolforge: https://github.com/Wikimedia-Suomi/ImageHash-Toolforge
Software we plan to install:
- the OnTop sparql engine: https://ontop-vkg.org/guide/
- the Postgres extension for fast hamming-distance searches: https://github.com/fake-name/pg-spgist_hamming/
How soon you are hoping this can be fulfilled: as soon as possible (during the hackathon)