Page MenuHomePhabricator

Request creation of imagehash VPS project
Closed, ResolvedPublic

Description

Project Name: imagehash

Developer account usernames of requestors: tuukka zache-tool

Purpose: Calculating perceptual hash values (phash and dhash) for Commons images for duplicate detection etc.

Brief description:

  1. Test using Postgres and its indexed hamming distance queries for quickly finding candidate near-duplicate images.
  1. Expose the data over an API (hopefully Sparql).
  1. Test how much faster a VPS can process images (this is currently too slow on Toolforge).

Current implementation on Toolforge: https://github.com/Wikimedia-Suomi/ImageHash-Toolforge

Software we plan to install:

  1. the OnTop sparql engine: https://ontop-vkg.org/guide/
  1. the Postgres extension for fast hamming-distance searches: https://github.com/fake-name/pg-spgist_hamming/

How soon you are hoping this can be fulfilled: as soon as possible (during the hackathon)

Event Timeline

Do you know @taavi @bd808 Who/how i should ask to add 'https://imagehash-sparql.wmcloud.org/sparql' to query.wikidata.org queryserver whitelist? (preferably in Hackathon)