Page MenuHomePhabricator

Investigate International Standard Content Code (ISCC)
Open, Needs TriagePublic

Description

International Standard Content Code (ISCC) is standard proposal for content identifier for text, images, audio and video. It uses hashes which contains three independent 64 bit similarity hash blocks (Metadata, Content, Binary) and file checksum. All four can be compared independently. Format of whole hash is like ISCC:KECVKU5SRORBR3PNWQSMT4ODTYWMOAL6CK3WINVIYE62IKUXVH5ATJI . There is ISO 24138:2024 proposal published at 2024-05 and next step is review. Reference code and SDK is under Apache 2.0 licence and written using Python.

Home pages

Wikidata property

Image hashing part
ISCC image hashing is similar than in Python's imagehash library's phash but has preprocess steps to confirm that data is in uniform format. It doesn't use Numpy in core, but still uses Pillow for preprosessing (scaling, cropping, grayscaling etc). It uses BICUBIC for scaling which means that it is ~5% less accurate than when LANCZOS is used.

Steps for calculating ISCC imagehash

  1. Transpose image according to EXIF Orientation
  2. Add white background to image if it has alpha transparency
  3. Crop empty borders of image
  4. Convert image to grayscale
  5. Resize image to 32x32
  6. Flatten 32x32 matrix to an array of 1024 grayscale (uint8) pixel values
  7. Compute the 32x32 DCT
  8. Keep the top-left 8x8 of DCT (lowest frequencies)
  9. compute the median DCT value
  10. Set the 64 hash bits to 0 or 1 depending on whether each of the 64 DCT values is above or below the median value

More info

Event Timeline

Wow thank you Zache for having found this. I think this is definitively the way to go.

Confirmed that if preprosessing is toggled off and scaling is changed to LANCZOS the resulted image hashes are identical to Python imagehash library's phash.

Zache updated the task description. (Show Details)
Zache updated the task description. (Show Details)

Wikimedia Swedens & co commonsdb pilot is started

Open Future is leading the newly launched CommonsDB initiative, funded by the European Commission, to create a prototype registry of public domain and openly licensed works. We are collaborating with Liccium, the Europeana Foundation, Wikimedia Sverige and the Institute for Information Law to bring this vision to life. The CommonsDB registry will enable users to verify the rights status of content from multiple sources. It will be built using existing technologies and standards, consolidating ISCC codes, rights metadata, and verifiable credentials to make registry information available through public APIs.

https://openfuture.eu/blog/open-future-launches-commonsdb/

Thanks @Zache for linking to the project. Our meta page is perhaps more detailed in regards to what Wikimedia Sverige is working on. See https://meta.wikimedia.org/wiki/CommonsDB