International Standard Content Code (ISCC) is standard proposal for content identifier for text, images, audio and video. It uses hashes which contains three independent 64 bit similarity hash blocks (Metadata, Content, Binary) and file checksum. All four can be compared independently. Format of whole hash is like `ISCC:KECVKU5SRORBR3PNWQSMT4ODTYWMOAL6CK3WINVIYE62IKUXVH5ATJI` . There is [[ https://www.iso.org/standard/77899.html | ISO 24138:2024 ]] proposal published at 2024-05 and next step is review. Reference code and SDK is under Apache 2.0 licence and written using Python.
**Home pages**
* https://demo.iscc.io/
* https://github.com/iscc
* https://iscc.codes
**ISCC Image hashing part**
ISCC image hashing is similar than in Python's [[ https://pypi.org/project/ImageHash/ | imagehash ]] library's phash but has preprocess steps to confirm that data is in uniform format. It doesn't use Numpy in core, but still uses Pillow for preprosessing (scaling, cropping, grayscaling etc). It uses BICUBIC for scaling which means that it is ~5% less accurate than when LANCZOS is used.
Steps for calculating ISCC imagehash
# Transpose image according to EXIF Orientation
# Add white background to image if it has alpha transparency
# Crop empty borders of image
# Convert image to grayscale
# Resize image to 32x32
# Flatten 32x32 matrix to an array of 1024 grayscale (uint8) pixel values
# Compute the 32x32 DCT
# Keep the top-left 8x8 of DCT (lowest frequencies)
# compute the median DCT value
# Set the 64 hash bits to 0 or 1 depending on whether each of the 64 DCT values is above or below the median value
* https://github.com/iscc/iscc-core/blob/main/iscc_core/code_content_image.py
* https://github.com/iscc/iscc-sdk/blob/main/iscc_sdk/image.py
* https://github.com/iscc/iscc-sdk/blob/4a18a88ff49aa86817d01e527dad1a24215046bb/iscc_sdk/main.py#L164
**More info**
* [[ https://posth.me/why-can-it-be-hard-to-explain-the-concept-of-content-derived-identifiers-and-the-iscc/ | Posth Werk, 2024/02/25, Why can it be hard to explain the concept of content derived identifiers and the ISCC ]]
* [[ https://blog.tib.eu/2024/07/05/the-international-standard-content-code-iscc-why-libraries-archives-and-museums-should-use-it/| blog.tib.eu, 5,7.2024, The International Standard Content Code (ISCC) – why libraries, archives and museums should use it]]
* [[ https://www.wipo.int/meetings/en/details.jsp?meeting_id=68848 | WIPO presentation (April 27, 2022) ]]
** [[ https://www.wipo.int/edocs/mdocs/mdocs/en/wipo_webinar_cr_2022_9/wipo_webinar_cr_2022_9_p1.pdf | slides only ]]