iscc-sci is a proof of concept implementation of a semantic Image-Code for the
ISCC (International Standard Content Code). Semantic Image-Codes are
designed to capture and represent the semantic content of images for improved similarity detection.
Caution
This is a proof of concept. All releases with version numbers below v1.0.0 may break backward
compatibility and produce incompatible Semantic Image-Codes. The algorithms of this iscc-sci
repository are experimental and not part of the official
ISO 24138:2024 standard.
The ISCC framework already comes with an Image-Code that is based on perceptual hashing and can match near duplicates. The ISCC Semantic Image-Code is planned as a new additional ISCC-UNIT focused on capturing a more abstract and broad semantic similarity. As such the Semantic Image-Code is engineered to be robust against a broader range of variations that cannot be matched with the perceptual Image-Code.
- Semantic Similarity: Leverages deep learning models to generate codes that reflect the semantic content of images.
- Bit-Length Flexibility: Supports generating codes of various bit lengths (up to 256 bits), allowing for adjustable granularity in similarity detection.
- ISCC Compatible: Generates codes that are fully compatible with the ISCC specification, facilitating integration with existing ISCC-based systems.
Ensure you have Python 3.11 or newer installed on your system. The package requires an ONNX runtime that is selected via install extras. For CPU inference (works everywhere):
pip install "iscc-sci[cpu]"For NVIDIA CUDA accelerated inference (requires CUDA 12.x and cuDNN 9.x):
pip install "iscc-sci[gpu]"Note
Install exactly one of the cpu/gpu extras. The underlying onnxruntime and
onnxruntime-gpu packages unpack into the same directory and overwrite each other, so installing
both silently disables GPU support. A plain pip install iscc-sci installs no ONNX runtime and
fails on import with instructions.
To generate a Semantic Image-Code for an image, use the code_image_semantic function. You can
specify the bit length of the code to control the level of granularity in the semantic
representation.
import iscc_sci as sci
# Generate a 64-bit ISCC Semantic Image-Code for an image file
image_file_path = "path/to/your/image.jpg"
semantic_code = sci.code_image_semantic(image_file_path, bits=64)
print(semantic_code)iscc-sci uses a pre-trained deep learning model based on the 1st Place Solution of the Image
Similarity Challenge (ISC21) to create semantic embeddings of images. The model generates a feature
vector that captures the essential characteristics of the image. This vector is then binarized to
produce a Semantic Image-Code that is robust to variations in image presentation but sensitive to
content differences.
This is a proof of concept and welcomes contributions to enhance its capabilities, efficiency, and
compatibility with the broader ISCC ecosystem. For development, install the project with
uv. The default uv sync installs the test group, which provides a CPU
ONNX runtime:
git clone https://github.com/iscc/iscc-sci.git
cd iscc-sci
uv syncContributions are welcome! If you have suggestions for improvements or bug fixes, please open an issue or pull request. For major changes, please open an issue first to discuss what you would like to change.