Page MenuHomePhabricator

Semantic image search for Wikimedia Commons
Open, Needs TriagePublic

Description

Searching for images on Wikimedia Commons mainly depends on titles, categories and descriptions. This makes it hard to find images when the metadata is missing, incomplete or doesn’t match the exact words used in the search.

For example, a query like:

  • “people cooking street food in Indian night markets”
  • "foggy mountains in the Western Ghats at sunrise”
  • “photos of 19th-century Indian temples during monsoon”

do not return good results today.

Proposed Idea Introduce a new way to search images using natural language.

Instead of relying only on keywords, users can type full descriptions and the system will try to understand the meaning and show visually relevant images.

Benefits

  • Makes image search more intuitive and user-friendly
  • Helps discover images even when they are not well tagged
  • Supports more descriptive and creative searches
  • Improves overall accessibility of Wikimedia Commons

Possible Approach

  • Start as an experimental feature/tool
  • Test on a smaller set of images first
  • Compare results with current search
  • Gather feedback from users and contributors

Open Questions

  • How should this be introduced to users (separate search option or part of existing search)?
  • What is the best way to evaluate if results are actually better?
  • How can this work well across different languages?

Next Steps

  • Build a small prototype
  • Test with sample queries
  • Share results with the community
  • Iterate based on feedback

Some refs:

Details

Other Assignee
Gopavasanth

Event Timeline

Nice proposal! The infrastructure from T412338 (text embeddings on Lift Wing) could potentially be reused here for image embeddings with CLIP/OpenCLIP.

Combining CLIP visual similarity with existing SDC signals (depicts P180, captions) as a hybrid approach might be a good starting point.

Multilingual models like SigLIP 2 could also help with the cross-language question.

I am one of the developers of the wise software that is referenced here. I will be attending the wikimedia hackathon and I'm very interested on this . I have opened T424068 to request a cloudVPS with the goal of setting this up during the hackathon.

We are starting the initial discussions on our project, if you would like to come, join us, please find this table in the second floor, main hacking room and talk to us..

telegram-cloud-photo-size-5-6316468721760800795-w.jpg (1×2 px, 527 KB)

@Gopavasanth

WikiSemanticImgSearch — Semantic Image Search for Wikimedia Commons

Live: https://wiki-semantic-img-search.vercel.app/ GitHub: https://github.com/dubeysanskar/WikiSemanticImgSearch

Built a Next.js tool that lets users search Commons images using natural language queries. It uses the Wikidata Vector DB API (AI embeddings via CLIP/OpenCLIP/SigLIP) to find semantically related Q-items, then fetches Commons images through SDC haswbstatement:P180 depicts statements. Results are shown alongside standard keyword search for comparison.

Features:

Natural language search powered by Wikidata Vector DB embeddings
Dual-path pipeline: Semantic (Vector DB → SDC P180) + Keyword (Commons API)
Tabbed results: Semantic | Keyword | Combined — side-by-side comparison
Category targeting — Wiki Loves Monuments, Folklore, Birds presets + custom categories
Resolution filtering — preset (HD/FHD/QHD/4K) or custom width/height with pixel tolerance
Image selection + Excel export with full metadata (title, URL, author, license, resolution, description)
Image detail modal with Commons page link and full-resolution download
Proper User-Agent compliance via .env config
APIs: Wikidata Vector DB (/item/query/), MediaWiki Commons API, Wikidata API, Commons SDC search

I have taken it down to rebuild on a new instance with more memory and add the face search.