Page MenuHomePhabricator

Develop an Image Similarity Tool
Closed, ResolvedPublic

Description

IMPORTANT: Make sure to read the Outreachy participant instructions and communication guidelines thoroughly before commenting on this task. This space is for project-specific questions, so avoid asking questions about getting started, setting up Gerrit, etc. When in doubt, ask your question on Zulip first!

Brief summary

The project will consist of designing and implementing a tool that, given an image, can retrieve the closest image from a large repository of freely-available visual content, the Wikimedia Commons [0].
Wikimedia Commons provides a common place for free visual knowledge. The vast majority of images on Wikipedia are stored and indexed in the Commons repository, which hosts around 75M multimedia files. While it is possible to search the Commons using text, the repository does not provide a “search by image” function, similar to Google’s Reverse Image Search feature [1]. A tool able to automatically retrieve sets of similar images to a query image would be of great help for anyone looking for open alternatives to proprietary visual content, as well as overcome the limitations of the textual-based search. As a matter of fact, traditional text-based image search algorithms are bound to the quality of the image textual metadata. Despite the tireless work of Commons editors and contributors, due to the sheer number of images on the repository, the presence of textual or structured metadata on Commons images is still scarce. Having a way to automatically retrieve groups of similar images would also be of great help for bulk image annotation by manual contributors.

In this project, we will leverage the efficiency of distributed computing to generate, for each image in Commons, a compact image signature (or “embedding”). We will then use the resulting embeddings to pre-compute an index which maps the distances between all images on Commons. Finally, we will build a system that, given an image, calculates the corresponding embedding, and efficiently looks for the nearest neighbors in our corpus. The output of this project is an open API built on top of this system.

[0] https://commons.wikimedia.org/wiki/Main_Page
[1] https://support.google.com/websearch/answer/1325808?hl=en&co=GENIE.Platform%3DDesktop

Skills required

Programming skills, specific technologies and Phabricator project tags

Possible mentor(s)

@Isaac
@fkaelin
@Miriam

Microtasks

See T291453 (Develop an Image Similarity API)

Event Timeline

Miriam changed the visibility from "Public (No Login Required)" to "acl*outreachy-mentors (Project)".Sep 15 2021, 10:34 AM
Miriam renamed this task from Develop an Image Similarity API to Develop an Image Similarity Tool.Sep 22 2021, 2:38 PM
Miriam updated the task description. (Show Details)
Miriam updated the task description. (Show Details)
Isaac changed the visibility from "acl*outreachy-mentors (Project)" to "Public (No Login Required)".Oct 18 2021, 7:14 PM

If I recall correctly, this project didn't pass the selection phase. And we didn't select anyone. Right? If there is any interest in promoting the project via the ongoing Round 25 or a future one, feel free to add the relevant tag to bring attention to the organizers.

Isaac claimed this task.

@srishakatux thanks for the ping -- this project was completed though might be future developments that could be good Outreachy projects.

Tool that was created: https://imagesimilarity.toolforge.org/