Run a computer vision challenge
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	leila
	Aug 18 2020, 12:08 AM

Description

Questions
in-progress

Budget
Approved.

Host
Kaggle

Timelines
For a contest to go live in June-August 2021, below are the timelines:

Challenge data validation, prep and documentation: 2-4 weeks
Prepare challenge content for launch: ~3 weeks
Run competition: ~2-3 months
Collect solutions, verify winners, administer prizes, and release results: ~4-6 weeks

TODOs

Email the DrivenData contact and ask for the following information. vendor contact email, vendor name, contract start date (tbd by Miriam), contract end date (tbd by Miriam), a phone number from the vendor, mailing address, name of the person on file, an answer to the question "Are you a WMF employee, a former WMF employee, or a family member or business partner of a WMF employee?".
Send all the info to Leila and she can start the process in Coupa.
Once completed, notify Janna to follow up for next steps with Contract.
give a heads up to Legal that the contract is in for their review so they're aware.
once the contract is approved, contact Legal and share the data-set we will be using. ask Google for the filters they have used to remove data from the data-set and share that with Legal (if they can share details).
check with Fabian re engineering requirements or any infra requirements we can define for the submissions to bring them closer to our environment.
Launch the competition on Kaggle
Monitor competition progress and participation
Close the competition and award winners

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		Miriam	T260634 Run a computer vision challenge
		Resolved		Miriam	T278217 Release image data for training

Event Timeline

leila created this task.Aug 18 2020, 12:08 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 18 2020, 12:08 AM

leila added a subscriber: Miriam.Aug 18 2020, 12:08 AM

What resources are needed to carry this out? GPU compute time?

@Harej, apologies for the delay, I was on holidays!

Resources will vary depending on the task. Given the heavy multimedia focus for this task, it is likely that participants will need GPU resources and substantial storage space to host the image datasets used for training. Participants might also need access to the Wikidata JSON dumps and the Wikipedia (and Commons?) XML dumps. I will get back to you with more details once we have figured out the specifics of the task.
Hope this helps!

leila assigned this task to Miriam.Jan 12 2021, 6:58 PM

leila moved this task from FY2020-21-Research-October-December to FY2020-21-Research-January-March on the Research board.

leila edited projects, added Research (FY2020-21-Research-January-March); removed Research (FY2020-21-Research-October-December).

@Miriam I updated the task based on what we discussed today. do feel free to edit and expand, of course. Good luck!

Harej unsubscribed.Mar 19 2021, 2:43 PM

Weekly updates:

Estimated language and geographic distribution (thanks to Isaac's https://github.com/geohci/wiki-region-groundtruth Wiki Region Groundtruth data) of WIT test data
Defined the legal constraints for image data publication.
Worked with the WIT team to figure out next steps and involvement on their end, set up continuous communication channels and provided an detailed overview of timelines and commitments on our end.

Miriam updated the task description. (Show Details)Mar 23 2021, 9:33 AM

Weekly updates:

Met with the full team - including Google researchers and identified the next steps and deadlines. On our end, we will work full force on data release and on setting up the contract with the org responsible for setting up the challenge.
Started process to generate the contract.
Started process for data release.

Weekly updates:

Progresses on data release, it was assessed as low risk by the security team pending a few checks on the images to be released.
Progresses on the baseline design: tested the CLIP model on the WIT test data - the CLIP model has been trained on the same data, therefore it is difficult to assess the difficulty of the task. Next we will extend the notion of caption to "all surrounding text" and try to work on a more image-text retrieval baseline.

Weekly updates:

Progresses on data release: working on polishing the list of images for public release.

leila moved this task from FY2020-21-Research-January-March to FY2020-21-Research-April-June on the Research board.Apr 29 2021, 1:00 AM

leila edited projects, added Research (FY2020-21-Research-April-June); removed Research (FY2020-21-Research-January-March).

Weekly updates:

Progress on the data release as per T278217
Progress on a multimodal-multilingual baseline based on a cross-modal network trained on WIT.

Weekly updates:

Progress on the contract end
No other updates as people away for holidays or other reasons

Weekly updates:

Contract is signed
Dataset in preparation
We scoped the task as follows:

The task of this competition is  the following: given an image, retrieve the closest text from a large pool of words and sentences.
Images will come from Wikipedia articles, in many languages, and the target pieces of text will be taken from image captions and the title of the Wikipedia articles where images are placed.

leila awarded a token.May 24 2021, 7:10 PM

leila updated the task description. (Show Details)May 25 2021, 2:51 PM

thcipriani subscribed.May 25 2021, 11:48 PM

Weekly updates:

Competition launch is on hold due to discussion on the data nature and availability.
We are putting together a workshop proposal for Neurips 2021 (deadline June 18th).

Weekly updates:

We submitted a proposal for a NeurIPS 2021 workshop titled "Wiki-M3L: Wikipedia and Multimodal & Multilingual Research - How can the two communities help each other?" about using Wikimedia data for multimodal ML, and using multimodal ML technologies to serve the community needs. The competition-related papers and awards are part of our workshop program.
We agreed on a playground competition. We are preparing data and details so that we can start running the competition in August.

Isaac mentioned this in T215825: Prepare modeling task for outside competition.Jun 29 2021, 8:29 PM

Miriam moved this task from FY2020-21-Research-April-June to FY2021-22-Research-July-Sept on the Research board.Jul 28 2021, 8:51 AM

Miriam edited projects, added Research (FY2021-22-Research-July-Sept); removed Research (FY2020-21-Research-April-June).

Closing this task: the competition was launched on Kaggle on September 12 and, 3 weeks after the launch, we already have 45 teams who are participating! https://www.kaggle.com/c/wikipedia-image-caption/leaderboard

Miriam closed subtask T278217: Release image data for training as Resolved.Oct 13 2021, 1:40 PM