Page MenuHomePhabricator

Run a computer vision challenge
Closed, ResolvedPublic

Description

Questions
in-progress

Budget
Approved.

Host
Kaggle

Timelines
For a contest to go live in June-August 2021, below are the timelines:

  • Challenge data validation, prep and documentation: 2-4 weeks
  • Prepare challenge content for launch: ~3 weeks
  • Run competition: ~2-3 months
  • Collect solutions, verify winners, administer prizes, and release results: ~4-6 weeks

TODOs

  • Email the DrivenData contact and ask for the following information. vendor contact email, vendor name, contract start date (tbd by Miriam), contract end date (tbd by Miriam), a phone number from the vendor, mailing address, name of the person on file, an answer to the question "Are you a WMF employee, a former WMF employee, or a family member or business partner of a WMF employee?".
  • Send all the info to Leila and she can start the process in Coupa.
  • Once completed, notify Janna to follow up for next steps with Contract.
  • give a heads up to Legal that the contract is in for their review so they're aware.
  • once the contract is approved, contact Legal and share the data-set we will be using. ask Google for the filters they have used to remove data from the data-set and share that with Legal (if they can share details).
  • check with Fabian re engineering requirements or any infra requirements we can define for the submissions to bring them closer to our environment.
  • Launch the competition on Kaggle
  • Monitor competition progress and participation
  • Close the competition and award winners

Event Timeline

What resources are needed to carry this out? GPU compute time?

@Harej, apologies for the delay, I was on holidays!

Resources will vary depending on the task. Given the heavy multimedia focus for this task, it is likely that participants will need GPU resources and substantial storage space to host the image datasets used for training. Participants might also need access to the Wikidata JSON dumps and the Wikipedia (and Commons?) XML dumps. I will get back to you with more details once we have figured out the specifics of the task.
Hope this helps!

leila renamed this task from Explore the possibility of running a computer vision challenge to Run a computer vision challenge.Mar 19 2021, 5:17 AM
leila triaged this task as High priority.
leila updated the task description. (Show Details)

@Miriam I updated the task based on what we discussed today. do feel free to edit and expand, of course. Good luck!

Weekly updates:

  • Estimated language and geographic distribution (thanks to Isaac's https://github.com/geohci/wiki-region-groundtruth Wiki Region Groundtruth data) of WIT test data
  • Defined the legal constraints for image data publication.
  • Worked with the WIT team to figure out next steps and involvement on their end, set up continuous communication channels and provided an detailed overview of timelines and commitments on our end.

Weekly updates:

  • Met with the full team - including Google researchers and identified the next steps and deadlines. On our end, we will work full force on data release and on setting up the contract with the org responsible for setting up the challenge.
  • Started process to generate the contract.
  • Started process for data release.

Weekly updates:

  • Progresses on data release, it was assessed as low risk by the security team pending a few checks on the images to be released.
  • Progresses on the baseline design: tested the CLIP model on the WIT test data - the CLIP model has been trained on the same data, therefore it is difficult to assess the difficulty of the task. Next we will extend the notion of caption to "all surrounding text" and try to work on a more image-text retrieval baseline.

Weekly updates:

  • Progresses on data release: working on polishing the list of images for public release.

Weekly updates:

  • Progress on the data release as per T278217
  • Progress on a multimodal-multilingual baseline based on a cross-modal network trained on WIT.

Weekly updates:

  • Progress on the contract end
  • No other updates as people away for holidays or other reasons

Weekly updates:

  • Contract is signed
  • Dataset in preparation
  • We scoped the task as follows:
The task of this competition is  the following: given an image, retrieve the closest text from a large pool of words and sentences.
Images will come from Wikipedia articles, in many languages, and the target pieces of text will be taken from image captions and the title of the Wikipedia articles where images are placed.

Weekly updates:

  • Competition launch is on hold due to discussion on the data nature and availability.
  • We are putting together a workshop proposal for Neurips 2021 (deadline June 18th).

Weekly updates:

  • We submitted a proposal for a NeurIPS 2021 workshop titled "Wiki-M3L: Wikipedia and Multimodal & Multilingual Research - How can the two communities help each other?" about using Wikimedia data for multimodal ML, and using multimodal ML technologies to serve the community needs. The competition-related papers and awards are part of our workshop program.
  • We agreed on a playground competition. We are preparing data and details so that we can start running the competition in August.

Closing this task: the competition was launched on Kaggle on September 12 and, 3 weeks after the launch, we already have 45 teams who are participating! https://www.kaggle.com/c/wikipedia-image-caption/leaderboard

Reopening as I will use this task to track the competition progress and closure.