Page MenuHomePhabricator

Run a computer vision challenge
Open, HighPublic




Possible host

For a contest to go live in June-August 2021, below are the timelines:

  • Challenge data validation, prep and documentation: 2-4 weeks
  • Prepare challenge content for launch: ~3 weeks
  • Run competition: ~2-3 months
  • Collect solutions, verify winners, administer prizes, and release results: ~4-6 weeks


  • Email the DrivenData contact and ask for the following information. vendor contact email, vendor name, contract start date (tbd by Miriam), contract end date (tbd by Miriam), a phone number from the vendor, mailing address, name of the person on file, an answer to the question "Are you a WMF employee, a former WMF employee, or a family member or business partner of a WMF employee?".
  • Send all the info to Leila and she can start the process in Coupa.
  • Once completed, notify Janna to follow up for next steps with Contract.
  • give a heads up to Legal that the contract is in for their review so they're aware.
  • once the contract is approved, contact Legal and share the data-set we will be using. ask Google for the filters they have used to remove data from the data-set and share that with Legal (if they can share details).
  • check with Fabian re engineering requirements or any infra requirements we can define for the submissions to bring them closer to our environment.

Event Timeline

What resources are needed to carry this out? GPU compute time?

@Harej, apologies for the delay, I was on holidays!

Resources will vary depending on the task. Given the heavy multimedia focus for this task, it is likely that participants will need GPU resources and substantial storage space to host the image datasets used for training. Participants might also need access to the Wikidata JSON dumps and the Wikipedia (and Commons?) XML dumps. I will get back to you with more details once we have figured out the specifics of the task.
Hope this helps!

leila renamed this task from Explore the possibility of running a computer vision challenge to Run a computer vision challenge.Mar 19 2021, 5:17 AM
leila triaged this task as High priority.
leila updated the task description. (Show Details)

@Miriam I updated the task based on what we discussed today. do feel free to edit and expand, of course. Good luck!

Weekly updates:

  • Estimated language and geographic distribution (thanks to Isaac's Wiki Region Groundtruth data) of WIT test data
  • Defined the legal constraints for image data publication.
  • Worked with the WIT team to figure out next steps and involvement on their end, set up continuous communication channels and provided an detailed overview of timelines and commitments on our end.

Weekly updates:

  • Met with the full team - including Google researchers and identified the next steps and deadlines. On our end, we will work full force on data release and on setting up the contract with the org responsible for setting up the challenge.
  • Started process to generate the contract.
  • Started process for data release.

Weekly updates:

  • Progresses on data release, it was assessed as low risk by the security team pending a few checks on the images to be released.
  • Progresses on the baseline design: tested the CLIP model on the WIT test data - the CLIP model has been trained on the same data, therefore it is difficult to assess the difficulty of the task. Next we will extend the notion of caption to "all surrounding text" and try to work on a more image-text retrieval baseline.

Weekly updates:

  • Progresses on data release: working on polishing the list of images for public release.

Weekly updates:

  • Progress on the data release as per T278217
  • Progress on a multimodal-multilingual baseline based on a cross-modal network trained on WIT.