Prepare a machine-learning modeling task that can be hosted as a competition on an outside platform such as Kaggle.
Requirements:
- Testing data is not public: that would make it too easy to train on the test set
- Test dataset is not easily manipulated: if the test set depends on future behavior, we want to make sure that the task does not encourage editing that might negatively impact the community
- Task directly supports one of our current goals
- Task serves as a good tutorial to Wikimedia research challenges and resources [nice to have]
Steps:
- Identify scope of challenge
- Identify venue for challenge
- Make sure we have agreement / support from any teams connected to the data
- Build datasets in support of task and documentation of them and other supporting resources
- Apply for acceptance of task (if necessary)