Reopening as I will use this task to track the competition progress and closure.
@marcmiquel could you post updates and docs about the ongoing conversations? Thanks!
@AniketArs welcome and looking forward to your contribution!
Presented the results yesterday at Release Engineering's lunch and learn:
- Slides: https://docs.google.com/presentation/d/12K7k0LkgxK8ovPM0zLaK7Q9az7jZca0tAV91uwSIib8/edit#slide=id.p
- Filenames that cause blockers spreadsheet: https://docs.google.com/spreadsheets/d/15xO-y0SopWvpjmztwvrvZf9z2V8NJ3gtufkCzjDApjE/edit#gid=1646264321
- Notebook with more insights: https://github.com/mirrys/release-engineering-data/blob/main/metrics_reng_data.ipynb
Wed, Oct 13
Wed, Oct 6
Closing this task: the competition was launched on Kaggle on September 12 and, 3 weeks after the launch, we already have 45 teams who are participating! https://www.kaggle.com/c/wikipedia-image-caption/leaderboard
Tue, Oct 5
@MMiller_WMF sorry for the late input on this! Would it be possible to store the following:
- image id
- article id
- match source
- user name
Wed, Sep 22
Sep 15 2021
Aug 27 2021
Hi Sarah, the competition is launching on September 9th. Would it be possible to wait until then for publication? Thanks again!
@srodlund thank you so much for your pass and for the detailed comments! You are the best :) I accepted most of your suggestions and responded to the comments.
Aug 25 2021
Hi @srodlund ! Yes, just finished today - you can find here the first draft for the blog post: https://docs.google.com/document/d/18TSGax5Xwo3mgDeCs5XliMFZDM6rezfLRvB2yykf6iU/edit
Feel free add comments and suggestions! Thanks a lot!
Aug 24 2021
Aug 9 2021
Aug 6 2021
Aug 5 2021
Great, thanks @Trizek-WMF! So I compiled a list of infoboxes here: https://w.wiki/3nRd
@MMiller_WMF yes, we used the main "infobox" template, but there are more that we should consider. We are working on that!
Aug 3 2021
@Trizek-WMF thanks! Do you think there is an easy way to retrieve a list of all the major templates used to define infoboxes?
Jul 28 2021
Thanks @mmodell !
Jul 27 2021
HI @Aklapper. As a Senior Research Scientist in the Research team I would like to be able to create milestones to organize and structure our team's projects. Could you please add me to this group?
Jul 26 2021
Jun 25 2021
- survey-based and content-based metrics have been finalized and entered in the deliverable spreadsheet: https://docs.google.com/spreadsheets/d/1J-3yeAVGbBr5s8vj8qFNvTu98jV8S4tKlqiKKlRMZVE/edit#gid=0
- 75% of the metrics have been defined (target was 60%)
- We submitted a proposal for a NeurIPS 2021 workshop titled "Wiki-M3L: Wikipedia and Multimodal & Multilingual Research - How can the two communities help each other?" about using Wikimedia data for multimodal ML, and using multimodal ML technologies to serve the community needs. The competition-related papers and awards are part of our workshop program.
- We agreed on a playground competition. We are preparing data and details so that we can start running the competition in August.
Weekly updates: the dataset release to-dos are listed in this doc. Fabian and Tiziano will work on releasing the image pixels and the embeddings, together with image metadata and license url by the end of next week.
- Qualitative: the study "how much of Wikipedia do you know?" is live on labinthewild at: https://labinthewild.org/studies/wikipedia/. We will analyze the data and verify some of our hypothesis as part of next fiscal year work.
- Quantitative: paper on the analysis of readers' interactions with images was submitted to EPJ data science. Now taking a step back, reading literature and looking at old experiments to finalize the research questions for the next project starting next FY.
Finalized the analysis of Android data, and helped the Growth team with the decision-making process around whether to deploy the "add an image" task as part of the newcomers structured tasks. They decided to go for it next fiscal year.
Jun 18 2021
Jun 16 2021
Hi @MMiller_WMF sorry for the delay on this. Please see our estimation of number of articles having at least one "Citation Needed" tag in this spreadsheet: https://docs.google.com/spreadsheets/d/1-diGTFHnpOw5gHjmWrfIfaZhxAg23fxMtIIdRa3F9Pk/edit#gid=665212437
Jun 4 2021
@leila yes this makes sense. I am meeting next week with the Design Research team to see how to improve this basic prototype. I will then create tasks to describe how we move forward, and one will definitely include metric description refinement and translation as you suggested.
Still trying to understand why the estimator performances become so low after moving from Keras to TF.Estimator. Investigation is ongoing and we are getting to the bottom of it.
There are different variables we are looking at:
(1) How the input data is formatted
(2) Whether the model is pre-trained or not
(3) The function used to transform the Keras model to an Estimator
- Competition launch is on hold due to discussion on the data nature and availability.
- We are putting together a workshop proposal for Neurips 2021 (deadline June 18th).
Weekly updates: discussions on the data nature and structure are still ongoing.
Weekly updates: some placeholder images escape the filters we put together based on categories. I manually went through the top-100 annotated image and I found ~15 of those. We should add those to the list of images to filter out, but also think of more scalable solutions.
- Qualitative: the pilot experiment has launched, next step is that the whole team will give feedback on the pilot.
- Quantitative: final touches for paper submission (expected next week)
Analyzed with Aiko the results of the Android POC, to understand
(1) The extent to which newcomers behave and annotate data differently
(2) The extent to which non-english users struggle with the POC
(3) The reliability of newcomers annotations (via agreement)
May 24 2021
May 21 2021
- Contract is signed
- Dataset in preparation
- We scoped the task as follows:
- Extracted metrics on toy dataset for the following questions:
- What is the representation of each category for this gap in this project? -- Probability distribution for a gap in a language edition for a specific year P = ( P ( gap ( year, language ) ) )
- What is the most represented category for this gap in this project? -- Max ( P )
- What is the least represented category for this gap in this project? -- Min ( P )
- How dominant is the most represented category with respect to the least represented one? -- Max( P )/Min( P )
- How dominant is the most represented category with respect to second most represented one? -- Max( P )/2ndMax ( P )
- How unbalanced is the representation of different categories? -- Gini ( P )
- How diverse is this project with respect to this gap? -- Normalized-Entropy( P )
- How are gaps evolving over time? -- Cumulative distribution for a gap in a language edition over all years, example:
As part of our conversations with Kaggle and the rest of the org team, we have figured out the schema for data deliverable. We should be able to release our part next week.
- Ran accuracy diagnositcs on the data from T273057. For the 2 largest sources, it looks like the algorithm has a 64-71% accuracy, meaning the majority of users say@Ai that the algorithm recommends a good match:
It looks like the time it takes a user to give a response is also inversely proportional with goodness of the match:
- Next @AikoChou and I will breakdown these metrics by user expertise and topic.
May 14 2021
We solved the problem of the Parameter Server being the bottleneck for computation, for now, by increasing the batch size used for training. In the mean time, we are running several experiments to understand the difference between the Keras Model API vs the Tensorflow Estimator, as prediction accuracy seems to be much lower using the Estimator, i.e. the function which we need to use for large-scale distributed training.
- Preparing a presentation to summarize the efforts on this front to Tech Managers
- Working on plotting and rendering some of the metrics from tagged content
- Progress on the contract end
- No other updates as people away for holidays or other reasons
- No updates
- Qualitative: no updates
- Quantitative: finalizing the paper for resubmission to EPJ Data Science
- Started analyzing the data in T273057 to check for early sign of algorithm accuracy by source of recommendation
May 7 2021
- With the help of @marcmiquel , we are figuring out the details of the questions for metrics deployment. I will start computing those on toy data coming from the cultural observatory.
We were able to get similar results across CPU and GPU computation. One major issue is that the worker dispatching weights to the GPU worker (the Parameter Server) is overloaded and saturates the network. we are investigating ways to reduce or re-distribute this load.
- Progress on the data release as per T278217
- Progress on a multimodal-multilingual baseline based on a cross-modal network trained on WIT.
- @tizianopiccardi computed the size of the image dataset based on face size. The idea is to remove all images where there is a face as primary subject. It looks like, even with a conservative approach of removing all images where the face is larger than 5% of the total image area, we can retain about 90% of the original image dataset
- Also, only about 4k images from the 7M in the WIT dataset are candidate for deletion on Commons. We will remove those as well.
Weekly updates: none
Resolving this for now, as all experiments are done.
- Qualitative: fine-tuned the last details of the back-end data collection for the LabInTheWild experiment. Ready to pilot test.
- Quantitative: re-writing the paper for resubmission to EPJ Data Science
- No major updates.
Apr 30 2021
- Started working on extracting features from a real distribution (articles by number of images), based on the metrics questions we are defining. This is to understand the extent to which these metrics are interpretable and understandable by non-technical people.
- Data release details are sorted, figuring out the last details on the type of features we want to release.
- For baselines, we are finetuning a model on the WIT dataset, and we will probably release it as baseline, with relative embeddings
- Working on putting together the workshop proposal for NeurIPS 2021.
Weekly updates: none
- Qualitative: finalized the results page for the LabInTheWild experiment. Ready to pilot test.
- Quantitative: re-writing the paper for resubmission to EPJ Data Science
- Image Recs on Android is now available in Beta. It will be in production next week if everything goes well.