User Details
- User Since
- Sep 25 2017, 10:36 AM (248 w, 21 h)
- Availability
- Available
- LDAP User
- Miriam
- MediaWiki User
- Miriam (WMF) [ Global Accounts ]
Wed, Jun 1
Thanks @KStoller-WMF !
Mon, May 30
Hi @KStoller-WMF thanks for this!
Mar 22 2022
Feb 16 2022
Feb 10 2022
Status update:
- The skeleton for the viz is ready. We have different views for different knowledge gaps scenarios.
- Link to the Figma page https://www.figma.com/file/SksDEinmLdyqTD4wWEJ3Mt/Wiki_KnowledgeGapsIndex?node-id=659%3A4740
- Link to the Meta page, from @marcmiquel https://meta.wikimedia.org/wiki/Research:Knowledge_Gaps_Index/Visualization
- What is missing: landing page, how do you navigate to the actual visualization? @nayoub will provide more info about the next steps here.
Feb 7 2022
This amazing work by Aiko was concluded in 2020! https://github.com/AikoChou/citationdetective
Thesis submitted and final exam brilliantly passed!!
Code has been published and most of the work is completed, thanks @Aklapper for the ping!
Thanks @Aklapper for the ping. Actually we abandoned this task a while ago, due to other problems that came up during the experimental phase. thanks!
Jan 31 2022
@jhathaway could you double check that @AniketArs has LDAP access? They are not able to access the notebooks.
He is able to access the stat machines now, but not able to login the notebooks (Jupyterhub). It shows 'Invalid username or password'.
Thanks!
Jan 28 2022
Thanks @Ottomata! @AniketArs will be with us until the end of the fiscal year, so the expiry_date should be 2020-06-30.
And yes, could we please get Kerberos access too?
Jan 27 2022
Thanks @jhathaway , approved on my end!
Jan 24 2022
Jan 10 2022
Just chiming in to share previous work on detecting icons.
Oct 15 2021
Reopening as I will use this task to track the competition progress and closure.
@marcmiquel could you post updates and docs about the ongoing conversations? Thanks!
@AniketArs welcome and looking forward to your contribution!
Presented the results yesterday at Release Engineering's lunch and learn:
- Slides: https://docs.google.com/presentation/d/12K7k0LkgxK8ovPM0zLaK7Q9az7jZca0tAV91uwSIib8/edit#slide=id.p
- Filenames that cause blockers spreadsheet: https://docs.google.com/spreadsheets/d/15xO-y0SopWvpjmztwvrvZf9z2V8NJ3gtufkCzjDApjE/edit#gid=1646264321
- Notebook with more insights: https://github.com/mirrys/release-engineering-data/blob/main/metrics_reng_data.ipynb
Oct 13 2021
Oct 6 2021
Closing this task: the competition was launched on Kaggle on September 12 and, 3 weeks after the launch, we already have 45 teams who are participating! https://www.kaggle.com/c/wikipedia-image-caption/leaderboard
Oct 5 2021
@MMiller_WMF sorry for the late input on this! Would it be possible to store the following:
- image id
- article id
- match source
- decision
- user name
- timestamp
- wiki_db
Sep 22 2021
Sep 15 2021
Aug 27 2021
Hi Sarah, the competition is launching on September 9th. Would it be possible to wait until then for publication? Thanks again!
@srodlund thank you so much for your pass and for the detailed comments! You are the best :) I accepted most of your suggestions and responded to the comments.
Aug 25 2021
Hi @srodlund ! Yes, just finished today - you can find here the first draft for the blog post: https://docs.google.com/document/d/18TSGax5Xwo3mgDeCs5XliMFZDM6rezfLRvB2yykf6iU/edit
Feel free add comments and suggestions! Thanks a lot!
Aug 24 2021
Aug 9 2021
Aug 6 2021
Aug 5 2021
Great, thanks @Trizek-WMF! So I compiled a list of infoboxes here: https://w.wiki/3nRd
@MMiller_WMF yes, we used the main "infobox" template, but there are more that we should consider. We are working on that!
Aug 3 2021
@Trizek-WMF thanks! Do you think there is an easy way to retrieve a list of all the major templates used to define infoboxes?
Jul 28 2021
Thanks @mmodell !
Jul 27 2021
HI @Aklapper. As a Senior Research Scientist in the Research team I would like to be able to create milestones to organize and structure our team's projects. Could you please add me to this group?
Thanks!
Jul 26 2021
Jun 25 2021
Weekly updates:
- survey-based and content-based metrics have been finalized and entered in the deliverable spreadsheet: https://docs.google.com/spreadsheets/d/1J-3yeAVGbBr5s8vj8qFNvTu98jV8S4tKlqiKKlRMZVE/edit#gid=0
- 75% of the metrics have been defined (target was 60%)
Weekly updates:
Weekly updates:
- We submitted a proposal for a NeurIPS 2021 workshop titled "Wiki-M3L: Wikipedia and Multimodal & Multilingual Research - How can the two communities help each other?" about using Wikimedia data for multimodal ML, and using multimodal ML technologies to serve the community needs. The competition-related papers and awards are part of our workshop program.
- We agreed on a playground competition. We are preparing data and details so that we can start running the competition in August.
Weekly updates: the dataset release to-dos are listed in this doc. Fabian and Tiziano will work on releasing the image pixels and the embeddings, together with image metadata and license url by the end of next week.
- Qualitative: the study "how much of Wikipedia do you know?" is live on labinthewild at: https://labinthewild.org/studies/wikipedia/. We will analyze the data and verify some of our hypothesis as part of next fiscal year work.
- Quantitative: paper on the analysis of readers' interactions with images was submitted to EPJ data science. Now taking a step back, reading literature and looking at old experiments to finalize the research questions for the next project starting next FY.
Weekly updates:
Finalized the analysis of Android data, and helped the Growth team with the decision-making process around whether to deploy the "add an image" task as part of the newcomers structured tasks. They decided to go for it next fiscal year.
Jun 18 2021
Jun 16 2021
Hi @MMiller_WMF sorry for the delay on this. Please see our estimation of number of articles having at least one "Citation Needed" tag in this spreadsheet: https://docs.google.com/spreadsheets/d/1-diGTFHnpOw5gHjmWrfIfaZhxAg23fxMtIIdRa3F9Pk/edit#gid=665212437
Jun 4 2021
@leila yes this makes sense. I am meeting next week with the Design Research team to see how to improve this basic prototype. I will then create tasks to describe how we move forward, and one will definitely include metric description refinement and translation as you suggested.
Weekly updates:
Still trying to understand why the estimator performances become so low after moving from Keras to TF.Estimator. Investigation is ongoing and we are getting to the bottom of it.
There are different variables we are looking at:
(1) How the input data is formatted
(2) Whether the model is pre-trained or not
(3) The function used to transform the Keras model to an Estimator
Getting there!
Weekly updates:
- Competition launch is on hold due to discussion on the data nature and availability.
- We are putting together a workshop proposal for Neurips 2021 (deadline June 18th).
Weekly updates: discussions on the data nature and structure are still ongoing.
Weekly updates: some placeholder images escape the filters we put together based on categories. I manually went through the top-100 annotated image and I found ~15 of those. We should add those to the list of images to filter out, but also think of more scalable solutions.