Page MenuHomePhabricator

Miriam (Miriam Redi)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Sep 25 2017, 10:36 AM (211 w, 5 d)
Availability
Available
LDAP User
Miriam
MediaWiki User
Miriam (WMF) [ Global Accounts ]

Recent Activity

Yesterday

Miriam moved T260634: Run a computer vision challenge from FY2021-22-Research-July-Sept to FY2021-22-Research-Oct-Dec on the Research board.
Fri, Oct 15, 3:16 PM · Research (FY2021-22-Research-Oct-Dec)
Miriam updated the task description for T260634: Run a computer vision challenge.
Fri, Oct 15, 3:16 PM · Research (FY2021-22-Research-Oct-Dec)
Miriam reopened T260634: Run a computer vision challenge as "Open".

Reopening as I will use this task to track the competition progress and closure.

Fri, Oct 15, 3:15 PM · Research (FY2021-22-Research-Oct-Dec)
Miriam added a comment to T287580: Explore Design Options for the Knowledge Gap Index Tool.

@marcmiquel could you post updates and docs about the ongoing conversations? Thanks!

Fri, Oct 15, 3:03 PM · Research (FY2021-22-Research-Oct-Dec), Design-Research
Miriam created T293477: Investigate the importance of images in free knowledge ecosystems (Q2).
Fri, Oct 15, 3:01 PM · Research (FY2021-22-Research-Oct-Dec)
Miriam created T293476: Investigate best tools and methods to efficiently compute image similarity at scale.
Fri, Oct 15, 2:58 PM · Research (FY2021-22-Research-Oct-Dec)
Miriam updated subscribers of T291453: Outreachy Application Task: Develop an Image Similarity API.
Fri, Oct 15, 2:56 PM · Outreachy (Round 23)
Miriam added a comment to T291453: Outreachy Application Task: Develop an Image Similarity API.

@AniketArs welcome and looking forward to your contribution!

Fri, Oct 15, 2:55 PM · Outreachy (Round 23)
Miriam closed T287581: Investigate the importance of images in free knowledge ecosystems (Q1) as Resolved.
Fri, Oct 15, 2:47 PM · Research (FY2021-22-Research-July-Sept)
Miriam moved T289567: Define Metrics for Change Failure Percentage from FY2021-22-Research-July-Sept to FY2021-22-Research-Oct-Dec on the Research board.
Fri, Oct 15, 2:47 PM · Research (FY2021-22-Research-Oct-Dec), User-brennen, Release-Engineering-Team (Radar)
Miriam moved T287583: Research Support for Image Suggestion Algorithm Deployment from FY2021-22-Research-July-Sept to FY2021-22-Research-Oct-Dec on the Research board.
Fri, Oct 15, 2:47 PM · Research (FY2021-22-Research-Oct-Dec)
Miriam added a comment to T289567: Define Metrics for Change Failure Percentage.

Presented the results yesterday at Release Engineering's lunch and learn:

Fri, Oct 15, 2:46 PM · Research (FY2021-22-Research-Oct-Dec), User-brennen, Release-Engineering-Team (Radar)

Wed, Oct 13

Miriam moved T287580: Explore Design Options for the Knowledge Gap Index Tool from FY2021-22-Research-July-Sept to FY2021-22-Research-Oct-Dec on the Research board.
Wed, Oct 13, 2:58 PM · Research (FY2021-22-Research-Oct-Dec), Design-Research
Miriam closed T278217: Release image data for training, a subtask of T260634: Run a computer vision challenge, as Resolved.
Wed, Oct 13, 1:40 PM · Research (FY2021-22-Research-Oct-Dec)
Miriam closed T278217: Release image data for training as Resolved.
Wed, Oct 13, 1:40 PM · Research (FY2020-21-Research-April-June)

Wed, Oct 6

Miriam created T292648: Story idea for Blog: Towards fully open-source distributed machine learning.
Wed, Oct 6, 4:11 PM · Technical-blog-posts
Miriam closed T260634: Run a computer vision challenge as Resolved.

Closing this task: the competition was launched on Kaggle on September 12 and, 3 weeks after the launch, we already have 45 teams who are participating! https://www.kaggle.com/c/wikipedia-image-caption/leaderboard

Wed, Oct 6, 1:01 PM · Research (FY2021-22-Research-Oct-Dec)

Tue, Oct 5

Miriam added a comment to T291224: Add an image: export image judgments for analysis.

@MMiller_WMF sorry for the late input on this! Would it be possible to store the following:

  • image id
  • article id
  • match source
  • decision
  • user name
  • timestamp
  • wiki_db
Tue, Oct 5, 5:46 PM · Growth-Team (Current Sprint), Growth-Structured-Tasks

Wed, Sep 22

Miriam updated the task description for T291453: Outreachy Application Task: Develop an Image Similarity API.
Wed, Sep 22, 2:39 PM · Outreachy (Round 23)

Sep 15 2021

Miriam added a member for Outreachy Mentors: fkaelin.
Sep 15 2021, 2:52 PM

Aug 27 2021

Miriam added a comment to T288359: Story idea for Blog: Wikipedia Image Captioning Competition.

Hi Sarah, the competition is launching on September 9th. Would it be possible to wait until then for publication? Thanks again!

Aug 27 2021, 6:16 PM · Technical-blog-posts
Miriam added a comment to T288359: Story idea for Blog: Wikipedia Image Captioning Competition.

@srodlund thank you so much for your pass and for the detailed comments! You are the best :) I accepted most of your suggestions and responded to the comments.

Aug 27 2021, 4:50 PM · Technical-blog-posts

Aug 25 2021

Miriam added a comment to T288359: Story idea for Blog: Wikipedia Image Captioning Competition.

Hi @srodlund ! Yes, just finished today - you can find here the first draft for the blog post: https://docs.google.com/document/d/18TSGax5Xwo3mgDeCs5XliMFZDM6rezfLRvB2yykf6iU/edit
Feel free add comments and suggestions! Thanks a lot!

Aug 25 2021, 5:50 PM · Technical-blog-posts

Aug 24 2021

thcipriani awarded T289567: Define Metrics for Change Failure Percentage a 100 token.
Aug 24 2021, 1:38 PM · Research (FY2021-22-Research-Oct-Dec), User-brennen, Release-Engineering-Team (Radar)
Miriam created T289567: Define Metrics for Change Failure Percentage.
Aug 24 2021, 11:28 AM · Research (FY2021-22-Research-Oct-Dec), User-brennen, Release-Engineering-Team (Radar)

Aug 9 2021

Miriam added a parent task for T287317: Add an image: count image suggestions without infoboxes: T287583: Research Support for Image Suggestion Algorithm Deployment .
Aug 9 2021, 9:56 AM · Image-Suggestions, Growth-Team (Current Sprint), Growth-Structured-Tasks
Miriam added a subtask for T287583: Research Support for Image Suggestion Algorithm Deployment : T287317: Add an image: count image suggestions without infoboxes.
Aug 9 2021, 9:56 AM · Research (FY2021-22-Research-Oct-Dec)

Aug 6 2021

Miriam created T288359: Story idea for Blog: Wikipedia Image Captioning Competition.
Aug 6 2021, 4:28 PM · Technical-blog-posts

Aug 5 2021

Miriam added a comment to T287317: Add an image: count image suggestions without infoboxes.

Great, thanks @Trizek-WMF! So I compiled a list of infoboxes here: https://w.wiki/3nRd
@MMiller_WMF yes, we used the main "infobox" template, but there are more that we should consider. We are working on that!

Aug 5 2021, 8:19 AM · Image-Suggestions, Growth-Team (Current Sprint), Growth-Structured-Tasks

Aug 3 2021

Miriam added a comment to T287317: Add an image: count image suggestions without infoboxes.

@Trizek-WMF thanks! Do you think there is an easy way to retrieve a list of all the major templates used to define infoboxes?

Aug 3 2021, 3:09 PM · Image-Suggestions, Growth-Team (Current Sprint), Growth-Structured-Tasks

Jul 28 2021

Miriam renamed T287583: Research Support for Image Suggestion Algorithm Deployment from Research support Image Suggestion Algorithm Deployment to Research Support for Image Suggestion Algorithm Deployment .
Jul 28 2021, 2:01 PM · Research (FY2021-22-Research-Oct-Dec)
Miriam created T287583: Research Support for Image Suggestion Algorithm Deployment .
Jul 28 2021, 2:00 PM · Research (FY2021-22-Research-Oct-Dec)
Miriam claimed T287580: Explore Design Options for the Knowledge Gap Index Tool.
Jul 28 2021, 1:58 PM · Research (FY2021-22-Research-Oct-Dec), Design-Research
Miriam claimed T287581: Investigate the importance of images in free knowledge ecosystems (Q1).
Jul 28 2021, 1:58 PM · Research (FY2021-22-Research-July-Sept)
Miriam created T287581: Investigate the importance of images in free knowledge ecosystems (Q1).
Jul 28 2021, 1:57 PM · Research (FY2021-22-Research-July-Sept)
Miriam created T287580: Explore Design Options for the Knowledge Gap Index Tool.
Jul 28 2021, 1:56 PM · Research (FY2021-22-Research-Oct-Dec), Design-Research
Miriam renamed Research (FY2021-22-Research-July-Sept) from FY2021-22-Research-July-September to FY2021-22-Research-July-Sept.
Jul 28 2021, 8:55 AM
Miriam created Research (FY2021-22-Research-April-June).
Jul 28 2021, 8:54 AM
Miriam created Research (FY2021-22-Research-Jan-March).
Jul 28 2021, 8:54 AM
Miriam created Research (FY2021-22-Research-Oct-Dec).
Jul 28 2021, 8:54 AM
Miriam moved T260634: Run a computer vision challenge from FY2020-21-Research-April-June to FY2021-22-Research-July-Sept on the Research board.
Jul 28 2021, 8:51 AM · Research (FY2021-22-Research-Oct-Dec)
Miriam renamed Research (FY2021-22-Research-July-Sept) from Research (FY2021-22-Research-July-September) to FY2021-22-Research-July-September.
Jul 28 2021, 8:50 AM
Miriam created Research (FY2021-22-Research-July-Sept).
Jul 28 2021, 8:50 AM
Miriam added a comment to T706: Requests for addition to the #acl*Project-Admins group (in comments).

Thanks @mmodell !

Jul 28 2021, 8:48 AM · Project-Admins

Jul 27 2021

Miriam added a comment to T706: Requests for addition to the #acl*Project-Admins group (in comments).

HI @Aklapper. As a Senior Research Scientist in the Research team I would like to be able to create milestones to organize and structure our team's projects. Could you please add me to this group?
Thanks!

Jul 27 2021, 4:57 PM · Project-Admins

Jul 26 2021

Miriam edited projects for T256081: Image matching algorithm, added: Research; removed Research (FY2020-21-Research-April-June).
Jul 26 2021, 12:58 PM · Research, Growth-Team-Filtering, Image-Suggestions, Growth-Team, Wikipedia-Android-App-Backlog
Miriam removed a project from T277828: Investigate placeholder image recommendation: Research (FY2020-21-Research-April-June).
Jul 26 2021, 12:58 PM · Growth-Team-Filtering, Image-Suggestions, Growth-Team
Miriam reassigned T287317: Add an image: count image suggestions without infoboxes from Miriam to AikoChou.
Jul 26 2021, 9:50 AM · Image-Suggestions, Growth-Team (Current Sprint), Growth-Structured-Tasks

Jun 25 2021

Miriam closed T273968: Define Metrics for Survey-Based Knowledge Gaps as Resolved.
Jun 25 2021, 4:45 PM · Research (FY2020-21-Research-April-June)
Miriam added a comment to T273968: Define Metrics for Survey-Based Knowledge Gaps.

Weekly updates:

Jun 25 2021, 4:44 PM · Research (FY2020-21-Research-April-June)
Miriam added a comment to T276407: An End-to-End Image Classification Pipeline.

Weekly updates:

Jun 25 2021, 4:41 PM · Research (FY2020-21-Research-April-June), Structured-Data-Backlog, MachineVision
Miriam added a comment to T260634: Run a computer vision challenge.

Weekly updates:

  • We submitted a proposal for a NeurIPS 2021 workshop titled "Wiki-M3L: Wikipedia and Multimodal & Multilingual Research - How can the two communities help each other?" about using Wikimedia data for multimodal ML, and using multimodal ML technologies to serve the community needs. The competition-related papers and awards are part of our workshop program.
  • We agreed on a playground competition. We are preparing data and details so that we can start running the competition in August.
Jun 25 2021, 4:39 PM · Research (FY2021-22-Research-Oct-Dec)
Miriam added a comment to T278217: Release image data for training.

Weekly updates: the dataset release to-dos are listed in this doc. Fabian and Tiziano will work on releasing the image pixels and the embeddings, together with image metadata and license url by the end of next week.

Jun 25 2021, 4:35 PM · Research (FY2020-21-Research-April-June)
Miriam closed T266655: Quantifying the importance of images in Wikipedia as Resolved.
Jun 25 2021, 4:33 PM · Research (FY2020-21-Research-April-June)
Miriam added a comment to T266655: Quantifying the importance of images in Wikipedia.
  • Qualitative: the study "how much of Wikipedia do you know?" is live on labinthewild at: https://labinthewild.org/studies/wikipedia/. We will analyze the data and verify some of our hypothesis as part of next fiscal year work.
  • Quantitative: paper on the analysis of readers' interactions with images was submitted to EPJ data science. Now taking a step back, reading literature and looking at old experiments to finalize the research questions for the next project starting next FY.
Jun 25 2021, 4:33 PM · Research (FY2020-21-Research-April-June)
Miriam closed T278681: Image Matching Structured Task: Research Q3-Q4, a subtask of T256081: Image matching algorithm, as Resolved.
Jun 25 2021, 4:24 PM · Research, Growth-Team-Filtering, Image-Suggestions, Growth-Team, Wikipedia-Android-App-Backlog
Miriam closed T278681: Image Matching Structured Task: Research Q3-Q4 as Resolved.
Jun 25 2021, 4:24 PM · Research (FY2020-21-Research-April-June)
Miriam added a comment to T278681: Image Matching Structured Task: Research Q3-Q4.

Weekly updates:
Finalized the analysis of Android data, and helped the Growth team with the decision-making process around whether to deploy the "add an image" task as part of the newcomers structured tasks. They decided to go for it next fiscal year.

Jun 25 2021, 4:24 PM · Research (FY2020-21-Research-April-June)

Jun 18 2021

Miriam claimed T215413: Image Classification Working Group.
Jun 18 2021, 11:13 AM · Analytics-Radar, Reading-Admin, SDC General, Multimedia, Wikidata, Discovery-Search, Research
Miriam placed T215413: Image Classification Working Group up for grabs.
Jun 18 2021, 11:13 AM · Analytics-Radar, Reading-Admin, SDC General, Multimedia, Wikidata, Discovery-Search, Research
Miriam updated the task description for T215413: Image Classification Working Group.
Jun 18 2021, 11:12 AM · Analytics-Radar, Reading-Admin, SDC General, Multimedia, Wikidata, Discovery-Search, Research
Miriam updated the task description for T215413: Image Classification Working Group.
Jun 18 2021, 11:10 AM · Analytics-Radar, Reading-Admin, SDC General, Multimedia, Wikidata, Discovery-Search, Research
Miriam closed T228441: Design a pipeline for image classification, a subtask of T215413: Image Classification Working Group, as Invalid.
Jun 18 2021, 11:10 AM · Analytics-Radar, Reading-Admin, SDC General, Multimedia, Wikidata, Discovery-Search, Research
Miriam closed T228441: Design a pipeline for image classification, a subtask of T155538: General image classifier for commons, as Invalid.
Jun 18 2021, 11:09 AM · Wiki-Loves-Monuments, Wikimedia-Hackathon-2017, Research-Backlog, artificial-intelligence, Research ideas, Machine-Learning-Team
Miriam closed T228441: Design a pipeline for image classification as Invalid.
Jun 18 2021, 11:09 AM · Research, artificial-intelligence

Jun 16 2021

Miriam added a comment to T279606: Add a reference: usage of user-applied templates.

Hi @MMiller_WMF sorry for the delay on this. Please see our estimation of number of articles having at least one "Citation Needed" tag in this spreadsheet: https://docs.google.com/spreadsheets/d/1-diGTFHnpOw5gHjmWrfIfaZhxAg23fxMtIIdRa3F9Pk/edit#gid=665212437

Jun 16 2021, 10:50 AM · Growth-Team-Filtering, Growth-Team, Growth-Structured-Tasks

Jun 4 2021

Miriam added a comment to T273968: Define Metrics for Survey-Based Knowledge Gaps.

@leila yes this makes sense. I am meeting next week with the Design Research team to see how to improve this basic prototype. I will then create tasks to describe how we move forward, and one will definitely include metric description refinement and translation as you suggested.

Jun 4 2021, 2:14 PM · Research (FY2020-21-Research-April-June)
Miriam added a comment to T276407: An End-to-End Image Classification Pipeline.

Weekly updates:
Still trying to understand why the estimator performances become so low after moving from Keras to TF.Estimator. Investigation is ongoing and we are getting to the bottom of it.
There are different variables we are looking at:
(1) How the input data is formatted
(2) Whether the model is pre-trained or not
(3) The function used to transform the Keras model to an Estimator
Getting there!

Jun 4 2021, 2:11 PM · Research (FY2020-21-Research-April-June), Structured-Data-Backlog, MachineVision
Miriam added a comment to T260634: Run a computer vision challenge.

Weekly updates:

  • Competition launch is on hold due to discussion on the data nature and availability.
  • We are putting together a workshop proposal for Neurips 2021 (deadline June 18th).
Jun 4 2021, 2:04 PM · Research (FY2021-22-Research-Oct-Dec)
Miriam added a comment to T278217: Release image data for training.

Weekly updates: discussions on the data nature and structure are still ongoing.

Jun 4 2021, 2:03 PM · Research (FY2020-21-Research-April-June)
Miriam added a comment to T277828: Investigate placeholder image recommendation.

Weekly updates: some placeholder images escape the filters we put together based on categories. I manually went through the top-100 annotated image and I found ~15 of those. We should add those to the list of images to filter out, but also think of more scalable solutions.

Jun 4 2021, 2:02 PM · Growth-Team-Filtering, Image-Suggestions, Growth-Team
Miriam added a comment to T266655: Quantifying the importance of images in Wikipedia.
  • Qualitative: the pilot experiment has launched, next step is that the whole team will give feedback on the pilot.
  • Quantitative: final touches for paper submission (expected next week)
Jun 4 2021, 1:58 PM · Research (FY2020-21-Research-April-June)
Miriam added a comment to T278681: Image Matching Structured Task: Research Q3-Q4.

Analyzed with Aiko the results of the Android POC, to understand
(1) The extent to which newcomers behave and annotate data differently
(2) The extent to which non-english users struggle with the POC
(3) The reliability of newcomers annotations (via agreement)

Jun 4 2021, 1:57 PM · Research (FY2020-21-Research-April-June)

May 24 2021

leila awarded T260634: Run a computer vision challenge a Cookie token.
May 24 2021, 7:10 PM · Research (FY2021-22-Research-Oct-Dec)

May 21 2021

Miriam added a comment to T260634: Run a computer vision challenge.

Weekly updates:

  • Contract is signed
  • Dataset in preparation
  • We scoped the task as follows:
May 21 2021, 5:23 PM · Research (FY2021-22-Research-Oct-Dec)
Miriam added a comment to T273968: Define Metrics for Survey-Based Knowledge Gaps.

Weekly updates:

  • Extracted metrics on toy dataset for the following questions:
    • What is the representation of each category for this gap in this project? -- Probability distribution for a gap in a language edition for a specific year P = ( P ( gap ( year, language ) ) )
    • What is the most represented category for this gap in this project? -- Max ( P )
    • What is the least represented category for this gap in this project? -- Min ( P )
    • How dominant is the most represented category with respect to the least represented one? -- Max( P )/Min( P )
    • How dominant is the most represented category with respect to second most represented one? -- Max( P )/2ndMax ( P )
    • How unbalanced is the representation of different categories? -- Gini ( P )
    • How diverse is this project with respect to this gap? -- Normalized-Entropy( P )
    • How are gaps evolving over time? -- Cumulative distribution for a gap in a language edition over all years, example:

Screenshot from 2021-05-21 18-19-13.png (477×1 px, 59 KB)

May 21 2021, 5:21 PM · Research (FY2020-21-Research-April-June)
Miriam added a comment to T278217: Release image data for training.

Weekly updates:
As part of our conversations with Kaggle and the rest of the org team, we have figured out the schema for data deliverable. We should be able to release our part next week.

Screenshot from 2021-05-21 18-09-21.png (393×831 px, 59 KB)

May 21 2021, 5:10 PM · Research (FY2020-21-Research-April-June)
Miriam added a comment to T266655: Quantifying the importance of images in Wikipedia.

Weekly updates:

May 21 2021, 5:07 PM · Research (FY2020-21-Research-April-June)
Miriam updated subscribers of T278681: Image Matching Structured Task: Research Q3-Q4.

Weekly updates:

  • Ran accuracy diagnositcs on the data from T273057. For the 2 largest sources, it looks like the algorithm has a 64-71% accuracy, meaning the majority of users say@Ai that the algorithm recommends a good match:

Screenshot from 2021-05-21 17-59-27.png (384×1 px, 93 KB)

It looks like the time it takes a user to give a response is also inversely proportional with goodness of the match:
Screenshot from 2021-05-21 18-00-15.png (312×685 px, 34 KB)

  • Next @AikoChou and I will breakdown these metrics by user expertise and topic.
May 21 2021, 5:05 PM · Research (FY2020-21-Research-April-June)

May 14 2021

Miriam added a comment to T276407: An End-to-End Image Classification Pipeline.

Weekly updates:
We solved the problem of the Parameter Server being the bottleneck for computation, for now, by increasing the batch size used for training. In the mean time, we are running several experiments to understand the difference between the Keras Model API vs the Tensorflow Estimator, as prediction accuracy seems to be much lower using the Estimator, i.e. the function which we need to use for large-scale distributed training.

May 14 2021, 6:36 PM · Research (FY2020-21-Research-April-June), Structured-Data-Backlog, MachineVision
Miriam added a comment to T273968: Define Metrics for Survey-Based Knowledge Gaps.

Weekly updates:

  • Preparing a presentation to summarize the efforts on this front to Tech Managers
  • Working on plotting and rendering some of the metrics from tagged content
May 14 2021, 6:34 PM · Research (FY2020-21-Research-April-June)
Miriam added a comment to T260634: Run a computer vision challenge.

Weekly updates:

  • Progress on the contract end
  • No other updates as people away for holidays or other reasons
May 14 2021, 6:32 PM · Research (FY2021-22-Research-Oct-Dec)
Miriam added a comment to T278217: Release image data for training.

Weekly updates:

  • No updates
May 14 2021, 6:31 PM · Research (FY2020-21-Research-April-June)
Miriam added a comment to T266655: Quantifying the importance of images in Wikipedia.

Weekly updates:

  • Qualitative: no updates
  • Quantitative: finalizing the paper for resubmission to EPJ Data Science
May 14 2021, 6:31 PM · Research (FY2020-21-Research-April-June)
Miriam added a comment to T278681: Image Matching Structured Task: Research Q3-Q4.

Weekly updates:

  • Started analyzing the data in T273057 to check for early sign of algorithm accuracy by source of recommendation
May 14 2021, 6:30 PM · Research (FY2020-21-Research-April-June)

May 7 2021

Miriam added a comment to T273968: Define Metrics for Survey-Based Knowledge Gaps.

Weekly updates:

  • With the help of @marcmiquel , we are figuring out the details of the questions for metrics deployment. I will start computing those on toy data coming from the cultural observatory.
May 7 2021, 4:18 PM · Research (FY2020-21-Research-April-June)
Miriam added a comment to T276407: An End-to-End Image Classification Pipeline.

Weekly update:
We were able to get similar results across CPU and GPU computation. One major issue is that the worker dispatching weights to the GPU worker (the Parameter Server) is overloaded and saturates the network. we are investigating ways to reduce or re-distribute this load.

May 7 2021, 4:13 PM · Research (FY2020-21-Research-April-June), Structured-Data-Backlog, MachineVision
Miriam added a comment to T260634: Run a computer vision challenge.

Weekly updates:

  • Progress on the data release as per T278217
  • Progress on a multimodal-multilingual baseline based on a cross-modal network trained on WIT.
May 7 2021, 4:10 PM · Research (FY2021-22-Research-Oct-Dec)
Miriam updated subscribers of T278217: Release image data for training.

Weekly updates:

  • @tizianopiccardi computed the size of the image dataset based on face size. The idea is to remove all images where there is a face as primary subject. It looks like, even with a conservative approach of removing all images where the face is larger than 5% of the total image area, we can retain about 90% of the original image dataset

image.png (317×558 px, 21 KB)

image (1).png (317×558 px, 19 KB)

  • Also, only about 4k images from the 7M in the WIT dataset are candidate for deletion on Commons. We will remove those as well.
May 7 2021, 4:09 PM · Research (FY2020-21-Research-April-June)
Miriam added a comment to T277828: Investigate placeholder image recommendation.

Weekly updates: none

May 7 2021, 3:56 PM · Growth-Team-Filtering, Image-Suggestions, Growth-Team
Miriam closed T272109: Assess prevalence of Wikidata infoboxes as Resolved.

Resolving this for now, as all experiments are done.

May 7 2021, 3:56 PM · Research (FY2020-21-Research-April-June), Growth-Team-Filtering, Image-Suggestions, Growth-Team, Wikipedia-Android-App-Backlog
Miriam closed T272109: Assess prevalence of Wikidata infoboxes, a subtask of T256081: Image matching algorithm, as Resolved.
May 7 2021, 3:56 PM · Research, Growth-Team-Filtering, Image-Suggestions, Growth-Team, Wikipedia-Android-App-Backlog
Miriam added a comment to T266655: Quantifying the importance of images in Wikipedia.
  • Qualitative: fine-tuned the last details of the back-end data collection for the LabInTheWild experiment. Ready to pilot test.
  • Quantitative: re-writing the paper for resubmission to EPJ Data Science
May 7 2021, 3:54 PM · Research (FY2020-21-Research-April-June)
Miriam added a comment to T278681: Image Matching Structured Task: Research Q3-Q4.

Weekly updates:

  • No major updates.
May 7 2021, 3:52 PM · Research (FY2020-21-Research-April-June)

Apr 30 2021

Miriam added a comment to T273968: Define Metrics for Survey-Based Knowledge Gaps.

Weekly updates

  • Started working on extracting features from a real distribution (articles by number of images), based on the metrics questions we are defining. This is to understand the extent to which these metrics are interpretable and understandable by non-technical people.
Apr 30 2021, 5:03 PM · Research (FY2020-21-Research-April-June)
Miriam added a comment to T278217: Release image data for training.

Weekly updates:

  • Data release details are sorted, figuring out the last details on the type of features we want to release.
  • For baselines, we are finetuning a model on the WIT dataset, and we will probably release it as baseline, with relative embeddings
  • Working on putting together the workshop proposal for NeurIPS 2021.
Apr 30 2021, 5:00 PM · Research (FY2020-21-Research-April-June)
Miriam added a comment to T277828: Investigate placeholder image recommendation.

Weekly updates: none

Apr 30 2021, 4:44 PM · Growth-Team-Filtering, Image-Suggestions, Growth-Team
Miriam added a comment to T266655: Quantifying the importance of images in Wikipedia.
  • Qualitative: finalized the results page for the LabInTheWild experiment. Ready to pilot test.
  • Quantitative: re-writing the paper for resubmission to EPJ Data Science
Apr 30 2021, 4:44 PM · Research (FY2020-21-Research-April-June)
Miriam added a comment to T278681: Image Matching Structured Task: Research Q3-Q4.

Weekly updates:

  • Image Recs on Android is now available in Beta. It will be in production next week if everything goes well.
Apr 30 2021, 4:06 PM · Research (FY2020-21-Research-April-June)
Miriam merged T268352: Improve list of image candidates to discard into T277828: Investigate placeholder image recommendation.
Apr 30 2021, 3:56 PM · Growth-Team-Filtering, Image-Suggestions, Growth-Team