Page MenuHomePhabricator

Miriam (Miriam Redi)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Sep 25 2017, 10:36 AM (146 w, 1 d)
Availability
Available
LDAP User
Miriam
MediaWiki User
Miriam (WMF) [ Global Accounts ]

Recent Activity

Fri, Jun 26

Miriam added a comment to T242603: Investigate Knowledge Gaps in Multimedia.

Weekly updates:

  • Generated new metrics for image and text selection gender gap.
  • Brainstorming on new metrics for image/text framing gender gap.
  • Prepared the code for regression analysis to estimate gender gap in a more statistically solid way.
Fri, Jun 26, 4:14 PM · Research (FY2019-20-Research-April-June)
Miriam closed T250150: Improve prototypes of image classifiers trained on images from Commons Categories, a subtask of T228441: Design a pipeline for image classification, as Resolved.
Fri, Jun 26, 4:12 PM · Research, artificial-intelligence
Miriam closed T250150: Improve prototypes of image classifiers trained on images from Commons Categories as Resolved.

Resolving this for now. I was not able to do the stretch goal but will leave that task open hoping to work on it soon :)

Fri, Jun 26, 4:12 PM · Research (FY2019-20-Research-April-June), artificial-intelligence
Miriam added a comment to T250154: Initiate the analysis of readers' engagement with images in Wikipedia.

Quantitative:

  • Mapped image CTR by country. Likely highly related to latency.
  • Mapped top topics by image CTR: visual arts, geography, transportation (?) (will look into that)
  • Identified country indicators which we want to use as predictors for CTR
Fri, Jun 26, 4:11 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T250155: Provide a comprehensive write-up of the taxonomies developed during the year..

Weekly updates:

  • Almost there, filling up the last tables, polishing and restructuring parts of the text
Fri, Jun 26, 4:00 PM · Research (FY2019-20-Research-April-June), Epic
Miriam added a comment to T251246: Perform a large-scale analysis of citation quality on Wikipedia.

Weekly updates:

  • Discussed on adding a grondtruth to evaluate the effectiveness of the citation quality classifier
  • Based on the "articles with unsourced statements" category
  • Aiko passed her thesis defense :)
Fri, Jun 26, 3:59 PM · Research (FY2019-20-Research-April-June)

Tue, Jun 23

Miriam added a comment to T256081: Image suggestion proof-of-concept.
@Miriam -- here's the task that we talked about making so you could give this approach a try. What do you think? Does this sound doable? On what timeline would you prefer?
Tue, Jun 23, 2:51 PM · Research (FY2020-21-Research-July-September), Growth-Team, Wikipedia-Android-App-Backlog

Fri, Jun 19

Miriam added a comment to T250154: Initiate the analysis of readers' engagement with images in Wikipedia.
  • Finalized details of the first pilot for the "role of images in knowledge understanding" experiment: number of questions and variables to play with. We will manually design examples.
Fri, Jun 19, 4:31 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T250155: Provide a comprehensive write-up of the taxonomies developed during the year..

Weekly updates:

  • Finalized pass on Methods, Future Work and Introduction
  • Working on tables
Fri, Jun 19, 4:28 PM · Research (FY2019-20-Research-April-June), Epic

Jun 12 2020

Miriam added a comment to T242603: Investigate Knowledge Gaps in Multimedia.

Weekly updates:

  • Finalized paper narrative and upcoming to-dos for each of the member of the team. We need further computational results which I will be produce for now, when I have time, while we look for a student :)
Jun 12 2020, 4:35 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T250154: Initiate the analysis of readers' engagement with images in Wikipedia.
  • Generated example questions for the experiment design. The experiment will contain multiple choice questions on Wikipedia articles, with both visual and textual components.
Jun 12 2020, 4:34 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T250155: Provide a comprehensive write-up of the taxonomies developed during the year..

Weekly updates:

  • Finalized section 2 - related work
  • Working on making a pass on the new sections contributed by Martin and Leila
  • Working on adding comprehensive tables for better consumption of the taxonomy
Jun 12 2020, 4:29 PM · Research (FY2019-20-Research-April-June), Epic
Miriam added a comment to T251246: Perform a large-scale analysis of citation quality on Wikipedia.

Weekly updates:

  • Revised the metric to measure article quality, so to consider as "correctly cited" a sentence which is in a paragraph with a citation.
  • Revised predictors to model "contribution inequality" using gini coefficient.
  • Found that citation quality is higher when few editors are contributing to the article, similar to previous work
  • Aiko defended her thesis today and will submit the final manuscript by the end of the month -- Congrats @AikoChou
  • After a break, we would like to wrap up these results in a paper for a conference
Jun 12 2020, 4:28 PM · Research (FY2019-20-Research-April-June)

Jun 8 2020

Miriam closed T250151: Build a tool and dataset to annotate images with metadata from Commons and Wikipedia articles as Resolved.
Jun 8 2020, 2:57 PM · Research (FY2019-20-Research-April-June)

Jun 5 2020

Miriam updated the task description for T250151: Build a tool and dataset to annotate images with metadata from Commons and Wikipedia articles.
Jun 5 2020, 4:56 PM · Research (FY2019-20-Research-April-June)
Miriam updated subscribers of T250151: Build a tool and dataset to annotate images with metadata from Commons and Wikipedia articles.

Weekly updates:

  • We submitted the paper to ACM MM. \o/
  • We polished the code for the library, and published it here: https://github.com/OlehOnyshchak/pyWikiMM
  • For the part of "work on dataset release", most of the work is done, but we need to allocate time to download and store the data, and possibly run some baseline experiments on that. Everything is ready, but we won't be able to actually release data this quarter. @leila I would resolve this task if that works for you.
Jun 5 2020, 4:54 PM · Research (FY2019-20-Research-April-June)
Miriam updated the task description for T250151: Build a tool and dataset to annotate images with metadata from Commons and Wikipedia articles.
Jun 5 2020, 4:48 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T250154: Initiate the analysis of readers' engagement with images in Wikipedia.

Weekly updates:

Jun 5 2020, 4:04 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T250155: Provide a comprehensive write-up of the taxonomies developed during the year..

Weekly updates:

  • Finalized the "methods" section of the taxonomy
  • Gathering literature for the "related work section" which I will start soon
Jun 5 2020, 4:00 PM · Research (FY2019-20-Research-April-June), Epic

Jun 4 2020

Miriam added a comment to T234629: Move the Analytics infrastructure to Debian Buster.

Oh this is great, thanks so much @Ottomata !

Jun 4 2020, 2:34 PM · Analytics-Kanban, Analytics

Jun 1 2020

Miriam added a comment to T254191: Add sqooped imagelinks table to oozie load job for hive to show new snapshots.

Ignore the message above, I ran the queries again, and it indeed seems that the problem has been solved in the past few hours :) thanks so much!

Jun 1 2020, 8:36 PM · Analytics-Kanban, Analytics
Miriam added a comment to T254191: Add sqooped imagelinks table to oozie load job for hive to show new snapshots.

@JAllemandou thanks. However, if I query the mediawiki_imagelinks table in wmf_raw for pages more recent than December 2019, e.g. https://en.wikipedia.org/wiki/Coronavirus_disease_2019, I get an empty response. Am I missing something? Thanks!

Jun 1 2020, 8:26 PM · Analytics-Kanban, Analytics
Miriam updated subscribers of T254191: Add sqooped imagelinks table to oozie load job for hive to show new snapshots.
Jun 1 2020, 7:56 PM · Analytics-Kanban, Analytics
Miriam added a comment to T254191: Add sqooped imagelinks table to oozie load job for hive to show new snapshots.

Thanks for this @JAllemandou !

Jun 1 2020, 7:55 PM · Analytics-Kanban, Analytics

May 29 2020

Miriam added a comment to T242603: Investigate Knowledge Gaps in Multimedia.

Weekly updates:

  • Worked on finalizing the paper narrative with the rest of the team.
May 29 2020, 4:21 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T250150: Improve prototypes of image classifiers trained on images from Commons Categories.

Weekly updates:

  • Refined the data, results are similar.
  • Computed the top-5 accuracy as final metric on the classifiers. This metric is widely used in image classification competitions such as Imagenet Large Scale Visual Recognition Challenge. It counts how many time the correct label is found among the top-5 predictions of the classifier.
  • Top-5 accuracy is around 80% for the first version, and 81.5% for the improved one, with major gains on classes we have worked on this quarter. https://docs.google.com/spreadsheets/d/18Er84wdWIme_KMOrOYZZQxq5z0d9O4L0nZMMibzQ_rc/edit?usp=sharing
  • I could close this task but i still hope to train a network from scratch by the end of the quarter :)
May 29 2020, 4:19 PM · Research (FY2019-20-Research-April-June), artificial-intelligence
Miriam added a comment to T250151: Build a tool and dataset to annotate images with metadata from Commons and Wikipedia articles.

Weekly updates:

  • Added performance results, motivation, applications, and image/page statistics to the paper draft. It's almost ready to go!
May 29 2020, 4:16 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T250154: Initiate the analysis of readers' engagement with images in Wikipedia.

Weekly updates:

  • Qualitative:
    • analyzed the "reading comprehension" dataset from Allen AI. It contains questions about reading comprehension of Wikipedia paragraphs: https://allenai.org/data/quoref t
    • the team decided to start crafting a few pilots for this experiment. Focusing on few selected articles, we will take questions from existing QA datasets, and generate questions manually. We will also try different versions of the interface.
May 29 2020, 4:15 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T250155: Provide a comprehensive write-up of the taxonomies developed during the year..

Weekly updates:

  • Finalized the "content" subsection of the taxonomy, missing tables and references which I will add after feedback
  • Will start working on the "rationale" section next week
May 29 2020, 4:08 PM · Research (FY2019-20-Research-April-June), Epic
Miriam added a comment to T251246: Perform a large-scale analysis of citation quality on Wikipedia.

Weekly updates:

  • Refined the regression analysis, AUC on the test set is around 0.7. Still finding some inconsistencies, probably due to features' collinearity with ORES' quality score
  • Aiko is defending soon, so the first wrap-up of all experiments is expected in the coming 2 weeks.
May 29 2020, 4:07 PM · Research (FY2019-20-Research-April-June)

May 26 2020

Miriam placed T184744: Improve access to Commons image data for research and development up for grabs.
May 26 2020, 2:58 PM · User-ArielGlenn, User-fgiunchedi
Miriam closed T179970: Create a 'State of Research' Slide deck as Declined.
May 26 2020, 2:57 PM · Research-Backlog

May 22 2020

Miriam added a comment to T242603: Investigate Knowledge Gaps in Multimedia.

Weekly updates:

  • Worked on logistic regression to predict presence of page/images from people characteristics - more details+plots coming next week
  • Worked on comparing gender gap to other gaps, such as occupational or geographic gap.
May 22 2020, 4:25 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T250150: Improve prototypes of image classifiers trained on images from Commons Categories.

Weekly updates:

May 22 2020, 4:21 PM · Research (FY2019-20-Research-April-June), artificial-intelligence
Miriam updated the task description for T250150: Improve prototypes of image classifiers trained on images from Commons Categories.
May 22 2020, 4:13 PM · Research (FY2019-20-Research-April-June), artificial-intelligence
Miriam added a comment to T250151: Build a tool and dataset to annotate images with metadata from Commons and Wikipedia articles.

Weekly updates:

  • Paper draft almost finalized, working on the last details and contextualizing the release of the library in the MM community, and its role in supporting existing research and opening new areas of research
  • Link to the repository with the library: https://github.com/OlehOnyshchak/WikipediaMultimodalDownloader
May 22 2020, 4:13 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T250154: Initiate the analysis of readers' engagement with images in Wikipedia.

Weekly updates:

May 22 2020, 4:09 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T250155: Provide a comprehensive write-up of the taxonomies developed during the year..

Weekly update:

  • Finalized the "readers" subsection of the taxonomy, waiting for feedback
  • Started working on content subsection
May 22 2020, 4:07 PM · Research (FY2019-20-Research-April-June), Epic
Miriam added a comment to T251246: Perform a large-scale analysis of citation quality on Wikipedia.

Weekly updates:

  • Added regression analysis and discussed the role, of kurtosis and skewness, suggested modifications on the way we sample editors (currently, Aiko was using the top-10 editors only to generate features)
  • Presented the work at the weekly meeting and discussed the feedback afterwards
May 22 2020, 4:06 PM · Research (FY2019-20-Research-April-June)

May 18 2020

Miriam closed T252129: Access to analytics-privatedata-users for Research intern Daniram as Resolved.

Thanks @colewhite ! Closing this task. Thanks a lot all for your help :)

May 18 2020, 9:36 AM · SRE-Access-Requests, Operations

May 15 2020

Miriam added a comment to T250150: Improve prototypes of image classifiers trained on images from Commons Categories.

Weekly updates:
polished the Commons categories related to the 30 concepts for which we have lower accuracy. Downloaded the new data on stat1005. Ready for model re-train.

May 15 2020, 5:29 PM · Research (FY2019-20-Research-April-June), artificial-intelligence
Miriam added a comment to T250151: Build a tool and dataset to annotate images with metadata from Commons and Wikipedia articles.

Weekly updates:
First paper draft is ready, missing abstract and related work! Working on refining the sections, and packaging the software for release.

May 15 2020, 5:28 PM · Research (FY2019-20-Research-April-June)
Miriam updated subscribers of T250154: Initiate the analysis of readers' engagement with images in Wikipedia.

Weekly updates:

  • Qualitative: worked on exploring the questions in AI2 diagram dataset: https://allenai.org/data/diagrams These are schoolbooks questions about science that we could use to test how people learn through Wikipedia articles.
  • Quantitative: @Daniram3 is officially onboarded, with server and notebook access! We worked on exploring the data and on understanding what external data classification do we need to complete te project (country characteristics, image classifiers)
May 15 2020, 5:27 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T250155: Provide a comprehensive write-up of the taxonomies developed during the year..

Weekly updates:

  • Worked on creating the first paper draft, Sec 4 will be about the Taxonomy
  • We decided to work collectively on "Objectives" and drop the "causes" column
  • Build the structure for subsections of Section 4: Readers, Contributors, and Content
May 15 2020, 4:18 PM · Research (FY2019-20-Research-April-June), Epic
Miriam added a comment to T251246: Perform a large-scale analysis of citation quality on Wikipedia.

Weekly updates:

  • Aggregated editors' characteristics at page level, resulting in features such as editors' contribution skewness
  • Initiated the study of the impact of different factors (page length, quality, topic, and editors' features) on citation quality, based on logistic regression
May 15 2020, 4:16 PM · Research (FY2019-20-Research-April-June)

May 14 2020

Miriam added a comment to T252129: Access to analytics-privatedata-users for Research intern Daniram.

Hi @colewhite yes! this is for SWAP access, see this task: https://phabricator.wikimedia.org/T199736

May 14 2020, 6:58 PM · SRE-Access-Requests, Operations
Miriam reopened T252129: Access to analytics-privatedata-users for Research intern Daniram as "Open".

Re-opening this temporarily for a quick follow-up!
@elukey could you add @Daniram3 to the LDAP-group so that he can access the notebooks? Grazie!

May 14 2020, 6:26 PM · SRE-Access-Requests, Operations
Miriam added a comment to T252129: Access to analytics-privatedata-users for Research intern Daniram.

Thank you so much @colewhite and all!

May 14 2020, 5:07 PM · SRE-Access-Requests, Operations

May 13 2020

Miriam added a comment to T252129: Access to analytics-privatedata-users for Research intern Daniram.

Yes, thanks @Nuria and @colewhite. @Daniram3's internship end date is July 26th. Many thanks!

May 13 2020, 3:51 PM · SRE-Access-Requests, Operations
Miriam added a comment to T252129: Access to analytics-privatedata-users for Research intern Daniram.

@KFrancis thanks for your kind confirmation!
And thanks @colewhite for helping out. According to your list, the last point should be @Nuria's approval. Please let me know if there is anything else I can help with!

May 13 2020, 8:43 AM · SRE-Access-Requests, Operations

May 12 2020

Miriam added a comment to T252129: Access to analytics-privatedata-users for Research intern Daniram.

@KFrancis thanks! We discussed this case over email, and my understanding was that the signed letter of agreement already contains and NDA, so we do not need an additional one, could you please confirm?

May 12 2020, 9:35 PM · SRE-Access-Requests, Operations
Miriam updated the task description for T250154: Initiate the analysis of readers' engagement with images in Wikipedia.
May 12 2020, 1:52 PM · Research (FY2019-20-Research-April-June)
Miriam created T252539: Internship: Understanding Readers' engagement with Wikipedia through Traffic Logs.
May 12 2020, 1:52 PM · Research (FY2019-20-Research-April-June), Chinese-Sites

May 9 2020

Miriam updated the task description for T252129: Access to analytics-privatedata-users for Research intern Daniram.
May 9 2020, 11:53 AM · SRE-Access-Requests, Operations
Miriam added a comment to T252129: Access to analytics-privatedata-users for Research intern Daniram.

I don't think bastiononly has existed for years.

May 9 2020, 11:53 AM · SRE-Access-Requests, Operations

May 8 2020

Miriam added a comment to T250154: Initiate the analysis of readers' engagement with images in Wikipedia.

Weekly updates:

May 8 2020, 7:50 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T242603: Investigate Knowledge Gaps in Multimedia.

Weekly updates:

  • Worked on paper topic proposals
  • Worked on strengthen the statistical soundness through logisitc regression-based analysis
  • Extracted image quality score from all images of people in Wikipedia for all languages
May 8 2020, 7:44 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T250150: Improve prototypes of image classifiers trained on images from Commons Categories.

Weekly updates:
started working on data refinement, checked the categories for which we get lower accuracy, and refined the Commons category list associated to those

May 8 2020, 7:42 PM · Research (FY2019-20-Research-April-June), artificial-intelligence
Miriam added a comment to T250151: Build a tool and dataset to annotate images with metadata from Commons and Wikipedia articles.

Weekly updates:
none, deadline for paper submission postponed

May 8 2020, 7:41 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T250155: Provide a comprehensive write-up of the taxonomies developed during the year..

Weekly updates: none for now

May 8 2020, 7:40 PM · Research (FY2019-20-Research-April-June), Epic
Miriam added a comment to T251246: Perform a large-scale analysis of citation quality on Wikipedia.

Weekly updates:

  • Downloaded data about editors' characteristics
  • Refined citation quality analysis over time
May 8 2020, 7:39 PM · Research (FY2019-20-Research-April-June)

May 7 2020

Miriam created T252129: Access to analytics-privatedata-users for Research intern Daniram.
May 7 2020, 4:10 PM · SRE-Access-Requests, Operations

May 4 2020

Miriam added a comment to T250151: Build a tool and dataset to annotate images with metadata from Commons and Wikipedia articles.

Weekly updates:
All to-dos from last week are finished. Working on the paper submission for Open Source Competition at ACM MM: https://2020.acmmm.org/osc-proposals.html

May 4 2020, 3:37 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T250154: Initiate the analysis of readers' engagement with images in Wikipedia.

Weekly updates:

  • Quantitative: scoping down the project for the internship period. Focus on specific topics (education topic) and 1 or 2 research questions, leave others for later.
    • How are people engaging with images?
    • How does this change across different countries/different segments of countries having different development index levels?
  • Qualitative: collaborators retrieved lists of commonly asked questions. Next step is to match with QA datasets, and generate mulitple choice answers. This will be the root content for our experiment.
May 4 2020, 3:31 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T250155: Provide a comprehensive write-up of the taxonomies developed during the year..

Weekly update:

May 4 2020, 3:26 PM · Research (FY2019-20-Research-April-June), Epic
Miriam updated the task description for T250155: Provide a comprehensive write-up of the taxonomies developed during the year..
May 4 2020, 3:24 PM · Research (FY2019-20-Research-April-June), Epic
Miriam added a comment to T251246: Perform a large-scale analysis of citation quality on Wikipedia.

Weekly updates:

  • Computed citation quality by section and topics in English Wikipedia
  • Computed evolution of citation quality over time for different topics: Medicine, Politics, Economics. CQ impoves substantially over time!
May 4 2020, 3:24 PM · Research (FY2019-20-Research-April-June)

Apr 28 2020

Miriam claimed T251246: Perform a large-scale analysis of citation quality on Wikipedia.
Apr 28 2020, 11:03 AM · Research (FY2019-20-Research-April-June)
Miriam created T251246: Perform a large-scale analysis of citation quality on Wikipedia.
Apr 28 2020, 11:02 AM · Research (FY2019-20-Research-April-June)

Apr 27 2020

Miriam closed T242971: A report on accuracy and performance of the classification models as Resolved.

https://meta.wikimedia.org/wiki/Research:Prototypes_of_Image_Classifiers_Trained_on_Commons_Categories

Apr 27 2020, 4:03 PM · Research (FY2019-20-Research-January-March), artificial-intelligence
Miriam closed T242971: A report on accuracy and performance of the classification models , a subtask of T242229: Test the feasibility of a classifier trained on Commons categories, as Resolved.
Apr 27 2020, 4:03 PM · Research, artificial-intelligence
Miriam added a comment to T242603: Investigate Knowledge Gaps in Multimedia.

Weekly updates:

Apr 27 2020, 4:01 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T250151: Build a tool and dataset to annotate images with metadata from Commons and Wikipedia articles.

Weekly updates:
working on the following:

  • Optimise how we handle icons so that the script works faster
  • Add possibility to download only fields specified by user
  • Create a docker container with the script
  • Start writing supporting-paper for the software
Apr 27 2020, 3:52 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T250154: Initiate the analysis of readers' engagement with images in Wikipedia.

Weekly updates:

Apr 27 2020, 2:00 PM · Research (FY2019-20-Research-April-June)
Miriam updated the task description for T250154: Initiate the analysis of readers' engagement with images in Wikipedia.
Apr 27 2020, 1:57 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T250155: Provide a comprehensive write-up of the taxonomies developed during the year..

Weekly update: reviewed existing taxonomies, and added a candidate taxonomy layout to the - https://docs.google.com/document/d/1GG0cPB5bZALLAmqpZdNQtmooC1CcfOS8WoMZ_F2DkOw/edit?usp=sharing

Apr 27 2020, 1:52 PM · Research (FY2019-20-Research-April-June), Epic

Apr 22 2020

Miriam closed T242598: Organize Wiki Workshop 2020 as Resolved.
Apr 22 2020, 2:36 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T242598: Organize Wiki Workshop 2020.

Wiki workshop was succesfully held remotely on April 21st 2020.

Apr 22 2020, 2:36 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T250151: Build a tool and dataset to annotate images with metadata from Commons and Wikipedia articles.

Weekly updates:
A script is ready that extracts, for a given list of articles:

  • Article text
  • Article images links
  • Image captions on article
  • Image descriptions from Commons
  • Image's section headers
  • Image features from Res-net

Everything is packed in single "query" function, with tons of parameters to change the behaviour of the script if needed. Link: https://github.com/OlehOnyshchak/WikipediaDownloader

Apr 22 2020, 2:30 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T250154: Initiate the analysis of readers' engagement with images in Wikipedia.

Weekly Updates:
Progressing on internship contract for the student who is going to work on the quantitative bit. Working with the collaborators on scoping down the project: https://docs.google.com/document/d/1d-sOWana7zOj26cPov4wpoUvsG9exDWI8S0Tl62KDoo/edit?usp=sharing

Apr 22 2020, 2:21 PM · Research (FY2019-20-Research-April-June)
Miriam added a comment to T250155: Provide a comprehensive write-up of the taxonomies developed during the year..

Weekly update: Literature review in progress: https://docs.google.com/document/d/1GG0cPB5bZALLAmqpZdNQtmooC1CcfOS8WoMZ_F2DkOw/edit?usp=sharing
Next up - learning about the existing taxonomy material

Apr 22 2020, 2:14 PM · Research (FY2019-20-Research-April-June), Epic

Apr 14 2020

Miriam claimed T250155: Provide a comprehensive write-up of the taxonomies developed during the year..
Apr 14 2020, 2:35 PM · Research (FY2019-20-Research-April-June), Epic
Miriam edited projects for T250155: Provide a comprehensive write-up of the taxonomies developed during the year., added: Research (FY2019-20-Research-April-June); removed Research.
Apr 14 2020, 11:30 AM · Research (FY2019-20-Research-April-June), Epic
Miriam created T250155: Provide a comprehensive write-up of the taxonomies developed during the year..
Apr 14 2020, 11:30 AM · Research (FY2019-20-Research-April-June), Epic
Miriam created T250154: Initiate the analysis of readers' engagement with images in Wikipedia.
Apr 14 2020, 11:27 AM · Research (FY2019-20-Research-April-June)
Miriam created T250151: Build a tool and dataset to annotate images with metadata from Commons and Wikipedia articles.
Apr 14 2020, 11:23 AM · Research (FY2019-20-Research-April-June)
Miriam edited projects for T250150: Improve prototypes of image classifiers trained on images from Commons Categories, added: Research (FY2019-20-Research-April-June); removed Research.
Apr 14 2020, 11:15 AM · Research (FY2019-20-Research-April-June), artificial-intelligence
Miriam updated the task description for T228441: Design a pipeline for image classification.
Apr 14 2020, 11:14 AM · Research, artificial-intelligence
Miriam created T250150: Improve prototypes of image classifiers trained on images from Commons Categories.
Apr 14 2020, 11:14 AM · Research (FY2019-20-Research-April-June), artificial-intelligence

Apr 9 2020

Miriam added a comment to T248574: GPUs are not correctly handling multitasking .

So we did a few tests with the latest ROCm version.

  • When the GPU saturates, there is no need to reboot, as killing the stalled processes is enough for the GPU to release the resources. This is a big improvement compared to the previous version!
  • We found that the saturation is related to a VRAM usage problem
  • We found a Tensorflow-native solution to dynamically allocate the memory used by a process on the GPU. Added to every Tensorflow code, it allows multiple users to run tensorflow scripts on the GPU at the same time. More info here: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/AMD_GPU#Configure_your_Tensorflow_script
Apr 9 2020, 3:57 PM · Analytics

Apr 1 2020

Miriam closed T242229: Test the feasibility of a classifier trained on Commons categories, a subtask of T228441: Design a pipeline for image classification, as Resolved.
Apr 1 2020, 11:50 AM · Research, artificial-intelligence
Miriam closed T242229: Test the feasibility of a classifier trained on Commons categories as Resolved.
Apr 1 2020, 11:50 AM · Research, artificial-intelligence
Miriam added a comment to T242229: Test the feasibility of a classifier trained on Commons categories.

Report available here: https://meta.wikimedia.org/wiki/Research:Prototypes_of_Image_Classifiers_Trained_on_Commons_Categories
It highlights milestones and areas of improvement to design our own in-house image classifiers. Reports on accuracy and GPU performance. Links to some qualitative results of classification on a new set of images..

Apr 1 2020, 11:50 AM · Research, artificial-intelligence

Mar 30 2020

Miriam closed T242635: Start formal collaboration around the project "Understanding Readers' Image Usage in Wikipedia" as Resolved.

MOUs signed and formal collaboration announcement sent on wiki-research-l! Resolving this task.

Mar 30 2020, 10:09 AM · Research (FY2019-20-Research-January-March)

Mar 27 2020

Miriam created T248692: Train image classifiers based on Commons Categories from scratch..
Mar 27 2020, 3:59 PM · Research, artificial-intelligence
Miriam closed T242970: A set of prototypes of image classifiers trained on images from Commons Categories as Resolved.
Mar 27 2020, 3:55 PM · Research (FY2019-20-Research-January-March), artificial-intelligence
Miriam closed T242970: A set of prototypes of image classifiers trained on images from Commons Categories, a subtask of T242229: Test the feasibility of a classifier trained on Commons categories, as Resolved.
Mar 27 2020, 3:55 PM · Research, artificial-intelligence
Miriam added a comment to T242970: A set of prototypes of image classifiers trained on images from Commons Categories.

Closing this task as per our discussion.
Writing report here: https://meta.wikimedia.org/wiki/Research:Prototypes_of_Image_Classifiers_Trained_on_Commons_Categories

Mar 27 2020, 3:55 PM · Research (FY2019-20-Research-January-March), artificial-intelligence

Mar 26 2020

Miriam created T248574: GPUs are not correctly handling multitasking .
Mar 26 2020, 11:50 AM · Analytics

Mar 24 2020

Miriam added a comment to T245833: Enable layered data-access and sharing for a new form of collaboration.

THanks @elukey for this summary.

Mar 24 2020, 12:57 PM · User-Elukey, Operations, WMF-Legal, Research, Analytics