Page MenuHomePhabricator

Miriam (Miriam Redi)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Sep 25 2017, 10:36 AM (86 w, 4 d)
Availability
Available
LDAP User
Miriam
MediaWiki User
Miriam (WMF) [ Global Accounts ]

Recent Activity

Today

Miriam closed T211339: Write an "Image Analysis Component" proposal for the 3-5 year plan as Resolved.
Fri, May 24, 12:38 PM · Research
Miriam closed T211890: [TEC-10] Organize the Wiki Workshop 2019 as Resolved.
Fri, May 24, 12:38 PM · Research
Miriam updated the task description for T211890: [TEC-10] Organize the Wiki Workshop 2019.
Fri, May 24, 12:38 PM · Research
Miriam closed T221011: Publish list of accepted papers, a subtask of T211890: [TEC-10] Organize the Wiki Workshop 2019, as Resolved.
Fri, May 24, 12:38 PM · Research
Miriam closed T221011: Publish list of accepted papers as Resolved.
Fri, May 24, 12:38 PM · Research

Yesterday

Miriam added a comment to T222078: Analyze readers' engagement in countries affected by Singapore Data Center's switch.

@BBlack does the above help? Happy to steer the analysis in other directions if needed.

Thu, May 23, 1:53 PM · Research-consulting, Research
Miriam updated the task description for T222078: Analyze readers' engagement in countries affected by Singapore Data Center's switch.
Thu, May 23, 1:51 PM · Research-consulting, Research
Miriam added a comment to T222078: Analyze readers' engagement in countries affected by Singapore Data Center's switch.

Hello! Sorry for the slow response, travels in the middle.

Thu, May 23, 1:50 PM · Research-consulting, Research

Fri, May 17

Miriam added a comment to T215250: Estimate size of Commons image corpus at given resolution.

@Gilles merci!

Fri, May 17, 3:27 PM · Commons

Fri, May 10

Miriam added a comment to T222078: Analyze readers' engagement in countries affected by Singapore Data Center's switch.

@BBlack Thank you, I'll get back to you with more insights soon!

Fri, May 10, 12:10 AM · Research-consulting, Research

Fri, May 3

Miriam updated the task description for T222078: Analyze readers' engagement in countries affected by Singapore Data Center's switch.
Fri, May 3, 4:45 PM · Research-consulting, Research
Miriam added a comment to T222078: Analyze readers' engagement in countries affected by Singapore Data Center's switch.

Thanks @chelsyx ! Yes the questions are somehow similar -- how performance impacts engagement -- and we want perform a robust analysis 1 year after switching to the Singapore data center.

Fri, May 3, 4:45 PM · Research-consulting, Research
Miriam added a comment to T215250: Estimate size of Commons image corpus at given resolution.

512 is the 41st most common size and 600 the 20th most common (as of 2017-04, last time we ran this analysis). 1024 would be a much better choice (6th most common), followed by a local resize, imho.

Fri, May 3, 11:28 AM · Commons

Tue, Apr 30

Miriam added a comment to T221761: Test GPUs with an end-to-end training task (Photo vs Graphics image classifier).

This was via the internet. But we should try to do this from the internal cluster, too, for comparison, if possible. I just need few instructions on how to do this!

Tue, Apr 30, 11:35 AM · Analytics, Research-management
Miriam added a comment to T220811: Test Thumbor OpenCL smart cropping on stat1005.

Yes, both 3D rendering of STL files and "smart cropping" (face & feature detection).

Tue, Apr 30, 10:52 AM · User-jijiki, Thumbor, Performance-Team
Miriam added a comment to T221761: Test GPUs with an end-to-end training task (Photo vs Graphics image classifier).

Data collection is over, it took 271044s (~75hours) for 320815 images (~160k per class), i.e. 0.85 sec/image.
I downloaded 600-px thumbnails, and used 4 parallel sessions, as suggested in T215250.

Tue, Apr 30, 10:47 AM · Analytics, Research-management
Miriam updated the task description for T221761: Test GPUs with an end-to-end training task (Photo vs Graphics image classifier).
Tue, Apr 30, 10:44 AM · Analytics, Research-management
Miriam updated subscribers of T222140: Check home leftovers of pirroh.

Thanks @elukey!
@tizianopiccardi anything in Michele's home that we should keep?

Tue, Apr 30, 9:09 AM · Analytics

Mon, Apr 29

Miriam added a comment to T222085: Revoke @pirroh's shell access.

Thank you so much @Dzahn!

Mon, Apr 29, 5:43 PM · Patch-For-Review, SRE-Access-Requests, Operations, Research
Miriam created T222085: Revoke @pirroh's shell access.
Mon, Apr 29, 4:45 PM · Patch-For-Review, SRE-Access-Requests, Operations, Research
Miriam created T222078: Analyze readers' engagement in countries affected by Singapore Data Center's switch.
Mon, Apr 29, 3:53 PM · Research-consulting, Research

Fri, Apr 26

Miriam added a comment to T220811: Test Thumbor OpenCL smart cropping on stat1005.

@Gilles do you plan to test Thumbor's face detection functionalities too?

Fri, Apr 26, 11:31 AM · User-jijiki, Thumbor, Performance-Team
Miriam created T221934: Visualize Wiki Commons Images.
Fri, Apr 26, 9:21 AM · Research

Apr 24 2019

Miriam created T221761: Test GPUs with an end-to-end training task (Photo vs Graphics image classifier).
Apr 24 2019, 11:16 AM · Analytics, Research-management

Apr 23 2019

Miriam added a comment to T213969: Citation Usage: run third round of data collection.

@bmansurov many thanks!! @RyanSteinberg @tizianopiccardi FYI, the data collection is over!

Apr 23 2019, 9:24 AM · Patch-For-Review, Analytics, Research, Knowledge-Integrity, Epic

Apr 15 2019

Miriam created T221011: Publish list of accepted papers.
Apr 15 2019, 3:58 PM · Research
Miriam closed T211891: Invite speakers to Wiki Workshop as Resolved.
Apr 15 2019, 3:57 PM · Research-outreach, Research
Miriam closed T211891: Invite speakers to Wiki Workshop, a subtask of T211890: [TEC-10] Organize the Wiki Workshop 2019, as Resolved.
Apr 15 2019, 3:57 PM · Research
Miriam updated the task description for T211891: Invite speakers to Wiki Workshop.
Apr 15 2019, 3:56 PM · Research-outreach, Research
Miriam updated the task description for T211891: Invite speakers to Wiki Workshop.
Apr 15 2019, 3:56 PM · Research-outreach, Research
Miriam added a project to T212225: Collect and analyze second round of data on citation usage: Research-2017-18-Q4.
Apr 15 2019, 3:56 PM · Research-2017-18-Q4, Research
Miriam updated the task description for T213927: Create a first prototype of the "map of verifiability in Wikipedia".
Apr 15 2019, 3:54 PM · Epic, Research
Miriam created T221009: Improve the verifiability maps and release the results.
Apr 15 2019, 3:54 PM · Research-2017-18-Q4, Research
Miriam closed T213930: First "Map of Verifiability" test as Resolved.
Apr 15 2019, 3:52 PM · Research
Miriam closed T213930: First "Map of Verifiability" test, a subtask of T213927: Create a first prototype of the "map of verifiability in Wikipedia", as Resolved.
Apr 15 2019, 3:52 PM · Epic, Research
Miriam closed T213927: Create a first prototype of the "map of verifiability in Wikipedia" as Resolved.

Will add one more task for the improvement and release of this.

Apr 15 2019, 3:52 PM · Epic, Research
Miriam closed T213927: Create a first prototype of the "map of verifiability in Wikipedia", a subtask of T199187: [1.1] Conduct and publish research to map the “state of verifiability” of free knowledge, as Resolved.
Apr 15 2019, 3:52 PM · Knowledge-Integrity, Epic
Miriam updated the task description for T213927: Create a first prototype of the "map of verifiability in Wikipedia".
Apr 15 2019, 3:52 PM · Epic, Research
Miriam closed T212122: Write a scientific paper about the unsourced statements work as Resolved.
Apr 15 2019, 3:50 PM · Research-2017-18-Q4, Research, Research-2017-18-Q3
Miriam closed T212122: Write a scientific paper about the unsourced statements work, a subtask of T186279: Prototype models for detecting unsourced statements in need of citations in Wikipedia , as Resolved.
Apr 15 2019, 3:50 PM · Research-2017-18-Q4, Research, Research-2017-18-Q3
Miriam added a comment to T212122: Write a scientific paper about the unsourced statements work.

The paper will be published at The Web Conference 2019. Arxiv version here: https://arxiv.org/abs/1902.11116

Apr 15 2019, 3:50 PM · Research-2017-18-Q4, Research, Research-2017-18-Q3
Miriam updated the task description for T197782: Create a model to detect reasons why a citation is needed in a sentence.
Apr 15 2019, 3:50 PM · Research-2017-18-Q4, Research, Research-2017-18-Q3
Miriam created T221006: Publish models and data for the Citation Reason classifier.
Apr 15 2019, 3:49 PM · Research-2017-18-Q4, Research, Research-2017-18-Q3
Miriam closed T197782: Create a model to detect reasons why a citation is needed in a sentence as Resolved.
Apr 15 2019, 3:48 PM · Research-2017-18-Q4, Research, Research-2017-18-Q3
Miriam closed T197782: Create a model to detect reasons why a citation is needed in a sentence, a subtask of T186279: Prototype models for detecting unsourced statements in need of citations in Wikipedia , as Resolved.
Apr 15 2019, 3:48 PM · Research-2017-18-Q4, Research, Research-2017-18-Q3
Miriam created T221005: Collect more data and retrain Citation Reason models.
Apr 15 2019, 3:48 PM · Research-2017-18-Q4, Research, Research-2017-18-Q3

Apr 11 2019

Miriam added a comment to T148843: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models.

HI All,

Apr 11 2019, 7:16 PM · Patch-For-Review, User-Elukey, Operations, Analytics, Research-management

Apr 3 2019

Miriam added a comment to T148843: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models.

OK, I can prepare a task for this, or we can start from something like this maybe?
https://gist.github.com/omoindrot/dedc857cdc0e680dfb1be99762990c9c/

Apr 3 2019, 7:55 PM · Patch-For-Review, User-Elukey, Operations, Analytics, Research-management
Miriam added a comment to T148843: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models.

Thanks @EBernhardson and all!!. Would a CNN finetuning task, using few thousand images only as input, work as a training task for testing performance?

Apr 3 2019, 11:30 AM · Patch-For-Review, User-Elukey, Operations, Analytics, Research-management

Mar 28 2019

Miriam added a comment to T215250: Estimate size of Commons image corpus at given resolution.

@Gilles, sounds good, thanks!

Mar 28 2019, 12:36 PM · Commons

Mar 26 2019

Miriam added a comment to T215250: Estimate size of Commons image corpus at given resolution.

Thank you @Gilles, I think they aimed for thumbnails sized 512 or 600. Do you think those are sizes reasonably widely used on wikis? I can advise them to keep the number of concurrent downloads below 5 just in case.

Mar 26 2019, 11:31 AM · Commons

Mar 25 2019

Miriam added a comment to T215250: Estimate size of Commons image corpus at given resolution.

Hi @Gilles @fgiunchedi, a group of researchers from HTW Berlin would be interested in doing Commons image visualization. They would like to download ~1M images on their machines, and they are asking what is a reasonable number of parallel download requests they could send?
Thanks! Grazie! Merci :)

Mar 25 2019, 12:59 PM · Commons

Mar 19 2019

Miriam added a comment to T212225: Collect and analyze second round of data on citation usage.

@leila, this task is tracking the analysis bit of the second round of data collection, too. The results for the first round of analysis are avilable here: https://meta.wikimedia.org/wiki/Research:Characterizing_Wikipedia_Citation_Usage/First_Round_of_Analysis
However, for next quarter, we would like to perform a more in-depth analysis. So either we modify the task (as it involves different sub-tasks), or we keep it open?

Mar 19 2019, 5:38 PM · Research-2017-18-Q4, Research
Miriam updated the task description for T212225: Collect and analyze second round of data on citation usage.
Mar 19 2019, 5:36 PM · Research-2017-18-Q4, Research
Miriam added a comment to T213969: Citation Usage: run third round of data collection.

Yes, announcement just posted!

Mar 19 2019, 4:22 PM · Patch-For-Review, Analytics, Research, Knowledge-Integrity, Epic

Feb 27 2019

Miriam added a comment to T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

@mforns yes, that is correct :) Thanks!

Feb 27 2019, 11:23 AM · Patch-For-Review, Analytics-EventLogging, Analytics-Kanban

Feb 13 2019

Miriam added a comment to T213976: Workflow to be able to move data files computed in jobs from analytics cluster to production .

In the hundreds of megabytes I believe. @Halfak, @EBernhardson, @Miriam, @bmansurov, is this right? Will ML models be about this size for the foreseeable future?

@Ottomata yes, the size of the best image classification models is <1G

Feb 13 2019, 9:43 AM · Research, Operations, Discovery, Analytics

Feb 12 2019

Miriam added a subtask for T215413: Image Classification Working Group: T215250: Estimate size of Commons image corpus at given resolution.
Feb 12 2019, 4:42 PM · Reading-Admin, SDC General, Multimedia, Wikidata, Discovery-Search, Analytics, Research
Miriam added a parent task for T215250: Estimate size of Commons image corpus at given resolution: T215413: Image Classification Working Group.
Feb 12 2019, 4:42 PM · Commons
Miriam added a subtask for T215413: Image Classification Working Group: T148843: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models.
Feb 12 2019, 4:41 PM · Reading-Admin, SDC General, Multimedia, Wikidata, Discovery-Search, Analytics, Research
Miriam added a parent task for T148843: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models: T215413: Image Classification Working Group.
Feb 12 2019, 4:41 PM · Patch-For-Review, User-Elukey, Operations, Analytics, Research-management
Miriam added a comment to T213969: Citation Usage: run third round of data collection.

Hi @RyanSteinberg ! We have decided to turn on the data collection for a short period of time, so that you have real data samples to perform all the quality checks you might need on your side. @bmansurov suggested we can collect data for one day, at a sampling rate of maybe 1% for both schemas. Would that sound good to you? Would it be OK if we switch on this small data-collection in the coming days? Thanks.

Feb 12 2019, 4:20 PM · Patch-For-Review, Analytics, Research, Knowledge-Integrity, Epic

Feb 11 2019

SandraF_WMF awarded T215413: Image Classification Working Group a Burninate token.
Feb 11 2019, 10:34 AM · Reading-Admin, SDC General, Multimedia, Wikidata, Discovery-Search, Analytics, Research

Feb 8 2019

Miriam added a comment to T215413: Image Classification Working Group.

@Gilles thanks for this! Images and graphics have very different underlying image statistics: it is therefore fairly easy for a classifier to tell them a part. So it should be feasible.

Feb 8 2019, 4:00 PM · Reading-Admin, SDC General, Multimedia, Wikidata, Discovery-Search, Analytics, Research

Feb 7 2019

iamjessklein awarded T215413: Image Classification Working Group a Insectivore token.
Feb 7 2019, 6:25 PM · Reading-Admin, SDC General, Multimedia, Wikidata, Discovery-Search, Analytics, Research

Feb 6 2019

leila awarded T215413: Image Classification Working Group a Love token.
Feb 6 2019, 9:23 PM · Reading-Admin, SDC General, Multimedia, Wikidata, Discovery-Search, Analytics, Research
Miriam added a comment to T213969: Citation Usage: run third round of data collection.

Yess it now works after starting a new session on the client. Thanks @Ottomata !

Feb 6 2019, 7:17 PM · Patch-For-Review, Analytics, Research, Knowledge-Integrity, Epic
Miriam added a comment to T213969: Citation Usage: run third round of data collection.

@Ottomata thanks you're the best! I now see the table correponding to the new version of the schema, but I don't see the events I generated :/.
I'll try to see if that works from another session, but if you say it's flaky let's maybe not spend too much time on it? I can just parse the log file for the test events I generate, to double check that on the server side everything looks good.

Feb 6 2019, 5:47 PM · Patch-For-Review, Analytics, Research, Knowledge-Integrity, Epic
fgiunchedi awarded T215413: Image Classification Working Group a Yellow Medal token.
Feb 6 2019, 4:49 PM · Reading-Admin, SDC General, Multimedia, Wikidata, Discovery-Search, Analytics, Research
Miriam updated subscribers of T213969: Citation Usage: run third round of data collection.

I can see all events on the client side. I'll do some tests there.
On the server side, I can see in the client-side-events.log file all the events generated by my session_token . However, I can't find the same events in the MySQL log database. My understanding is that there should be a table recording events from the last version of the Citation Usage Schema) called CitationUsage_18810892, but I can't find it. Am I doing something wrong @Ottomata @elukey ? Thanks!

Feb 6 2019, 2:05 PM · Patch-For-Review, Analytics, Research, Knowledge-Integrity, Epic
Miriam created T215413: Image Classification Working Group.
Feb 6 2019, 1:44 PM · Reading-Admin, SDC General, Multimedia, Wikidata, Discovery-Search, Analytics, Research

Feb 5 2019

Miriam added a comment to T215288: Access to the Beta Cluster ( deployment-eventlog05 ).

Wow, thank you SO much, that works now :)

Feb 5 2019, 4:32 PM · User-Addshore, Release-Engineering-Team, Research
Miriam renamed T215288: Access to the Beta Cluster ( deployment-eventlog05 ) from Access to deployment-eventlog05 to Access to the Beta Cluster ( deployment-eventlog05 ).
Feb 5 2019, 4:25 PM · User-Addshore, Release-Engineering-Team, Research
Miriam created T215288: Access to the Beta Cluster ( deployment-eventlog05 ).
Feb 5 2019, 4:24 PM · User-Addshore, Release-Engineering-Team, Research
Miriam updated subscribers of T215250: Estimate size of Commons image corpus at given resolution.

Thanks @Gilles!
320px sounds like a good solution. In general, neural networks resize images to 256px before processing them.

Feb 5 2019, 11:33 AM · Commons

Jan 24 2019

Miriam added a comment to T213969: Citation Usage: run third round of data collection.

Sorry @bmansurov and thanks for the explanation ;) I think we should do the test in the beta cluster before deployment. @RyanSteinberg I might need your help to do these tests, as I you are more familiar with the last changes requested.

Jan 24 2019, 5:04 PM · Patch-For-Review, Analytics, Research, Knowledge-Integrity, Epic

Jan 17 2019

Miriam added a comment to T212937: Citation Usage instrumentation issues.

Hi @bmansurov, I discussed with the others, and we are OK with starting the data collection, no more issues on our side. Thanks!

Jan 17 2019, 2:25 PM · MW-1.33-notes (1.33.0-wmf.16; 2019-02-05), Research

Jan 16 2019

Miriam updated the task description for T213930: First "Map of Verifiability" test.
Jan 16 2019, 2:59 PM · Research
Miriam updated the task description for T213927: Create a first prototype of the "map of verifiability in Wikipedia".
Jan 16 2019, 2:59 PM · Epic, Research
Miriam triaged T213930: First "Map of Verifiability" test as Normal priority.
Jan 16 2019, 2:58 PM · Research
Miriam renamed T213927: Create a first prototype of the "map of verifiability in Wikipedia" from Create a first prototype of the "map of verfiability in Wikipedia! to Create a first prototype of the "map of verfiability in Wikipedia".
Jan 16 2019, 2:51 PM · Epic, Research
Miriam triaged T213927: Create a first prototype of the "map of verifiability in Wikipedia" as Normal priority.
Jan 16 2019, 2:51 PM · Epic, Research
Miriam closed T186354: Design machine learning models to detect unsrouced statments needing citation as Resolved.
Jan 16 2019, 2:50 PM · Research-2017-18-Q4, Research, Research-2017-18-Q3
Miriam closed T186354: Design machine learning models to detect unsrouced statments needing citation, a subtask of T186279: Prototype models for detecting unsourced statements in need of citations in Wikipedia , as Resolved.
Jan 16 2019, 2:50 PM · Research-2017-18-Q4, Research, Research-2017-18-Q3

Dec 18 2018

Miriam moved T212122: Write a scientific paper about the unsourced statements work from Staged to In Progress on the Research board.
Dec 18 2018, 5:09 PM · Research-2017-18-Q4, Research, Research-2017-18-Q3
Miriam moved T212225: Collect and analyze second round of data on citation usage from Staged to In Progress on the Research board.
Dec 18 2018, 5:09 PM · Research-2017-18-Q4, Research
Miriam moved T212228: Reader citation usage (quantitative) from Staged to In Progress on the Research board.
Dec 18 2018, 5:09 PM · Research
Miriam edited projects for T212228: Reader citation usage (quantitative), added: Research; removed Knowledge-Integrity, Epic.
Dec 18 2018, 5:07 PM · Research
Miriam edited projects for T212225: Collect and analyze second round of data on citation usage, added: Research; removed Epic, Research-Programs.
Dec 18 2018, 5:06 PM · Research-2017-18-Q4, Research
Miriam removed a parent task for T212225: Collect and analyze second round of data on citation usage: T199188: [1.2] Research study to understand how readers use citations.
Dec 18 2018, 3:54 PM · Research-2017-18-Q4, Research
Miriam removed a subtask for T199188: [1.2] Research study to understand how readers use citations: T212225: Collect and analyze second round of data on citation usage.
Dec 18 2018, 3:54 PM · Knowledge-Integrity, Epic
Miriam added a parent task for T212225: Collect and analyze second round of data on citation usage: T212228: Reader citation usage (quantitative).
Dec 18 2018, 3:54 PM · Research-2017-18-Q4, Research
Miriam added a subtask for T212228: Reader citation usage (quantitative): T212225: Collect and analyze second round of data on citation usage.
Dec 18 2018, 3:54 PM · Research
Miriam edited parent tasks for T190437: Analyze the first round of data about readers' usage of references, added: T212228: Reader citation usage (quantitative); removed: T199188: [1.2] Research study to understand how readers use citations.
Dec 18 2018, 3:53 PM · Research-2017-18-Q4, Epic, Research-Programs
Miriam removed a subtask for T199188: [1.2] Research study to understand how readers use citations: T190437: Analyze the first round of data about readers' usage of references.
Dec 18 2018, 3:53 PM · Knowledge-Integrity, Epic
Miriam added a subtask for T212228: Reader citation usage (quantitative): T190437: Analyze the first round of data about readers' usage of references.
Dec 18 2018, 3:52 PM · Research
Miriam triaged T212228: Reader citation usage (quantitative) as Normal priority.
Dec 18 2018, 3:51 PM · Research
Miriam added a parent task for T212225: Collect and analyze second round of data on citation usage: T199188: [1.2] Research study to understand how readers use citations.
Dec 18 2018, 3:48 PM · Research-2017-18-Q4, Research
Miriam added a subtask for T199188: [1.2] Research study to understand how readers use citations: T212225: Collect and analyze second round of data on citation usage.
Dec 18 2018, 3:48 PM · Knowledge-Integrity, Epic
Miriam added a parent task for T190437: Analyze the first round of data about readers' usage of references: T199188: [1.2] Research study to understand how readers use citations.
Dec 18 2018, 3:48 PM · Research-2017-18-Q4, Epic, Research-Programs