diego (Diego S-T)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Aug 8 2017, 10:56 AM (49 w, 18 h)
Availability
Available
LDAP User
Unknown
MediaWiki User
Diego (WMF) [ Global Accounts ]

Recent Activity

Mon, Jul 9

diego added a comment to T190443: Spark Jupyter Notebook integration.

Regarding the visualization problems, autocomplete, but specially the inline plots, here there is a possible way to explore:

Mon, Jul 9, 8:24 PM · Analytics-Kanban, Patch-For-Review, Analytics
diego added a comment to T198909: Errors with the new SWAP notebooks.

After restarting the notebook, now I get this error. I have the same error if I use pyspark from stat1005. The solution that I found for this is working with python2.7 in pyspark

Mon, Jul 9, 7:30 PM · Patch-For-Review, Analytics
diego added a comment to T198909: Errors with the new SWAP notebooks.
Mon, Jul 9, 7:22 PM · Patch-For-Review, Analytics

Sat, Jul 7

diego added a comment to T195001: Support getting community involved section translation.

I'm ok changing to another language. I don't have strong preferences between Korean or Vietnamese, I think we should select the one with more probabilities of getting good/fast translations.

Sat, Jul 7, 4:47 PM · CommRel-Specialists-Support (Jul-Sep-2018), Research

Thu, Jul 5

diego updated the task description for T198909: Errors with the new SWAP notebooks.
Thu, Jul 5, 7:54 PM · Patch-For-Review, Analytics
diego updated the task description for T198909: Errors with the new SWAP notebooks.
Thu, Jul 5, 7:50 PM · Patch-For-Review, Analytics
diego created T198909: Errors with the new SWAP notebooks.
Thu, Jul 5, 7:44 PM · Patch-For-Review, Analytics
diego added a comment to T195001: Support getting community involved section translation.

@Trizek-WMF , thanks for your efforts.
Can we push more for translators from/to Japanese? This is the main missing piece.

Thu, Jul 5, 10:04 AM · CommRel-Specialists-Support (Jul-Sep-2018), Research

Mon, Jul 2

diego added a project to T184213: Gather labels as ground truth for section synonym detection: Research-2017-18-Q4.
Mon, Jul 2, 6:26 PM · Research-2017-18-Q4, Research-2017-18-Q3, Research
diego moved T190770: Improve section translation classifier from In Progress to Done (current quarter) on the Research board.
Mon, Jul 2, 6:21 PM · Research-2017-18-Q4, Research
diego moved T190771: Improve section synonym classifier from In Progress to Done (current quarter) on the Research board.
Mon, Jul 2, 6:21 PM · Research-2017-18-Q4, Research

Mon, Jun 18

diego created T197623: Inconsistencies in the responses of the Pageview API.
Mon, Jun 18, 5:32 PM · Analytics

Jun 13 2018

diego added a comment to T195001: Support getting community involved section translation.

@Trizek-WMF, I agree with you that instructions could be improved. I have myself needed to struggle a bit to translate to Spanish. But, we have already done many iterations to reach this quality of instructions, and although they are far from perfect, I don't thing that we should be blocked on this. So, please, let's just use this instructions.

Jun 13 2018, 3:32 PM · CommRel-Specialists-Support (Jul-Sep-2018), Research

Jun 12 2018

diego added a comment to T195001: Support getting community involved section translation.

@Trizek-WMF : I have completed the translation to Spanish.

Jun 12 2018, 5:56 PM · CommRel-Specialists-Support (Jul-Sep-2018), Research

Jun 11 2018

diego added a comment to T195001: Support getting community involved section translation.

@Trizek-WMF : you mean a subpage in Meta? We can create a new one under this one:

Jun 11 2018, 12:23 PM · CommRel-Specialists-Support (Jul-Sep-2018), Research

Jun 1 2018

diego added a comment to T195001: Support getting community involved section translation.

Thanks for this stats @bmansurov

Jun 1 2018, 1:38 PM · CommRel-Specialists-Support (Jul-Sep-2018), Research

May 24 2018

diego added a comment to T195001: Support getting community involved section translation.

@Trizek-WMF : I've updated the list of candidates.
For candidates coming from the Babel Template, I'm filtering out the ones with less than 11 revisions in each Wiki.
For the candidates coming from the CX, I'm filtering the ones with less than 11 translations. I'm also providing the number of translations that they have already done

May 24 2018, 9:10 PM · CommRel-Specialists-Support (Jul-Sep-2018), Research

May 23 2018

diego added a comment to T195001: Support getting community involved section translation.

@diego, makes sense to have a better interface and potentially wait, but I clearly remember you asked me to have it done by the end of the month, and the interface would take time to be created. That's why I've made those messages.

May 23 2018, 7:00 PM · CommRel-Specialists-Support (Jul-Sep-2018), Research
diego updated subscribers of T195001: Support getting community involved section translation.

Hi @Trizek-WMF! Thanks for all the work the you have done in so little time.

May 23 2018, 5:47 PM · CommRel-Specialists-Support (Jul-Sep-2018), Research

May 19 2018

diego added a comment to T195021: A/B Testing Measurements for Wikivoyage and Mobile Main Page.

I have no experience with A/B testing. I could help later with some data processing.

May 19 2018, 12:42 PM · Research-consulting, Research

May 16 2018

Neil_P._Quinn_WMF awarded T186559: Provide data dumps in the Analytics Data Lake a Love token.
May 16 2018, 11:35 PM · Research, Analytics
diego updated the task description for T186559: Provide data dumps in the Analytics Data Lake.
May 16 2018, 5:23 PM · Research, Analytics

May 14 2018

diego added a comment to T186558: Create a Historical Link Graph for Wikipedia.

We are considering to collaborate on this with EPFL. To proceed, with need to:

May 14 2018, 6:11 PM · Data-release, Research
diego updated subscribers of T186558: Create a Historical Link Graph for Wikipedia.
May 14 2018, 5:32 PM · Data-release, Research

May 3 2018

diego added a project to T193759: Add legacy per-article pagecounts data (prior to 2015): Analytics.
May 3 2018, 4:15 PM · Analytics

May 2 2018

diego added a comment to T184212: Gather labels as ground truth for section translation.

@bmansurov , ok let's try to solve that.

May 2 2018, 6:02 PM · Research-2017-18-Q4, Research-2017-18-Q3, Research
diego added a comment to T184212: Gather labels as ground truth for section translation.

Here the remaining languages:

May 2 2018, 2:27 PM · Research-2017-18-Q4, Research-2017-18-Q3, Research

May 1 2018

diego added a comment to T184212: Gather labels as ground truth for section translation.

Any reason for doing manually? there is no way to automatize this process?

May 1 2018, 3:51 PM · Research-2017-18-Q4, Research-2017-18-Q3, Research
diego added a comment to T184212: Gather labels as ground truth for section translation.

en: Help request for mapping section titles
es: Solicitud de ayuda para mapear títulos de secciones
fr: Aidez-nous à associer des titres de section
ru: Просьба помочь с переводом заголовков разделов

May 1 2018, 2:37 PM · Research-2017-18-Q4, Research-2017-18-Q3, Research

Apr 23 2018

diego added a comment to T184212: Gather labels as ground truth for section translation.

Please find the list of people to be contacted here: https://docs.google.com/spreadsheets/d/1vmTvSFitmsbpFKagLVxR2c2VcY8mBIVRdd_KUa_cIc4/edit?usp=sharing

Apr 23 2018, 2:44 PM · Research-2017-18-Q4, Research-2017-18-Q3, Research

Apr 17 2018

diego added a comment to T184212: Gather labels as ground truth for section translation.

Thanks @bmansurov. Let's wait until tomorrow and see how it works. Then, I'll give you 30 usernames more.

Apr 17 2018, 6:36 PM · Research-2017-18-Q4, Research-2017-18-Q3, Research
diego added a comment to T184212: Gather labels as ground truth for section translation.

@bmansurov, we need to change the spreadsheet permissions, we need to allow non-logged people to edit the document. Do you know how to do it?

Apr 17 2018, 1:26 PM · Research-2017-18-Q4, Research-2017-18-Q3, Research
diego added a comment to T175664: Analyze editor-editor interactions for instances of wikihounding.

Main results:
https://meta.wikimedia.org/wiki/Research:Topical_coverage_of_Edit_Wars

Apr 17 2018, 1:19 PM · Research-Archive, Research-2017-18-Q2, Anti-Harassment
diego moved T175664: Analyze editor-editor interactions for instances of wikihounding from In Progress to Done (current quarter) on the Research board.
Apr 17 2018, 1:16 PM · Research-Archive, Research-2017-18-Q2, Anti-Harassment

Apr 10 2018

diego added a comment to T184212: Gather labels as ground truth for section translation.
Apr 10 2018, 6:47 AM · Research-2017-18-Q4, Research-2017-18-Q3, Research

Mar 27 2018

diego added a comment to T184213: Gather labels as ground truth for section synonym detection.

We (@leila and me), have updated the labels, now we will use: same, overlap and different. And translated this in Spanish, and required help from staff and community for translating this labels in the other 4 languages.
We have also added 3 columns, for collecting different assessment in the case different opinions among reviewers.

Mar 27 2018, 6:13 PM · Research-2017-18-Q4, Research-2017-18-Q3, Research

Mar 22 2018

diego added a comment to T183009: Research Showcase March 2018.

@leila , done

Mar 22 2018, 6:12 PM · Research-Archive, Research-collaborations, Research-management

Mar 20 2018

diego updated subscribers of T184213: Gather labels as ground truth for section synonym detection.

Please find the data to labeled here: https://drive.google.com/drive/folders/1pzR3P16ck7FyrE7QgIpcSx1TPumTGA9u?usp=sharing

Mar 20 2018, 4:20 PM · Research-2017-18-Q4, Research-2017-18-Q3, Research

Mar 19 2018

diego changed the status of T186558: Create a Historical Link Graph for Wikipedia from Open to Stalled.
Mar 19 2018, 5:45 PM · Data-release, Research

Mar 13 2018

diego added a comment to T186559: Provide data dumps in the Analytics Data Lake.

@JAllemandou : just for the record, in this case I meant the parquet partitions. See you in IRC

Mar 13 2018, 5:43 PM · Research, Analytics
diego added a comment to T186559: Provide data dumps in the Analytics Data Lake.

@JAllemandou : My understanding is that if you partition by a unique id, you sort by that key,and then all the contiguous ids are in the same partition, as explained here: https://hackernoon.com/managing-en,spark-partitions-with-coalesce-and-repartition-4050c57ad5c4

Mar 13 2018, 5:06 PM · Research, Analytics
diego added a comment to T186559: Provide data dumps in the Analytics Data Lake.

@JAllemandou : as we discussed on IRC, could you please add the timestamp for each revision?
Also it would be good to have the data partitioned by revision_id, because this would make easer futures joins to get additional information (e.g. user)
Thanks.

Mar 13 2018, 2:15 PM · Research, Analytics
diego added a comment to T186776: Reduction of stat1005's disk space usage.

now 66G /home/dsaez/

Mar 13 2018, 11:50 AM · User-Elukey, Analytics-Kanban

Mar 12 2018

diego added a comment to T189497: pyspark2 different versions in Driver and Workers.

@Ottomata and @JAllemandou I found a work-around by creating an python2.7 virtualenv on stat1005.
I think that is the easiest solution right now. Updating python3 on the workers might be a good idea for the future :)

Mar 12 2018, 5:24 PM · Analytics
diego updated subscribers of T189497: pyspark2 different versions in Driver and Workers.
Mar 12 2018, 3:01 PM · Analytics
diego created T189497: pyspark2 different versions in Driver and Workers.
Mar 12 2018, 2:58 PM · Analytics
diego added a comment to T183037: Develop a standalone classifier for section synonym finder.

You can find the candidates for synonyms here:

Mar 12 2018, 2:25 PM · Research-Archive, Research-2017-18-Q3

Mar 11 2018

diego added a comment to T186559: Provide data dumps in the Analytics Data Lake.

@JAllemandou : just came back to this. The parquet version is amazing!! Thank you very much!

Mar 11 2018, 10:23 PM · Research, Analytics

Feb 27 2018

diego updated the task description for T187795: Give access to Wikimedia github account.
Feb 27 2018, 6:00 PM · GitHub-Mirrors, Research

Feb 26 2018

diego added a comment to T186558: Create a Historical Link Graph for Wikipedia.

@DarTar: Do you have any preference for the format of this dataset? I can think in two ways of present it:

Feb 26 2018, 11:23 PM · Data-release, Research

Feb 15 2018

diego added a comment to T186270: Research team process and technical blockers (items from our 2018 offsite).

to make stat1004 we need to solve this: https://phabricator.wikimedia.org/T187178

Feb 15 2018, 3:39 PM · Research
diego added a subtask for T186270: Research team process and technical blockers (items from our 2018 offsite): T187178: Mount XML dumps on stat1004.
Feb 15 2018, 3:37 PM · Research
diego added a parent task for T187178: Mount XML dumps on stat1004: T186270: Research team process and technical blockers (items from our 2018 offsite).
Feb 15 2018, 3:37 PM · Research, Analytics

Feb 14 2018

diego added a comment to T186819: Pageviews/Stats on research.wikimedia.org.
Feb 14 2018, 1:28 AM · Research-landing-page, Patch-For-Review, Analytics-Kanban

Feb 13 2018

diego added a comment to T182211: Develop a standalone classifier for section translation (alignment) across languages.

@diego would it be possible to generate each entry on a separate line and mappings bundled under the key "targets"? Something like this:

{key1: {"rank": n, "targets: { "l1": [candidate1, ..., candidate5], ..., "l5":[candidate1, ... candidate5]}}}
{key2: {"rank": n, "targets: { "l1": [candidate1, ..., candidate5], ..., "l5":[candidate1, ... candidate5]}}}

This would allow me to not load all data in memory in order to parse it.

Feb 13 2018, 4:51 PM · Research-Archive, Research-2017-18-Q3
diego created T187178: Mount XML dumps on stat1004.
Feb 13 2018, 12:50 PM · Research, Analytics

Feb 12 2018

diego added a comment to T182211: Develop a standalone classifier for section translation (alignment) across languages.

@bmansurov please find the candidates here: @stat1005:/home/dsaez/code/alignment/resultsMapping

Feb 12 2018, 3:53 PM · Research-Archive, Research-2017-18-Q3

Feb 10 2018

diego added a comment to T186559: Provide data dumps in the Analytics Data Lake.
Feb 10 2018, 5:43 PM · Research, Analytics

Feb 9 2018

diego added a comment to T186559: Provide data dumps in the Analytics Data Lake.

@bmansurov could you try to upload the 20180201 dumps for en,ru,ar,jp,fr,es in parquet_
This is not urgent but might be useful for the section recommendations project.

Feb 9 2018, 10:43 AM · Research, Analytics

Feb 8 2018

diego created T186819: Pageviews/Stats on research.wikimedia.org.
Feb 8 2018, 6:31 PM · Research-landing-page, Patch-For-Review, Analytics-Kanban
diego added a comment to T183043: Build an API and/or a tool to surface section recommendations.

quick comment: from this results http://gapfinder.wmflabs.org/fr.wikipedia.org/v1/section/article/Barack_Obama

Feb 8 2018, 6:27 PM · Research-Archive, Patch-For-Review, Research-2017-18-Q3

Feb 7 2018

diego updated the task description for T186245: January travel expense forms.
Feb 7 2018, 5:32 PM · Research-Archive, Research-management
diego added a comment to T186519: Request creation of "wmf-research-tools" VPS project.

@diego could you update "Brief description" with more info about how you're going to use labs instances?

Feb 7 2018, 3:36 PM · cloud-services-team (Kanban), Research, Cloud-VPS (Project-requests)

Feb 5 2018

diego created T186559: Provide data dumps in the Analytics Data Lake.
Feb 5 2018, 8:07 PM · Research, Analytics
diego triaged T186558: Create a Historical Link Graph for Wikipedia as Normal priority.
Feb 5 2018, 8:05 PM · Data-release, Research
diego created T186558: Create a Historical Link Graph for Wikipedia.
Feb 5 2018, 8:03 PM · Data-release, Research

Jan 18 2018

diego added a comment to T185160: Gather basic statistics on languages spoken by editors.

X is the number of people that speaks 'en' and 'uz', Y is the number of people that speaks 'en' and 'fr' ...etc

Jan 18 2018, 7:31 PM · Data-release, Research-2017-18-Q3, Research
diego added a comment to T185160: Gather basic statistics on languages spoken by editors.

this is great @bmansurov !

Jan 18 2018, 4:27 PM · Data-release, Research-2017-18-Q3, Research

Jan 3 2018

diego added a comment to T171249: [Objective 3.1.1] Characterize and model wikihounding.

Results are interesting for understanding topic-span of edit wars. Cross-topic edit wars are rare, and usually associated to very active users.
The viability of applying this approach for detecting harassment or more specifically wikihounding requires deeper analysis.

Jan 3 2018, 7:43 AM · Anti-Harassment, Epic, Research-Programs
diego updated the task description for T171249: [Objective 3.1.1] Characterize and model wikihounding.
Jan 3 2018, 7:40 AM · Anti-Harassment, Epic, Research-Programs
diego updated the task description for T171249: [Objective 3.1.1] Characterize and model wikihounding.
Jan 3 2018, 7:39 AM · Anti-Harassment, Epic, Research-Programs
diego updated the task description for T171249: [Objective 3.1.1] Characterize and model wikihounding.
Jan 3 2018, 7:39 AM · Anti-Harassment, Epic, Research-Programs

Jan 2 2018

diego moved T182211: Develop a standalone classifier for section translation (alignment) across languages from Staged to In Progress on the Research board.
Jan 2 2018, 11:35 PM · Research-Archive, Research-2017-18-Q3

Dec 6 2017

diego created T182211: Develop a standalone classifier for section translation (alignment) across languages.
Dec 6 2017, 4:54 PM · Research-Archive, Research-2017-18-Q3

Nov 14 2017

diego updated the task description for T180001: Give talk in IWSC (Brazil, Niteroi).
Nov 14 2017, 12:54 PM · Research-Archive, Research-outreach

Nov 8 2017

diego triaged T180001: Give talk in IWSC (Brazil, Niteroi) as Normal priority.
Nov 8 2017, 3:54 AM · Research-Archive, Research-outreach
diego created T180001: Give talk in IWSC (Brazil, Niteroi).
Nov 8 2017, 3:54 AM · Research-Archive, Research-outreach
diego closed T174737: Build a prediction model to predict section ranks within category as Resolved.
Nov 8 2017, 3:35 AM · Research-Archive
diego closed T174737: Build a prediction model to predict section ranks within category, a subtask of T171224: [Objective 9.1.1] Article expansion recommendations, as Resolved.
Nov 8 2017, 3:35 AM · Epic, Research-Programs

Sep 27 2017

diego added a comment to T176091: Mount dumps on SWAP machines (notebook1003.eqiad.wmnet / notebook1004.eqiad.wmnet).

Installing sshfs would be also a good solution for this and for https://phabricator.wikimedia.org/T176093

Sep 27 2017, 6:57 PM · Patch-For-Review, Analytics-Kanban, Analytics

Sep 17 2017

diego added a comment to T176093: Give +w permission for users in /srv folder in SWAP Machines .

@Aklapper, sorry, it is analytics. Tagged.

Sep 17 2017, 9:04 PM · Analytics
diego added a project to T176093: Give +w permission for users in /srv folder in SWAP Machines : Analytics.
Sep 17 2017, 9:03 PM · Analytics
diego created T176093: Give +w permission for users in /srv folder in SWAP Machines .
Sep 17 2017, 6:08 PM · Analytics
diego created T176091: Mount dumps on SWAP machines (notebook1003.eqiad.wmnet / notebook1004.eqiad.wmnet).
Sep 17 2017, 6:01 PM · Patch-For-Review, Analytics-Kanban, Analytics

Sep 16 2017

diego updated the task description for T171249: [Objective 3.1.1] Characterize and model wikihounding.
Sep 16 2017, 11:08 PM · Anti-Harassment, Epic, Research-Programs
diego updated the task description for T171249: [Objective 3.1.1] Characterize and model wikihounding.
Sep 16 2017, 11:08 PM · Anti-Harassment, Epic, Research-Programs

Sep 6 2017

diego created T175220: Change prod uid from diego to dsaez, so it can match with the ldap uid.
Sep 6 2017, 11:50 PM · Patch-For-Review, SRE-Access-Requests, Operations

Sep 5 2017

diego updated the task description for T174737: Build a prediction model to predict section ranks within category.
Sep 5 2017, 7:40 PM · Research-Archive
diego added a comment to T174737: Build a prediction model to predict section ranks within category.

check https://meta.wikimedia.org/wiki/Research:Expanding_Wikipedia_stubs_across_languages/Ranking_sections_within_categories

Sep 5 2017, 7:31 PM · Research-Archive
diego updated the task description for T174737: Build a prediction model to predict section ranks within category.
Sep 5 2017, 7:31 PM · Research-Archive
diego moved T166778: Travel reimbursements for WikiCite 2017 participants from Blocked to Staged on the Research board.
Sep 5 2017, 7:28 PM · Research-Archive, Research-management, WikiCite

Sep 1 2017

diego added a comment to T174737: Build a prediction model to predict section ranks within category.
Sep 1 2017, 4:52 PM · Research-Archive

Aug 18 2017

diego created P5894 ssh access.
Aug 18 2017, 4:02 PM

Aug 17 2017

diego added a comment to T172891: Access for new Research Scientist: Diego Saez.

new production key:

Aug 17 2017, 11:37 PM · Patch-For-Review, Operations, SRE-Access-Requests, Research
diego added a comment to T172891: Access for new Research Scientist: Diego Saez.

my ssh config

Aug 17 2017, 7:29 PM · Patch-For-Review, Operations, SRE-Access-Requests, Research
diego created P5892 ssh problems.
Aug 17 2017, 7:13 PM

Aug 9 2017

diego added a comment to T172891: Access for new Research Scientist: Diego Saez.

@RobH: I would prefer to have just one account, with 'diego' as username. I can delete the personal one.

Aug 9 2017, 4:32 PM · Patch-For-Review, Operations, SRE-Access-Requests, Research
diego added a comment to T172891: Access for new Research Scientist: Diego Saez.

just FYI my username in wikitech is diego, but my 'Instance shell account name:' is dsaez (diego was not available)

Aug 9 2017, 2:58 PM · Patch-For-Review, Operations, SRE-Access-Requests, Research
diego added a comment to T172891: Access for new Research Scientist: Diego Saez.

L3 signed

Aug 9 2017, 2:42 PM · Patch-For-Review, Operations, SRE-Access-Requests, Research