Page MenuHomePhabricator

diego (Diego S-T)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Aug 8 2017, 10:56 AM (196 w, 4 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
Diego (WMF) [ Global Accounts ]

Recent Activity

Yesterday

diego closed T252933: Exploration on content propagation across Wikimedia projects. as Resolved.
Fri, May 14, 3:08 PM · Research (FY2020-21-Research-April-June)
diego closed T252933: Exploration on content propagation across Wikimedia projects., a subtask of T260564: Deliver 4 milestones towards 1 model to understand the diffusion of content on the Wikimedia projects, as Resolved.
Fri, May 14, 3:08 PM · Research (FY2020-21-Research-April-June), Epic
diego added a comment to T252933: Exploration on content propagation across Wikimedia projects..
  • We have shared the results on several places, such as the talk mentioned before, the research showcase, and
  • This work was published at ICWSM'21, paper publicly available.
Fri, May 14, 3:07 PM · Research (FY2020-21-Research-April-June)
diego closed T260566: Develop 1 model to identify Misinformation as Resolved.
Fri, May 14, 3:04 PM · Research (FY2020-21-Research-April-June)
diego added a comment to T260566: Develop 1 model to identify Misinformation.
Fri, May 14, 3:03 PM · Research (FY2020-21-Research-April-June)
diego added a comment to T260564: Deliver 4 milestones towards 1 model to understand the diffusion of content on the Wikimedia projects.
  • The work is published, poster and presentation submitted.
Fri, May 14, 3:02 PM · Research (FY2020-21-Research-April-June), Epic

Tue, May 11

diego added a comment to T282532: Web proxies not working for VPS project research-collaborations-api.

Thanks @Majavah. I've added the new rules, and problem is solved :)

Tue, May 11, 10:59 AM · cloud-services-team (Kanban), Cloud-VPS
diego created T282532: Web proxies not working for VPS project research-collaborations-api.
Tue, May 11, 10:42 AM · cloud-services-team (Kanban), Cloud-VPS

Fri, May 7

diego added a comment to T260566: Develop 1 model to identify Misinformation.
  • The CR version was submitted.
  • We have uploaded the paper on arxiv, it should be available next week.
Fri, May 7, 7:19 PM · Research (FY2020-21-Research-April-June)
diego added a comment to T260564: Deliver 4 milestones towards 1 model to understand the diffusion of content on the Wikimedia projects.

Update

Fri, May 7, 7:18 PM · Research (FY2020-21-Research-April-June), Epic

Thu, May 6

diego created T282178: Article missing from the Clickstream dataset.
Thu, May 6, 7:34 PM · Analytics-Kanban, Analytics

Fri, Apr 30

diego added a comment to T252933: Exploration on content propagation across Wikimedia projects..
Fri, Apr 30, 10:02 PM · Research (FY2020-21-Research-April-June)
diego added a comment to T260564: Deliver 4 milestones towards 1 model to understand the diffusion of content on the Wikimedia projects.
  • Preparing the ICWSM presentation.
Fri, Apr 30, 9:59 PM · Research (FY2020-21-Research-April-June), Epic
diego added a comment to T260566: Develop 1 model to identify Misinformation.

*Updates*

  • We are preparing the camera ready version for SIGIR.
Fri, Apr 30, 9:58 PM · Research (FY2020-21-Research-April-June)

Apr 3 2021

diego closed T263860: Outreachy Project: Create Machine Learning datasets to measure content reliability on Wikipedia. as Resolved.
Apr 3 2021, 1:10 AM · Outreachy (Round 21), Outreach-Programs-Projects
diego added a comment to T263860: Outreachy Project: Create Machine Learning datasets to measure content reliability on Wikipedia..

Outcome: https://meta.wikimedia.org/wiki/Research:Wiki-Reliability:_A_Large_Scale_Dataset_for_Content_Reliability_on_Wikipedia

Apr 3 2021, 1:10 AM · Outreachy (Round 21), Outreach-Programs-Projects
diego added a comment to T263860: Outreachy Project: Create Machine Learning datasets to measure content reliability on Wikipedia..

@srishakatux project finished successfully, more details here: T260566

Apr 3 2021, 1:09 AM · Outreachy (Round 21), Outreach-Programs-Projects
diego added a comment to T260566: Develop 1 model to identify Misinformation.
  • No updates
Apr 3 2021, 12:54 AM · Research (FY2020-21-Research-April-June)
diego added a comment to T260564: Deliver 4 milestones towards 1 model to understand the diffusion of content on the Wikimedia projects.
Apr 3 2021, 12:51 AM · Research (FY2020-21-Research-April-June), Epic

Mar 30 2021

diego added a comment to T272192: Migrate to new Wikidata Analytics.

I see. I was asking because we wrote these address on published papers, and those are immutable. But if is not possible, is not possible.

Mar 30 2021, 8:23 PM · User-GoranSMilovanovic, WMDE-Analytics-Engineering, Wikidata-Bridge, Wikidata
diego added a comment to T272192: Migrate to new Wikidata Analytics.

Would be possible to add redirects from the old urls to the new ones?

Mar 30 2021, 7:15 PM · User-GoranSMilovanovic, WMDE-Analytics-Engineering, Wikidata-Bridge, Wikidata

Mar 26 2021

diego added a comment to T260566: Develop 1 model to identify Misinformation.
  • No updates.
Mar 26 2021, 11:21 PM · Research (FY2020-21-Research-April-June)
diego updated the task description for T260564: Deliver 4 milestones towards 1 model to understand the diffusion of content on the Wikimedia projects.
Mar 26 2021, 11:20 PM · Research (FY2020-21-Research-April-June), Epic
diego added a comment to T260564: Deliver 4 milestones towards 1 model to understand the diffusion of content on the Wikimedia projects.
  • Finishing the camera ready version for ICWSM'21 paper.
  • We have trained a new model, content information, to predict the likelihood of item to propagate to other projects.
  • We are working on documenting our new results.
Mar 26 2021, 11:20 PM · Research (FY2020-21-Research-April-June), Epic

Mar 19 2021

diego added a comment to T260566: Develop 1 model to identify Misinformation.
  • Information updated on betterworks.
Mar 19 2021, 7:34 PM · Research (FY2020-21-Research-April-June)
diego added a comment to T260564: Deliver 4 milestones towards 1 model to understand the diffusion of content on the Wikimedia projects.
  • The paper with the dataset and first model has been accepted on ICWSM'21
  • We are working on the camera ready version.
Mar 19 2021, 7:34 PM · Research (FY2020-21-Research-April-June), Epic

Mar 13 2021

diego added a comment to T260564: Deliver 4 milestones towards 1 model to understand the diffusion of content on the Wikimedia projects.
  • We are improvements on the model to predict the next language of propagation.
  • We are working on modeling changes of content propagation behavior depending on the article reliability.
Mar 13 2021, 2:42 AM · Research (FY2020-21-Research-April-June), Epic

Mar 8 2021

diego updated the task description for T260566: Develop 1 model to identify Misinformation.
Mar 8 2021, 2:58 PM · Research (FY2020-21-Research-April-June)
diego added a comment to T260566: Develop 1 model to identify Misinformation.
Mar 8 2021, 2:57 PM · Research (FY2020-21-Research-April-June)

Feb 26 2021

diego added a comment to T260564: Deliver 4 milestones towards 1 model to understand the diffusion of content on the Wikimedia projects.
  • Pageviews added an improvement over %5 on predicting content propagation.
  • We are currently working on add content-related information (articles meta-data) to de model. This will allow to study the effects content quality on the spread patterns.
Feb 26 2021, 9:30 PM · Research (FY2020-21-Research-April-June), Epic
diego added a comment to T260566: Develop 1 model to identify Misinformation.
  • The datasets will be published next week.
Feb 26 2021, 9:27 PM · Research (FY2020-21-Research-April-June)

Feb 19 2021

diego added a comment to T260566: Develop 1 model to identify Misinformation.
  • We have announced the datasets in a presentation to the NLP group in the University of Cambridge.
  • Currently working on documenting the datasets.
Feb 19 2021, 7:04 PM · Research (FY2020-21-Research-April-June)
diego added a comment to T260564: Deliver 4 milestones towards 1 model to understand the diffusion of content on the Wikimedia projects.

Updates

  • New experiments using pageviews as feature to predict content propagation.
Feb 19 2021, 7:02 PM · Research (FY2020-21-Research-April-June), Epic
diego added a comment to T274400: Request creation of research-collaborations-api VPS project.

Great! Thx!

Feb 19 2021, 4:31 PM · Cloud-VPS (Project-requests)

Feb 13 2021

diego updated subscribers of T274304: Requesting access to Analytic Cluster for Research Intern (ChristineDeKock).
Feb 13 2021, 2:48 AM · SRE, SRE-Access-Requests

Feb 11 2021

diego updated subscribers of T260566: Develop 1 model to identify Misinformation.
Feb 11 2021, 4:57 PM · Research (FY2020-21-Research-April-June)

Feb 10 2021

diego added a comment to T274400: Request creation of research-collaborations-api VPS project.

Good idea! I'll do the same.

Feb 10 2021, 10:02 PM · Cloud-VPS (Project-requests)
diego added a comment to T274400: Request creation of research-collaborations-api VPS project.

Hi @bd808, I get your point. I can take the responsibility on keeping track of all these instances, and be the point of contact with you.

Feb 10 2021, 7:38 PM · Cloud-VPS (Project-requests)
diego created T274400: Request creation of research-collaborations-api VPS project.
Feb 10 2021, 6:49 PM · Cloud-VPS (Project-requests)
diego updated the task description for T274304: Requesting access to Analytic Cluster for Research Intern (ChristineDeKock).
Feb 10 2021, 3:15 PM · SRE, SRE-Access-Requests

Feb 9 2021

diego added a comment to T274304: Requesting access to Analytic Cluster for Research Intern (ChristineDeKock).

@ChristineDeKock please update the task description with your SSH key.

Feb 9 2021, 9:32 PM · SRE, SRE-Access-Requests
diego created T274304: Requesting access to Analytic Cluster for Research Intern (ChristineDeKock).
Feb 9 2021, 9:31 PM · SRE, SRE-Access-Requests

Jan 28 2021

diego added a comment to T273213: Effects of collaboration patterns on article quality.

From our previous meeting:

Jan 28 2021, 6:31 PM · Research
diego created T273213: Effects of collaboration patterns on article quality.
Jan 28 2021, 6:06 PM · Research

Jan 23 2021

diego added a comment to T260566: Develop 1 model to identify Misinformation.
  • New metadata has been added to the dataset: We are differentiating templates at article, section, and inline level.
Jan 23 2021, 1:52 AM · Research (FY2020-21-Research-April-June)
diego added a comment to T260564: Deliver 4 milestones towards 1 model to understand the diffusion of content on the Wikimedia projects.
  • The dataset to model content propagation has been published in Zenodo.
Jan 23 2021, 1:49 AM · Research (FY2020-21-Research-April-June), Epic

Jan 18 2021

diego added a comment to T269256: Story Idea for Blog: Automated detection of wikipedia censorship events .

Thanks everybody. Especially @Nuria for putting all this together.

Jan 18 2021, 12:07 AM · Technical-blog-posts

Jan 15 2021

diego added a comment to T260564: Deliver 4 milestones towards 1 model to understand the diffusion of content on the Wikimedia projects.
  • We submitted the propagation dataset to ICWSM.
  • We are building a new model considering content popularity (pageviews).
Jan 15 2021, 5:53 PM · Research (FY2020-21-Research-April-June), Epic
diego added a comment to T260566: Develop 1 model to identify Misinformation.
  • We have analyzed the impact of reverts on negative examples (reliability issue being solved)
  • We have already created an heuristic to find negatives examples.
  • We have created an initial dataset 80 templates.
  • Currently we identifying relevant meta-data (ie. pre-computed features) to be added on the dataset.
Jan 15 2021, 5:41 PM · Research (FY2020-21-Research-April-June)

Jan 6 2021

diego added a comment to T204438: finding statements that need a reference.

https://dl.acm.org/doi/abs/10.1145/3366424.3383571

Jan 6 2021, 8:53 PM · patch-welcome, Wikidata

Dec 23 2020

diego added a comment to T90881: Framework for checking sources on Wikidata (Does the source actually say what we claim it says?).

Hi all

Dec 23 2020, 2:11 PM · Wikimedia-Hackathon-2018, Wikidata, patch-welcome

Dec 11 2020

diego updated the task description for T260566: Develop 1 model to identify Misinformation.
Dec 11 2020, 2:46 PM · Research (FY2020-21-Research-April-June)
diego added a comment to T260564: Deliver 4 milestones towards 1 model to understand the diffusion of content on the Wikimedia projects.
  • Depending on the results for the paper submission (received border line evaluation), we are planning to publish the dataset separately from the model. In the case of publishing the dataset separately, this will be done during Q3.
Dec 11 2020, 2:46 PM · Research (FY2020-21-Research-April-June), Epic
diego updated the task description for T260564: Deliver 4 milestones towards 1 model to understand the diffusion of content on the Wikimedia projects.
Dec 11 2020, 2:44 PM · Research (FY2020-21-Research-April-June), Epic
diego added a comment to T260566: Develop 1 model to identify Misinformation.
  • Kay (outreachy intern) has started her work based on the templates listed in this WikiProject.
  • We are exploring techniques to get negative examples (cases were the problem has been solved) for these templates.
Dec 11 2020, 2:44 PM · Research (FY2020-21-Research-April-June)
diego changed the status of T252450: Submit a paper about the model developed for detecting disinformation. , a subtask of T243256: Measuring the consistency of information between Wikipedia articles and Wikidata items., from Stalled to Open.
Dec 11 2020, 2:40 PM · Research (FY2019-20-Research-April-June)
diego changed the status of T252450: Submit a paper about the model developed for detecting disinformation.  from Stalled to Open.
Dec 11 2020, 2:40 PM · Research
diego added a comment to T252450: Submit a paper about the model developed for detecting disinformation. .
  • We have submitted one paper about self-contradictory content in Wikipedia articles.
Dec 11 2020, 2:40 PM · Research

Nov 16 2020

diego added a comment to T260566: Develop 1 model to identify Misinformation.
  • We have selected one Outreachy intern that will start on December. The intern will help on the task of developing the machine readable dataset.
Nov 16 2020, 11:34 AM · Research (FY2020-21-Research-April-June)
diego added a comment to T260564: Deliver 4 milestones towards 1 model to understand the diffusion of content on the Wikimedia projects.
  • The dataset are ready. We are waiting for the paper to be published to share the link in public. Currently, the datasets are available under request via email.
Nov 16 2020, 11:31 AM · Research (FY2020-21-Research-April-June), Epic
diego added a comment to T252933: Exploration on content propagation across Wikimedia projects..
  • We started a preliminary analysis propagation of sources across Wikis.
Nov 16 2020, 11:28 AM · Research (FY2020-21-Research-April-June)

Nov 2 2020

diego added a comment to T260566: Develop 1 model to identify Misinformation.
  • We are extending the list to other languages: es, pt, ca.
  • Reviewing outreaching applications that will help on creating the machine readable dataset.
Nov 2 2020, 12:11 PM · Research (FY2020-21-Research-April-June)
diego added a comment to T252933: Exploration on content propagation across Wikimedia projects..
  • We are exploring a follow-up on this project, that based on our results, will focus on how to model the spread of disinformation.
Nov 2 2020, 12:08 PM · Research (FY2020-21-Research-April-June)
diego updated the task description for T219903: Keep research.wikimedia.org landing page updated.
Nov 2 2020, 12:05 PM · Patch-For-Review, Research

Oct 29 2020

diego added a comment to T266426: Outreachy '21 Proposal: Create Machine Learning datasets to measure content reliability on Wikipedia.

For more details on the timeline recommendations please check Isaac's comment here: T263874#6589856

Oct 29 2020, 8:33 PM · Outreachy (Round 21)
diego added a comment to T266426: Outreachy '21 Proposal: Create Machine Learning datasets to measure content reliability on Wikipedia.

Got you. Yes, looks good, please add it in the outreachy application.

Oct 29 2020, 8:25 PM · Outreachy (Round 21)
diego added a comment to T263860: Outreachy Project: Create Machine Learning datasets to measure content reliability on Wikipedia..

Hello everybody!

Oct 29 2020, 7:38 PM · Outreachy (Round 21), Outreach-Programs-Projects
diego added a comment to T266426: Outreachy '21 Proposal: Create Machine Learning datasets to measure content reliability on Wikipedia.

@KemmieKemy thanks for submitting. You are doing great progress.

Oct 29 2020, 7:33 PM · Outreachy (Round 21)

Oct 28 2020

diego added a comment to T266467: Check home/HDFS leftovers of rodolfovalentim.

@elukey, yes.

Oct 28 2020, 3:33 PM · Analytics
diego updated subscribers of T266467: Check home/HDFS leftovers of rodolfovalentim.

@Rvvalentim , please can you double check if you need any of those files?

Oct 28 2020, 12:09 PM · Analytics

Oct 26 2020

diego updated subscribers of T266180: Request increased quota for wmf-research-tools Cloud VPS project.
Oct 26 2020, 4:08 PM · cloud-services-team (Kanban), Cloud-VPS (Quota-requests)
diego updated the task description for T252933: Exploration on content propagation across Wikimedia projects..
Oct 26 2020, 2:52 PM · Research (FY2020-21-Research-April-June)
diego added a comment to T252933: Exploration on content propagation across Wikimedia projects..
  • Paper was submitted last week.
Oct 26 2020, 2:51 PM · Research (FY2020-21-Research-April-June)
diego updated the task description for T252933: Exploration on content propagation across Wikimedia projects..
Oct 26 2020, 2:50 PM · Research (FY2020-21-Research-April-June)

Oct 10 2020

diego added a comment to T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.
# Wikidata JSON dump we'll start processing (56 GB in size, compressed) so far too large to process the whole thing right now
!ls -shH "{WIKIDATA_DIR}{WIKIDATA_DUMP_FN}"

I'm still going through the Wikidata example, do you know what the shH option might mean? I can't find it online

Here you have: https://man7.org/linux/man-pages/man1/ls.1.html

Oct 10 2020, 8:56 PM · Outreachy (Round 21)
diego added a comment to T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.
# Wikidata JSON dump we'll start processing (56 GB in size, compressed) so far too large to process the whole thing right now
!ls -shH "{WIKIDATA_DIR}{WIKIDATA_DUMP_FN}"

I'm still going through the Wikidata example, do you know what the shH option might mean? I can't find it online

Oct 10 2020, 5:57 PM · Outreachy (Round 21)

Oct 8 2020

diego added a comment to T263860: Outreachy Project: Create Machine Learning datasets to measure content reliability on Wikipedia..

Hi @Lisasiziba
Please check the instructions here T263874.

Oct 8 2020, 8:08 AM · Outreachy (Round 21), Outreach-Programs-Projects

Oct 2 2020

diego added a comment to T252933: Exploration on content propagation across Wikimedia projects..
  • We are currently working on the paper, adding new analysis, and improvements on the model published in the first round of analysis.
Oct 2 2020, 11:08 PM · Research (FY2020-21-Research-April-June)
diego closed T260567: Provide one dump of all Wikipedia articles and predicted topics, a subtask of T258804: Language-Agnostic Topic Modeling, as Resolved.
Oct 2 2020, 11:04 PM · Research, Epic
diego closed T260567: Provide one dump of all Wikipedia articles and predicted topics as Resolved.
Oct 2 2020, 11:04 PM · Research (FY2020-21-Research-July-September)
diego updated the task description for T260567: Provide one dump of all Wikipedia articles and predicted topics.
Oct 2 2020, 11:04 PM · Research (FY2020-21-Research-July-September)
diego added a comment to T260567: Provide one dump of all Wikipedia articles and predicted topics.
Oct 2 2020, 11:03 PM · Research (FY2020-21-Research-July-September)

Sep 30 2020

diego updated subscribers of T263885: Okapi: Fresher -> Safer Spectrum, please review!!.
Sep 30 2020, 7:36 PM · Okapi [Wikimedia Enterprise]

Sep 29 2020

diego updated subscribers of T263885: Okapi: Fresher -> Safer Spectrum, please review!!.
Sep 29 2020, 5:56 PM · Okapi [Wikimedia Enterprise]
diego updated subscribers of T263885: Okapi: Fresher -> Safer Spectrum, please review!!.
Sep 29 2020, 3:06 PM · Okapi [Wikimedia Enterprise]
diego added a comment to T263885: Okapi: Fresher -> Safer Spectrum, please review!!.

Hi @RBrounley_WMF, thanks for sharing this and for the great work you are doing. Few comments from my side:

Sep 29 2020, 9:17 AM · Okapi [Wikimedia Enterprise]

Sep 24 2020

diego added a comment to T155560: Linked fact checker.

@leila I see some overlap although this task seems to be broader than the one I'm working on. Given that I don't see much documentation nor code about this task, I prefer to not take responsibility on this.

Sep 24 2020, 1:28 PM · WikiCite, artificial-intelligence, Wikidata

Sep 4 2020

diego updated the task description for T252933: Exploration on content propagation across Wikimedia projects..
Sep 4 2020, 8:48 PM · Research (FY2020-21-Research-April-June)
diego added a comment to T252933: Exploration on content propagation across Wikimedia projects..
  • We are currently working on preparing a paper to be submitted at the end of October.
Sep 4 2020, 8:48 PM · Research (FY2020-21-Research-April-June)
diego updated the task description for T260564: Deliver 4 milestones towards 1 model to understand the diffusion of content on the Wikimedia projects.
Sep 4 2020, 8:47 PM · Research (FY2020-21-Research-April-June), Epic
diego added a comment to T260564: Deliver 4 milestones towards 1 model to understand the diffusion of content on the Wikimedia projects.
  • The two datasets have been prepared:
    • One dataset with items that propagates across Wikipedias, removing bot activity.
    • Another dataset about external references (links) across projects.
Sep 4 2020, 8:47 PM · Research (FY2020-21-Research-April-June), Epic
diego added a comment to T260567: Provide one dump of all Wikipedia articles and predicted topics.
  • A recent (with all the articles existing until Aug 31th) dump have been created. During the following days I will upload it in a public repository.
Sep 4 2020, 8:43 PM · Research (FY2020-21-Research-July-September)

Aug 31 2020

diego closed T243256: Measuring the consistency of information between Wikipedia articles and Wikidata items. as Resolved.
Aug 31 2020, 1:52 PM · Research (FY2019-20-Research-April-June)
diego added a comment to T243256: Measuring the consistency of information between Wikipedia articles and Wikidata items..

Updates

  • We have finished the first model.
  • Report can be found here.
  • I will close this task, and continue reporting the progress on this line of research here: T260566
Aug 31 2020, 1:52 PM · Research (FY2019-20-Research-April-June)
diego added a comment to T252933: Exploration on content propagation across Wikimedia projects..

Updates

  • We have published the first round of analysis.
  • Some important highlights:
    • The size of the project (ie number of articles) is not correlated with likelihood of propagate content to other projects.
    • Initial results shows correlation between cultural similarity and the likelihood of two or more projects to share similar content.
    • For long cascades (ie, articles that exists in several languages), we are able to predict with a reasonable accuracy, the new languages that will create articles about the same topic.
Aug 31 2020, 1:40 PM · Research (FY2020-21-Research-April-June)

Aug 17 2020

diego added a subtask for T258804: Language-Agnostic Topic Modeling: T260567: Provide one dump of all Wikipedia articles and predicted topics.
Aug 17 2020, 3:33 PM · Research, Epic
diego added a parent task for T260567: Provide one dump of all Wikipedia articles and predicted topics: T258804: Language-Agnostic Topic Modeling.
Aug 17 2020, 3:33 PM · Research (FY2020-21-Research-July-September)
diego created T260567: Provide one dump of all Wikipedia articles and predicted topics.
Aug 17 2020, 3:32 PM · Research (FY2020-21-Research-July-September)
diego triaged T260566: Develop 1 model to identify Misinformation as High priority.
Aug 17 2020, 3:23 PM · Research (FY2020-21-Research-April-June)
diego created T260566: Develop 1 model to identify Misinformation.
Aug 17 2020, 3:23 PM · Research (FY2020-21-Research-April-June)