Page MenuHomePhabricator

diego (Diego S-T)
Senior Research Scientist

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Aug 8 2017, 10:56 AM (250 w, 1 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
Diego (WMF) [ Global Accounts ]

Recent Activity

Yesterday

diego added a comment to T306810: Manually evaluate section topics for accuracy.

Done.
Notice that the list of articles is new (the sample was random, now I fixed to get always the same articles in case we need to fix something).

Wed, May 25, 5:38 PM · Research, Structured-Data-Backlog (Current Work)
diego updated the task description for T306810: Manually evaluate section topics for accuracy.
Wed, May 25, 4:20 PM · Research, Structured-Data-Backlog (Current Work)
diego added a comment to T306810: Manually evaluate section topics for accuracy.

Please find the samples required in this link.

Wed, May 25, 4:19 PM · Research, Structured-Data-Backlog (Current Work)

Fri, May 6

diego added a comment to T288333: Understanding the spread of disinformation on Wikipedia.

Updates

  • We are exploring cross-lingual article quality. Details on this task: T305390
Fri, May 6, 2:15 PM · Research (FY2021-22-Research-April-June)
diego added a parent task for T305390: Cross-Linngual Article Quality Exploration: T288333: Understanding the spread of disinformation on Wikipedia.
Fri, May 6, 2:14 PM · Research (FY2021-22-Research-April-June)
diego added a subtask for T288333: Understanding the spread of disinformation on Wikipedia: T305390: Cross-Linngual Article Quality Exploration.
Fri, May 6, 2:14 PM · Research (FY2021-22-Research-April-June)

Wed, Apr 27

diego closed T293511: Expand section aligment to more languages, and share dumps, a subtask of T276212: Improve section mapping for Section Translation, as Resolved.
Wed, Apr 27, 11:11 AM · SectionTranslation, Epic
diego closed T293511: Expand section aligment to more languages, and share dumps as Resolved.
Wed, Apr 27, 11:11 AM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)

Apr 25 2022

diego added a comment to T306114: Cloud VPS "wmf-research-tools" project Stretch deprecation.

Thanks @Isaac. I'll check this.

Apr 25 2022, 5:11 PM · Cloud-VPS (Debian Stretch Deprecation)

Apr 22 2022

diego added a comment to T293511: Expand section aligment to more languages, and share dumps.

Hi @santhosh, @MunizaA has created a new dump just with pairs with probabilities > 0.9, you can find here. We think that 0.9 might be to high, let us know if you want to try with other values between 0.5 and 0.9.

Apr 22 2022, 2:56 PM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)

Apr 20 2022

diego added a comment to T287655: Generate template parameter alignments for en > de wikis.
pyspark2 --master yarn --deploy-mode client --executor-memory 8g --driver-memory 8g --conf spark.dynamicAllocation.maxExecutors=128

but result is same.

In this case fasttext is running on the driver. You can increase the memory driver when calling the spark env, replacing this:

Apr 20 2022, 8:29 AM · Language-Team (Language-2022-April-June), ContentTranslation

Apr 15 2022

diego added a comment to T293511: Expand section aligment to more languages, and share dumps.
  • We have published the documentation about this project here.
  • All code and data is available and linked on the documentation page.
Apr 15 2022, 11:40 AM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)
diego closed T289492: Detecting Promotional Tone in Wikipedia Articles as Resolved.
Apr 15 2022, 11:31 AM · Research (FY2021-22-Research-April-June), Epic
diego added a comment to T289492: Detecting Promotional Tone in Wikipedia Articles.
  • Results and dataset details has been published. You can find them here.
  • I'm changing this task as resolves, and updating also in betterworks.
Apr 15 2022, 11:30 AM · Research (FY2021-22-Research-April-June), Epic

Apr 12 2022

diego added a comment to T305888: Reference Quality in English Wikipedia / Internship.

Hi @diego. Could you please associate at least one active project with this task (via the Add Action...Change Project Tags dropdown)? This will allow others to get notified, or see this task when searching via projects. Thanks!

Apr 12 2022, 2:37 PM · Research (FY2021-22-Research-April-June)
diego added a project to T305888: Reference Quality in English Wikipedia / Internship: Research.
Apr 12 2022, 2:37 PM · Research (FY2021-22-Research-April-June)

Apr 11 2022

diego added a comment to T305390: Cross-Linngual Article Quality Exploration.
  • Last week we have done the onboarding on using PySpark and cluster data.
  • As first step @paramita_das will be working on obtain the article quality distribution along the time for enwiki.
Apr 11 2022, 9:17 PM · Research (FY2021-22-Research-April-June)
diego added a comment to T305888: Reference Quality in English Wikipedia / Internship.
  • Last week we have done the onboarding on using PySpark and cluster data.
  • @Aitolkyn is starting to explore how to match bad references with pageviews.
Apr 11 2022, 9:14 PM · Research (FY2021-22-Research-April-June)
diego created T305888: Reference Quality in English Wikipedia / Internship.
Apr 11 2022, 9:12 PM · Research (FY2021-22-Research-April-June)

Apr 8 2022

diego updated the task description for T289492: Detecting Promotional Tone in Wikipedia Articles.
Apr 8 2022, 4:30 PM · Research (FY2021-22-Research-April-June), Epic
diego updated the task description for T293511: Expand section aligment to more languages, and share dumps.
Apr 8 2022, 4:30 PM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)
diego added a comment to T293511: Expand section aligment to more languages, and share dumps.
  • We have published the alignments for 205 languages here.
  • Each folder contains the alignments from that language to all others. For example 'enwiki' contains the alignments from English to all the other wikis.
  • The format is SQLite. @santhosh could you confirm you are able to read the files?
  • We are working on the algorithm and output documentation.
Apr 8 2022, 4:30 PM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)
diego added a comment to T289492: Detecting Promotional Tone in Wikipedia Articles.
  • The dataset has been released here.
  • The paper will be published in May.
Apr 8 2022, 4:25 PM · Research (FY2021-22-Research-April-June), Epic
diego added a comment to T287946: Identifying controversial content in Wikidata.
  • We finished this project, results can be found on Meta, the code and models could be found in Gitlab.
  • I'll discuss future work with @Lydia_Pintscher.
Apr 8 2022, 4:20 PM · Research (FY2021-22-Research-April-June), Wikidata Analytics, Wikidata
diego closed T287946: Identifying controversial content in Wikidata as Resolved.
Apr 8 2022, 4:17 PM · Research (FY2021-22-Research-April-June), Wikidata Analytics, Wikidata

Apr 7 2022

diego added a comment to T305390: Cross-Linngual Article Quality Exploration.

Thanks @Isaac for these inputs. There was a mistake on the title, this work is about article quality and not specifically about citations.

Apr 7 2022, 3:00 PM · Research (FY2021-22-Research-April-June)
diego renamed T305390: Cross-Linngual Article Quality Exploration from Cross-Linngual Citation Quality Exploration to Cross-Linngual Article Quality Exploration.
Apr 7 2022, 2:59 PM · Research (FY2021-22-Research-April-June)

Apr 4 2022

diego created T305390: Cross-Linngual Article Quality Exploration.
Apr 4 2022, 4:44 PM · Research (FY2021-22-Research-April-June)
diego added a comment to T305298: Requesting access to Analytic Cluster for Research Intern (paramita_das).

@diego: We also need the estimated end date of the internship (you'll be contacted two weeks before it expires whether to extend access or not).

The internship ends on June 24th.

Apr 4 2022, 2:40 PM · SRE, SRE-Access-Requests
diego added a comment to T305299: Requesting access to Analytic Cluster for Research Intern (Aitolkyn).

@diego: We also need the estimated end date of the internship (you'll be contacted two weeks before it expires whether to extend access or not).

Apr 4 2022, 2:39 PM · SRE, SRE-Access-Requests

Apr 2 2022

diego added a comment to T305299: Requesting access to Analytic Cluster for Research Intern (Aitolkyn).

@Aitolkyn please update the task description with your SSH key.

Apr 2 2022, 11:20 AM · SRE, SRE-Access-Requests
diego created T305299: Requesting access to Analytic Cluster for Research Intern (Aitolkyn).
Apr 2 2022, 11:19 AM · SRE, SRE-Access-Requests
diego added a comment to T305298: Requesting access to Analytic Cluster for Research Intern (paramita_das).

@paramita_das please update the task description with your SSH key.

Apr 2 2022, 11:13 AM · SRE, SRE-Access-Requests
diego created T305298: Requesting access to Analytic Cluster for Research Intern (paramita_das).
Apr 2 2022, 11:11 AM · SRE, SRE-Access-Requests

Apr 1 2022

diego added a comment to T293511: Expand section aligment to more languages, and share dumps.
  • We have tested our model model on the CX dataset (sections translations done using the CX Tool).
  • Results are showing a good performance. @MunizaA please report the precision@5 for the top-100 languages pairs.
  • Now, we run the alignments for all the languages, and the results will be ready early next week.
Apr 1 2022, 6:19 PM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)

Mar 25 2022

diego added a comment to T288333: Understanding the spread of disinformation on Wikipedia.
  • I'm preparing a short presentation with the main findings of this project. I'm planning to present these results during the next disinformation working group meeting.
Mar 25 2022, 10:06 PM · Research (FY2021-22-Research-April-June)
diego added a comment to T293511: Expand section aligment to more languages, and share dumps.

Updates

  • We decided to go back to the XGBoost based model, because the results were better than using the Spark implementation.
  • We noticed a decrease on precision when considering under-resourced languages. Our hypothesis is that the quality of embeddings created by M-Bert is not very high. We decided to create a second model, language-agnostic, and then compare the results. Our intuition is that for some languages the language agnostic model will be better.
  • We plan to release all these results at the end of next week.
Mar 25 2022, 9:59 PM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)
diego added a comment to T287946: Identifying controversial content in Wikidata.

Updates

  • I was comparing the results when adding anonymous edits, until now I haven't find major differences with the previous results. I'll continue working on this during the next week before my next meeting with Lydia.
Mar 25 2022, 9:52 PM · Research (FY2021-22-Research-April-June), Wikidata Analytics, Wikidata

Mar 18 2022

diego added a comment to T287946: Identifying controversial content in Wikidata.
  • I've presented the main results of this work during the Tuesday Research Sessions, slides can be find here.
Mar 18 2022, 12:48 PM · Research (FY2021-22-Research-April-June), Wikidata Analytics, Wikidata
diego added a comment to T293511: Expand section aligment to more languages, and share dumps.
  • We are working in applying the model at scale. @MunizaA has been experimenting with native spark libraries to see if is possible to replace external dependencies. The quality of firsts results are not satisfactory, so we are exploring alternatives.
Mar 18 2022, 12:47 PM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)
diego added a comment to T288333: Understanding the spread of disinformation on Wikipedia.
  • No updates.
Mar 18 2022, 12:45 PM · Research (FY2021-22-Research-April-June)
diego added a comment to T289492: Detecting Promotional Tone in Wikipedia Articles.
  • Our collaborators got their publication accepted, the dataset should be released in the following week.s
Mar 18 2022, 12:44 PM · Research (FY2021-22-Research-April-June), Epic

Mar 7 2022

diego renamed T293511: Expand section aligment to more languages, and share dumps from Expand section aligment to more languages, and create an API to Expand section aligment to more languages, and share dumps.
Mar 7 2022, 11:33 AM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)
diego added a comment to T289492: Detecting Promotional Tone in Wikipedia Articles.

Updates

  • No updates
Mar 7 2022, 1:09 AM · Research (FY2021-22-Research-April-June), Epic
diego added a comment to T293511: Expand section aligment to more languages, and share dumps.
  • We are fine tuning the model.
Mar 7 2022, 1:09 AM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)
diego added a comment to T288333: Understanding the spread of disinformation on Wikipedia.
  • I'm currently studying the propagation of "climate change" related items (thanks Isaac for the dataset)
Mar 7 2022, 1:08 AM · Research (FY2021-22-Research-April-June)
diego added a comment to T287946: Identifying controversial content in Wikidata.
  • We meet with Lydia and discussed the current results.
  • We reviewed the results confirming that most co-edited items corresponds to on going events, even when we change the time window to be considered.
  • Now, I'll be studying the relevance/prevalence of anonymous edits on popular content.
Mar 7 2022, 1:07 AM · Research (FY2021-22-Research-April-June), Wikidata Analytics, Wikidata

Feb 18 2022

diego added a comment to T289492: Detecting Promotional Tone in Wikipedia Articles.

Updates

  • No update.
Feb 18 2022, 9:46 PM · Research (FY2021-22-Research-April-June), Epic
diego added a comment to T293511: Expand section aligment to more languages, and share dumps.
  • With @MunizaA we have annotated data in Spanish to English and Urdu to English.
    • We found that popularity of sections (amount of articles they appear) has a huge impact on the results' quality.
    • While for popular sections there are multiple possible translations, the most infrequent ones usually has 1 or 2.
    • We are trying to improve the model to address these issues.
  • We are also analyzing how to use MT to improve the results.
Feb 18 2022, 9:44 PM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)
diego added a comment to T288333: Understanding the spread of disinformation on Wikipedia.
  • We are shaping the paper and checking which new experiments would be required.
Feb 18 2022, 9:38 PM · Research (FY2021-22-Research-April-June)
diego added a comment to T287946: Identifying controversial content in Wikidata.
  • No updates
Feb 18 2022, 9:36 PM · Research (FY2021-22-Research-April-June), Wikidata Analytics, Wikidata

Feb 12 2022

diego added a comment to T289492: Detecting Promotional Tone in Wikipedia Articles.
  • No updates.
Feb 12 2022, 2:50 AM · Research (FY2021-22-Research-April-June), Epic
diego added a comment to T293511: Expand section aligment to more languages, and share dumps.
  • @MunizaA has uploaded this sample files containing several languages Each of them contains the top-200 most frequent sections in the source language.
  • @Pginer-WMF , please have a look on them. Keep in mind that we are focusing in recall more than precision. By now, we are showing the top-20 most similar target sections, per source section.
  • I'll coordinate a meeting in the following days to discuss how to tune these results.
Feb 12 2022, 2:30 AM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)
diego added a comment to T288333: Understanding the spread of disinformation on Wikipedia.
  • No updates
Feb 12 2022, 2:24 AM · Research (FY2021-22-Research-April-June)
diego added a comment to T287946: Identifying controversial content in Wikidata.
  • I'm working in identifying collaborative edits on wikidata items not related to current events.
Feb 12 2022, 2:24 AM · Research (FY2021-22-Research-April-June), Wikidata Analytics, Wikidata

Feb 4 2022

diego added a comment to T288333: Understanding the spread of disinformation on Wikipedia.
  • We are in the process of writing the paper.
Feb 4 2022, 9:55 PM · Research (FY2021-22-Research-April-June)
diego updated the task description for T293511: Expand section aligment to more languages, and share dumps.
Feb 4 2022, 9:53 PM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)
diego added a comment to T289492: Detecting Promotional Tone in Wikipedia Articles.
  • No updates
Feb 4 2022, 9:53 PM · Research (FY2021-22-Research-April-June), Epic
diego added a comment to T293511: Expand section aligment to more languages, and share dumps.
Feb 4 2022, 9:53 PM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)
diego added a comment to T287946: Identifying controversial content in Wikidata.
  • No updates
Feb 4 2022, 9:49 PM · Research (FY2021-22-Research-April-June), Wikidata Analytics, Wikidata

Jan 22 2022

diego added a comment to T289492: Detecting Promotional Tone in Wikipedia Articles.
  • No updates.
Jan 22 2022, 12:38 AM · Research (FY2021-22-Research-April-June), Epic
diego added a comment to T293511: Expand section aligment to more languages, and share dumps.
  • We have done manual sanity checks on the data extraction pipeline, confirming that is working properly.
  • Next steps will be to run the model in 20 new languages.
Jan 22 2022, 12:38 AM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)
diego added a comment to T288333: Understanding the spread of disinformation on Wikipedia.
  • No updates
Jan 22 2022, 12:33 AM · Research (FY2021-22-Research-April-June)
diego added a comment to T287946: Identifying controversial content in Wikidata.
  • We are now focusing in understanding collaborations patterns: when/how more than user edits the same item in a given period of time.
    • We found that in Wikidata such collaborations are less frequent than in other Wikimedia projects.
    • We also found that items edited by more than one user are usually related to on going events (awards, deaths, releases)
  • I'll present some of these findings:
    • On research meeting (Tuesday) in March
    • And @Lydia_Pintscher will propose a date probably in April to present these results to the Wikidata folks.
Jan 22 2022, 12:32 AM · Research (FY2021-22-Research-April-June), Wikidata Analytics, Wikidata

Jan 17 2022

diego added a comment to T288333: Understanding the spread of disinformation on Wikipedia.
  • I'm gathering all the last results to organize them to write the report.
Jan 17 2022, 1:52 AM · Research (FY2021-22-Research-April-June)
diego added a comment to T293511: Expand section aligment to more languages, and share dumps.
  • We are analyzing the results showed above before deciding the new steps.
Jan 17 2022, 1:51 AM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)
diego added a comment to T289492: Detecting Promotional Tone in Wikipedia Articles.

Updates

  • No updates
Jan 17 2022, 1:50 AM · Research (FY2021-22-Research-April-June), Epic
diego added a comment to T287946: Identifying controversial content in Wikidata.
  • I'm organizing the new results to be discussed with the stakeholder.
Jan 17 2022, 1:50 AM · Research (FY2021-22-Research-April-June), Wikidata Analytics, Wikidata
diego moved T289492: Detecting Promotional Tone in Wikipedia Articles from FY2021-22-Research-Oct-Dec to FY2021-22-Research-Jan-March on the Research board.
Jan 17 2022, 1:49 AM · Research (FY2021-22-Research-April-June), Epic
diego moved T288333: Understanding the spread of disinformation on Wikipedia from FY2021-22-Research-Oct-Dec to FY2021-22-Research-Jan-March on the Research board.
Jan 17 2022, 1:49 AM · Research (FY2021-22-Research-April-June)
diego moved T293511: Expand section aligment to more languages, and share dumps from FY2021-22-Research-Oct-Dec to FY2021-22-Research-Jan-March on the Research board.
Jan 17 2022, 1:49 AM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)
diego moved T287946: Identifying controversial content in Wikidata from FY2021-22-Research-Oct-Dec to FY2021-22-Research-Jan-March on the Research board.
Jan 17 2022, 1:49 AM · Research (FY2021-22-Research-April-June), Wikidata Analytics, Wikidata

Jan 13 2022

diego added a comment to T297461: Check home/HDFS leftovers of christinedk.

yes.

Jan 13 2022, 7:03 PM · Data-Engineering-Kanban, Data-Engineering

Jan 8 2022

diego added a comment to T293511: Expand section aligment to more languages, and share dumps.
  • @MunizaA has run the first experiments to compare the results with the new language model, with our old FastText-based model, obtaining promising results. (@MunizaA please share the new results here.)
  • The next steps are:
    • Test the model for language pairs without training data.
    • Estimate the time required to run the model in the 100+ languages supported by this new approach.
Jan 8 2022, 1:00 AM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)
diego added a comment to T287946: Identifying controversial content in Wikidata.
  • I'm focusing on modeling the relationship between topics and collaborations/controversies.
    • I'm working on graph representation of these components
Jan 8 2022, 12:56 AM · Research (FY2021-22-Research-April-June), Wikidata Analytics, Wikidata
diego added a comment to T289492: Detecting Promotional Tone in Wikipedia Articles.
  • I've updated the task according to the new plan discussed with our collaborators.
Jan 8 2022, 12:54 AM · Research (FY2021-22-Research-April-June), Epic
diego updated the task description for T289492: Detecting Promotional Tone in Wikipedia Articles.
Jan 8 2022, 12:53 AM · Research (FY2021-22-Research-April-June), Epic
diego added a comment to T288333: Understanding the spread of disinformation on Wikipedia.
  • No updates
Jan 8 2022, 12:52 AM · Research (FY2021-22-Research-April-June)

Jan 3 2022

diego added a comment to T287655: Generate template parameter alignments for en > de wikis.

I've updated the code here https://github.com/digitalTranshumant/templatesAlignment/blob/master/02alignmentsSpark.ipynb

Jan 3 2022, 12:17 PM · Language-Team (Language-2022-April-June), ContentTranslation

Dec 24 2021

diego added a comment to T293511: Expand section aligment to more languages, and share dumps.
  • @MunizaA has developed the full pipeline to efficiently extract all the features used on the original model, such as link similarity and edit distance.
  • We are currently preparing the experiment to validate our results using the new Language model (to replace FastText).
Dec 24 2021, 11:40 PM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)
diego added a comment to T289492: Detecting Promotional Tone in Wikipedia Articles.
  • We have received the report from our collaborators with the description of the dataset, and the result of their model.
  • We will coordinate for releasing the dataset during this FY.
Dec 24 2021, 11:37 PM · Research (FY2021-22-Research-April-June), Epic
diego added a comment to T287946: Identifying controversial content in Wikidata.
  • We have seen that few items are edited by more than one user.
  • We are currently researching about the item and users characteristics related to collaborative work.
Dec 24 2021, 11:35 PM · Research (FY2021-22-Research-April-June), Wikidata Analytics, Wikidata
diego added a comment to T288333: Understanding the spread of disinformation on Wikipedia.
  • We are testing a new DL model to predict content propagation, using content reliability as one of the features.
Dec 24 2021, 11:33 PM · Research (FY2021-22-Research-April-June)

Dec 8 2021

diego added a comment to T287655: Generate template parameter alignments for en > de wikis.

Oh got it! This setup has been changed around one year ago. Now we all use the spark environments provided by the JupyterHub.

Dec 8 2021, 5:55 PM · Language-Team (Language-2022-April-June), ContentTranslation
diego added a comment to T287655: Generate template parameter alignments for en > de wikis.

@KartikMistry this looks like a pyspark configuration issue, which kernel are you using?

Dec 8 2021, 10:15 AM · Language-Team (Language-2022-April-June), ContentTranslation

Dec 4 2021

diego updated the task description for T293511: Expand section aligment to more languages, and share dumps.
Dec 4 2021, 12:42 PM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)

Dec 3 2021

diego closed T252450: Submit a paper about the model developed for detecting disinformation. , a subtask of T243256: Measuring the consistency of information between Wikipedia articles and Wikidata items., as Resolved.
Dec 3 2021, 5:37 PM · Research (FY2019-20-Research-April-June)
diego closed T252450: Submit a paper about the model developed for detecting disinformation.  as Resolved.
Dec 3 2021, 5:37 PM · Research
diego added a comment to T252450: Submit a paper about the model developed for detecting disinformation. .
  • The paper can be found here, and this task is done.
Dec 3 2021, 5:37 PM · Research
diego added a comment to T287946: Identifying controversial content in Wikidata.
  • No updates this week. I'm going to meet with the stakeholder next week.
Dec 3 2021, 5:36 PM · Research (FY2021-22-Research-April-June), Wikidata Analytics, Wikidata
diego added a comment to T288333: Understanding the spread of disinformation on Wikipedia.

Updates

  • No updates
Dec 3 2021, 5:35 PM · Research (FY2021-22-Research-April-June)
diego added a comment to T289492: Detecting Promotional Tone in Wikipedia Articles.
  • The dataset was created, I'm coordinating with our collaborators to discuss the time and format of the data release.
Dec 3 2021, 5:35 PM · Research (FY2021-22-Research-April-June), Epic
diego added a comment to T293511: Expand section aligment to more languages, and share dumps.
  • We obtained the first results with new language models. @MunizaA could you please report the numbers here?
Dec 3 2021, 5:34 PM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)

Nov 12 2021

diego added a comment to T293511: Expand section aligment to more languages, and share dumps.
  • @MunizaA is testing new languages models that could be more efficient and possible accurate than the FastText embeddings used in the previous experiments.
Nov 12 2021, 6:19 PM · SectionTranslation, Language-Team (Language-2022-April-June), Research (FY2021-22-Research-April-June)
diego added a comment to T287946: Identifying controversial content in Wikidata.
  • I've been working on classifier to predict reverts.
    • The current classifier uses article (item), revision and user information.
    • On a balance test set, the actual model gets results over 70% of accuracy
    • However, there is a set of caveats to be considered:
      • 'auto-reverts': users can revert themselves, this shouldn't be consider as signal of controversy. We need to analyze more this behavior.
      • power-users: we need to take in account that a small set of users produces most of the edits and reverts, this behavior could affect our results. We are working on different sampling method to address this issue.
  • The meta page was updated with the results in Q1 and partial results in Q2.
Nov 12 2021, 6:18 PM · Research (FY2021-22-Research-April-June), Wikidata Analytics, Wikidata
diego updated the task description for T287946: Identifying controversial content in Wikidata.
Nov 12 2021, 6:10 PM · Research (FY2021-22-Research-April-June), Wikidata Analytics, Wikidata
diego added a comment to T289492: Detecting Promotional Tone in Wikipedia Articles.

Updates

  • Meta page was updated.
Nov 12 2021, 6:10 PM · Research (FY2021-22-Research-April-June), Epic
diego added a comment to T288333: Understanding the spread of disinformation on Wikipedia.

Updates

  • No updates
Nov 12 2021, 6:08 PM · Research (FY2021-22-Research-April-June)

Nov 5 2021

diego added a comment to T252450: Submit a paper about the model developed for detecting disinformation. .
  • The paper has been accepted in the IEEE BigData 2021 conference.
  • I'll upload the paper and write documentation in meta in the following weeks.
Nov 5 2021, 8:28 PM · Research
diego added a comment to T289492: Detecting Promotional Tone in Wikipedia Articles.
  • No updates this week.
Nov 5 2021, 8:25 PM · Research (FY2021-22-Research-April-June), Epic