Page MenuHomePhabricator

Pablo (Pablo Aragón)
Research Scientist

Today

  • No visible events.

Tomorrow

  • No visible events.

Sunday

  • No visible events.

User Details

User Since
Feb 2 2021, 1:32 PM (253 w, 3 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
Pablo (WMF) [ Global Accounts ]

Recent Activity

Thu, Nov 27

Pablo added a comment to T389809: Check home/HDFS leftovers of aitolkyn.

@Gehel: thank you for your understanding and support!

Thu, Nov 27, 7:04 PM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Essential-Work

Mon, Nov 24

Pablo updated subscribers of T410744: model reference-risk: reference_risk_score is always 0..

Thanks again, @OKarakaya-WMF!

Mon, Nov 24, 11:44 AM · Machine-Learning-Team
Pablo added a comment to T410744: model reference-risk: reference_risk_score is always 0..

Thank you, @OKarakaya-WMF!

Mon, Nov 24, 10:15 AM · Machine-Learning-Team

Nov 6 2025

Pablo added a comment to T389809: Check home/HDFS leftovers of aitolkyn.

Thank you, @Gehel! Is it possible to copy the data from home/HDFS leftovers of Aitolkyn to my home folder(s)?

Nov 6 2025, 10:29 AM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Essential-Work

Oct 29 2025

Pablo created T408728: Additional monthly snapshots for the editor history dataset.
Oct 29 2025, 6:02 PM · WMF-NDA, OKR-Work, Research, Knowledge-Integrity

Oct 24 2025

Pablo added a comment to T408203: Fix risk observatory dashboard.

Beyond the dashboard (which is sometimes used by the T&S Disinformation team), the risk observatory data has been used on multiple occasions, e.g., to calibrate the default thresholds of Automoderator (T358128) or to enrich the patrolling dataset (T392210).

Oct 24 2025, 2:45 PM · Research-engineering, Research

Jul 30 2025

Pablo added a comment to T399696: GitLab Private Repository Request for: research/npov-workstream-research.

That works. Thank you, @thcipriani!

Jul 30 2025, 3:16 PM · Essential-Work, Release-Engineering-Team (Doing 😎), GitLab (Support)

Jul 22 2025

Pablo added a comment to T398718: update the risk observatory usage of the ipblocks.

Thank you both!

Jul 22 2025, 9:49 AM · Research-engineering, Research

Jul 16 2025

Pablo updated the task description for T399696: GitLab Private Repository Request for: research/npov-workstream-research.
Jul 16 2025, 8:58 AM · Essential-Work, Release-Engineering-Team (Doing 😎), GitLab (Support)
Pablo created T399696: GitLab Private Repository Request for: research/npov-workstream-research.
Jul 16 2025, 8:55 AM · Essential-Work, Release-Engineering-Team (Doing 😎), GitLab (Support)

Jul 11 2025

Pablo closed T393472: Attend ICWSM 2025 conference as Resolved.
Jul 11 2025, 2:13 PM · Knowledge-Integrity, Research-outreach, Research
Pablo added a comment to T393472: Attend ICWSM 2025 conference.

weekly update

Jul 11 2025, 2:13 PM · Knowledge-Integrity, Research-outreach, Research

Jun 30 2025

Pablo closed T392210: [WE1.5.3] Wikipedia Patrolling Measurement as Resolved.
Jun 30 2025, 5:48 PM · OKR-Work, Research, Knowledge-Integrity
Pablo added a comment to T392210: [WE1.5.3] Wikipedia Patrolling Measurement.

Progress update on the hypothesis for the week
The report has been updated to incorporate feedback from various stakeholders. The hypothesis said:

If we develop heuristics-based data pipelines for measuring patrolling activity on Wikipedia, we can prototype a model to detect moderation gaps at scale.

To test this, data pipelines were developed to generate the dataset, which has been made accessible via a Superset dashboard, as specified here. The analysis with the dashboard has revealed moderation gaps, which have been shared with stakeholders and documented in the revised report. The dataset is expected to be used to provide data on the retention rate of patrollers using the FlaggedRevs and reverting editors at T396493 as a comparable moderator retention rate metric to inform targets. Furthermore, additional opportunities to leverage this data have been identified at T398071. As a consequence, the hypothesis is supported.

Jun 30 2025, 5:48 PM · OKR-Work, Research, Knowledge-Integrity

Jun 27 2025

Pablo added a comment to T393472: Attend ICWSM 2025 conference.

weekly update

  • The work on Crowdsourced Content Moderation in Wikipedia: A Preliminary Look at Article Maintenance Templates across Language Editions was presented at the workshop.
  • A report summarizing interactions and insights from ICWSM will be prepared next week.
Jun 27 2025, 11:58 AM · Knowledge-Integrity, Research-outreach, Research

Jun 20 2025

Pablo added a comment to T391717: [Q4 FY 24-25 Applied Science] Knowledge Integrity Research.
  • Moderator Motivations: Synced with Daisy about upcoming motivations-related projects and cross-team collaboration. Sharing motivations work at Research meeting next Tuesday.
  • Metrics on Patrolling Work: This week’s work focused on two main areas. First, significant effort was dedicated to reviewing and refining both the dataset and the dashboard. This included adding a new table with metrics suggested by the Moderator Tools team. Second, a preliminary version of the report was completed, documenting the dataset, the dashboard, and key findings. The report has been shared with the stakeholders for feedback.
Jun 20 2025, 7:22 PM · Research (FY2024-25-Research-April-June)
Pablo added a comment to T392210: [WE1.5.3] Wikipedia Patrolling Measurement.

Progress update on the hypothesis for the week
This week’s work focused on two main areas. First, significant effort was dedicated to reviewing and refining both the dataset and the dashboard. This included adding a new table with metrics suggested by the Moderator Tools team. Second, a preliminary version of the report was completed, documenting the dataset, the dashboard, and key findings. The report has been shared with the stakeholders for feedback.

Jun 20 2025, 3:47 PM · OKR-Work, Research, Knowledge-Integrity
Pablo added a comment to T393472: Attend ICWSM 2025 conference.

weekly update

Jun 20 2025, 8:58 AM · Knowledge-Integrity, Research-outreach, Research

Jun 13 2025

Pablo added a comment to T392210: [WE1.5.3] Wikipedia Patrolling Measurement.

Progress update on the hypothesis for the week
This week's efforts primarily focused on interactions with the Moderator Tools and Product Analytics teams. During a joint meeting, I presented the current status of the dataset and dashboard, and gathered feedback (notes). Following the session, both teams engaged further to request additional details, particularly in relation to defining baselines for the projected increase in moderation actions for FY 25/26 WE 1.3 KR (T396493). In parallel, while awaiting input from stakeholders regarding specific product interventions to be tracked using this data, I began drafting a report for this project to document the dataset and a summary of findings.

Jun 13 2025, 10:41 AM · OKR-Work, Research, Knowledge-Integrity
Pablo added a comment to T393472: Attend ICWSM 2025 conference.

weekly update

Jun 13 2025, 9:10 AM · Knowledge-Integrity, Research-outreach, Research

Jun 6 2025

Pablo added a comment to T392210: [WE1.5.3] Wikipedia Patrolling Measurement.

Progress update on the hypothesis for the week
This week’s work focused on building a Superset dashboard to deliver the dataset, which also included modifications to the existing notebooks, e.g., moving the parsing of edit summaries (to extract links to the project namespace) into the data collection phase. The dashboard will be reviewed next week in meetings with colleagues from the Product Analytics and Moderator Tools teams.

Jun 6 2025, 10:34 AM · OKR-Work, Research, Knowledge-Integrity
Pablo added a comment to T393472: Attend ICWSM 2025 conference.

weekly update

Jun 6 2025, 9:17 AM · Knowledge-Integrity, Research-outreach, Research

May 30 2025

Pablo added a comment to T392210: [WE1.5.3] Wikipedia Patrolling Measurement.

Progress update on the hypothesis for the week
This week has been relatively lighter, as I have been OoO for two days. That said, I updated the data collection notebook to incorporate revision tags and the comment of the reverting revision for those who are reverted. Progress with the data has been reviewed with @Isaac during our 1:1 meeting. In addition, two other meetings have been scheduled with stakeholders as the dataset is expected to be used on an Product Analytics + Moderators Tools teams's effort to measure current moderator activity to inform a Key Result target for WE 1.3 FY 25/26.

May 30 2025, 1:00 PM · OKR-Work, Research, Knowledge-Integrity

May 23 2025

Pablo added a comment to T392210: [WE1.5.3] Wikipedia Patrolling Measurement.

Progress update on the hypothesis for the week
Continued development on the data collection notebook. For the October 2024 snapshot, metadata now includes information on addition and removal of article maintenance templates. For instance, English Wikipedia revision 1248733993 reflects this update with a new column indicating templates changes as mbox:more footnotes-add | mbox:no footnotes-remove.

May 23 2025, 1:53 PM · OKR-Work, Research, Knowledge-Integrity
Pablo closed T378485: Organize the Research track - Wiki Workshop 2025 as Resolved.
May 23 2025, 12:03 PM · Research, Essential-Work, Research-foundational
Pablo closed T378485: Organize the Research track - Wiki Workshop 2025, a subtask of T369818: Organize and wrap up Wiki Workshop 2025 (May 21 and 22), as Resolved.
May 23 2025, 12:03 PM · Research, Research-foundational
Pablo added a comment to T378485: Organize the Research track - Wiki Workshop 2025.

weekly update:

May 23 2025, 12:03 PM · Research, Essential-Work, Research-foundational

May 16 2025

Pablo added a comment to T392210: [WE1.5.3] Wikipedia Patrolling Measurement.

Progress update on the hypothesis for the week
Advanced the notebook to create the dataset of edits in March 2025. The dataset has been expanded to include detailed information on reverting editors and predicted revert risk scores, enabling answering question of reverting activity (see updates on metrics below). Furthermore, the notebook has been re-run to generate an analogous dataset for edits in October 2024, for which metadata on article maintenance templates will be available (see task T384600).

May 16 2025, 10:58 AM · OKR-Work, Research, Knowledge-Integrity
Pablo added a comment to T378485: Organize the Research track - Wiki Workshop 2025.

weekly update:

May 16 2025, 10:05 AM · Research, Essential-Work, Research-foundational

May 13 2025

Pablo created T394065: [FY25-WE1.5.3] HTML wiki content dataset to support Wikipedia Patrolling Measurement.
May 13 2025, 3:53 PM · Research
Pablo added a comment to T392210: [WE1.5.3] Wikipedia Patrolling Measurement.

Progress update on the hypothesis for the week (last week)
Started working on a notebook to create the dataset of edits in March 2025 with metadata, including mediawiki_history fields and patrolling information and status (prevented, delete, reverted, reviewed, edited_over, autopatrolled).

May 13 2025, 3:17 PM · OKR-Work, Research, Knowledge-Integrity

May 9 2025

Pablo added a comment to T378485: Organize the Research track - Wiki Workshop 2025.

weekly update:

May 9 2025, 11:17 AM · Research, Essential-Work, Research-foundational
Pablo added a comment to T393695: Frequent crashes of Hadoop nodemanager services with OOM errrors.

Hi, I confirm that I was running a notebook yesterday, which unfortunately crashed due to memory limitations. Apologies for any inconvenience this may have caused!

May 9 2025, 10:57 AM · Data-Platform-SRE (2025.05.02 - 2025.05.23)

May 6 2025

Pablo updated the task description for T391719: [Q4 FY 24-25 Applied Science] Building the Foundations Research.
May 6 2025, 2:00 PM · Research (FY2024-25-Research-April-June)
Pablo created T393472: Attend ICWSM 2025 conference.
May 6 2025, 1:59 PM · Knowledge-Integrity, Research-outreach, Research

Apr 30 2025

Pablo added a comment to T378485: Organize the Research track - Wiki Workshop 2025.

no weekly update (on hold until the schedules of the research track sessions are finalized)

Apr 30 2025, 12:50 PM · Research, Essential-Work, Research-foundational

Apr 25 2025

Pablo added a comment to T378485: Organize the Research track - Wiki Workshop 2025.

weekly update:

  • Topical sessions have been organized into three slots based on the approximate schedules shared by @leila and the estimated location of the first author of each accepted contribution (note: no perfect distribution is feasible).
  • Tentative candidates for session chairs have also been identified.
Apr 25 2025, 10:44 AM · Research, Essential-Work, Research-foundational

Apr 17 2025

Pablo updated the task description for T392210: [WE1.5.3] Wikipedia Patrolling Measurement.
Apr 17 2025, 4:04 PM · OKR-Work, Research, Knowledge-Integrity
Pablo updated the task description for T391717: [Q4 FY 24-25 Applied Science] Knowledge Integrity Research.
Apr 17 2025, 11:43 AM · Research (FY2024-25-Research-April-June)
Pablo created T392210: [WE1.5.3] Wikipedia Patrolling Measurement.
Apr 17 2025, 11:42 AM · OKR-Work, Research, Knowledge-Integrity
Pablo closed T383222: Review and select submissions for research track as Resolved.
Apr 17 2025, 8:56 AM · Research, Research-outreach, Research-foundational
Pablo closed T383222: Review and select submissions for research track, a subtask of T378485: Organize the Research track - Wiki Workshop 2025, as Resolved.
Apr 17 2025, 8:56 AM · Research, Essential-Work, Research-foundational
Pablo added a comment to T383222: Review and select submissions for research track.

As reported at T378485#10751230, this task is resolved.

Apr 17 2025, 8:56 AM · Research, Research-outreach, Research-foundational
Pablo updated the task description for T383222: Review and select submissions for research track.
Apr 17 2025, 8:54 AM · Research, Research-outreach, Research-foundational
Pablo updated subscribers of T378485: Organize the Research track - Wiki Workshop 2025.

weekly update:

Apr 17 2025, 8:53 AM · Research, Essential-Work, Research-foundational

Apr 11 2025

Pablo added a comment to T378485: Organize the Research track - Wiki Workshop 2025.

weekly update:

Apr 11 2025, 6:01 PM · Research, Essential-Work, Research-foundational

Apr 4 2025

Pablo closed T384600: Crowdsourced content moderation metrics, a subtask of T371865: Who are moderators?, as Resolved.
Apr 4 2025, 3:52 PM · Research, Epic
Pablo closed T384600: Crowdsourced content moderation metrics as Resolved.
Apr 4 2025, 3:52 PM · Research (FY2024-25-Research-April-June), Essential-Work
Pablo added a comment to T384600: Crowdsourced content moderation metrics.

Thanks, @Isaac! I re-submitted a new version of the manuscript addressing your suggestions, along with some minor edits from @cwylo and me, so I will resolve this task.

Apr 4 2025, 3:52 PM · Research (FY2024-25-Research-April-June), Essential-Work
Pablo added a comment to T378485: Organize the Research track - Wiki Workshop 2025.

weekly update:

Apr 4 2025, 3:18 PM · Research, Essential-Work, Research-foundational

Apr 2 2025

Pablo added a comment to T383222: Review and select submissions for research track.

@leila, please consider reassigning this task to me

Apr 2 2025, 2:55 PM · Research, Research-outreach, Research-foundational

Mar 28 2025

Pablo updated the task description for T383610: [Q3 FY 24-25 Applied Science] Moderation Research.
Mar 28 2025, 2:16 PM · Research (FY2024-25-Research-January-March)
Pablo added a comment to T383610: [Q3 FY 24-25 Applied Science] Moderation Research.

Weekly update:

Mar 28 2025, 2:16 PM · Research (FY2024-25-Research-January-March)
Pablo closed T382618: Experiment: Translate research findings for policy impact as Resolved.
Mar 28 2025, 12:36 PM · Research (FY2024-25-Research-January-March)
Pablo added a comment to T382618: Experiment: Translate research findings for policy impact.

Weekly Update

Mar 28 2025, 12:36 PM · Research (FY2024-25-Research-January-March)

Mar 27 2025

Pablo added a comment to T384600: Crowdsourced content moderation metrics.

@Isaac please review for sign-off

Mar 27 2025, 5:27 PM · Research (FY2024-25-Research-April-June), Essential-Work
Pablo added a comment to T384600: Crowdsourced content moderation metrics.

In order to assist @cwylo in categorizing templates, a notebook was created to link templates added or removed in a revision to policy invocations in the comment of such revision. A sample of the dataset is shown below.

Mar 27 2025, 5:23 PM · Research (FY2024-25-Research-April-June), Essential-Work

Mar 24 2025

Pablo added a comment to T382618: Experiment: Translate research findings for policy impact.

@Scann: Your organization of these two events was incredibly helpful, and I appreciate you sharing this information as well!

Mar 24 2025, 8:55 AM · Research (FY2024-25-Research-January-March)

Mar 21 2025

Pablo added a comment to T383610: [Q3 FY 24-25 Applied Science] Moderation Research.

Weekly update:

Mar 21 2025, 7:47 PM · Research (FY2024-25-Research-January-March)
Pablo added a comment to T382618: Experiment: Translate research findings for policy impact.

Weekly update

Mar 21 2025, 7:47 PM · Research (FY2024-25-Research-January-March)

Mar 19 2025

Pablo added a comment to T388890: [Request] List of articles without references AND highly visible.

A year ago, I adapted the notebook we used for the ICWSM paper for this task: https://gitlab.wikimedia.org/paragon/miscellanea/-/blob/main/notebooks/recent-revisions.ipynb
I have added notes to provide context and highlighted hardcoded values that need to be modified or (ideally) parameterized.

Mar 19 2025, 9:18 AM · Research

Mar 14 2025

Pablo added a comment to T383610: [Q3 FY 24-25 Applied Science] Moderation Research.

Weekly update:

Mar 14 2025, 6:28 PM · Research (FY2024-25-Research-January-March)
Pablo updated subscribers of T384600: Crowdsourced content moderation metrics.

With this notebook, I have created CSV files of template stats for each wiki at: https://gitlab.wikimedia.org/repos/research/who-are-moderators/-/tree/main/data/templates. The files include the following fields:

  • wiki_db: Database name of the wiki (in this notebook: arzwiki, dewiki, enwiki, eswiki, frwiki, itwiki, jawiki, nlwiki, plwiki, ruwiki, svwiki, zhwiki).
  • snapshot: Timestamp of the dataset (in this notebook: 2024-10).
  • template_name: Name of the template.
  • template_type: Type of template (mbox, inline).
  • template_change: Type of change (add, remove).
  • template_count: Number of times the template was added/removed.
  • revision_count: Number of revisions in which the template was added/removed.
  • page_count: Number of pages where the template was added/removed.
  • page_namespace_0_count: Number of pages in namespace 0 where the template was added/removed.
  • page_namespace_2_count: Number of pages in namespace 2 where the template was added/removed.
  • page_namespace_102_count: Number of pages in namespace 102 where the template was added/removed.
  • page_namespace_118_count: Number of pages in namespace 118 where the template was added/removed.
  • page_namespace_other_count: Number of pages in other namespaces where the template was added/removed.
  • editor_count: Number of editors who added/removed the template.
  • editor_bot_count: Number of bot users who added/removed the template.
  • editor_bot_perc: Percentage of bot users who added/removed the template.
  • editor_sysop_count: Number of sysop users who added/removed the template.
  • editor_sysop_perc: Percentage of sysop users who added/removed the template.
  • editor_editor_count: Number of editors belonging to the editor user group who added/removed the template.
  • editor_editor_perc: Percentage of editors belonging to the editor user group who added/removed the template.
  • editor_patroller_count: Number of patroller users who added/removed the template.
  • editor_patroller_perc: Percentage of patroller users who added/removed the template.
  • editor_with_rights_count: Number of editors belonging to any user group who added/removed the template.
  • editor_with_rights_perc: Percentage of editors belonging to any user group who added/removed the template.
  • editor_without_rights_count: Number of editors not belonging to any user group who added/removed the template.
  • editor_without_rights_perc: Percentage of editors not belonging to any user group who added/removed the template.
  • editor_1_9_count: Number of editors with an edit count between 1 and 9 who added/removed the template.
  • editor_1_9_perc: Percentage of editors with an edit count between 1 and 9 who added/removed the template.
  • editor_10_99_count: Number of editors with an edit count between 10 and 99 who added/removed the template.
  • editor_10_99_perc: Percentage of editors with an edit count between 10 and 99 who added/removed the template.
  • editor_100_999_count: Number of editors with an edit count between 100 and 999 who added/removed the template.
  • editor_100_999_perc: Percentage of editors with an edit count between 100 and 999 who added/removed the template.
  • editor_1000_9999_count: Number of editors with an edit count between 1,000 and 9,999 who added/removed the template.
  • editor_1000_9999_perc: Percentage of editors with an edit count between 1,000 and 9,999 who added/removed the template.
  • editor_10000_inf_count: Number of editors with an edit count greater than 10,000 who added/removed the template.
  • editor_10000_inf_perc: Percentage of editors with an edit count greater than 10,000 who added/removed the template.
  • editor_age_mean: Mean number of years since registration for editors who added/removed the template.
  • editor_age_median: Median number of years since registration for editors who added/removed the template.
Mar 14 2025, 6:26 PM · Research (FY2024-25-Research-April-June), Essential-Work
Pablo added a comment to T382618: Experiment: Translate research findings for policy impact.

Weekly update

Mar 14 2025, 6:26 PM · Research (FY2024-25-Research-January-March)

Mar 11 2025

Pablo updated the task description for T383610: [Q3 FY 24-25 Applied Science] Moderation Research.
Mar 11 2025, 9:49 AM · Research (FY2024-25-Research-January-March)

Mar 7 2025

Pablo updated the task description for T383610: [Q3 FY 24-25 Applied Science] Moderation Research.
Mar 7 2025, 2:16 PM · Research (FY2024-25-Research-January-March)
Pablo renamed T384600: Crowdsourced content moderation metrics from Crowdsourced Content Moderation: data and metrics to Crowdsourced content moderation metrics.
Mar 7 2025, 2:15 PM · Research (FY2024-25-Research-April-June), Essential-Work
Pablo renamed T384600: Crowdsourced content moderation metrics from Distributed moderation: metrics to Crowdsourced Content Moderation: data and metrics.
Mar 7 2025, 2:15 PM · Research (FY2024-25-Research-April-June), Essential-Work
Pablo added a comment to T383610: [Q3 FY 24-25 Applied Science] Moderation Research.

Weekly update:

Mar 7 2025, 2:08 PM · Research (FY2024-25-Research-January-March)
Pablo added a comment to T382618: Experiment: Translate research findings for policy impact.

Weekly update

Mar 7 2025, 11:39 AM · Research (FY2024-25-Research-January-March)
Pablo added a comment to T384600: Crowdsourced content moderation metrics.

The Gitlab repository originally created for T377324 has been updated with:

  • Code to generate the dataset of moderation actions in October 2024 on arzwiki, dewiki, enwiki, eswiki, frwiki, itwiki, jawiki, nlwiki, plwiki, ruwiki, svwiki, zhwiki (the notebook expands on @Isaac's original approach).
  • Resulting dataset. It is a sample of one month only, but an expanded version could be utilized for understanding the use of maintenance templates to develop models to support editors (ping @MGerlach).
  • Code with the following metrics:
    • Number of templates added/removed
    • Most common templates added/removed
Mar 7 2025, 11:27 AM · Research (FY2024-25-Research-April-June), Essential-Work
Pablo closed T384616: Stats for election-related articles and edits (EU / India) as Resolved.
Mar 7 2025, 8:58 AM · Research
Pablo added a comment to T384616: Stats for election-related articles and edits (EU / India).

For outreach, these findings have been shared with colleagues and posted on Meta. I will resolve the ticket, as no further work is expected.

Mar 7 2025, 8:58 AM · Research

Feb 28 2025

Pablo added a comment to T384616: Stats for election-related articles and edits (EU / India).

Thanks for your questions!

  • ~20% of low-quality edits on English Wikipedia. This is definitely something that would be worth inspecting as there are many factors that could be influencing this finding. For example, in the US election analysis, I reviewed some low-quality edits that were not reverted and found that some were followed by edits that partially or completely removed their content. Are you imagining something that could be used for model re-training?
  • Total # of reverted edits. Yes, I actually modified the original formulation of the hypothesis because the rationale provided was: "As people are incentivised to promote or obstruct a candidate, more newcomers are making politically biased edits (or edits that violate NPOV), and thus more newcomer edits are reverted". To ensure that peaks in reverting activity are not merely a reflection of overall editing volume but instead highlight the revertability of edits (likely due to increased malicious behavior), I decided to use the revert rate rather than the absolute count of reverted edits (see figures below for the latter). However, your question indicates that further clarification is needed and suggests considering showing both metrics.
Feb 28 2025, 4:00 PM · Research
Pablo updated subscribers of T383610: [Q3 FY 24-25 Applied Science] Moderation Research.

Weekly update:

Feb 28 2025, 10:30 AM · Research (FY2024-25-Research-January-March)
Pablo added a comment to T384616: Stats for election-related articles and edits (EU / India).
  1. Take home messages
Feb 28 2025, 9:50 AM · Research
Pablo updated subscribers of T382618: Experiment: Translate research findings for policy impact.

Weekly update

Feb 28 2025, 9:46 AM · Research (FY2024-25-Research-January-March)

Feb 21 2025

Pablo added a comment to T382618: Experiment: Translate research findings for policy impact.

Weekly update

Feb 21 2025, 2:11 PM · Research (FY2024-25-Research-January-March)
Pablo added a comment to T383610: [Q3 FY 24-25 Applied Science] Moderation Research.

Weekly update:

  • Moderation public report T382614: Completed.
  • Distributed moderation (T383365/T384600): No updates this week.
  • Peacock support: @diego provided hands-on support in evaluating the existing peacock detection model (T386645#10561213). From now on, he will take on an advisory role.
  • Client-hint consultation: @Pablo reviewed the WE 4.2.10 Project Planning to provide feedback on (1) the identification sockpuppets and potential ban evasion, and (2) the process to create hash representation.
  • Elections analysis: @Pablo built the dataset with revert risk predictions of revisions about the 2024 Indian elections (enwiki: 148,650; hiwiki: 9,911; mrwiki: 3,084) and the 2024 EU elections (eswiki: 10,590; frwiki: 32,979; itwiki: 20,288; rowiki: 1,700). Notebooks have also been prepared to test hypotheses H2 and H3, which will be discussed next week with the technical research partner of the DEM-Debate project, responsible for the analysis of Wikipedia data on the 2024 EU elections.
Feb 21 2025, 1:57 PM · Research (FY2024-25-Research-January-March)
Pablo updated the task description for T383610: [Q3 FY 24-25 Applied Science] Moderation Research.
Feb 21 2025, 9:19 AM · Research (FY2024-25-Research-January-March)

Feb 14 2025

Pablo updated subscribers of T382618: Experiment: Translate research findings for policy impact.

Weekly update

Feb 14 2025, 7:10 PM · Research (FY2024-25-Research-January-March)
Pablo added a comment to T383610: [Q3 FY 24-25 Applied Science] Moderation Research.

Weekly update:

  • Moderation public report T382614: @diego created a public Meta page based on the internal report.
  • Distributed moderation (qualitative work) T383365: To align efforts between qualitative and quantitative work, a check-in meeting between @cwylo @Isaac and @Pablo has been scheduled for February 24 (next week is the quiet week for the WMF research team).
  • Distributed moderation (quantitative work) T384600: @Pablo focused on creating a notebook with a refined the parsing strategy based on three steps for matching moderation signals between revisions: (1) exact HTML matching, (2), exact wikitext matching, and (3) fuzzy HMTL matching. The later is based on the Levenshtein ratio, so over 100 edge cases across wikis have been manually inspected. As a result, it is proposed for the third step that two moderation signals are the same if (a) the Levenshtein ratio is over 0.9, and (b) the wikitext template names match. @Isaac and @Pablo are currently examining and getting expert feedback on very extreme cases.
  • Peacock support: @diego is currently advising the ML team to create an evaluation dataset at paragraph/sentence level.
  • Client-hint consultation: No update (next week, @Pablo will review the project plan).
  • Elections analysis: No update (next (quiet) week, @Pablo will focus largely on this project).
Feb 14 2025, 5:04 PM · Research (FY2024-25-Research-January-March)

Feb 7 2025

Pablo added a comment to T383610: [Q3 FY 24-25 Applied Science] Moderation Research.

Weekly update:

Feb 7 2025, 5:32 PM · Research (FY2024-25-Research-January-March)
Pablo added a comment to T382618: Experiment: Translate research findings for policy impact.

Weekly update

Feb 7 2025, 5:28 PM · Research (FY2024-25-Research-January-March)

Jan 31 2025

Pablo added a comment to T382618: Experiment: Translate research findings for policy impact.

Weekly update

  • Learnings from this week’s calls
    • There is strong interest in having research findings from surveys presented at the Youth Conference 2025.
    • Many affiliates mainly collaborate with universities and GLAM institutions, but fewer engage with other institutions for public policy. It was suggested to contact WMF colleagues working in this area to identify affiliates who are active in it.
    • We may need to map key information sources for affiliates (e.g., Diff, the Wikimedia Foundation Bulletin, etc.). In line with this, we have been invited to give an online talk at WikiHerramientas to share our work with Wikimedians from LATAM.
    • AI, content moderation, disinformation, and child protection are key areas of interest for European affiliates involved in public policy.
    • The opportunities identified with the DEM-Debate project have been confirmed.
  • A call with a user group has been scheduled for next week.
Jan 31 2025, 6:17 PM · Research (FY2024-25-Research-January-March)
Pablo added a comment to T384616: Stats for election-related articles and edits (EU / India).

After multiple checks with Wikidata queries, I found even better results by simply leveraging the article topic prediction model to identify which wikilinks in https://en.wikipedia.org/wiki/List_of_members_of_the_European_Parliament_(2019%E2%80%932024) and https://en.wikipedia.org/wiki/List_of_members_of_the_17th_Lok_Sabha correspond to biographies.

Jan 31 2025, 6:00 PM · Research
Pablo updated subscribers of T383610: [Q3 FY 24-25 Applied Science] Moderation Research.

Weekly update:

  • I focused on getting familiar with the dataset of templates or infoboxes added/deleted on revisions from October 2024. @Isaac and I have schedule a co-working session to discuss improvements and potential directions.
  • @Isaac met with Editing and ML Platform to discuss the scoping of the Peacock-detection Edit Check. They are open to adjusting but would ideally like an edit check that can operate at the sentence or paragraph level for any newcomer edit that adds new content and be able to detail which words are causing the issue. This is sensible but a different data distribution (new edits vs. existing articles) and granularity (sentences/paragraphs vs. articles) from the current model and also means that we need a more robust evaluation of the SHAP-based approach prototyped by @Aitolkyn. @Isaac suggested that valuable next steps for Editing/ML to bring more clarity would mean hosting the current model, hand-labeling some edits where the check would be triggered, and generating some groundtruth for which words are problematic when the edit should fail the peacock detection. This would provide a better understanding of whether the current model is still a good fit or needs to be re-worked with new data or approach.
  • With the feedback of the Disinformation team and a Romanian wikimedian, I completed the final lists of English articles related to the 2024 elections in India and the European Union. I also identified the existing versions in other Wikipedia language editions of interest.
Jan 31 2025, 5:56 PM · Research (FY2024-25-Research-January-March)

Jan 29 2025

Pablo added a comment to T276857: Surface Reference Reliability signal within VE.

@ppelberg as I saw that you have added the language-agnostic reference risk model card, please find the datasets with the risk scores for each domain in each wiki at https://analytics.wikimedia.org/published/wmf-ml-models/reference-quality/reference-risk

Jan 29 2025, 9:02 AM · Editing-team, Community-Wishlist-Survey-2023, EditCheck, VisualEditor

Jan 27 2025

Pablo added a comment to T384616: Stats for election-related articles and edits (EU / India).

@Strainu, thank you for your feedback! On Friday, I performed a quick precision check and plan to conduct a recall check this week, and your observation about entries missing the start time property is really helpful. The issue with replacements is certainly a limitation of this approach, but conducting a more comprehensive analysis to identify all edge cases would unfortunately be too costly.

Jan 27 2025, 4:02 PM · Research

Jan 24 2025

Pablo added a comment to T383610: [Q3 FY 24-25 Applied Science] Moderation Research.

Weekly update:

Jan 24 2025, 4:57 PM · Research (FY2024-25-Research-January-March)
Pablo added a comment to T382618: Experiment: Translate research findings for policy impact.

Weekly update

Jan 24 2025, 4:55 PM · Research (FY2024-25-Research-January-March)
Pablo updated subscribers of T384616: Stats for election-related articles and edits (EU / India).

@NForrester @Abhas I compiled these lists of English Wikipedia articles related to:

Jan 24 2025, 4:54 PM · Research

Jan 23 2025

Pablo updated the task description for T383610: [Q3 FY 24-25 Applied Science] Moderation Research.
Jan 23 2025, 3:53 PM · Research (FY2024-25-Research-January-March)
Pablo created T384616: Stats for election-related articles and edits (EU / India).
Jan 23 2025, 3:45 PM · Research
Pablo updated the task description for T383610: [Q3 FY 24-25 Applied Science] Moderation Research.
Jan 23 2025, 2:10 PM · Research (FY2024-25-Research-January-March)
Pablo created T384600: Crowdsourced content moderation metrics.
Jan 23 2025, 2:10 PM · Research (FY2024-25-Research-April-June), Essential-Work
Pablo updated the task description for T383610: [Q3 FY 24-25 Applied Science] Moderation Research.
Jan 23 2025, 12:10 PM · Research (FY2024-25-Research-January-March)
Pablo closed T375691: Essential work - Monitoring and Stats for election-related articles and edits as Resolved.
Jan 23 2025, 11:09 AM · Research, Movement-Insights
Pablo added a comment to T375691: Essential work - Monitoring and Stats for election-related articles and edits.

Solving this ticket, as all the results are included in the report.

Jan 23 2025, 11:08 AM · Research, Movement-Insights

Jan 17 2025

Pablo updated subscribers of T383610: [Q3 FY 24-25 Applied Science] Moderation Research.

Weekly update:

Jan 17 2025, 5:15 PM · Research (FY2024-25-Research-January-March)