User Details
- User Since
- Feb 2 2021, 1:32 PM (253 w, 3 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- Pablo (WMF) [ Global Accounts ]
Thu, Nov 27
@Gehel: thank you for your understanding and support!
Mon, Nov 24
Thanks again, @OKarakaya-WMF!
Thank you, @OKarakaya-WMF!
Nov 6 2025
Thank you, @Gehel! Is it possible to copy the data from home/HDFS leftovers of Aitolkyn to my home folder(s)?
Oct 29 2025
Oct 24 2025
Jul 30 2025
That works. Thank you, @thcipriani!
Jul 22 2025
Thank you both!
Jul 16 2025
Jul 11 2025
weekly update
Jun 30 2025
Progress update on the hypothesis for the week
The report has been updated to incorporate feedback from various stakeholders. The hypothesis said:
If we develop heuristics-based data pipelines for measuring patrolling activity on Wikipedia, we can prototype a model to detect moderation gaps at scale.
To test this, data pipelines were developed to generate the dataset, which has been made accessible via a Superset dashboard, as specified here. The analysis with the dashboard has revealed moderation gaps, which have been shared with stakeholders and documented in the revised report. The dataset is expected to be used to provide data on the retention rate of patrollers using the FlaggedRevs and reverting editors at T396493 as a comparable moderator retention rate metric to inform targets. Furthermore, additional opportunities to leverage this data have been identified at T398071. As a consequence, the hypothesis is supported.
Jun 27 2025
weekly update
- The work on Crowdsourced Content Moderation in Wikipedia: A Preliminary Look at Article Maintenance Templates across Language Editions was presented at the workshop.
- A report summarizing interactions and insights from ICWSM will be prepared next week.
Jun 20 2025
- Moderator Motivations: Synced with Daisy about upcoming motivations-related projects and cross-team collaboration. Sharing motivations work at Research meeting next Tuesday.
- Metrics on Patrolling Work: This week’s work focused on two main areas. First, significant effort was dedicated to reviewing and refining both the dataset and the dashboard. This included adding a new table with metrics suggested by the Moderator Tools team. Second, a preliminary version of the report was completed, documenting the dataset, the dashboard, and key findings. The report has been shared with the stakeholders for feedback.
Progress update on the hypothesis for the week
This week’s work focused on two main areas. First, significant effort was dedicated to reviewing and refining both the dataset and the dashboard. This included adding a new table with metrics suggested by the Moderator Tools team. Second, a preliminary version of the report was completed, documenting the dataset, the dashboard, and key findings. The report has been shared with the stakeholders for feedback.
weekly update
Jun 13 2025
Progress update on the hypothesis for the week
This week's efforts primarily focused on interactions with the Moderator Tools and Product Analytics teams. During a joint meeting, I presented the current status of the dataset and dashboard, and gathered feedback (notes). Following the session, both teams engaged further to request additional details, particularly in relation to defining baselines for the projected increase in moderation actions for FY 25/26 WE 1.3 KR (T396493). In parallel, while awaiting input from stakeholders regarding specific product interventions to be tracked using this data, I began drafting a report for this project to document the dataset and a summary of findings.
weekly update
Jun 6 2025
Progress update on the hypothesis for the week
This week’s work focused on building a Superset dashboard to deliver the dataset, which also included modifications to the existing notebooks, e.g., moving the parsing of edit summaries (to extract links to the project namespace) into the data collection phase. The dashboard will be reviewed next week in meetings with colleagues from the Product Analytics and Moderator Tools teams.
weekly update
- The COMPASS workshop program is finally online at https://sites.google.com/view/workshop-compass25/program.
- Each submission will be allocated 25 minutes total (15–20 minutes for presentation, 5–10 minutes for Q&A).
- I will start working on the slides next week.
May 30 2025
Progress update on the hypothesis for the week
This week has been relatively lighter, as I have been OoO for two days. That said, I updated the data collection notebook to incorporate revision tags and the comment of the reverting revision for those who are reverted. Progress with the data has been reviewed with @Isaac during our 1:1 meeting. In addition, two other meetings have been scheduled with stakeholders as the dataset is expected to be used on an Product Analytics + Moderators Tools teams's effort to measure current moderator activity to inform a Key Result target for WE 1.3 FY 25/26.
May 23 2025
Progress update on the hypothesis for the week
Continued development on the data collection notebook. For the October 2024 snapshot, metadata now includes information on addition and removal of article maintenance templates. For instance, English Wikipedia revision 1248733993 reflects this update with a new column indicating templates changes as mbox:more footnotes-add | mbox:no footnotes-remove.
weekly update:
May 16 2025
Progress update on the hypothesis for the week
Advanced the notebook to create the dataset of edits in March 2025. The dataset has been expanded to include detailed information on reverting editors and predicted revert risk scores, enabling answering question of reverting activity (see updates on metrics below). Furthermore, the notebook has been re-run to generate an analogous dataset for edits in October 2024, for which metadata on article maintenance templates will be available (see task T384600).
weekly update:
May 13 2025
Progress update on the hypothesis for the week (last week)
Started working on a notebook to create the dataset of edits in March 2025 with metadata, including mediawiki_history fields and patrolling information and status (prevented, delete, reverted, reviewed, edited_over, autopatrolled).
May 9 2025
weekly update:
Hi, I confirm that I was running a notebook yesterday, which unfortunately crashed due to memory limitations. Apologies for any inconvenience this may have caused!
May 6 2025
Apr 30 2025
no weekly update (on hold until the schedules of the research track sessions are finalized)
Apr 25 2025
weekly update:
- Topical sessions have been organized into three slots based on the approximate schedules shared by @leila and the estimated location of the first author of each accepted contribution (note: no perfect distribution is feasible).
- Tentative candidates for session chairs have also been identified.
Apr 17 2025
As reported at T378485#10751230, this task is resolved.
weekly update:
Apr 11 2025
weekly update:
Apr 4 2025
weekly update:
Apr 2 2025
@leila, please consider reassigning this task to me
Mar 28 2025
Weekly update:
Weekly Update
Mar 27 2025
@Isaac please review for sign-off
Mar 24 2025
@Scann: Your organization of these two events was incredibly helpful, and I appreciate you sharing this information as well!
Mar 21 2025
Weekly update:
Weekly update
Mar 19 2025
A year ago, I adapted the notebook we used for the ICWSM paper for this task: https://gitlab.wikimedia.org/paragon/miscellanea/-/blob/main/notebooks/recent-revisions.ipynb
I have added notes to provide context and highlighted hardcoded values that need to be modified or (ideally) parameterized.
Mar 14 2025
Weekly update:
With this notebook, I have created CSV files of template stats for each wiki at: https://gitlab.wikimedia.org/repos/research/who-are-moderators/-/tree/main/data/templates. The files include the following fields:
- wiki_db: Database name of the wiki (in this notebook: arzwiki, dewiki, enwiki, eswiki, frwiki, itwiki, jawiki, nlwiki, plwiki, ruwiki, svwiki, zhwiki).
- snapshot: Timestamp of the dataset (in this notebook: 2024-10).
- template_name: Name of the template.
- template_type: Type of template (mbox, inline).
- template_change: Type of change (add, remove).
- template_count: Number of times the template was added/removed.
- revision_count: Number of revisions in which the template was added/removed.
- page_count: Number of pages where the template was added/removed.
- page_namespace_0_count: Number of pages in namespace 0 where the template was added/removed.
- page_namespace_2_count: Number of pages in namespace 2 where the template was added/removed.
- page_namespace_102_count: Number of pages in namespace 102 where the template was added/removed.
- page_namespace_118_count: Number of pages in namespace 118 where the template was added/removed.
- page_namespace_other_count: Number of pages in other namespaces where the template was added/removed.
- editor_count: Number of editors who added/removed the template.
- editor_bot_count: Number of bot users who added/removed the template.
- editor_bot_perc: Percentage of bot users who added/removed the template.
- editor_sysop_count: Number of sysop users who added/removed the template.
- editor_sysop_perc: Percentage of sysop users who added/removed the template.
- editor_editor_count: Number of editors belonging to the editor user group who added/removed the template.
- editor_editor_perc: Percentage of editors belonging to the editor user group who added/removed the template.
- editor_patroller_count: Number of patroller users who added/removed the template.
- editor_patroller_perc: Percentage of patroller users who added/removed the template.
- editor_with_rights_count: Number of editors belonging to any user group who added/removed the template.
- editor_with_rights_perc: Percentage of editors belonging to any user group who added/removed the template.
- editor_without_rights_count: Number of editors not belonging to any user group who added/removed the template.
- editor_without_rights_perc: Percentage of editors not belonging to any user group who added/removed the template.
- editor_1_9_count: Number of editors with an edit count between 1 and 9 who added/removed the template.
- editor_1_9_perc: Percentage of editors with an edit count between 1 and 9 who added/removed the template.
- editor_10_99_count: Number of editors with an edit count between 10 and 99 who added/removed the template.
- editor_10_99_perc: Percentage of editors with an edit count between 10 and 99 who added/removed the template.
- editor_100_999_count: Number of editors with an edit count between 100 and 999 who added/removed the template.
- editor_100_999_perc: Percentage of editors with an edit count between 100 and 999 who added/removed the template.
- editor_1000_9999_count: Number of editors with an edit count between 1,000 and 9,999 who added/removed the template.
- editor_1000_9999_perc: Percentage of editors with an edit count between 1,000 and 9,999 who added/removed the template.
- editor_10000_inf_count: Number of editors with an edit count greater than 10,000 who added/removed the template.
- editor_10000_inf_perc: Percentage of editors with an edit count greater than 10,000 who added/removed the template.
- editor_age_mean: Mean number of years since registration for editors who added/removed the template.
- editor_age_median: Median number of years since registration for editors who added/removed the template.
Weekly update
Mar 11 2025
Mar 7 2025
Weekly update:
Weekly update
The Gitlab repository originally created for T377324 has been updated with:
- Code to generate the dataset of moderation actions in October 2024 on arzwiki, dewiki, enwiki, eswiki, frwiki, itwiki, jawiki, nlwiki, plwiki, ruwiki, svwiki, zhwiki (the notebook expands on @Isaac's original approach).
- Resulting dataset. It is a sample of one month only, but an expanded version could be utilized for understanding the use of maintenance templates to develop models to support editors (ping @MGerlach).
- Code with the following metrics:
- Number of templates added/removed
- Most common templates added/removed
For outreach, these findings have been shared with colleagues and posted on Meta. I will resolve the ticket, as no further work is expected.
Feb 28 2025
Thanks for your questions!
- ~20% of low-quality edits on English Wikipedia. This is definitely something that would be worth inspecting as there are many factors that could be influencing this finding. For example, in the US election analysis, I reviewed some low-quality edits that were not reverted and found that some were followed by edits that partially or completely removed their content. Are you imagining something that could be used for model re-training?
- Total # of reverted edits. Yes, I actually modified the original formulation of the hypothesis because the rationale provided was: "As people are incentivised to promote or obstruct a candidate, more newcomers are making politically biased edits (or edits that violate NPOV), and thus more newcomer edits are reverted". To ensure that peaks in reverting activity are not merely a reflection of overall editing volume but instead highlight the revertability of edits (likely due to increased malicious behavior), I decided to use the revert rate rather than the absolute count of reverted edits (see figures below for the latter). However, your question indicates that further clarification is needed and suggests considering showing both metrics.
Weekly update:
- Take home messages
Weekly update
Feb 21 2025
Weekly update
- I compiled all my notes and the current progress of this project into a structured working document.
- Two new interventions were implemented:
- I contacted the President of the board of a national chapter very active on supporting admins to let them know about the report on Wikipedia Administrator Recruitment, Retention, and Attrition and the upcoming research showcase on this project.
- I contacted the Director of a national chapter interested in Wikipedia article quality to let them know about our modeling approaches and datasets.
- A call has been scheduled for next week with the technical partner of the DEM-Debate project, coordinated by Wikimedia Europe, which has a clear overlap with T384616.
Weekly update:
- Moderation public report T382614: Completed.
- Distributed moderation (T383365/T384600): No updates this week.
- Peacock support: @diego provided hands-on support in evaluating the existing peacock detection model (T386645#10561213). From now on, he will take on an advisory role.
- Client-hint consultation: @Pablo reviewed the WE 4.2.10 Project Planning to provide feedback on (1) the identification sockpuppets and potential ban evasion, and (2) the process to create hash representation.
- Elections analysis: @Pablo built the dataset with revert risk predictions of revisions about the 2024 Indian elections (enwiki: 148,650; hiwiki: 9,911; mrwiki: 3,084) and the 2024 EU elections (eswiki: 10,590; frwiki: 32,979; itwiki: 20,288; rowiki: 1,700). Notebooks have also been prepared to test hypotheses H2 and H3, which will be discussed next week with the technical research partner of the DEM-Debate project, responsible for the analysis of Wikipedia data on the 2024 EU elections.
Feb 14 2025
Weekly update
Weekly update:
- Moderation public report T382614: @diego created a public Meta page based on the internal report.
- Distributed moderation (qualitative work) T383365: To align efforts between qualitative and quantitative work, a check-in meeting between @cwylo @Isaac and @Pablo has been scheduled for February 24 (next week is the quiet week for the WMF research team).
- Distributed moderation (quantitative work) T384600: @Pablo focused on creating a notebook with a refined the parsing strategy based on three steps for matching moderation signals between revisions: (1) exact HTML matching, (2), exact wikitext matching, and (3) fuzzy HMTL matching. The later is based on the Levenshtein ratio, so over 100 edge cases across wikis have been manually inspected. As a result, it is proposed for the third step that two moderation signals are the same if (a) the Levenshtein ratio is over 0.9, and (b) the wikitext template names match. @Isaac and @Pablo are currently examining and getting expert feedback on very extreme cases.
- Peacock support: @diego is currently advising the ML team to create an evaluation dataset at paragraph/sentence level.
- Client-hint consultation: No update (next week, @Pablo will review the project plan).
- Elections analysis: No update (next (quiet) week, @Pablo will focus largely on this project).
Feb 7 2025
Weekly update:
Weekly update
Jan 31 2025
Weekly update
- Learnings from this week’s calls
- There is strong interest in having research findings from surveys presented at the Youth Conference 2025.
- Many affiliates mainly collaborate with universities and GLAM institutions, but fewer engage with other institutions for public policy. It was suggested to contact WMF colleagues working in this area to identify affiliates who are active in it.
- We may need to map key information sources for affiliates (e.g., Diff, the Wikimedia Foundation Bulletin, etc.). In line with this, we have been invited to give an online talk at WikiHerramientas to share our work with Wikimedians from LATAM.
- AI, content moderation, disinformation, and child protection are key areas of interest for European affiliates involved in public policy.
- The opportunities identified with the DEM-Debate project have been confirmed.
- A call with a user group has been scheduled for next week.
After multiple checks with Wikidata queries, I found even better results by simply leveraging the article topic prediction model to identify which wikilinks in https://en.wikipedia.org/wiki/List_of_members_of_the_European_Parliament_(2019%E2%80%932024) and https://en.wikipedia.org/wiki/List_of_members_of_the_17th_Lok_Sabha correspond to biographies.
Weekly update:
- I focused on getting familiar with the dataset of templates or infoboxes added/deleted on revisions from October 2024. @Isaac and I have schedule a co-working session to discuss improvements and potential directions.
- @Isaac met with Editing and ML Platform to discuss the scoping of the Peacock-detection Edit Check. They are open to adjusting but would ideally like an edit check that can operate at the sentence or paragraph level for any newcomer edit that adds new content and be able to detail which words are causing the issue. This is sensible but a different data distribution (new edits vs. existing articles) and granularity (sentences/paragraphs vs. articles) from the current model and also means that we need a more robust evaluation of the SHAP-based approach prototyped by @Aitolkyn. @Isaac suggested that valuable next steps for Editing/ML to bring more clarity would mean hosting the current model, hand-labeling some edits where the check would be triggered, and generating some groundtruth for which words are problematic when the edit should fail the peacock detection. This would provide a better understanding of whether the current model is still a good fit or needs to be re-worked with new data or approach.
- With the feedback of the Disinformation team and a Romanian wikimedian, I completed the final lists of English articles related to the 2024 elections in India and the European Union. I also identified the existing versions in other Wikipedia language editions of interest.
Jan 29 2025
@ppelberg as I saw that you have added the language-agnostic reference risk model card, please find the datasets with the risk scores for each domain in each wiki at https://analytics.wikimedia.org/published/wmf-ml-models/reference-quality/reference-risk
Jan 27 2025
@Strainu, thank you for your feedback! On Friday, I performed a quick precision check and plan to conduct a recall check this week, and your observation about entries missing the start time property is really helpful. The issue with replacements is certainly a limitation of this approach, but conducting a more comprehensive analysis to identify all edge cases would unfortunately be too costly.
Jan 24 2025
Weekly update:
Weekly update
@NForrester @Abhas I compiled these lists of English Wikipedia articles related to:
- 2024 India election: first tab of this spreadsheet (most come from https://en.wikipedia.org/wiki/Template:2024_Indian_general_election).
- 2024 EU election: first tab of this spreadsheet (most come from https://en.wikipedia.org/wiki/Category:2024_European_Parliament_election and a Wikidata query to retrieve all MEPS in 2019-2024).
Jan 23 2025
Solving this ticket, as all the results are included in the report.
Jan 17 2025
Weekly update: