User Details
- User Since
- Aug 8 2017, 10:56 AM (444 w, 4 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- Diego (WMF) [ Global Accounts ]
Fri, Feb 13
The model was released in and currently running on LiftWing.
The paper was accepted at TheWebConf'26. We are solving small formatting issues with the publication. I'm going to link the paper when is on a public url and close this task then.
@Gehel I have reviewed the data and saved what we need to keep. You can delete the leftovers whenever is neccesary. Thanks.
Progress:
- This project was started this week
- We are researching the potential to reuse intermediate outputs from the Language-agnostic Link-based Article Topic Model. By representing article topics as vectors, we can perform similarity measurements while building on a robust, proven pipeline, ultimately saving a significant amount of computational resources.
Tue, Feb 10
- The Research Team have done some explorations on aligning Wikipedia and Wikidata content in 2020.
- As part of that work we experimented on taking Wikipedia sentences and transform them on Wikidata claims. If I understand correctly, the task described here is the opposite.
- One of the main challenges there was performing NER correctly, and establishing the relationship. With current technologies (probably SLM) this might be easier.
Wed, Feb 4
tbomk, all the relevant data/code from Muniza's work has a backup.
Oct 8 2025
The work was done. Check links on the task description.
Oct 1 2025
Sep 4 2025
Thanks @MGerlach , I agree that is the main source of pageviews readable from Spark. I'm not aware of anything aggregated in larger buckets.
Aug 29 2025
Weekly Report
Aug 23 2025
Weekly report
Aug 16 2025
Retrain recommendation:
- Main conclusion: Models should be retrained at least once a year. After 1 year models lose 1% precision (details in T399726#11090332)
Aug 15 2025
Weekly report
- Recommendation 1
Aug 10 2025
There is buggy behavior:
Aug 8 2025
Weekly Update
Aug 4 2025
Weekly Update
Jul 25 2025
Weekly Update
- Tone Check edit check
- Discussing how to incorporate user feedback
- Designing the structured task
- Improving the Model card
Jul 24 2025
He have develop a method to analyze languages without enough evaluation data. A detailed explanation can be found in this Jupyter Notebook.
In summary:
Jul 16 2025
This solution keeps failing. The workaround I found was to run this experiments in chunks of two years (2013 to 2015, 2015 to 2017 ...) and then join the results. This is not optimal because requires manually creating each chunk, but at least solves the problem.
Jul 9 2025
Is the 12 year period intentional?
Jul 8 2025
The large experiment had failed. I'll need engineering help me to understand and fix this error. I'm going to coordinate with @Miriam and @fkaelin to decide how to proceed with this issue.
Jul 2 2025
- @MunizaA spend the last week debugging the system and now is ready to use.
- The system was deployed and can be found in the Research Airflow instance.
- I'm currently running a large experiment (12 different training datasets) to study the effect of data "freshness" on the Revert Risk models.
Jun 27 2025
Weekly update:
Jun 26 2025
I would like to ask for adding informtion to Publications (2025) and Knowledge Integrity (Publications) page.
Jun 23 2025
Weekly Update:
Jun 13 2025
Jun 6 2025
Weekly update:
May 30 2025
Weekly Update:
May 23 2025
Weekly update:
May 16 2025
Weekly Update:
May 14 2025
May 13 2025
May 9 2025
Weekly update:
May 5 2025
Weekly Update:
Apr 27 2025
Weekly update:
Apr 21 2025
Given that are considering a system that we can reuse later, I think we should define this as parameters:
- Label balance: Balanced, Real (random), or desired balance (eg. 0.80 False)
- Max data: Undefined or fixed
- Date: start and end date.
Apr 18 2025
Weekly updated:
Apr 1 2025
Another Q about revertrisk. Are visibilty settings relevant to possible revert risk? E.g. if the comment (or content!) of a revision is not visible, does its revert risk potentially change? Even if it isn't input to the current revertrisk model, could revision visibility settings be a potential signal for a new or different model?
I'm not sure if I'm understanding the question. With the same inputs the results would be always the same. Visibility is not a feature for this model (there is no "is_visible" column in the feature set). Now if lack of visibility blocks feature extraction, then we have a problem. In LiftWing the features are extracted through the MediaWiki API, so if this is blocked, the model won't be able to run. But if we feed the model with data coming from other sources (ie stream data), and then some visibility configuration is changed on the revision, that - by model design- shouldn't change the revert risk score.
If so, then that would mean that for any given revision, the score might not be totally deterministic just on revision content alone?
In theory yes, in practice could be some changes depending when you call the API.
Revert Risk models uses user's related features. In model design we assumed that those were the features of the users at the time of performing the revision that is being evaluated. In practice, when you call the LiftWing API, the users' features are collected with the current information, meaning that if user had changed their status (eg, had new user groups, or make more edits), the numbers final score can change.
Mar 24 2025
Mar 11 2025
Mar 4 2025
I'm confused, I think in T374440 they are working just with dumps, nothing like Eventstreams.
Feb 19 2025
Notice that the template above is visualized in articles as "peacock prose"
Feb 18 2025
Looking for the template: peacock inline (and redirects) in mediawiki_wikitext_history table. In enwiki I found article ~108K matches:
Feb 11 2025
[Update] I've created a Meta page, and updated based on the internal report.
I'm going to refine and finalize this page within the next 10 days.
Jan 20 2025
Jan 8 2025
- The definition includes "creation, revision, and enforcement" - in the quantiative side of this work it seems like enforcement was the focus here. I don't know if this is something you explored at all, but do you think there's a good way for us to track creation and revision of values, rules, and norms too? My initial reaction is to think about cataloguing policy and guideline pages and looking at substantive edits to those pages, but perhaps there's a better method.
I think it would possible to design some methods to get signals for these numbers, following your suggestion would be one approach. However, is difficult to assess the relevance/impact of those edits. Probably a mix of metrics plus some (permanent) qualitative analysis would be required.
- In the moderation activities dashboard you include upload as a log type indicative of moderation activity - I wondered if you could explain the thought process on this one, because it's not immediately obvious to me that file uploads would be moderation.
@Pablo please can you explain this?
As a preliminary result, we found that moderation-related edits range from less than 1% in some editions, such as German (0.09%) and Polish (0.53%), to nearly 10% in others, like Russian (9.6%).
- Does this include bot edits? Just curious, as I notice that a substantial % of the Russian Wikipedia moderation activity comes from bots.
Those numbers were on human edits.
From the report:
eswiki % Moderation (considering revert-related): 35.71%
This seems huge! Especially compared to the values for other wikis. Do we have any insight on what this is driven by?
We noticed this, but didn't have time during this work to analyze specific cases. Given that we consider just one month of data , October 2024, results might be affect by some specific (exogenous or endogenous) events or edit wars. It is important to highlight that our goal here was to understand which actions were measurable, and show how stable or sparse were those numbers. To get actionable insights about specific project it would be necessary to apply these methods on larger data.
Jan 7 2025
Jan 2 2025
Dec 20 2024
Report is finished, so I'm closing this task.
Briefly describe what was accomplished over the course of the hypothesis work
- For the first time, we established a formal definition of “moderators”: We define Moderators as the human actors responsible for social, technical and governance work needed to sustain an online community, including the creation, revision and enforcement of community values, rules, and norms.
- To measure moderator activity, we drew on our prior qualitative knowledge of patrolling and admins work and conducted an extensive review of research literature and internal reports, resulting in a comprehensive list of 81 traceable moderation actions. We classified these actions based on their relevance to moderation, measurability, availability, and other dimensions.
- We assessed the feasibility of measuring each moderation activity and decided to focus on 12 key actions for this hypothesis, measuring them across 13 different Wikipedia language editions: dewiki, arzwiki, plwiki, nlwiki, itwiki, frwiki, eswiki, svwiki, zhwiki, enwiki, jawiki, and ruwiki.
- To measure the 12 key actions, we leveraged and expanded our previous work on edit classification (a.k.a edit types) to distinguish between moderator and non-moderator edits.
- Additionally, we needed to create ad-hoc datasets of HTML article versions (T380871) to capture complex moderation activities that are difficult or impossible to detect with existing data. To achieve this within a short timeframe, we leveraged previous work by research engineers.
- Based on the work described above, we developed an initial approach to measure moderation activities, focusing on the 12 key actions and 13 language editions previously mentioned:
- As a preliminary result, we found that moderation-related edits range from less than 1% in some editions, such as German (0.09%) and Polish (0.53%), to nearly 10% in others, like Russian (9.6%).
- We also developed a prototype dashboard to track logged moderation activities, demonstrating its potential for monitoring moderation efforts within our infrastructure.
- These results are a proof of concept, demonstrating the potential of measuring and tracking moderation activities. However, they should not be considered final, due to the limited number of actions tracked and the reliance on ad-hoc data, which is not available in our infrastructure.
- More details can be found in the final report.
Dec 17 2024
Hi @Easikingarmager , I've already coordinated with Caroline. Also, I'm going to present in the next group meeting.
Dec 14 2024
Great job @Aitolkyn ! Can you please share a link to the code you have used to generate these visualizations?
Dec 6 2024
Thanks for the report , I'm not sure how to interpret this:
Nov 29 2024
Progress update on the hypothesis for the week
Nov 26 2024
The main challenge here is to find data to train and test this model. Currently, the data we have is at article label. I see to possible ways to work around this problem:
We would be able to provide insights on this regard when we finalize T377159 , I'll keep this task updated with those results.
Nov 22 2024
Progress update on the hypothesis for the week