Page MenuHomePhabricator

Generate dump of scored-revisions from 2018-2020 for Wikis except English Wikipedia
Open, MediumPublic

Description

I am Leijie Wang, an undergraduate from Tsinghua University and I am currently collaborating with Haiyi Zhu and Steven Wu, two professors at Carnegie Mellon University, to understand how ORES flagging systems in Recent Changes might influence the decision-making process of patrollers and how the effect differs across different wikis.

Given ORES models and thresholds for filters are continuously changing, it is important for me to obtain historical ORES scores for revisions, preferably during the year 2019-2020. My research is based on the project conducted by Nathan Teblunthuis (from University of Washington) and Aaron Halfaker, and they mentioned that there is a maintained table of such information.
There is a similar task for English Wikipedia (https://phabricator.wikimedia.org/T277609)

Therefore, I would like to ask whether it is possible to generate a dump of the table for different wikis between 2019-2020.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Tagging @JAllemandou to know how much work this implies and if it is feasible :)

Would it be possible to get all ORES scores ever as a one-off-job and then dump them online somewhere?

fdans moved this task from Incoming to Datasets on the Analytics board.
fdans added a project: Analytics-Kanban.
Milimetric triaged this task as Medium priority.May 10 2021, 4:25 PM

Hi - I am trying to make this happen.
Data for the wikidata project is very big (many edits, and the itemquality model to be added to the other ones). Is it needed for you or can I not export this project (this would be all models for all edits of all projects except enwiki and wikidatawiki).
Thanks

Hi - I am trying to make this happen.
Data for the wikidata project is very big (many edits, and the itemquality model to be added to the other ones). Is it needed for you or can I not export this project (this would be all models for all edits of all projects except enwiki and wikidatawiki).
Thanks

If you need help please ping me!