Task of the Spambot Detection System to Support Stewards for FY2021-22 Q3
Description
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Pablo | T288338 [EPIC] Spambot Detection System to Support Stewards | |||
Resolved | Pablo | T300391 Build the spambot detection model |
Event Timeline
Comment Actions
Weekly updates:
- Schedule call with stewards to review first findings of a spambot detection model prototype
Comment Actions
Weekly updates:
- Conversation with a steward to share first findings of a spambot detection model prototype. It was suggested to reframe the link spamming focus into a edit quality focus.
- Results will be shared with other stewards and T&S colleagues for open discussion on next steps.
Comment Actions
Weekly updates:
- Conversation with T&S staff to share first findings of a spambot detection model prototype. It was suggested to explore specific sophisticated forms of spamming.
- Conversation with Research Lab group to share first findings of a spambot detection model prototype. Several resources were provided.
- Recording of a video that will be shared with stewards for them to provide feedback during imminent leave.
- Schedule call with the Global Head of Trust and Safety to share findings and discuss next steps.
- Added README.md for continuity purposes during imminent leave.
Comment Actions
Weekly updates:
- Conversation with Global Head of Trust and Safety to share findings. Two possible (non-exclusive) next steps are:
- Collect and analyze a dataset of first edits of editors to quantify if "newbie external link" behaviour is also present in good-faith editors.
- Focus on sophisticated forms of spamming.
Comment Actions
Weekly updates:
- Conversation with Moderator Tools lead to share results and discuss next steps
- Launch script to retrieve first edits of editors registered since 2020 to then analyze their edit type (the purpose is to quantify how often first edits contain an external link)
Comment Actions
Weekly updates:
- Adapted existing notebook to create the dataset of first edits of editors registered since 2020.
- Review of literature on editing behaviors among new wikipedians.
Comment Actions
Weekly updates
- The notebook to collect first edits of editors has failed with large wikis because of memory issue (already reported in Research Weekly), so the remaining data from specific wikis will be collected in a stats machine.
- Draft written with the design details and results of the different machine learning models (to be adapted to Mediawiki and then update meta).
Comment Actions
Weekly updates
- Completed dataset of first edits of editors registered from 2020 (~10% of editors inserted an external link in their first edit).
- Working on feature selection and clustering
Comment Actions
Weekly updates
- Assessment of the machine learning model with a namespace-balanced dataset to mitigate biases (it was found that a large majority of deleted spambot revisions were often on namespace 2)
- Update on meta of all the machine learning results
- Call with the stewards to discuss ongoing and future work. They concluded that the best approach is to include this work on ORES, so I introduced them @diego's proposal on ML-Based Models for Knowledge Integrity.