Wikipedia leverages large-scale online collaboration by editors to provide an open source online encyclopaedia. The collaborative content creation process has played a key role in the creation of the encyclopaedia, but has also led to some variance in the quality of articles on the platform, causing issues such as sockpuppeting, vandalism, factual inaccuracies and biased article perspectives. To alert moderators, other editors and readers of such issues, the platform allows editors to add a variety of markup templates to articles.
Description
Related Objects
Event Timeline
Hi @ChristineDeKock !
From our previous meeting:
TODOs:
- Create a placeholder for the project on meta.
- Check WikiProject relaibility related templates and select few of them to work with.
Additional readings:
- Keegan, Brian, Darren Gergle, and Noshir Contractor. "Do editors or articles drive collaboration? Multilevel statistical network analysis of Wikipedia coauthorship." Proceedings of the ACM 2012 conference on computer supported cooperative work. 2012.
- Sepehri Rad, Hoda, et al. "Leveraging editor collaboration patterns in Wikipedia." Proceedings of the 23rd ACM conference on Hypertext and social media. 2012.
Hello! Adding some notes from our meeting on Friday.
- Chose templates to work with based on how often they occur in the BLP category. These are Autobiography, Fanpov, Advert, Peacock, and Weasel.
- Identified BLP noticeboard as an alternative option for finding problematic BLP articles (~6000 cases, will need scraping but doable).
- Went through different methods to access the WP database.
- Established that features from Keegan et al (2012) and Rad et al (2012) seem useful for representing collaboration patterns.
TODOs:
- Diego to provide Christine with revisions containing the identified templates.
- Exploratory data analysis on the template data.
- Exploration of the WP database.
- Implement some features from the abovementioned papers.
Some notes from our meeting earlier:
Done:
- Obtained access to servers and verified connection.
- Determined additional data needs (See pipeline diagram here) for feature extraction. Mainly require Talk page activity and complete user histories.
- Implemented language features on edit summaries, will be applied to Talk pages when available.
- Looked at Talk page vs Article activity before template addition; may be a useful feature.
- Reviewed wikiworkshop papers.
To do:
- Flag issue with JupyterLab login
- PySpark / Analytics Cluster Training
- Extract required data from MediaWiki history & MediaWiki text
- Implement remaining features (Look at WitPy for Talk page reconstruction?)
- Establish control group to compare to.
Hello,
Notes from yesterday's meeting.
- I have implemented data extraction queries and a number of notebooks for feature engineering. Features are based on a number of entities / relationships: editor, article, Talk page, editor-article, editor-editor.
- We have decided to truncate user collaboration history (for editor-editor features) to 1 year to make the size of the graph more manageable.
- One user added a large number of tags in 2015 (~500), spanning multiple tag types. We have decided to remove this user's tags.
To do:
- Connect various parts of code into pipeline.
- Export features (if necessary, for a subset of samples and features).
- Sample negatives and calculate features.
@ChristineDeKock: Removing task assignee as this open task has been assigned for more than two years - See the email sent to task assignee on Feburary 22nd, 2023.
Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome! :)
If this task has been resolved in the meantime, or should not be worked on by anybody ("declined"), please update its task status via "Add Action… 🡒 Change Status".
Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator. Thanks!