This task is about investigating whether we are able to programmatically sort talk page edits into the categories listed below.
Background
The scenarios below describe what kind of information could be "unlocked" by being able to programmatically categorize talk page edits.
Scenario A: Evaluation tool adoption
One proxy we've used (T249386) to understand the extent to which people value the Reply tool, is the proportion of talk page edits people use the Reply tool to make.[i]
Trouble is, there are many talk page edits people make that cannot be made with the Reply tool. This means, the denominator used in the metric above is likely to be larger than it ought to be.
By knowing which talk page edits are comments, we would be able to confidently know things like, "Of all the comments people posted in this period, people used the Reply tool to publish this percentage of them."
This information would be helpful in deciding whether the tool is ready to be deployed more broadly.
Scenario B: Understanding response rates
One aim of the Talk pages project to increase the likelihood people using Wikipedia talk pages are able to receive the response they are seeking to, among other things, know how to make the edit they are wanting to make and find sources for content they think ought to be included in the encyclopedia.
Trouble is, we do not currently have a way for programmatically identifying whether people receive responses to the questions they ask and the calls for input they post on talk pages.
Being able to know the following would enable us to better understand whether people are having success using talk pages to communicate with others, and ultimately, accomplish a task to help them improve the encyclopedia:
- Was an edit to a talk page a comment, the beginning of a new conversation or an edit to something that had been posted previously?
- If an edit to a talk page contained a comment (call it "Comment B"), to what existing comment was "Comment B" a response to?
Scenario C: enabling researchers to develop a more nuanced understanding of talk page activity
- "The size of these pages makes them difficult subjects for content analysis (Schneider, Passant and Breslin 2011). Research into talk pages has also struggled with the form of the data. Although used as discussion forums, Wikipedia talk pages do not follow the conventions of other web forums, lacking dedicated formatting or explicit threading structures that could demarcate the beginning or end of comments, or delineate different users (Laniado et al 2011; Ferschke 2014)." | source
Scenario D: enabling us to know whether people are "talking more" as a result of the new tools/enhancements introduced as part of the Talk pages project
Categories
- Comments
- New sections
- All other edits
Open questions
- How can the software know what a talk page edit "is"? Is it a comment? Multiple comments? The start of a new section/conversation? An edit to something that had been written/posted previously?
Done
- All "Open questions" are answered