Page MenuHomePhabricator

[SPIKE] How might we programmatically categorize talk page edits?
Closed, ResolvedPublic


This task is about investigating whether we are able to programmatically sort talk page edits into the categories listed below.


The scenarios below describe what kind of information could be "unlocked" by being able to programmatically categorize talk page edits.

Scenario A: Evaluation tool adoption
One proxy we've used (T249386) to understand the extent to which people value the Reply tool, is the proportion of talk page edits people use the Reply tool to make.[i]

Trouble is, there are many talk page edits people make that cannot be made with the Reply tool. This means, the denominator used in the metric above is likely to be larger than it ought to be.

By knowing which talk page edits are comments, we would be able to confidently know things like, "Of all the comments people posted in this period, people used the Reply tool to publish this percentage of them."

This information would be helpful in deciding whether the tool is ready to be deployed more broadly.

Scenario B: Understanding response rates
One aim of the Talk pages project to increase the likelihood people using Wikipedia talk pages are able to receive the response they are seeking to, among other things, know how to make the edit they are wanting to make and find sources for content they think ought to be included in the encyclopedia.

Trouble is, we do not currently have a way for programmatically identifying whether people receive responses to the questions they ask and the calls for input they post on talk pages.

Being able to know the following would enable us to better understand whether people are having success using talk pages to communicate with others, and ultimately, accomplish a task to help them improve the encyclopedia:

  • Was an edit to a talk page a comment, the beginning of a new conversation or an edit to something that had been posted previously?
  • If an edit to a talk page contained a comment (call it "Comment B"), to what existing comment was "Comment B" a response to?

Scenario C: enabling researchers to develop a more nuanced understanding of talk page activity

  • "The size of these pages makes them difficult subjects for content analysis (Schneider, Passant and Breslin 2011). Research into talk pages has also struggled with the form of the data. Although used as discussion forums, Wikipedia talk pages do not follow the conventions of other web forums, lacking dedicated formatting or explicit threading structures that could demarcate the beginning or end of comments, or delineate different users (Laniado et al 2011; Ferschke 2014)." | source

Scenario D: enabling us to know whether people are "talking more" as a result of the new tools/enhancements introduced as part of the Talk pages project


  • Comments
  • New sections
  • All other edits

Open questions

  • How can the software know what a talk page edit "is"? Is it a comment? Multiple comments? The start of a new section/conversation? An edit to something that had been written/posted previously?


  • All "Open questions" are answered


Event Timeline

Adding Scenario C to the task description and filling in details for Scenario B.

As part of our investigation into notifications, we believe we will be able to identify when comments have been posted using any tool (see

This could be easily extended to a tag commits as they happen, as we will have already computed the comment-tree-diff for sending notifications.

We could also investigate tagging or otherwise identifying added comments in historical diffs using this method, but there may be other performance considerations we need to address first if we are mass-processing old edits.

As @Esanders noted in T257644#6360816, we think we found a way to tag talk page edits for the purposes of sending notifications.

The remaining open questions are as follows:

  • How can the software tag/categorize talk page edits as comments and new sections in real-time (read: as they happen)?
  • How can the software tag/categorize historic talk page edits as comments and new sections?

These new questions will be investigated in these tasks:
T262107: Create a hidden revision tag for talk page comments
T262108: [SPIKE] How can the software tag/categorize historical talk page edits?

ppelberg claimed this task.