Page MenuHomePhabricator

Analyze what impacts response rate
Open, MediumPublic

Description

This task is about understanding what actions affect the likelihood people will get a response to comments they post/conversations they start?

Constraints

⚠️ The below has NOT yet been reviewed by Product Analytics.
What is currently knowable

  • We can uniquelu identify comments posted in discussions
  • We know what comment a reply is a response to at the time said reply is being written, provided said reply is being written using DiscussionTools.
  • We know when an edit posted to a talk page contains a mention of another user courtesy of DiscussionParser.php [1]

What is NOT currently knowable

  • We do not know what comment a reply written using DiscussionTools is a response to after said reply has been published.
  • We do not know what comment a reply written using the full page editor is a response to while said reply is being written and after said reply has been published.

Info. that would enable us to know if and when someone's comments have been responded to

  • We would need a way to relate edits/comments to each other. One way of relating said edits/comments would be automatically assigned ID attributes to each talk page comment, as T230659 proposes.

Open questions

  • Can you think of more "assumptive" ways of approximating response rates? A few examples that come to mind:
    • When an edit [2] is made to a talk page, how often is that same page edited by a different user, within X hours/days?
    • When an edit [2] is made to a talk page, how often is that same section edited by a different user, within X hours/days?
    • When an edit [2] that contains a link to another user's user page is posted to a talk page, after how long does the "mentioned" user edit the talk page on which they were mentioned?
    • When an editor starts a new section on a page where __NEWSECTIONLINK__ is present, after how long, on average, are edits made to said section by users who are not the person who started it?
  • Do any of the constraints change now that we've moved comment parser code to server (T252252), and can theoretically identify comments made using full page editing after the fact? cc @Esanders
    • I ask this wondering whether it will unlock this constraint, "We do not know what comment a reply written using DiscussionTools is a response to after said reply has been published."

  1. https://www.mediawiki.org/wiki/Manual:Echo/en
  2. Ideally, we could be more specific and say, "When a comment..."

Event Timeline

LGoto triaged this task as Medium priority.
LGoto moved this task from Triage to Current Quarter on the Product-Analytics board.
LGoto moved this task from Current Quarter to Needs Investigation on the Product-Analytics board.

@Mayakp.wiki and I discussed this analysis idea in detail yesterday, and unfortunately I don't think it's actually feasible, because there isn't a good way to determine which comments actually get replies.

Let's say comment B is a reply to comment A, and both were authored using Discussion Tools. We would have data for the edits that led to those comments, keyed by the editing session ID and linked to the revision ID of the published comment. But we wouldn't have any way to tell that the two editing session IDs (or revision IDs) were connected!

While the user is writing comment B, Discussion Tools knows that it's a reply to comment A since it parsed the talk page and identified A as a distinct comment. But none of the rest of our software has any knowledge of comments, so neither A nor B has a durable identifier which we could use to record their relationship.

One possibility is that, when each comment is published, we take a hash of its content and its parent's content, and log those hashes along with the editing session ID and revision ID. That way, we could later look up comment A's hash and find that comment B has logged the same hash for its parent.

However, this approach would still have some limitations:

  1. It wouldn't work for replies not posted with DiscussionTools. This is particularly significant because many experienced users, who are likely responders, will not adopt DiscussionTools.
  2. It wouldn't deal with comments that are edited before the reply happens; in that case, the reply will give the hash of the edited comment as its parent, which will different from the hash of the initial comment. If the edit is made with DiscussionTools, we can work around this by logging the pre-edit edit and post-edit hash. But this adds significant additional work and still doesn't work for cases where the edit isn't made using DiscussionTools.

Overall, while I can totally understand the value of this analysis, I just don't think we can do it without adding real comment identifiers (like T230659 proposes).

@Mayakp.wiki + @nshahquinn-wmf, thank you for thinking this through. I've written – what I understand to be – the constraints below. Are you able to give them a read and tell me if anything needs to be adjusted? I want to make sure I'm correctly understanding what limits our ability to know if and when someone's comments have been responded to by another user.

Also below: a couple follow up questions based on how I currently understand these "constraints."

Constraints

What is currently knowable

  • We can uniquelu identify comments posted in discussions
  • We know what comment a reply is a response to at the time said reply is being written, provided said reply is being written using DiscussionTools.
  • We know when an edit posted to a talk page contains a mention of another user courtesy of DiscussionParser.php [1]

What is NOT currently knowable

  • We do not know what comment a reply written using DiscussionTools is a response to after said reply has been published.
  • We do not know what comment a reply written using the full page editor is a response to while said reply is being written and after said reply has been published.

Info. that would enable us to know if and when someone's comments have been responded to

  • We would need a way to relate edits/comments to each other. One way of relating said edits/comments would be automatically assigned ID attributes to each talk page comment, as T230659 proposes.

Follow up questions

  • Can you think of more "assumptive" ways of approximating response rates? A few examples that come to mind:
    • When an edit [2] is made to a talk page, how often is that same page edited by a different user, within X hours/days?
    • When an edit [2] is made to a talk page, how often is that same section edited by a different user, within X hours/days?
    • When an edit [2] that contains a link to another user's user page is posted to a talk page, after how long does the "mentioned" user edit the talk page on which they were mentioned?
    • When an editor starts a new section on a page where __NEWSECTIONLINK__ is present, after how long, on average, are edits made to said section by users who are not the person who started it?
  • Do any of the constraints change now that we've moved comment parser code to server (T252252), and can theoretically identify comments made using full page editing after the fact? cc @Esanders
    • I ask this wondering whether it will unlock this constraint, "We do not know what comment a reply written using DiscussionTools is a response to after said reply has been published."

  1. https://www.mediawiki.org/wiki/Manual:Echo/en
  2. Ideally, we could be more specific and say, "When a comment..."

Posting an update on this task: During our weekly 1:1s, Peter and Megan are actively brainstorming on this problem statement :

Can you think of more "assumptive" ways of approximating response rates?