Implement a way to relate the components of a conversation
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	ppelberg
	Jun 3 2021, 1:58 AM

Description

This task involves the work with relating replies to comments and comments to topics in such a way that this information/data can be used, at scale, to evaluate the impact topic subscriptions (T273920) are having on the rates and speeds with which people receive responses to the topics they starts and comments they post on talk pages.

Requirements

Behavior

Talk page edits are categorized and related such that we can answer the questions below. Note: these questions are borrowed from T280895.
- "How much time, on average, elapses between when someone posts on a talk page (e.g. start a new topic, comments in an existing one) and another person responds to them?"
- "What percentage of comments and headings receive a response from another person within __hours/days of being posted?"
- "On average, on how many comments do topics started on Junior Contributor talk pages receive?"
- "On average, how long does it take for someone to get a response to a topic they posted and/or conversation they started?"

Meta

The logic we are implementing as part of this task should act on/be applied to all talk page edits, regardless of the editing interface someone used to publish said edits.
- Said another way: whether someone posted a comment using the Reply Tool or full-page wikitext editing should *not* affect how said edit is categorized. We met a similar requirement as part of implementing Topic Subscriptions (T263820).

Open questions

What – if any – other requirements will need to be met for @MNeisler/Product Analytics to aggregate/query the data we are already tracking.
- See T280100#7174055 for more information about the data we are already tracking.

To answer the questions identified above, the instrumentation will need to include:

A way to distinguish the components of the conversation we decide to track (topic, comment, response)
Unique topic identifier to relate comments to topics
Unique comment identifier to relate response to comments

Done

The instrumentation needed to fulfill the requirements above is implemented
Any additional tickets are filed (e.g. a ticket for QA)

Related Objects
Search...

Status	Assigned	Task
Open	None	T233447 [OKR] Increase response rates
Duplicate	MNeisler	T280895 Evaluate impact of notification interventions
Open	None	T284200 Implement a way to relate the components of a conversation

Event Timeline

ppelberg created this task.Jun 3 2021, 1:58 AM

ppelberg mentioned this in T277349: [SPIKE] Determine what – if any – new instrumentation is needed for notifications.

ppelberg removed MNeisler as the assignee of this task.Jun 3 2021, 5:05 PM

ppelberg moved this task from Incoming to Upcoming on the Editing-team (Kanban Board) board.

ppelberg added subscribers: DLynch, MNeisler.

The definition of comment is incorrect, I believe. All initial responses to a topic are indented once. Only the actual topic-starting text is not indented.

To illustrate this, here's some annotated source I copied from Help_talk:Talk_pages.

Technically our data model treats the text of the topic as just the first comment. You can see this with dtdebug=1:

ppelberg mentioned this in T262107: Create a hidden revision tag for talk page comments.Jun 12 2021, 12:38 AM

In T284200#7152688, @DLynch wrote:

The definition of comment is incorrect, I believe. All initial responses to a topic are indented once. Only the actual topic-starting text is not indented.

@DLynch before traveling further down the path of revising the definitions I proposed in the task description, can you share how you think about these two questions?

What about how the software is currently written would constrain our ability to answer these questions?
- A) "For all comments and new topics, how much time, on average, elapses between when someone posts on a talk page and another person responds?"
- B) "For all comments and topics posted after a certain date, what percentage of said "comments" and "topics" receive a response from another person within __hours/days?"
What – if any – changes could we make to help us more accurately/complete answer "A)" and "B)" above?

What about how the software is currently written would constrain our ability to answer these questions?

For both of these, we're not currently storing data in a way that would be helpful. Technically all this is present in the echo_event table, but relating the initial topic-event to the reply-event would be challenging (there's a comment-id and parent-id stored, but it's in the JSON-blob part of the echo_event row). It could also be extracted by analyzing the wikitext on the page at any given moment.

What – if any – changes could we make to help us more accurately/complete answer "A)" and "B)" above?

We should probably implement some logging to a schema. We can do logging at the same time as the echo_event rows are created, just sending the data in a more convenient fashion for us. We could directly store a time-since-parent-comment value, if you don't mind us trusting what the wikitext says about the parent's timestamp. (If not, @MNeisler could presumably look up the logged row for the parent -- assuming that we're doing 100% logging and not sampling.)

ppelberg moved this task from Upcoming to Incoming on the Editing-team (Kanban Board) board.Jun 26 2021, 1:16 AM

ppelberg mentioned this in T280100: [SPIKE] Document talk page comment data structure.

ppelberg updated the task description. (Show Details)Jun 26 2021, 1:19 AM

ppelberg updated the task description. (Show Details)

ppelberg moved this task from Incoming to Ready for Sign Off on the Editing-team (Kanban Board) board.Jun 30 2021, 5:30 PM

ppelberg moved this task from Ready for Sign Off to Incoming on the Editing-team (Kanban Board) board.

On the definitions:

What you called "direct response", I would call "reply"
What you called "comment", I would call "top-level comment"

That's how I described the structure in the recent doc page: https://www.mediawiki.org/wiki/Extension:DiscussionTools/How_it_works#Data_structures

ppelberg updated the task description. (Show Details)Jun 30 2021, 11:30 PM

Ok, I want to parse out – what I see as – the two open threads we have open in this ticket:

Definitions: converging on semantic definitions for each of the unique elements/components of a talk page conversation that make up its structure
Implementation: deciding on the approach for storing said "talk page structure"

Note: to avoid confusion and simplify this task, I've revised the task description to describe the behavior being asked for more generically; I've also removed the DEFINITIONS section.

1. Definitions

In T284200#7188117, @matmarex wrote:

On the definitions:

What you called "direct response", I would call "reply"

What you called "comment", I would call "top-level comment"

That's how I described the structure in the recent doc page: https://www.mediawiki.org/wiki/Extension:DiscussionTools/How_it_works#Data_structures

This is a big help, @matmarex.

Can you comment what – if anything – is inaccurate and/or incomplete about the two statements below?

NO changes need to be made the current Data structure to satisfy the task description's ===Requirements.
In order to expose the number of comments within a conversation, we will likely need to extend the current Data structure to include a third "item", topics. Although, the work to implement this change does NOT need to happen as part of this task. Instead, it can happen as part of T269950.

2. Implementation

In T284200#7160082, @DLynch wrote:

What about how the software is currently written would constrain our ability to answer these questions?

For both of these, we're not currently storing data in a way that would be helpful. Technically all this is present in the echo_event table, but relating the initial topic-event to the reply-event would be challenging (there's a comment-id and parent-id stored, but it's in the JSON-blob part of the echo_event row). It could also be extracted by analyzing the wikitext on the page at any given moment.

Understood.

What – if any – changes could we make to help us more accurately/complete answer "A)" and "B)" above?

We should probably implement some logging to a schema. We can do logging at the same time as the echo_event rows are created, just sending the data in a more convenient fashion for us. We could directly store a time-since-parent-comment value, if you don't mind us trusting what the wikitext says about the parent's timestamp. (If not, @MNeisler could presumably look up the logged row for the parent -- assuming that we're doing 100% logging and not sampling.)

@DLynch + @MNeisler: regarding actually, "...implement[ing] some logging to a schema..." does the below seem like the right order of operations to y'all?

Converge on shared definitions for the components of the conversation we are wanting to track.
Agree on/acknowledge any "lossyness" we may need to accept between how we'll have semantically defined the components of the conversation and what we'll be able to codify (read: represent in the software)
Design the schema that will store the talk page structure we'll have specified in the preceding steps
Implement the schema
QA the schema/event logging

ppelberg updated the task description. (Show Details)Jul 1 2021, 12:46 AM

matmarex moved this task from Backlog to Triaged on the DiscussionTools board.Jul 1 2021, 2:58 PM

MNeisler updated the task description. (Show Details)Jul 13 2021, 12:34 PM

ppelberg added a parent task: T286076: Implement topic subscription instrumentation.Jul 21 2021, 8:33 PM

ppelberg removed a parent task: T280895: Evaluate impact of notification interventions.

matmarex mentioned this in T290803: Duplicate notifications if a topic is archived by a user who posted in it.Sep 20 2021, 3:17 PM

ppelberg updated the task description. (Show Details)Oct 21 2021, 3:21 PM

ppelberg merged a task: T274834: Make it so comments are related to one another in talk page data structure.

ppelberg edited parent tasks, added: T280895: Evaluate impact of notification interventions; removed: T286076: Implement topic subscription instrumentation.Nov 24 2021, 9:36 PM

ppelberg mentioned this in T296801: Introduce a data structure for tracking talk page comments.Dec 1 2021, 1:50 AM

ppelberg edited projects, added Editing-team; removed Editing-team (Kanban Board).Dec 11 2021, 1:59 AM

ppelberg moved this task from Untriaged to This Fiscal Year on the Editing-team board.

	F34495951: image.png
	Jun 11 2021, 10:07 PM

	F34495961: image.png
	Jun 11 2021, 10:07 PM

	F34495958: image.png
	Jun 11 2021, 10:07 PM

Implement a way to relate the components of a conversationOpen, Needs TriagePublicActions