[SPIKE] Document talk page comment data structure
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• ppelberg
	Apr 14 2021, 1:33 AM

Description

As part of T264885, we started storing information about comments posted to talk pages.

This task is about documenting the following:

The information/metadata the software is now logging/storing about every comment posted to a talk page, regardless of the interface used to post said comments.
Where and how this "information"/"metadata" is being logged/stored

...so that we can determine what – if any – additional work might need to be done so we can use this new data we are logging to answer questions like those listed in the ===Use cases section of T262107's task description.

Open questions

1. What information/metadata is the software now logging/storing about every comment posted to a talk page, regardless of the interface used to post said comments?
- See: T280100#7174055.
2. Where/how is this information/metadata being logged/stored?
- -- See: T280100#7174055.

Done

All ===Open questions are answered

Related Objects
Search...

Status	Assigned	Task
Open	None	T233447 [OKR] Increase response rates
Duplicate	MNeisler	T280895 Evaluate impact of notification interventions
Resolved	• ppelberg	T262107 Create a hidden revision tag for talk page comments
Resolved	matmarex	T280100 [SPIKE] Document talk page comment data structure

Event Timeline

• ppelberg created this task.Apr 14 2021, 1:33 AM

• ppelberg moved this task from Incoming to Ready to Be Worked On on the Editing-team (Kanban Board) board.

• ppelberg mentioned this in T262107: Create a hidden revision tag for talk page comments.

This is relation to our expectation that we will soon generate an echo_event row for every comment posted.

• ppelberg updated the task description. (Show Details)Apr 16 2021, 3:58 PM

• ppelberg mentioned this in T280857: [SPIKE] Determine what – if any – changes are needed to access comment data.Apr 21 2021, 8:56 PM

• ppelberg moved this task from Backlog to Triaged on the DiscussionTools board.Apr 28 2021, 5:41 PM

• ppelberg mentioned this in T277349: [SPIKE] Determine what – if any – new instrumentation is needed for notifications.Jun 3 2021, 1:59 AM

matmarex claimed this task.Jun 23 2021, 2:08 PM

I wrote a documentation page that answers these questions (and others): https://www.mediawiki.org/wiki/Extension:DiscussionTools/How_it_works

Relevant fragment about data in Echo events:

Each event has the following properties:

(built-in in Echo) Page title

(built-in in Echo) Agent (user who caused the event, by leaving the comment)

(built-in in Echo) Section title

(built-in in Echo) Page revision

Subscription item name. (…)

New comment's ID and name. (…)

New comment's content, a snippet of which is shown in the notifications

List of users who were mentioned in the comment

This data is stored in one of Echo's database tables, however only the title and agent can be queried directly. Everything else is in a serialized blob.

We generate an event for every new talk page comment, regardless of whether anyone is subscribed to the thread it's in.

Relevant fragment about data in our own data structures:

DiscussionTools recognizes two kinds of items: headings and comments. (…)

Each item has the following properties:

ID and name, which are used to identify the item in different contexts

Range, referencing the HTML DOM nodes where it was detected. The range may begin or end in the middle of an element, and may span multiple elements in different parent nodes.

Indentation level (always 0 for headings, 1 for top-level comments, 2+ for replies)

References to parent item and reply items

Comments additionally have:

Signature ranges, as above, referencing the HTML DOM nodes of signatures

Author name

Date and time

Headings additionally have: (…)

This data structure is ephemeral and not stored anywhere. When it's needed, it is constructed from scratch from the page HTML. (The information is encoded back into the HTML in the formatter though, as described below.)

Given the above (Echo data not being queryable, and our own data not being stored at all), it probably can't be used to answer questions like in T262107, or at least not any interesting ones. But we have the data, and we could work on storing it in a different form to allow that.

In T280100#7174055, @matmarex wrote:

I wrote a documentation page that answers these questions (and others): https://www.mediawiki.org/wiki/Extension:DiscussionTools/How_it_works

Excellent, Bartosz.

Given the above (Echo data not being queryable, and our own data not being stored at all), it probably can't be used to answer questions like in T262107, or at least not any interesting ones. But we have the data, and we could work on storing it in a different form to allow that.

Understood.

For the time being, we'll consider T284200 the ticket where we'll spec the work required to store this data in a way that allows us to query/aggregate it.

• ppelberg reopened this task as Open.Jun 26 2021, 1:18 AM

• ppelberg updated the task description. (Show Details)

• ppelberg mentioned this in T284200: Implement a way to relate the components of a conversation.

• ppelberg closed this task as Resolved.Jun 26 2021, 1:22 AM