Page MenuHomePhabricator

[SPIKE] Determine what – if any – changes are needed to access comment data
Closed, ResolvedPublic

Description

As part of T264885, we started storing information about comments posted to talk pages. In T280100, we documented how the data about these "comments" are being stored.

This task is about determining what – if any – changes ought to be made how data about talk pages comments are stored so we can answer analysis questions like the ones listed below.

Analysis questions

This section contains the *kinds* of questions we will want to be able to answer with this new comment data. Some of the questions below from T274215 and T262107.

  • "On average, how many comments are posted to sections on article talk pages, grouped by project and experience level?"
  • "On average, how much time elapses between when "User A" creates a new section on a talk page (article, user, or otherwise) and "User B" posts a comment in that same section?"
  • "On average, how many comments do Junior and Senior Contributors post to talk pages within a defined timeframe?"

Instrumentation needs

TBD

Done

  • The ===Instrumentation needs section above contains either:
    • A) Specifications for how what was implemented in T264885 [and documented in T280100] needs to be adjusted in order to answer the questions listed in the ===Analysis questions above OR
    • B) Confirmation that no adjustments to what was implemented in T264885 [and documented in T280100] in order to answer the questions listed in the ===Analysis questions above

Event Timeline

ppelberg moved this task from Backlog to Triaged on the DiscussionTools board.
ppelberg moved this task from Backlog to Analytics on the Editing-team (Tracking) board.

Next Steps

  • @MNeisler to confirm whether the new Talk_page_event schema will enable us to answer the question below or whether additional instrumentation is needed.

"On average, how much time elapses between when "User A" creates a new section on a talk page (article, user, or otherwise) and "User B" posts a comment in that same section?"

"On average, how much time elapses between when "User A" creates a new section on a talk page (article, user, or otherwise) and "User B" posts a comment in that same section?"

@ppelberg - I've confirmed that the new talk_page_edit schema will enable us to answer this question.

For reference, I'm documenting the field that would be used to answer this question below:

  • distinct logged-in users ("User A vs User B"): performer.user_id
  • Creates new section:
      • action = publish : new section posted
      • comment_id: Unique identifier of the comment that the user just posted.
      • comment_parent_id: If this is a top-level comment this will be the identifier of the heading (topic_id)
    • meta.dt: timestamp of event
  • Posts a comment in that same section:
    • action = publish: comment published
    • comment_id: Unique identifier of the comment that the user just posted.
    • topic_id and comment_parent_id: use to identify if the posted comment is in the same section or new topic that "User A" created
    • meta.dt: timestamp of event.

Furhter documentation of the talk_page_edit instrumentation is available in the schema repo and instrumentation spec

"On average, how much time elapses between when "User A" creates a new section on a talk page (article, user, or otherwise) and "User B" posts a comment in that same section?"

@ppelberg - I've confirmed that the new talk_page_edit schema will enable us to answer this question.

Excellent. You confirming the above is the information we needed to remove this task as a blocker of T284848.

Resolving this as topic subscriptions has been instrumented and the analysis this work refers to was completed.