Page MenuHomePhabricator

Differentiate between events emitted from the Reply Tool and the New Discussion Tool
Closed, ResolvedPublic

Description

This task is about implementing a way for Editing Engineering and Product Analytics to differentiate between events that are emitted from the Reply Tool and events that are emitted from the New Discussion Tool.

Background

Currently, EditAttemptStep contains one integration value for DiscussionTool features. This value is called discussiontools.

To date, we've been able to assume all integration-discussiontools events are emitted from the Reply Tool.

Trouble is, once the New Discussion Tool is deployed, it will no longer be clear what editor integration – the Reply Tool or the New Discussion Tool – is responsible for the integration-discussiontools events we will be seeing.

Implementation details

Use the EditAttemptStep schema's existing init_type field to differentiate between events emitted from the Reply Tool and the New Discussion Tool.

Once implemented, events from the Reply Tool and New Discussion Tool should be logged as follows:

  • Reply Tool events: event.action = 'init', event.integration = 'discussiontools', event.init_type = 'page'
  • New Discussion Tool events: event.action = 'init', event.integration = 'discussiontools', event.init_type = 'section'

The above is the outcome of what was discussed in T265099#6561327, T265099#6571916 and T265099#6604344.

Done

  • The changes detailed in the ===Implementation details section are implemented so that we (Editing and Product Analytics) are able to differentiate between events emitted from the Reply Tool and events emitted from the New Discussion Tool.

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
OpenNone
OpenNone
Resolvedppelberg
ResolvedRyasmeen
Resolvedmatmarex
ResolvedEsanders
Resolvedmatmarex
ResolvedWhatamidoing-WMF
ResolvedWhatamidoing-WMF
Resolvedppelberg
Resolvedmatmarex
ResolvedWhatamidoing-WMF
ResolvedWhatamidoing-WMF
Resolvedmatmarex
ResolvedWhatamidoing-WMF
Resolvedmatmarex
ResolvedWhatamidoing-WMF
OpenNone
OpenNone
OpenNone
OpenMNeisler
Resolvedppelberg
Resolvedppelberg

Event Timeline

One option would be to use the existing init_type field, which is currently being sent as page but could be sent as section for this case. Advantage there is that it's just one event to change, and doesn't require any schema change. Disadvantage is that querying "event X in a session whose init was of type Y" is a more complicated query. It's consistent with how we currently log this sort of difference in VE/wikieditor though.

The other option is option is to make a new integration value, discussiontools-newtopic or whatever. Advantage: easier queries. Disadvantage: needs a schema change, becomes a (slightly) more complex query for things which you want to ask about that're common between comment replies and new topics.

JTannerWMF added a subscriber: JTannerWMF.

@MNeisler , David needs you to weigh in on an approach

Thanks @DLynch for outlining the options and @JTannerWMF .for the ping

I'm leaning towards the option of using the existing init_type field. While it might be a slightly more complicated query, I think being consistent with how these events are currently stored in the schema will avoid confusion.

Can you confirm if the following is correct to make sure I understand how the two types of events will be logged?

  • Reply Events: All events in sessions where event.action = 'init', event.integration = 'discussiontools', event.init_type = 'page'
  • Discussion Tool Events: All events in sessions where event.action = 'init', event.integration = 'discussiontools', event.init_type = 'section'

If by "Discussion Tool Events" you mean "New discussion section events", then yes.

If by "Discussion Tool Events" you mean "New discussion section events", then yes.

Yes, sorry I meant "New discussion section events". Thanks for confirming. Based on that, I think we should go ahead and use the existing init_type field for differentiating between the two event types.

Change 655548 had a related patch set uploaded (by DLynch; owner: DLynch):
[mediawiki/extensions/DiscussionTools@master] Give new-section a specific init_type to distinguish it

https://gerrit.wikimedia.org/r/655548

Change 655548 merged by jenkins-bot:
[mediawiki/extensions/DiscussionTools@master] Give new-section a specific init_type to distinguish it

https://gerrit.wikimedia.org/r/655548

@MNeisler, I'm assigning this over to you for data QA, assuming that this is necessary for this task.

If you do not think data QA is necessary, please assign the task back over to me to close out.

LGoto triaged this task as Medium priority.Jan 26 2021, 6:06 PM
MNeisler moved this task from Doing to Needs Sign-off on the Product-Analytics (Kanban) board.

I've reviewed the eventlogging data and confirmed we can differentiate between new discussion tool and reply tool events in EditAttemptStep.

Below are some data regarding new discussion tool events logged since the deployment of the change to the init_type field on Jan 12th. These are events that were recorded as event.action = 'init', event.integration = 'discussiontools', event.init_type = 'section'.

  • The first new discussion tool event was recorded on 21 January 2021.
  • Since then, a total of 24 new discussion tool init events have been recorded in EditAttemptStep (7 on cswiki, 1 on cswikinews, and 16 on enwiki)
  • A total of 6 of these new discussion events have been saved (met saveSuccess).

Please see QA notebook for further details regarding the queries and data.

@ppelberg - Reassigning to you for sign-off. Let me know if you have any questions.

I’ve created two new discussions with the tool on Wikidata so far (first, second). Is it normal that they don’t appear in the statistics?

@Tacsipacsi Our logging is "sampled", meaning we only log a fraction of the actions people take. It's generally sampled at 1/16 of sessions, so it's very plausible that you've just not been logged there. As you can see, the drawback to this arrangement is that in low-traffic situations we might not get a good representative sample, so it's certainly possible that we might want to increase our sample rate on less busy wikis.

I've reviewed the eventlogging data and confirmed we can differentiate between new discussion tool and reply tool events in EditAttemptStep.
Please see QA notebook for further details regarding the queries and data.

@ppelberg - Reassigning to you for sign-off. Let me know if you have any questions.

Thank you, @MNeisler – no questions from me at this time related to this QA. Although, see below for a question in response to the point @DLynch raised in T265099#6784140.

...As you can see, the drawback to this arrangement is that in low-traffic situations we might not get a good representative sample, so it's certainly possible that we might want to increase our sample rate on less busy wikis.

  • @MNeisler: do you think it is necessary to do what David is describing above? I ask this assuming the people start new topics on talk pages significantly less frequently than they do comment/reply in existing ones.

I've reviewed the eventlogging data and confirmed we can differentiate between new discussion tool and reply tool events in EditAttemptStep.
Please see QA notebook for further details regarding the queries and data.

@ppelberg - Reassigning to you for sign-off. Let me know if you have any questions.

Thank you, @MNeisler – no questions from me at this time related to this QA. Although, see below for a question in response to the point @DLynch raised in T265099#6784140.

...As you can see, the drawback to this arrangement is that in low-traffic situations we might not get a good representative sample, so it's certainly possible that we might want to increase our sample rate on less busy wikis.

  • @MNeisler: do you think it is necessary to do what David is describing above? I ask this assuming the people start new topics on talk pages significantly less frequently than they do comment/reply in existing ones.

@MNeisler and I just talked about this. Megan agrees it would be worthwhile to do what @DLynch is describing: to increase the sampling rate from 1 out of 16 events to 100% of events.

Thinking: we implemented 1/5th sampling in T250086 for the Reply Tool and we assume the New Discussion Tool will be used less frequently than the Reply Tool, thus the assumed need for a higher sampling rate.

Notes

  • We should revisit this sampling rate before the New Discussion Tool is made available by default at larger Wikipedias.

Next steps

  • @ppelberg to file a ticket for increasing sampling rate in the New Discussion Tool to 100%

Just as a note, we don't currently have a convenient setting for oversampling the logging for just one part of DiscussionTool. It's all-features or nothing, at the moment.

Just as a note, we don't currently have a convenient setting for oversampling the logging for just one part of DiscussionTool. It's all-features or nothing, at the moment.

Understood, David. I've filed a new ticket (linked below) where we can work out the specifics about how the oversampling should be adjusted.

  • @ppelberg to file a ticket for increasing sampling rate in the New Discussion Tool to 100%

See: T273946.