Page MenuHomePhabricator

CX event instrumentation: basic topic selection
Closed, ResolvedPublic

Description

The following user actions to be instrumented to understand user actions related to basic topic selection

screenrelated touser actionevent_typeevent_subtypeevent_source(s)event_context (str)
dashboard homecustom suggestionsuser selects a quick alternative to the selected filter: "For you" or "Popular"dashboard_suggestion_filters_quick_select-yes, corresponding source from list below
dashboard homecustom suggestionsusers selects to view "---More" filters for translation suggestionsdashboard_suggestion_filters_view_more---
Adjust suggestionscustom suggestions (single_selection)user selects a suggestions filtersuggestion_filters_selectsuggestion_filters_single_selectyesname of the topic area or the collection selected (as a string)
Adjust suggestionscustom suggestions (single_selection)user confirms the selected topic (i.e. user clicks "Done" button)suggestion_filters_confirmsuggestion_filters_single_select_confirm-in case of single-select, this will just be a string

The following adjustments have to be made for to existing event

screenrelated touser actionevent_typeevent_subtypeevent_source(s)event_context (str)contextual fieldsnotes
dashboard home -> translation startgeneral workflowusers selects an article from the suggestions and proceeds to translatedashboard_translation_start-yes, corresponding source to be captured (from the list below)in case of single-select, this will just be a stringtranslation_typeexamples: "Women in science 2024", "South America"
dashboard homegeneral workflowuser opens the dashboard by directly accessing a URL with pre-selected filtersdashboard_open-suggestion_filter_direct_preselectFilters active at the time of opening the dashboard, to be recorded as semi-colon separated stringsexamples: "related_edits; nearby_topics; art; india", "all_lists; fashion; food and drink", "related_edits; Women in science week"

Exhaustive list of high-level suggestion filter groups (to be captured as event source):

  • suggestion_filter_previous_edits
  • suggestion_filter_topic_area
  • suggestion_filter_collections (T378958)
  • suggestion_filter_vital_articles (T374597)
  • suggestion_filter_search_result_seed (T369595)

Spreadsheet for reference

Event Timeline

KCVelaga_WMF moved this task from Incoming to Engineering on the LPL Analytics board.

Just to clarify the following part from ticket description, the second table:

The following adjustments have to be made for to existing event

I would like to understand exactly what are the adjustments needed. Is it what's stated in the event_context column for each row?

dashboard_translation_start -> For community lists or the topic areas, the list name or the topic area should be recorded as string in the event_context
dashboard_open -> Filters active at the time of opening the dashboard, to be recorded as semi-colon separated strings

Thanks

@eamedina

dashboard_translation_start -> For community lists or the topic areas, the list name or the topic area should be recorded as string in the event_context
dashboard_open -> Filters active at the time of opening the dashboard, to be recorded as semi-colon separated strings

Yes, that's right. And also, the new event sources should be added for the corresponding events.

@eamedina

A couple of changes:

  • Renamed the sources to suggestion_ to instead of suggest_ to be consistent with existing event sources.
  • Removed the sources related to lists as those features not present yet, and given the product terminology is still being finalized.

Change #1082504 had a related patch set uploaded (by Eamedina; author: Eamedina):

[mediawiki/extensions/ContentTranslation@master] [WIP] CX event instrumentation: basic topic selection

https://gerrit.wikimedia.org/r/1082504

@KCVelaga_WMF sorry for jumping in so late, but after coming to review the related patch, I found the new schema "fragment" to be very confusing, even misleading.

To start with I find the namings to not represent the actual events properly, as all of them are missing the "filter" keyword, which is the core of these events. For example, dashboard_suggestions_selection makes me think about an event related to selecting a suggestion, not a filter. Additionally, all the other event types use a verb (e.g. dashboard_translation_start, dashboard_open), so I also think that name is inconsistent.

Moreover, the event_context field is more suitable to the approach that Metrics Platform instruments use. With the current schema, we have defined a field for every single "context" field (e.g. translation_provider), so I believe that we can define specific fields for these events too. I would suggest to create a new translation_suggestion_filters field and use it for all the events related to this task. Given that all filters are uniquely described by a filter-id and a filter-type, this field ideally should be an array of objects like this:

translation_suggestion_filters: [{ suggestion_filter_type: 'automatic', suggestion_filter_id: 'popular' }, { suggestion_filter_type: 'topics', suggestion_filter_id: 'engineering' }],

but I'm not sure if array of objects are supported by the infrastructure. If not, we can explore other ways like: translation_suggestion_filters: [ "automatic;popular", "topics;engineering"], etc. I don't consider this implementation detail to be very important.

Given that the additions to the schema have not yet been merged and deployed (1.6.0 is still the current version of the schema), my suggested fragment would be:

screenrelated touser actionevent_typeevent_subtypeevent_source(s)translation_suggestion_filters (str)
dashboard homecustom suggestionsuser selects a quick alternative to the selected filter: "For you" or "Popular"dashboard_suggestion_filters_select- (no need for event subtype as there is no other case where this event is logged)(no need for event source, we can get the selected filter from the translation_suggestion_filters field)automatic;popular, automatic;previous-edits, topics;africa, etc. Note: topic filters can also be present (and thus be selected) inside the quick filters menu, when they have been previously selected from the "More filters" dialog
dashboard homecustom suggestionsusers selects to view "---More" filters for translation suggestionsdashboard_more_suggestion_filters---
Adjust suggestionscustom suggestions (single_selection/multiple)user selects (tentatively) any filter from the "More filters" dialogsuggestion_filters_select- (no need for event_subtype, this is always a single selection as the event is logged after the user has clicked on one single filter option)-e.g. topics;south-america
Adjust suggestionscustom suggestions (single_selection)user confirms the selected topic (i.e. user clicks "Done" button)suggestion_filters_confirm- (no need for event_subtype, we can determine if this is a single or multiple filter selection by the length of the translation_suggestion_filters field-e.g. automatic;popular+topics;africa

Finally, I'm also thinking that since we have a suggestion_filters_select event that is logged when users click on a filter (to temporarily select it), we may also need a suggestion_filters_deselect to log the opposite action, when user clicks on an already selected filter to deselect it.

Please let me know what your thoughts are about the above proposals.

@ngkountas thanks for the suggestions!

To start with I find the namings to not represent the actual events properly, as all of them are missing the "filter" keyword, which is the core of these events. For example, dashboard_suggestions_selection makes me think about an event related to selecting a suggestion, not a filter. Additionally, all the other event types use a verb (e.g. dashboard_translation_start, dashboard_open), so I also think that name is inconsistent.

That makes sense. Thanks for pointing that all other event types use a verb - something to remember and document in the schema README for future. I will make the change along with others as needed.

Moreover, the event_context field is more suitable to the approach that Metrics Platform instruments use. With the current schema, we have defined a field for every single "context" field (e.g. translation_provider), so I believe that we can define specific fields for these events too.

That's a good point. However, I am leaning towards introducing the context field, to capture label(s) of the filter(s) selected. I will explain more about this below.

translation_suggestion_filters: [{ suggestion_filter_type: 'automatic', suggestion_filter_id: 'popular' }, { suggestion_filter_type: 'topics', suggestion_filter_id: 'engineering' }], but I'm not sure if array of objects are supported by the infrastructure. If not, we can explore other ways like: translation_suggestion_filters: [ "automatic;popular", "topics;engineering"], etc. I don't consider this implementation detail to be very important.

I don't suggest using either of these approaches. Arrays are not well supported by Event Platform. The second approach makes it quite hard for analysis. The reason being, that will end up as string in the Hive table, and filtering based on that is not straight forward. For example, it is much simpler & faster to filter events where users selected an article from collections if it is present in event_source, rather than having to split the filter string in a sub-query using REGEX, creating a temporary view/column and then using that to filter. As event data is fairly large, this will likely cause issue with reporting on Superset, as the query time out is currently set to 180 secs. Of course, we can setup a pipeline to process the data, but that adds an additional layer of complexity.

I think we can use event_source for high level sources of various groups of suggestions, as that is a limited list (and this is an enum field). We have topic areas, related edits, vital articles etc. For measurements related to the hypothesis, and for future analysis work such as mapping the user funnel, there will more frequent need to aggregate by these high level sources. It is less important for us to capture whether a source is automatic or manually generated, as that can easily be differentiated while doing the analysis. If you think event_source might not be the best column, we can have another context column, say translation_suggestions_group which can be used to capture the high level grouping such as popular topics, vital articles, topics, collections, and then another context field to capture the labels where applicable, such as engineering, Africa, WikiProject Science etc. But from analysis standpoint, it is very essential capture these in two separate fields. We want to avoid data stuffing apart from simple strings.

I am slightly partial towards using event_context to instead of introducing a separate field specific to suggestions, to capture the label of the filters wherever applicable. The purpose is similar to what we have in Metrics Platform, which is to add additional context to the event subtype or source. Yes, in the schema currently, we have defined each context field separately, but I think having an generic context field that can be shared multiple event types is better. However, I don't have a strong opinion about using a generic vs. a specific context field.

To summarize, it brings us to two approaches:

For high level grouping such as related_edits, topic_areas, collections, vital articles etc.
A. have a new context field: translation_suggestion_group (enum)
B. use event_source (enum)

For further selected specific topics or lists like engineering, Africa, WikiProject Science etc.
A. have a new specific context field: translation_suggestion_filter_labels (str)
B. have a generic context field: event_context (str)

event_type: dashboard_suggestion_filters_select
event_subtype: no need for event subtype as there is no other case where this event is logged

Sounds good to me.

event_type: suggestion_filters_select
event_subtype: no need for event_subtype, this is always a single selection as the event is logged after the user has clicked on one single filter option

Sounds good to me.

event_type: suggestion_filters_confirm
event_subtype: no need for event_subtype, we can determine if this is a single or multiple filter selection by the length of the translation_suggestion_filters field

In this case, it is better to have an event subtype to differente between single select and multi select. Not having that will lead to similar problem I explained above - having to split the string, then count, and create a temporary column in SQL based on that differentiate between single vs. multi, instead of simply using the event_subtype for aggregations.

Finally, I'm also thinking that since we have a suggestion_filters_select event that is logged when users click on a filter (to temporarily select it), we may also need a suggestion_filters_deselect to log the opposite action, when user clicks on an already selected filter to deselect it.

Good point. This will be more essential in multi-select mode.

Given that the additions to the schema have not yet been merged and deployed (1.6.0 is still the current version of the schema)

The changes were merged, and it is now 1.7.0. As we didn't start using it, it should be fine to revert the change.

As this task is for the instrumentation work, but not the schema itself, I have summarized the changes at T373785#10329768.

Thanks KC and Nik, I have updated the description on this phabricator ticket to reflect the new changes, feel free to review and update as needed. I believe the instrumentation related to selecting multiple filters will have to wait for T369268, but leaving as is for now, we can move that part of instrumentation to a separate ticket later.

Expanding on my last comment above, for example, the event type suggestion_filters_deselect seems like another candidate to include after T369268 because 'de-selecting' a filter is not really possible with the current code (unless I'm missing something). Does it make sense to extract everything regarding instrumentation that depends on multiple filters selection into another separate ticket? Happy to create it myself if confirmed.

Yes, the real use case for suggestion_filters_deselect will only be in the muti-select mode.

Created T380538 for multi-selection instrumentation

eamedina updated the task description. (Show Details)

Change #1082504 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] CX event instrumentation: basic topic selection

https://gerrit.wikimedia.org/r/1082504

Change #1094472 had a related patch set uploaded (by Nik Gkountas; author: Nik Gkountas):

[mediawiki/extensions/ContentTranslation@master] CX3 Build 0.2.0+20241122

https://gerrit.wikimedia.org/r/1094472

Change #1094472 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] CX3 Build 0.2.0+20241125

https://gerrit.wikimedia.org/r/1094472

SBisson subscribed.

This has been deployed with the train this week.

@KCVelaga_WMF you can now verify that the events are correct.

QA done! Everything looks fine.

dashboard_suggestion_filters_quick_select & dashboard_suggestion_filters_view_more

suggestion_filters_select & suggestion_filters_confirm

event sources dashboard open events

direct pre-select filter

From the logs:

event_typeevent_count
dashboard_suggestion_filters_view_more13
dashboard_suggestion_filters_quick_select10
suggestion_filters_select8
suggestion_filters_confirm4

Event sources for dashboard_suggestion_filters_quick_select

event_sourceevent_count
suggestion_filter_previous_edits5
suggestion_filter_popular_articles5

Event sources for suggestion_filters_select

event_typeevent_count
suggestion_filter_collections3
suggestion_filter_topic_area3
suggestion_filter_popular_articles1
suggestion_filter_previous_edits1

Event context for suggestion_filters_select

event_contextevent_count
tv-and-film1
biography1
asia1

Event context for suggestion_filters_confirm

event_typeevent_count
architecture1
previous-edits1
women1
collections1

Event sources for dashboard_translation_start

event_typeevent_count
previous-edits35
popular3
computers-and-internet1
food-and-drink1

I realized that I missed including an event for user closing the suggestion filters menu (instead of confirm). Is it possible to do it as part of this? If not, I can create a separate ticket. However, this is a low priority as it can be inferred. The event would be

event_type - suggestion_filters_close

I realized that I missed including an event for user closing the suggestion filters menu (instead of confirm). Is it possible to do it as part of this? If not, I can create a separate ticket. However, this is a low priority as it can be inferred. The event would be

event_type - suggestion_filters_close

Good catch @KCVelaga_WMF, I think it makes sense to include it as part of this ticket. However I believe that event type was not included in the last schema update, could you please create a new MR to include it? I can instrument the code afterwards.

I'm also okay with creating a new ticket if it makes sense to move this one along.

Good catch @KCVelaga_WMF, I think it makes sense to include it as part of this ticket. However I believe that event type was not included in the last schema update, could you please create a new MR to include it? I can instrument the code afterwards.

I actually already created another ticket for that T381270, and created an MR for the same.