Page MenuHomePhabricator

Newcomer tasks: topic matching instrumentation
Open, Needs TriagePublic

Description

The newcomer tasks feature is already instrumented, per the work done in T230068: Newcomer tasks: instrumentation. In adding topic matching to the workflow, we will need to extend the existing instrumentation. Since this is a modification to an existing instrumentation effort and not a whole new effort, we'll try to use this one task to track all the steps.

  • Changes to EventLogging schema
  • Code for instrumentation
  • Post-deployment QA by data analyst
  • Publish changes to measurement plan

The specifications are recorded in the existing measurement plan for newcomer tasks at these headers:

  • Measurements: the numbers we want to be able to produce
  • Rules: specific interactions to be recorded. These are also pasted below for convenience.

The measurements and rules were written with the use case of ORES models in mind -- not with the interim "morelike" approach. These should be pretty compatible in terms of instrumentation, but if they're not, we should discuss on this task.

Additional rules for version 1.1 (topic matching)

  • Intro overlay: version 1.1 will evolve the topic overlay from purely informative to including options to set topics of interest.
    • We want to record the topics the user has chosen once they advance beyond this overlay. We should also record which topics the user chose from above the “show more” link, and which they chose from below the “show more” link. Individual clicks on topics do not need to be recorded, and no record of topics chosen needs to be made if the user cancels the overlay.
    • We should record as a separate event that the user clicked to “show more” in the overlay.
  • Topic filters:
    • topic open: the user clicks to open the filter selection to change topics. We record the topic selection upon open.
    • topic done: the user clicks “done” on the filter selection to change topics. We record the topic selection and the number of matching suggestions. We should also record which topics the user chose from above the “show more” link, and which they chose from below the “show more” link.
    • topic close: the user clicks “cancel” on the filter selection to change topics. We record the topic selection and the number of matching suggestions.
    • We should record as a separate event that the user clicked to “show more” in the overlay.
  • Task impression and click: in addition to the data being recorded for these events for version 1.0, we should add the following elements.
    • topic assignment: the topic associated with that task. This should be determined as the highest-scoring topic that the user has selected a filter for. For example, if an article scores 0.96 for Physics and 0.83 for Chemistry, and the user has selected “Chemistry” in their topic filters, then we should record Chemistry. If the user has selected “Physics” and “Chemistry”, we should record Physics.
    • match score: any sort of match confidence or score available with the topic.

Event Timeline

Tgr added a comment.Jan 7 2020, 1:06 AM

Note that we won't have the topics associated with a task with the morelike backend. We'll have a match score, but it will be a single score that combines likeness to all the topics the user has selected (and possibly the task types as well; not sure how the hastemplate: keyword is implemented).

@Tgr -- okay, then I guess we'll need to leave parts of the schema blank for records during the morelike era.

Tgr added a comment.EditedJan 11 2020, 1:52 AM

So we need a new se-cta-more-click and se-topicfilter-more-click for clicking the more button on the topic list (in the initiation dialog and the topic filter, respectively), se-topicfilter-open, se-topicfilter-done, se-topicfilter-cancel to match the similar task filter events,

We record the task type set for every suggested edit related event, so it would make sense to always record the topic set as well, even though the rule specification doesn't explicitly say that. (The result count is also always recorded, so we don't need to do anything about that.) There is not much point in always reporting the above / below the fold flag though, so that should probably be in the action data for se-topicfilter-more-click and se-topicfilter-done, maybe as the set of above-the-fold topics (a bit more verbose than saving just storing at which position the fold was, but a bit easier to handle when doing analytics).

The "Task impression and click" is left for the future (the current backend can't return topics, and so we don't have any handling for them in the backend nor in the JS code.

no record of topics chosen needs to be made if the user cancels the overlay

What should be done if the user clicks continue (proceeds to the difficulty step) and cancels there? Should we delay logging until the user saves the whole dialog, so we can avoid recording topics which are not ultimately saved?

@Tgr -- good question. I think we can do whatever is easier. There is nothing wrong with recording the topics if the user passes the first overlay but doesn't pass the second. We just don't intend to look at them, so I thought it's a place we might save some work and some space. But if it's easiest to record topics just from passing the first overlay, that's fine.

Tgr added a comment.Jan 11 2020, 2:10 AM

I guess we have to delay saving the user preferences anyway, so it's no extra effort to do the same with the logging.

Tgr claimed this task.Jan 12 2020, 8:07 PM
Tgr added a comment.EditedJan 13 2020, 2:51 AM

Schema changes: diff

Change 563972 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@master] Newcomer tasks: Topic matching instrumentation

https://gerrit.wikimedia.org/r/563972

Change 563972 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] Newcomer tasks: Topic matching instrumentation

https://gerrit.wikimedia.org/r/563972

Back to in progress - after the change in T242560: Newcomer tasks: task suggestions fail because of search queries exceeding length limits, we can actually log per-task topics / match scores.

Tgr added a comment.Jan 14 2020, 12:22 AM

Back to in progress - after the change in T242560: Newcomer tasks: task suggestions fail because of search queries exceeding length limits, we can actually log per-task topics / match scores.

That is the "Task impression and click" top-level bullet point item in the task description. Everything else should be working.

@Tgr -- I did a little testing today, and I think everything looks good so far. My only question is I see that we record all the topics selected and separately, record, the list that are above the fold. There is no separate list for those that are below the fold. That's deliberate, because we'll be able to deduce that from the two lists? If so, that sounds fine. Let us know when the final instrumentation piece is in place!

Yeah, I figured it's easy to get that via negative conditions so we can as well as spare the storage space / bandwith (while sparing even more by only storing the cut position within the topic array, or something like that, would be annoying when doing analytics).

The last instrumentation patch is now up for review.

Tgr added a comment.Jan 14 2020, 11:43 PM

Schema diff for the last patch.

Note that scores provided via the remote task suggester are completely fake. There doesn't seem to be any way to get scores via the core search API (I'll file a task about maybe changing that eventually), so I figured it's better for testing if we log some score even if it is made up. Production will have real scores.

kostajh updated the task description. (Show Details)Jan 15 2020, 11:15 AM

Checked in cswiki wmf.15 after the hidden preference deployment - the newly added eventlogging events are present (checked in the Console).

@Tgr -- I tested out EventLogging yesterday in production by clicking around, and then looked to see the corresponding events in the data lake today. I see the following two issues. The first one should be addressed before releasing to users, if possible.

There is no indication in the action_data of what topic is related to article being shown or clicked. In other words, I don't see any data related to this specification:

  • Task impression and click: in addition to the data being recorded for these events for version 1.0, we should add the following elements.
    • topic assignment: the topic associated with that task. This should be determined as the highest-scoring topic that the user has selected a filter for. For example, if an article scores 0.96 for Physics and 0.83 for Chemistry, and the user has selected “Chemistry” in their topic filters, then we should record Chemistry. If the user has selected “Physics” and “Chemistry”, we should record Physics.
    • match score: any sort of match confidence or score available with the topic.

Here is an example of the action_data from a se-task-impression event:

taskTypes=copyedit,links;topics=history,sports;taskCount=200;taskType=copyedit;maintenanceTemplates=Kdo?;hasImage=true;ordinalPosition=4;pageviews=3127;pageTitle=Ostravar Aréna;pageId=175411;revisionId=18023234

There should not be a close event logged when the user activates the suggested edits module on mobile

Right now, when the user clicks the button on the second overlay to initiate the suggested edits module, there is a close event logged for the start module. That event shouldn't happen because the user didn't really close the start module, although it did disappear in favor of the suggested edits module.

Tgr added a comment.Jan 17 2020, 9:40 PM

The patch for logging topic / match score is not in production yet, it was merged on Wednesday. I'll look into the other one.

@Tgr -- okay, I just talked to @Catrope about wanting to SWAT this on Tuesday. Because we don't want there to be a few days of the topic era that don't have this element as part of their events.

Change 565697 had a related patch set uploaded (by Catrope; owner: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@wmf/1.35.0-wmf.15] Handle and log task topics and scores

https://gerrit.wikimedia.org/r/565697

@Tgr -- okay, I just talked to @Catrope about wanting to SWAT this on Tuesday. Because we don't want there to be a few days of the topic era that don't have this element as part of their events.

Scheduled for the 4pm PST SWAT window on Tuesday (the same window as the main topic matching deployment)

@Tgr - was it intentional for hover-in/hover-out events associated with SE dialogs to have additional action_data?

 select event_module,event_action,event_action_data from HomepageModule_19528052 where event_action like'hover%' group by event_module;
+-----------------+--------------+----------------------------------------+
| event_module    | event_action | event_action_data                      |
+-----------------+--------------+----------------------------------------+
| impact          | hover-out    |                                        |
| start-account   | hover-in     |                                        |
| start-email     | hover-in     |                                        |
| suggested-edits | hover-in     | taskTypes=copyedit,links;taskCount=200 |
+-----------------+--------------+----------------------------------------+
Tgr added a comment.Jan 17 2020, 11:53 PM

Yeah, that data is needed for a lot of actions so it was easier to add it to everything.

Change 565714 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@master] Do not log a close event on suggested edits initiation

https://gerrit.wikimedia.org/r/565714

Change 565714 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] Do not log a close event on suggested edits initiation

https://gerrit.wikimedia.org/r/565714

Change 565697 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@wmf/1.35.0-wmf.15] Handle and log task topics and scores

https://gerrit.wikimedia.org/r/565697

Mentioned in SAL (#wikimedia-operations) [2020-01-22T00:26:53Z] <catrope@deploy1001> Synchronized php-1.35.0-wmf.15/extensions/GrowthExperiments/: SWAT for T242811, T242052 (duration: 01m 05s)

Tgr added a comment.Feb 10 2020, 7:27 PM

Note that we currently log the score the search engine assigns to the result. That won't be the same as the ORES score (we are using the classic_noboostlinks scoring profile, which combines relevance and recency) so if we want to log exactly that, we'll probably need significant extra work (enough for a dedicated task) since the search API can't be made out of the box to return it. (Either we need to fetch the CirrusSearch document manually, or maybe we can hook into the search logic in some way and make it return that field as well.)

Tgr added a comment.Feb 11 2020, 12:48 AM

we'll probably need significant extra work (enough for a dedicated task)

Which exists already, I just forgot about it: T243478: Newcomer tasks: fetch ElasticSearch data for search results

@Etonkovidova -- can you remind us why this moved from QA to In Progress?