Page MenuHomePhabricator

Newcomer tasks: estimate impact of edit tags not being applied to all edits
Closed, ResolvedPublic

Description

As mentioned in the description of the parent task, we're interested in understanding the impact of these missing tags. We see a couple of approaches to understanding this:

  1. Estimate what percent of edits historically have occurred under the circumstances that caused the bug (edits to the first card). This can then be combined with counts of tagged edits from the Data Lake to estimate the actual edits.
  2. Use HomepageModule and EditAttemptStep to identify edits that fit these circumstances, but did not have the edit tag applied. From previous investigations, we know that this is challenging partly because HomepageModule is logged client-side and therefore gets blocked by ad blockers.

As mentioned in the parent task, this rolled out on 2020-09-15.

Investigating further, it appears the bug is triggered when the user clicks on the first task shown upon loading the Newcomer Homepage. In that case the URL of the page is incorrect, leading to the "newcomer task" tag not being applied.

I've found that if the user is already on the Homepage but takes action that loads up the task list, then the URL is correct. Three ways of making that happen is to either go through the module initialization process, or change the topics or difficulty of the tasks shown.

Event Timeline

nettrom_WMF triaged this task as High priority.

The logic for EditAttemptStep oversampling and change tags is a bit different: the former is applied to unbroken single-tab navigation sessions which start with clicking on a task card (on the homepage or the post-edit dialog) and only include pageviews to the task's article and its talk page. The latter is applied to any edit to the task article by the user within seven days of clicking on the card.

We could probably reconstruct the edits by taking the task impression or task click data (the latter is logically more accurate but click logging is probably more likely to get lost), which conveniently includes the user, the page and the task card position (the bug affects cards with position 0), and filtering for edits with the right user / page / time range. There is no way to recover edits by users with ad blockers, but IIRC that's a single-digit percentage.

Since I caused this, let me know if I can help in cleaning it up.

@Tgr : Thanks for chiming in here and volunteering to help out! Your suggested approach for reconstructing the edits is what I also came up with. I identified three conditions for exclusion: initializing the Newcomer Task module, changing topics, and changing difficulties. In all three cases the module is loaded/refreshed and the link correct, so the tag would be applied.

Since I need to reference this in T264831, the notebook of the impact analysis is now on GitHub. Here's a quick summary:

Using data from mid-June through mid-September (excluding known test accounts) as the "pre-bug era" and looking at weekly patterns so they line up with Marshall's reporting, I find a median proportion of edits matching the conditions to be 26.3%. I choose the interquartile range as the lower and upper bounds, with the 25th percentile at 21.2% and the 75% percentile at 28.1%.

The bug was in effect from September 15 to October 28. During that time I found 7,380 tagged edits were made (again excluding known test accounts). Using the lower and upper bounds I estimate that between 1,989 and 2,888 additional edits would've been tagged if the bug was not there, for a total of between 9,369 and 10,268 edits.

@nettrom_WMF -- thanks for figuring this out and giving a concrete estimate. Given that we are not missing the majority of edits, I do not think we actually need to reconstruct the missing edits, and that it's sufficient for us to note it on the reports (which I have done).