Update per-domain uniques fresh-sessions computation
Closed, ResolvedPublic3 Story Points

Description

Currently on per-domain uniques, fresh sessions computation named offset counts fingerprinted sessions having made:

  • 1 request with no cookies set (nocookie IS NOT NULL)
  • 0 request with some cookies set (nocookie IS NULL).

This way of computing the offset undercounts the fresh sessions.
While making sure we count only devices having made 1 request with no cookies set (nocookie IS NOT NULL) is correct, restricting the number by counting only devices having made 0 other request prevents counting devices whose "fresh" session includes more than 1 hit, about 10% of the offset.

Move to production:

  • Add a row in documentation about the change in this page
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 5 2017, 8:26 AM
Nuria updated the task description. (Show Details)Jun 5 2017, 2:46 PM
Nuria set the point value for this task to 3.Jun 5 2017, 3:26 PM

Change 356823 had a related patch set uploaded (by Nuria; owner: Joal):
[analytics/refinery@master] Correct per-domain unique devices jobs

https://gerrit.wikimedia.org/r/356823

JAllemandou updated the task description. (Show Details)Jun 9 2017, 1:11 PM
JAllemandou updated the task description. (Show Details)Jun 12 2017, 3:57 PM
JAllemandou moved this task from Ready to Deploy to Done on the Analytics-Kanban board.
Milimetric triaged this task as Normal priority.Jun 22 2017, 3:09 PM
Nuria closed this task as Resolved.Jun 27 2017, 6:30 PM
Tbayer added a subscriber: Tbayer.May 29 2018, 8:57 AM

@JAllemandou Did the "about 10% of the offset" estimate in the task description refer to the daily metric?
For the monthly unique devices, the impact may have been much larger (looking at the total uniques_estimate - haven't examined the offset part separately yet):

(This is for enwiki, adding mobile and desktop - i.e. the number we have been tracking as a core metrics in the monthly board metrics report until recently. There are other weird fluctuations here too, also when looking at mobile and desktop separately, which led to the conclusion that year-over-year comparisons for per-domain uniques should not be relied upon too much - certainly if they span the point in time where this fix was implemented in June 2017.)

Nuria added a subscriber: Nuria.EditedThu, Jul 5, 11:13 PM

The correction that this bug is about affects only the "offset" of the unique devices calculation. Not the under_estimate (see definition of fix, it only affects fresh sessions).

Seems that your graph above is plotting "underestimate+offset". If you want to see the effect of this bug best way would be graphing offset alone before and after correction for per-domain uniques. Daily or Monthly.

The effect of bugfix on unique devices metric will depend of the percentage of the unique devices total that is derived from the offset.

The offset correction represents a higher percentage of uniques the longer the timespan is so it is bigger for "monthly devices" that it is for "daily" ones.

See:
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Unique_Devices/Last_access_solution#How_big_of_a_percentage_does_the_offset_represent_from_the_total?