Page MenuHomePhabricator

Unique devices, retrofit with bot detection code
Closed, ResolvedPublic

Description

Unique devices, retrofit with bot detection code , the offset part of the metric can filter bots using the udf actor_signature and its classification on actor_label

Event Timeline

Milimetric triaged this task as Medium priority.May 7 2020, 4:14 PM
Milimetric moved this task from Incoming to Datasets on the Analytics board.
Milimetric added a subscriber: Milimetric.

Good to let the pageview detection bake for a bit before doing this.

Findings for a day of per-domain uniques, considering domain+country:

  • No effect of removing bots traffic on offset, as offset is about actors having made a single call while bots are about recurring calls
  • On uniques-global (offset+last-visit)
    • 99.5% of domain+country show less than 1% variability by removing bots
    • 0.09% of rows (69 over 78456) disappear (all instances where flagged as bots - only 1 of those had more than 1 actor, precisely 24)

Now, given the relatively small impact of removing bots, and the relatively big computational cost, I question whether we should do it or not :)

Our traffic since march in terms of devices is a lot higher due to covid.

Screen Shot 2020-06-15 at 9.39.55 AM.png (1×2 px, 489 KB)

Findings for a day of per-domain uniques, considering domain+country:

Are these findings for en.wikipedia?

@Nuria : we change from user to bots on pageview table only, not webrequest. Then uniques is being computed with webrequest data as various PII fields are needed for fingerprinting and compute the offset.
We could split sources of computation using webrequest for offsets and pageview for underestimate (we'd need to push last-visit info to pageview, not complicated), but so far uniques have not changed at all.
The results above are from me recomputing one day of uniques removing bots.

we change from user to bots on pageview table only, not webrequest.

Ah right, we need to "join" to reduce the data we are combing through, yes.

Change 606233 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Update unique-devices jobs to use pageview_actor

https://gerrit.wikimedia.org/r/606233

Change 606233 merged by Joal:
[analytics/refinery@master] Update unique-devices jobs to use pageview_actor_hourly

https://gerrit.wikimedia.org/r/606233

Let's write docs before we close ticket (cc @JAllemandou )