Page MenuHomePhabricator

Unique devices, retrofit with bot detection code
Closed, ResolvedPublic

Description

Unique devices, retrofit with bot detection code , the offset part of the metric can filter bots using the udf actor_signature and its classification on actor_label

Event Timeline

Nuria created this task.Apr 20 2020, 7:27 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 20 2020, 7:27 PM
Nuria updated the task description. (Show Details)Apr 21 2020, 3:26 AM
Milimetric triaged this task as Medium priority.May 7 2020, 4:14 PM
Milimetric moved this task from Incoming to Datasets on the Analytics board.
Milimetric added a subscriber: Milimetric.

Good to let the pageview detection bake for a bit before doing this.

Nuria assigned this task to JAllemandou.Jun 2 2020, 4:14 PM
Nuria added a project: Analytics-Kanban.
Nuria moved this task from Datasets to Data Quality on the Analytics board.Jun 8 2020, 7:42 PM

Findings for a day of per-domain uniques, considering domain+country:

  • No effect of removing bots traffic on offset, as offset is about actors having made a single call while bots are about recurring calls
  • On uniques-global (offset+last-visit)
    • 99.5% of domain+country show less than 1% variability by removing bots
    • 0.09% of rows (69 over 78456) disappear (all instances where flagged as bots - only 1 of those had more than 1 actor, precisely 24)

Now, given the relatively small impact of removing bots, and the relatively big computational cost, I question whether we should do it or not :)

Nuria added a comment.EditedJun 15 2020, 4:42 PM

Our traffic since march in terms of devices is a lot higher due to covid.

Nuria added a comment.Jun 15 2020, 4:45 PM

Findings for a day of per-domain uniques, considering domain+country:

Are these findings for en.wikipedia?

@Nuria : we change from user to bots on pageview table only, not webrequest. Then uniques is being computed with webrequest data as various PII fields are needed for fingerprinting and compute the offset.
We could split sources of computation using webrequest for offsets and pageview for underestimate (we'd need to push last-visit info to pageview, not complicated), but so far uniques have not changed at all.
The results above are from me recomputing one day of uniques removing bots.

Nuria added a comment.Jun 15 2020, 5:49 PM

we change from user to bots on pageview table only, not webrequest.

Ah right, we need to "join" to reduce the data we are combing through, yes.

Change 606233 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Update unique-devices jobs to use pageview_actor

https://gerrit.wikimedia.org/r/606233

Change 606233 merged by Joal:
[analytics/refinery@master] Update unique-devices jobs to use pageview_actor_hourly

https://gerrit.wikimedia.org/r/606233

JAllemandou set Final Story Points to 3.Jun 30 2020, 9:51 AM
Nuria added a comment.Jul 6 2020, 5:49 PM

Let's write docs before we close ticket (cc @JAllemandou )

Nuria closed this task as Resolved.Jul 23 2020, 4:36 AM