Page MenuHomePhabricator

Bot from an Azure cloud cluster is causing a false pageview spike (can we identify as bot?)
Open, LowPublic8 Story Points

Description

This task is to deal with T136084 in the hadoop pageview pipeline.

Right now it looks like we can either add a list of IPs that automatically classify a agent_type to "bot" or "spider" or we can dig further and see if there's another way to uniquely classify these requests.

Event Timeline

Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJun 9 2016, 6:04 PM
Nuria added a subscriber: Nuria.EditedJul 21 2016, 4:44 PM

If we want to implement tagging as "possible bot" (or "bot") based on request ratio first we need to research whether we will be tagging a bunch of mobile traffic due to NAT-ing . Same UA and IP are common in mobile users that come from the same mobile provider.

Research spike: investigate as to prevalence of IP+UA request patterns

Nuria renamed this task from Bot from an Azure cloud cluster is causing a false pageview spike to Bot from an Azure cloud cluster is causing a false pageview spike (can we identify as bot?).Jul 21 2016, 4:45 PM
Nuria set the point value for this task to 8.
Milimetric triaged this task as Low priority.Jul 28 2016, 5:34 PM
Milimetric moved this task from Incoming to Backlog (Later) on the Analytics board.
Nuria moved this task from Wikistats Production to Dashiki on the Analytics board.Jan 6 2017, 4:48 PM
Nuria moved this task from Dashiki to Backlog (Later) on the Analytics board.Jun 15 2017, 4:40 PM
Nuria moved this task from Wikistats Production to Bots on the Analytics board.Feb 5 2018, 5:45 PM
Nuria removed Milimetric as the assignee of this task.Sep 3 2019, 2:26 PM