Page MenuHomePhabricator

eventlogging user agent data should be parsed so spiders can be easily identified {flea}
Closed, DuplicatePublic

Description

This is an ask from reading team

We could pre-parse UA data using ua parser python library,

Original request was to have a is_spider column similar to our hive parsing.

Event Timeline

Nuria created this task.Dec 15 2015, 6:13 PM
Nuria raised the priority of this task from to Needs Triage.
Nuria updated the task description. (Show Details)
Nuria added a project: Analytics-Backlog.
Nuria added a subscriber: Nuria.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptDec 15 2015, 6:13 PM
Milimetric renamed this task from 'is_spider' column in eventlogging user agent data to 'is_spider' column in eventlogging user agent data {flea}.Dec 17 2015, 6:20 PM
Milimetric triaged this task as Normal priority.
Milimetric set Security to None.
Milimetric moved this task from Incoming to Backlog on the Analytics-Backlog board.

The easiest way to add user-agent refinement to eventlogging would be to use the refinery code through hive or spark on eventlogging logged into hadoop.

Nuria added a comment.Jan 14 2016, 6:29 PM

Adding this column to the capsule requires work on the EL mysql database end of things which is having a lot of issues right now (as a new column needs to be added to every single table) so this is not likely to get done in the near term.

Just a thought Will be much better to re-use hadoop logic for this as we have the code to parse bots ready to go.

Nuria renamed this task from 'is_spider' column in eventlogging user agent data {flea} to eventlogging user agent data should be parsed so spiders can be easily identified {flea}.Mar 7 2016, 5:19 PM
Nuria updated the task description. (Show Details)Sep 16 2016, 5:40 AM
Nuria updated the task description. (Show Details)

Change 311127 had a related patch set uploaded (by Joal):
Add singleton capactiy to UAParser

https://gerrit.wikimedia.org/r/311127

Change 311127 merged by jenkins-bot:
Add singleton capactiy to UAParser

https://gerrit.wikimedia.org/r/311127

Nuria added a comment.Jan 6 2017, 4:45 PM

Update: we will be replacing the data held by the user agent column with the parsed version . Resolving this ticket as duplicate.