Page MenuHomePhabricator

EventLogging requests we get from non-wiki* hostnames or apps should be filtered at refine time
Closed, ResolvedPublic5 Estimated Story Points

Description

Keeping this brief on purpose, because WP:BEANS, but basically we should write a query that tells us:

  • of all webrequests to our EventLogging endpoints
  • how many are from hostnames that look like IP addresses
  • how many are from hostnames that match those on the sitematrix
  • how many "others" are there

That last one is the interesting one, if it's unexpectedly high, we can dig deeper to see if any of those validate. We can also dig deeper in the IP-looking ones to see if the User Agent is one of our apps.

Once quantified we should remove this data en eventlogging probably at refine time (with a filtered function?)

Putting this on kanban to get it done by q4.

Event Timeline

fdans triaged this task as Medium priority.Mar 29 2018, 5:00 PM
fdans moved this task from Incoming to Data Quality on the Analytics board.

Much of this data may be coming from bots as well, see: T210006

mforns raised the priority of this task from Medium to Needs Triage.Mar 25 2019, 5:32 PM
mforns triaged this task as Medium priority.
Nuria renamed this task from Spike: Quantify how many EventLogging requests we get from non-wiki* hostnames or apps to EventLogging requests we get from non-wiki* hostnames or apps should be filtered at refine time.Apr 15 2019, 11:54 PM
Nuria reassigned this task from Milimetric to mforns.
Nuria added a project: Analytics-Kanban.
Nuria updated the task description. (Show Details)
Nuria removed a subscriber: Tbayer.

Change 511934 had a related patch set uploaded (by Mforns; owner: Mforns):
[analytics/refinery/source@master] Add refine transform function to filter our non-wiki hostnames

https://gerrit.wikimedia.org/r/511934

Change 511934 merged by Nuria:
[analytics/refinery/source@master] Add refine transform function to filter our non-wiki hostnames

https://gerrit.wikimedia.org/r/511934

Nuria set the point value for this task to 5.