Page MenuHomePhabricator

Refine eventlogging pipeline should not refine data for domains that are not wikimedia's
Open, HighPublic5 Story Points

Description

Refine eventlogging pipeline should not refine data for domains that are not wikimedia's. It is not infrequent that other wikis like www.wikipedia-with-spam.org run a clone of our code and , as such, they endup running our instrumenting code and sending us their eventlogging events.

Those events should probably be dropped (ideally) before they get refined. This is somewhat related to: https://phabricator.wikimedia.org/T219162

and https://github.com/wikimedia/analytics-refinery/commit/58a03f623cd6124fd4de70cb8d7e739a90b58214

Event Timeline

Nuria created this task.Apr 1 2019, 6:57 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 1 2019, 6:57 PM

ping @phuedx and @Jdlrobson so they are aware this ticket exists

fdans moved this task from Incoming to Data Quality on the Analytics board.Apr 4 2019, 5:18 PM
fdans triaged this task as High priority.
phuedx awarded a token.Apr 4 2019, 5:18 PM
Nuria assigned this task to mforns.Tue, May 14, 7:50 PM
Nuria added a project: Analytics-Kanban.
Nuria set the point value for this task to 5.