Page MenuHomePhabricator

Write hive code doing pageview data anonimisation with two tables {hawk}
Closed, ResolvedPublic13 Story Points

Description

Use two tables so as not to use the webrequest table as source for validation (too slow), better to use pageview_hourly_unsanitized to populate pageview_hourly
Use previous day or week rolling window on pageview_hourly_unsanitized for generating filters (thresholds TBD).
Test validity of sanitisation (nuria has existing code)

Details

Related Gerrit Patches:
analytics/refinery/source : master[WIP] Sanitize pageview_hourly table

Event Timeline

JAllemandou raised the priority of this task from to High.
JAllemandou updated the task description. (Show Details)
JAllemandou added a project: Analytics-Backlog.
JAllemandou added a subscriber: JAllemandou.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 17 2015, 10:27 AM
mforns claimed this task.Nov 25 2015, 12:31 PM
mforns edited projects, added Analytics-Kanban; removed Analytics-Backlog.
mforns set Security to None.
mforns moved this task from Next Up to In Progress on the Analytics-Kanban board.
Nuria added a subscriber: Nuria.Dec 7 2015, 4:45 PM

After meeting with team: we are going to have our anonymization strategy peer-reviewed by research before we roll out implementation.

Change 260408 had a related patch set uploaded (by Mforns):
[WIP] Sanitize pageview_hourly table

https://gerrit.wikimedia.org/r/260408

Milimetric reassigned this task from mforns to JAllemandou.Jan 12 2016, 5:07 PM
Milimetric renamed this task from Write hive code doing pageview data anonimisation with two tables [13 pts] {hawk} to Write hive code doing pageview data anonimisation with two tables {hawk}.Feb 22 2016, 9:05 PM
Milimetric set the point value for this task to 13.

Change 260408 abandoned by Mforns:
[WIP] Sanitize pageview_hourly table

Reason:
This change is obsolete, it was the base for https://gerrit.wikimedia.org/r/#/c/271033/
The actual development is being done in the latter.

https://gerrit.wikimedia.org/r/260408

JAllemandou moved this task from Paused to Done on the Analytics-Kanban board.Apr 11 2016, 4:11 PM

A non-prodictionized but working version of the code in the related patch.

Nuria closed this task as Resolved.Apr 15 2016, 4:52 PM