Page MenuHomePhabricator

Write hive code doing pageview data anonimisation with two tables {hawk}
Closed, ResolvedPublic13 Story Points

Description

Use two tables so as not to use the webrequest table as source for validation (too slow), better to use pageview_hourly_unsanitized to populate pageview_hourly
Use previous day or week rolling window on pageview_hourly_unsanitized for generating filters (thresholds TBD).
Test validity of sanitisation (nuria has existing code)

Event Timeline

JAllemandou raised the priority of this task from to High.
JAllemandou updated the task description. (Show Details)
JAllemandou added a project: Analytics-Backlog.
JAllemandou added a subscriber: JAllemandou.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 17 2015, 10:27 AM
mforns claimed this task.Nov 25 2015, 12:31 PM
mforns edited projects, added Analytics-Kanban; removed Analytics-Backlog.
mforns set Security to None.
mforns moved this task from Next Up to In Progress on the Analytics-Kanban board.
Nuria added a subscriber: Nuria.Dec 7 2015, 4:45 PM

After meeting with team: we are going to have our anonymization strategy peer-reviewed by research before we roll out implementation.

Change 260408 had a related patch set uploaded (by Mforns):
[WIP] Sanitize pageview_hourly table

https://gerrit.wikimedia.org/r/260408

Milimetric reassigned this task from mforns to JAllemandou.Jan 12 2016, 5:07 PM
Milimetric renamed this task from Write hive code doing pageview data anonimisation with two tables [13 pts] {hawk} to Write hive code doing pageview data anonimisation with two tables {hawk}.Feb 22 2016, 9:05 PM
Milimetric set the point value for this task to 13.

Change 260408 abandoned by Mforns:
[WIP] Sanitize pageview_hourly table

Reason:
This change is obsolete, it was the base for https://gerrit.wikimedia.org/r/#/c/271033/
The actual development is being done in the latter.

https://gerrit.wikimedia.org/r/260408

JAllemandou moved this task from Paused to Done on the Analytics-Kanban board.Apr 11 2016, 4:11 PM

A non-prodictionized but working version of the code in the related patch.

Nuria closed this task as Resolved.Apr 15 2016, 4:52 PM