Deploy new refinery code.
Create new pageview_hourly_unsanitized table.
Stop current pageview_hourly aggregation job
Start new pageview_hourly aggregation job (that includes sanitization)
Don't forget to automatise deleting data from the pageview_hourly_unsanitized (this is ok to start this now since backfilling will happen using the already existing pageview_houlry table).
Description
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T114675 Sanitize pageview_hourly | |||
Duplicate | None | T118841 Deploy pageview sanitization and start ongoing process {hawk} | |||
Declined | None | T118839 Productionize Pageview_sanitization hive code with Oozie job and refinery inclusion {hawk} | |||
Resolved | JAllemandou | T118838 Write hive code doing pageview data anonimisation with two tables {hawk} |