Page MenuHomePhabricator

Blank the event.page_title column in the editorjourney table in the Data Lake
Closed, DeclinedPublic

Description

We have identified data quality issues with the event.page_title column in the EditorJourney schema's table in the Data Lake. A patch to stop logging data for that column is being prepared in T213974. We would like to have the existing data in the column blanked (replaced with an empty string).

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 28 2019, 9:45 PM
Nuria added a subscriber: Nuria.Feb 28 2019, 10:20 PM

@nettrom_WMF data in hive cannot be modified without it being deleted and rewritten (that is right, no alters in hadoop, you can change a column type but not its data) so we do not do this type of modification. Our recommendation here would be to document issues on talk page .

Ok, @MMiller_WMF confirmed no entry is needed in whitelist.

nettrom_WMF closed this task as Declined.Feb 28 2019, 11:33 PM

We're fine with letting this data get purged as it otherwise would, so I'm closing this.