Page MenuHomePhabricator

Rename column on old hive data for a few tables
Closed, DeclinedPublic

Description

I'm in the process of unifying a column name between different EventLogging schemas, to make for less confusing joins. However, when querying data before the rename, the old column name is the one that has data, on more recent rows, it's the new column. This is less than ideal.

Is there a simple way to rename the column in hive rows that were recorded prior to the column name change?

The tables that will need this are NavigationTiming (very large) and QuickSurveysResponses/QuickSurveyInitiation (small)

Event Timeline

Gilles created this task.Sep 20 2018, 9:14 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 20 2018, 9:14 AM
Gilles added a parent task: Restricted Task.Sep 20 2018, 9:15 AM
Nuria added a subscriber: Nuria.Sep 21 2018, 5:23 PM

However, when querying data before the rename, the old column name is the one that has data, on more recent rows, it's the new column. This is less than ideal.

Right, the renaming of the schema is seen by the persistence storage as "removal" of 1 column and "addition" of 1 column. Our recommendation is to keep changes in schemas backwards compatible so they can be persisted easily, while json schema gives you a lot of freedom you are limited by what you can persist on hive. A removal of a field is not backwards compatible.

Is there a simple way to rename the column in hive rows that were recorded prior to the column name change?

While there is an alter table syntax is not trivial to do this as some partitions are receiving live data, will consult with team.

We talked this over and are hesitant to do this kind of work, because we want to encourage backwards-compatible changes going forward. Is it too much of a pain for you to build a view/something similar on top of this that you can query instead? Like this: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/AlterView

Milimetric moved this task from Incoming to Radar on the Analytics board.Sep 24 2018, 3:55 PM
Gilles closed this task as Declined.Sep 26 2018, 11:35 AM

I'll give views a shot, thanks. I wasn't aware those were available.

Aklapper edited projects, added Analytics-Radar; removed Analytics.Jun 10 2020, 6:44 AM