Page MenuHomePhabricator

Refine should DROP IF EXISTS before ADD PARTITION
Closed, ResolvedPublic3 Estimated Story Points

Description

In T244771: Refining is failing to refine centranoticeimpression events we had to ALTER a Hive table and then re-refine to backfill a large number of hourly datasets. After the backfill Refine completed, the Hive table's schema was correct, but previously added Hive partitions had the old schema. They had to be manually dropped and re-added to get them to pick up the new and proper table schema.

Refine already overwrites data directories when it finds that they exist; it should also forcibly re-create the Hive partition when it does this, in case the table schema has changed and needs to be propagated to the backfilled partition.

Event Timeline

Ottomata created this task.Feb 26 2020, 3:17 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 26 2020, 3:17 PM
Nuria added a comment.Feb 26 2020, 3:57 PM

They had to be manually dropped and re-added to get them to pick up the new and proper table schema.

I am missing here some concepts cause i just did not know that partitions of a table could have a different schema than the table itself

Milimetric triaged this task as Medium priority.Mar 2 2020, 4:51 PM
Milimetric moved this task from Incoming to Ops Week on the Analytics board.
Nuria closed this task as Resolved.Jul 6 2020, 10:16 PM
Nuria set the point value for this task to 3.