Page MenuHomePhabricator

Create HivePartitioner in Camus
Closed, DeclinedPublic

Description

Camus' default partitioner creates time bucketed directories like YEAR/MONTH/DAY/HOUR. It would be handy if there was a partitioner that one could use that would create this layout in a way that Hive expects its partitions to be. I.e year=YEAR/month=MONTH/day=DAY/hour=HOUR, etc.

I'd like to write something that does this, and get it upstreamed into Camus.

Event Timeline

Ottomata claimed this task.
Ottomata raised the priority of this task from to Low.
Ottomata updated the task description. (Show Details)
Ottomata added a project: Analytics-Clusters.
Ottomata added subscribers: Ottomata, JAllemandou.

Not saying that such a partitioner is a bad idea, but just to call this out:

  • Bash's tab completion is a bit wonky around = *). This is already driving me insane around refined tables.
  • Hive's key=value is awkwardly verbose. **)

*)

When on stat1002 I enter

ls -l /mnt/hdfs/wmf/data/wmf/webrequest/webrequest_source=bits/year=2015/month=3/day=3/hour=*

and hit Enter, unescaped = obviously work. But when typing

ls -l /mnt/hdfs/wmf/data/wmf/webrequest/webrequest_source=bits/year=2015/month=3/day=3/hour=

and pressing tab, the command line turns into

ls -l /mnt/hdfs/wmf/data/wmf/webrequest/webrequest_source=bits/year=2015/month=3/day=3/hour=/mnt/hdfs/wmf/data/wmf/webrequest/webrequest_source\=bits/year\=2015/month\=3/day\=3/hour\=0/

which is terribly wrong.

Sure, one can unnecessarily escape each = or stop using tab.
But that's somewhat tedious.

**)
The refined table's

.../year=2015/month=3/day=4/hour=12/...

is way harder for me to read/grasp than raw's

.../2015/03/04/12/...

YMMV.

Agree, but if we use Hive's default,. then we can do MSCK REPAIR TABLE to automatically add partitions. This ticket is more of an open-source-work project for me. It would make Camus more useable by newbies. I'm not sure if I really want to change our own raw webrequest table.

Not going to do this, hope to move away from Camus one day.