Page MenuHomePhabricator

Reduce partition granularity of hive tables
Closed, ResolvedPublic

Description

In database bd808 tables action_action_hourly, action_param_hourly, and action_ua_hourly should be transformed to reduce their partition levels (less pressure on hive metastore and on HDFS).

Both action_action_hourly and action_param_hourly should be made partition-less. They contain less than 2G of data, that should be stored as parquet adding date information as fields.
action_ua_hourly table should be made monthly partitioned, and pruned for data after to 2019-07 (excluded). This should be done using dynamic partitioning in hive (https://cwiki.apache.org/confluence/display/Hive/DynamicPartitions)

Event Timeline

I'm just going to drop all these tables. The data loader job stopped over 6 months ago, and nobody actually seems to care about the Action API as an active area of development or improvement since Brad was fired.

hive (bd808)> show tables;
OK
tab_name
Time taken: 0.041 seconds

something to note: Hive separate table metadata from storage. When using external tables in Hive, dropping the tables only deletes the metadata, not the data itself:

hdfs dfs -du -s -h /user/hive/warehouse/bd808.db/*

1.8 G  /user/hive/warehouse/bd808.db/action_action_hourly
4.3 G  /user/hive/warehouse/bd808.db/action_param_hourly

I assume the action_ua_hourly table was not external as data is not present in the folder.
@bd808 I'm droppping the data to match the table-drop.

@bd808 I'm droppping the data to match the table-drop.

Ack, and thank you.