It would be cool to have an easy way to slice a Hive table, which is partitioned by year, month, day and hour, by using DateTime values.
Right now, the only way (that I know of) to do that is to have a complex condition with lots of ANDs and ORs, that depends on the number of existing partitionKeys:
def getTimePartitionPruningCondition(
since: DateTime,
until: DateTime,
partitionKeys: Seq[String]
): String = {
if (partitionKeys.contains("hour")) { s"""(
year > ${since.year.get} AND year < ${until.year.get} OR
year = ${since.year.get} AND (
month > ${since.monthOfYear.get} OR
month = ${since.monthOfYear.get} AND (
day > ${since.dayOfMonth.get} OR
day = ${since.dayOfMonth.get} AND hour >= ${since.hourOfDay.get}
)
) OR
year = ${until.year.get} AND (
month < ${until.monthOfYear.get} OR
month = ${until.monthOfYear.get} AND (
day < ${until.dayOfMonth.get} OR
day = ${until.dayOfMonth.get} AND hour <= ${until.hourOfDay.get}
)
)
)""" }
else if (partitionKeys.contains("day")) { s"""(
year > ${since.year.get} AND year < ${until.year.get} OR
year = ${since.year.get} AND (
month > ${since.monthOfYear.get} OR
month = ${since.monthOfYear.get} AND day >= ${since.dayOfMonth.get}) OR
year = ${until.year.get} AND (
month < ${until.monthOfYear.get} OR
month = ${until.monthOfYear.get} AND day <= ${until.dayOfMonth.get})
)""" }
else if (partitionKeys.contains("month")) { s"""(
year > ${since.year.get} AND year < ${until.year.get} OR
year = ${since.year.get} AND month >= ${since.monthOfYear.get} OR
year = ${until.year.get} AND month <= ${until.monthOfYear.get}
)""" }
else { s"year >= ${since.year.get} AND year <= ${until.year.get}" }
}May also be able to make a helper for Hive to create a string you inject into your WHERE clause: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution