Page MenuHomePhabricator

HivePartition (refinery::Hive.py) does not allow partition values to have dots (.)
Closed, ResolvedPublic

Description

The deletion timer for data_quality_hourly table is not deleting partitions properly,
because the regular expression in Hive.py L376:

partition_regex = re.compile(r'(\w+)=["\']?([\w\-]+)["\']?')

is not parsing correctly partition values that have dots (.) in them, like: source_table=event.navigationtiming.
The deletion script was parsing the partition value as source_table=event,
and when trying to delete it, it failed.
It failed silently, because the partition deletion statement uses DROP IF EXISTS PARTITION.

We should add the dot (.) to the regular expression,
unless this becomes a security problem?

Event Timeline

Change 542441 had a related patch set uploaded (by Mforns; owner: Mforns):
[analytics/refinery@master] Allow HivePartitions to have dots (.) in their values

https://gerrit.wikimedia.org/r/542441

JAllemandou moved this task from Incoming to Operational Excellence on the Analytics board.

Change 542441 merged by Joal:
[analytics/refinery@master] Allow HivePartitions to have dots (.) in their values

https://gerrit.wikimedia.org/r/542441