Page MenuHomePhabricator

Hive User can specify webrequest date range in query more easily
Closed, DeclinedPublic

Description

When writing a query in Hive, it is difficult to specify a date rage because the hive tables are partitioned by individual date partition fields.

Idea: add a new partition field like date='2014-12-01-00' to each webrequest partition, to allow users to use conditionals like:

where date BETWEEN '2014-12-25-00' AND '2015-01-01-00'.

(Declining, Dan wrote a JS bookmarklet to help with this)

Related Objects

Event Timeline

kevinator raised the priority of this task from to Needs Triage.
kevinator updated the task description. (Show Details)
kevinator changed Security from none to None.
kevinator subscribed.
Ottomata renamed this task from Hive User calls UDF to specify a date range in a query to Hive User can specify webrequest date range in query more easily.Dec 17 2014, 4:42 PM
Ottomata updated the task description. (Show Details)

A little background:

  • a UDF was deemed the wrong solution to the problem because Hive would have had to send all the partitions to the UDF to get determine if that partition was in the date range or not.
  • we also discussed writing a script that could turn a date range into text you could cut and paste into a where clause.
  • the best solution found is in the description above.

@Milimetric wrote this JS you can bookmark

javascript:function where(e,t){t||(t=new Date),e=new Date(e);for(var a=[];t>=e;){var r=[];r.push("YEAR="+e.getUTCFullYear()),r.push("MONTH="+(e.getUTCMonth()+1)),r.push("DAY="+e.getUTCDate()),a.push(r.join(" AND ")),e.setDate(e.getDate()+1)}prompt("Copy this and add to your where clause","("+a.join(") OR (")+")")}where(prompt("Start Date, in YYYY-MM-DD format"));

Code here, you can easily do the same thing in python or whatever you like. Extra credit: optimize the clause so it doesn't repeat naively

function where(from, to) {
    if (!to) {
        to = new Date();
    }
    from = new Date(from);

    var days = [];
    while (from <= to) {
        var parts = [];

        parts.push('YEAR='+from.getUTCFullYear());
        parts.push('MONTH='+(from.getUTCMonth() + 1));
        parts.push('DAY='+from.getUTCDate());

        days.push(parts.join(' AND '));

        from.setDate(from.getDate()+1);
    }
    prompt('Copy this and add to your where clause', '(' + days.join(') OR (') + ')');
}
where(prompt('Start Date, in YYYY-MM-DD format'));
Ottomata updated the task description. (Show Details)