Page MenuHomePhabricator

Standardize logic, names, and null handling across UDFs in refinery-source {hawk}
Closed, ResolvedPublic8 Estimated Story Points

Description

The names of our UDFs are sometimes Get<Something>UDF and sometimes just <Something>UDF, we should find a standard way to do that.

The UDFs sometimes test for null values being passed to them, and sometimes not. It seems there's some confusion about how to actually do this properly.

The UDFs sometimes use the Generic UDF Helper class to check arguments on initialize, and sometimes they duplicate that logic. We should clean this up.

Event Timeline

Milimetric raised the priority of this task from to Medium.
Milimetric updated the task description. (Show Details)
Milimetric added a project: Analytics-Backlog.
Milimetric subscribed.

Null treatment might be different per UDFs as nulls in hive depend on the columns they operate on .

Changes need to be backwards compatible so if we rename classes we need to have a stub with the old name.

We have about 20 UDFs:
https://github.com/wikimedia/analytics-refinery-source/tree/master/refinery-hive/src/main/java/org/wikimedia/analytics/refinery/hive

Nuria set the point value for this task to 8.Oct 13 2016, 3:55 PM
Nuria edited projects, added Analytics-Kanban; removed Analytics.

Change 327237 had a related patch set uploaded (by Fdans):
Standardized UDF naming

https://gerrit.wikimedia.org/r/327237

I'm quite happy with the current refactor, pending successful smoke testing within Hive (which I'm doing now):
https://gerrit.wikimedia.org/r/#/c/327237/

Change 327237 merged by Nuria:
Standardized UDF naming

https://gerrit.wikimedia.org/r/327237