Page MenuHomePhabricator

Rename oozie edit_hourly job
Closed, DeclinedPublic

Description

In every other job the _hourly part of jobs means data is processed hourly. Here data is available by snapshot, even if aggregated hourly. I suggest we rename the job edits_history_aggregated_hourly for instance (as we do for unique_devices). I'm super happy to discuss other names as well.

Event Timeline

This name is now been propagated to all product teams and the dataset is widely used there so I am afraid it is too late for this change.

Could we change oozie jobs and hdfs path and keep druid name?

On my opinion that will cause more confusion, no less, cc @mforns for thoughts

I thought the 'hourly' in pageview_hourly meant aggregated hourly, not updated hourly.
In general I would name a data set after what does it contain, rather than how it is processed or when it is updated.
Now, edit_hourly is partitioned by snapshot, not by hour. So it's structurally different from pageview_hourly.
We could mirror that in the name. Maybe edit_history_hourly? To be a bit shorter than edits_history_aggregated_hourly?
Question: Should we have the 's' at the end of edits or not? I didn't put it there because other Hive data sets seem to lean towards the singular word.

I thought the 'hourly' in pageview_hourly meant aggregated hourly

Me too!