Input: the refined logs in Hive (rows that were determined to be a page view are tagged as such)
Output: similar to pagecounts-all-sites (though the input for that job is different)
Code for pagecounts-all-sites oozie jobs:
https://github.com/wikimedia/analytics-refinery/tree/master/oozie/pagecounts-all-sites
https://github.com/wikimedia/analytics-refinery/blob/master/oozie/pagecounts-all-sites/load/insert_hourly_pagecounts.hql