https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Oozie#Automated_job_restart would need to be updated with all the work to do to avoid missing jobs in between the stop/start.
Description
Description
Event Timeline
Comment Actions
Instead of / in addition to cassandra loading jobs, I suggest putting an SLA alarm on pageview computation.