Our alarms on this regard are now daily but we think it will be worth exploring hourly alarms
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | mforns | T249759 Add hourly resolution to data quality outage/censhorship alarms | |||
Resolved | mforns | T251814 Tune up thresholds of data quality hourly alarms |
Event Timeline
Change 587844 had a related patch set uploaded (by Mforns; owner: Mforns):
[analytics/refinery@master] Add traffic entropy data quality stats in hourly resolution
Change 588027 had a related patch set uploaded (by Mforns; owner: Mforns):
[analytics/refinery/source@master] Make RSVDAnomalyDetection ignore too short timeseries
Change 588027 merged by Milimetric:
[analytics/refinery/source@master] Make RSVDAnomalyDetection ignore too short timeseries
Change 587844 merged by Milimetric:
[analytics/refinery@master] Add traffic entropy data quality stats in hourly resolution
Change 591577 had a related patch set uploaded (by Milimetric; owner: Milimetric):
[analytics/refinery@master] Use the right jar version
Change 591577 merged by Milimetric:
[analytics/refinery@master] Use the right jar version
follow-up docs: the README about the bundle is great, but it uses some outdated syntax for the example oozie command. I copied it blindly and messed it up: it should be more like this:
sudo -u analytics kerberos-run-command analytics oozie job \
--oozie $OOZIE_URL \ -Duser=$USER \ -Drefinery_directory=hdfs://analytics-hadoop$(hdfs dfs -ls -d /wmf/refinery/$(date +"%Y")* | tail -n 1 | awk '{print $NF}') \ -Dqueue='production' \ -Dgranularity='hourly' \ -Dstart_time='2020-04-21T22:00Z' \ -config /srv/deployment/analytics/refinery/oozie/data_quality_stats/bundle.properties \ -run
(note the kerberized command and the more correct refinery_directory definition. That old style with the 2019* is like a plague, I tried to erase it everywhere and it keeps coming back)
Change 591952 had a related patch set uploaded (by Nuria; owner: Nuria):
[analytics/refinery@master] Database name for data quality table shold be 'analytics'
Change 591956 had a related patch set uploaded (by Nuria; owner: Nuria):
[analytics/refinery@master] Correcting examples in README for data quality jobs
Change 592008 had a related patch set uploaded (by Nuria; owner: Nuria):
[analytics/refinery@master] Bumping up jar version for spark-job-jar
Change 592008 merged by Mforns:
[analytics/refinery@master] Bumping up jar version for spark-job-jar
Change 591952 merged by Mforns:
[analytics/refinery@master] Database name for data quality table should be 'analytics'
Change 591956 merged by Joal:
[analytics/refinery@master] Correcting examples in README for data quality jobs