Copy-pasting from T384962:
Here's an issue I currently see: the data_quality_ops.data_quality_alerts doesn't have a column to put in metadata like tags like the metrics table does. This doesn't affect the actual alerting part, but would affect any future analyses and dashboarding someone might want to do on the verification checks. For instance if we want to alert on T388439 there isn't a way currently to differentiate records in the table that are checking monthly vs daily reconciles. Even now, there's an open question whether the source_table column in the alerts table should refer to data_quality_ops.data_quality_metrics or the underlying table that the metrics were computed against.
To support T388439 and future use cases, before I enabling alerting I'm going to work on some patches that'll allow inserting tags into the alerts table using deequ's ResultKey class so it (kinda) aligns with the way metrics works.
Also, it's a bit weird to call it the alerts table when it doesn't store alerts but the verification checks that if failed will create trigger an alert, but that's some bike shedding for some future time maybe.
This turns out to have a few more steps than I expected.
[ ] Modify `refinery-source` to support new columns in a backwards-compatible way
[ ] Modify `refinery` with the new schema
[ ] Modify airflow jobs that use deequ alerts to use new jars
- Hopefully we don't have to modify the actual job themselves, but if we do it would probably require going back to refinery-source
[ ] Modify `refinery-deequ-python`
- [ ] Support new columns, change refinery version
- [ ] Change name of package, still called `refinery-python`
[ ] deploy `refinery-source`
[ ] deploy `refinery`
The next 3 bullets would ideally be done within an hour so the hourly dags don't break
[ ] Alter table with new schema
[ ] deploy Airflow dags that use deequ alerts with new refinery source version
- [] webrequest
- [] mediawiki-history
[ ] deploy `refinery-deequ-python` with new refinery source version
[ ] modify `compute_metrics` script to use the prod `refinery-deequ-python` instead of pointing to Gabriele's gitlab repo
- [ ] remember to rename imports
- At this point we can actually do T384962