Page MenuHomePhabricator

Traffic anomaly detection triggers alerts because of a MaxMind Country rename
Closed, ResolvedPublic

Description

General issue:

Description

The MaxMind database has recently updated the names of some countries, for instance Netherlands -> The Netherlands.

On Dember 14th Turkey got renamed to Türkiye in our MaxMind database.
The traffic anomaly detection job, which was configured to track Turkey (among the other countries)
has since not found any traffic counts for Turkey and assumed they were 0.
Thus, it has raised false positive anomaly alerts, thinking there was a sudden traffic drop for that country.

We should fix that, so that false positive alerts stop, the monitored metric history is restored, and the job can continue to monitor the country properly.
This should be a quick fix, the original problem affecting many pipelines (all pipelines that use MaxMind country names) is tackled in T353959.
A possible approach is (two steps):

  1. Change the query that generates the traffic anomaly metrics to group by MaxMind country code instead of country name. Then join the resulting metrics to the canonical_data.countires table to retrieve our canonical country name for each code. Test, review, merge and deploy.
  2. Correct the data by: Copying all production anomaly detection metric data over to a temporary place - with the Türkiye country name normalized to Turkey. Then replace the production dataset with the temporary one.

Acceptance Criteria

  • All the traffic anomaly detection metric have 1 single country name across all time.
  • The Airflow job for traffic anomaly detection outputs metrics with consistent country names.

Required

  • Modify traffic anomaly detection query
  • Test it in Airflow's development instance
  • Review, merge and deploy to airflow analytics
  • Re-run the corrupted dates in Airflow

Event Timeline

Change 985333 had a related patch set uploaded (by Mforns; author: Mforns):

[analytics/refinery@master] Make traffic anomaly detection query robust vs. MaxMind updates

https://gerrit.wikimedia.org/r/985333

Change 985333 merged by Mforns:

[analytics/refinery@master] Make traffic anomaly detection query robust vs. MaxMind updates

https://gerrit.wikimedia.org/r/985333

Quick mention of this other task where some of the work took place: T353296. Relevant to this, the gerrit change https://gerrit.wikimedia.org/r/c/analytics/refinery/+/982899 included updates to the following pipelines/datasets:

  • pageview hourly: script change means new data is correct and old data is split by country name, so there's a historical inconsistency that we need to at least document. Ideally we'd backfill the data so it's all consistent, but the dataset is rather large and may take some manual work to backfill.
  • referer daily: same as pageview hourly
  • unique devices per domain daily: this can not be backfilled but going forward will be correct
  • unique devices per domain monthly: same as other unique devices jobs
  • unique devices per project family daily: same as other unique devices jobs
  • unique devices per project family monthly: same as other unique devices jobs
  • virtualpageview hourly: probably fine to not backfill, unlike pageview hourly the historical data here is not as widely used