Page MenuHomePhabricator

Android Metrics Platform Migration Data Validation - Production
Open, Needs TriagePublic

Description

With fixes in Android prod version 2.7.50479-r-2024-03-21 I've set up comparison queries for migrated streams against existing MEP data.

Superset Dashboard shows Daily event count and DAUs for each schema.

Will investigate most obvious variances shown and track results here.:

  • android_product_metrics_article_link_preview_interaction v android_article_link_preview_interaction
  • android_product_metrics_article_toolbar_interaction v android_article_toolbar_interaction

Other relevant links:
Grafana Comparisons

Logstash

Event Timeline

Clare and I have been meeting and communicating on Slack to check in on findings. Investigation is ongoing to resolve disparities in unqiue user and event counts between MEP (higher counts) and MP (lower counts).

Created a Superset Dashboard to compare unique/event counts by dataset which are updated when new Prod app versions are released.

Most recent investigation ticket:
https://phabricator.wikimedia.org/T363610

Note related tickets from data validation task:

[Java] Investigate why user_agent_map is empty. [ADDED ADDITIONAL DATA REQUEST TO agent field]
https://phabricator.wikimedia.org/T357371

[Java] EventGate validation error for performer language groups. [TO BE IMPLEMENTED]
https://phabricator.wikimedia.org/T361265

With lingering issues fixed by @Dbrant in our latest release I was able to validate and compare unique user and event counts for all 4 migrated MP datasets with MEP datasets and we are seeing a successful migration for all. Reporting the found variance here, with a note that this variance is in favor of MP (counts are higher) and is expected as we have fixed a known data loss issue from MEP in the new MP code. These variances should be considered mostly for posterity and not indicative of any problems with data. Also thanks to @cjming for all your work on this migration.

I also validated for each schema/dataset that no unique users counted in MEP were missing from MP.

MP DatasetAvg. Variance MEP vs MP
android_product_metrics_article_link_preview_interaction-1.1%
android_product_metrics_article_toc_interaction-2.4%
android_product_metrics_find_in_page_interaction-2.4%
android_product_metrics_article_toolbar_interaction-1.8%

Data (see Tabs labelled NEW)