Data Engineering Bug Report or Data Problem Form.
Please fill out the following
For a data related problem:
- Is this a data quality issue? Yes
- What datasets and/or dashboards are affected? wmf.mediawiki_user_history
- What are the observed vs expected results? Please include information such as location of data, any initial assessments, sql statements, screenshots.
caused_by_user_text and caused_by_anonymous_user are not in the same order in Hive vs. Parquet (via Spark) (verified by looking at actual parquet files).
This causes select * from wmf.mediawiki_user_history where snapshot='2022-08' limit 10; to fail in Presto and Hive, because the schema in Hive metastore doesn't match the actual Parquet files.
For the DE Team to fill out
Which systems does this effect?
- Hive
- Druid
- Superset
- Turnilo
- WikiDumps
- Wikistats
- Airflow
- HDFS
- Goblin
- Scqoop
- Dashiki
- DataHub
- Spark
- Jupyter
- Modern Event Platform
- Event Logging
- Other - presto
Impact Assessment:
Does this problem qualify as an incident?
- Yes
- No
Does this violate an SLO?
- Yes
- No
Value Calculator | Rank |
---|---|
Will this improve the efficiency of a teams workflow? | 1 |
Does this have an effect of our Core Metrics? | 1 |
Does this align with our strategic goals? | 2 |
Is this a blocker for another team? | 2 |