Objective: To analyze the quality of the data collected with a new Metrics Platform-based instrument (mediawiki_web_ui_actions) by comparing that data to the data collected with the existing non-Metrics Platform-based instrument (desktop_web_ui_actions) and verify whether the migration to Metrics Platform can proceed or if there are any issues that need to be resolved.
Prerequisites for data QA
- Mapping of old instrumentation to new instrumentation: LINK TO DOCUMENT
- This is primarily driven by engineer performing the migration
- Analyst reviews map for completeness and correctness, and contributes/assists as needed
- Specific QA needs have been identified and are agreed upon (SEE BELOW)
- Analyst decides in collaboration with engineers/PM whether to QA the whole instrument, or if there are key parts that should be QAed and the rest can be assumed to be okay.
- Document the parts and the relevant queries:
- (1) overall counts by action and sub-action (if applicable)
- if relevant (2) counts by specific identifiers (e.g. by session)
- NOTE: these queries will almost always be limited by time in some way
- New instrument has been deployed and activated (Link to Phab task)
- Engineer has verified that events are flowing in and that the instrument is not producing schema validation errors.
- Verify that events are flowing in:
- EventGate Grafana dashboard
- Kafka by Topic Grafana dashboard
- EventStreams
- New instrument doesn't have any schema validation errors
- Prioritization agreement between analyst & PM of the QA work in the context of other needs/requests (e.g. PM may need to wait longer for some analysis so that the analyst can do the QA work)
- Documentation of old and new table names and date of deployment for analyst's reference:
Instrument | Table name | Stream deployed (if applicable) |
---|---|---|
DesktopWebUIActionsTracking | DesktopWebUIActionsTracking | 2023-12-05 |
MediaWikiWebUIActions | mediawiki_web_ui_actions | 2024-01-08 |
Data QA checklist
If more than one instrument is being migrated, these steps need to be completed for each one.
- Count the daily number of schema validation errors for
- Old instrument
- New instrument
- Count the daily number of schema validation errors for
- Compare counts of events by action and sub-action (as defined in the mapping from prerequisites)
- if relevant Compare counts by specific identifier (as defined in the prerequisites)
- Upload QA notebooks to Gitlab, making sure to follow data publication guidelines
- Document any issues (or notable observations found) on this ticket
- Resolve this ticket
NOTE: If any issues were identified that require fixing the new instrument, data QA of the fixed instrument will need to be filed as a new Phab task. Some of the checked prerequisites will carry over.
See Metrics Platform Instrument Migration Data QA Process Description for more details.