Page MenuHomePhabricator

QA Wikilambda instrumentation port to new core interactions metrics platform version
Open, In Progress, MediumPublic

Description

Objective: To analyze the quality of the data collected with the updated Metrics Platform-based Wikilambda instrument by comparing that data to the data collected with the existing "monoschema" Wikilambda instrument and verify whether the migration to Metrics Platform can proceed or if there are any issues that need to be resolved.

Prerequisites for data QA

  • Mapping of old instrumentation to new instrumentation:
  • Specific QA needs have been identified and are agreed upon
      • Analyst decides in collaboration with engineers/PM whether to QA the whole instrument, or if there are key parts that should be QAed and the rest can be assumed to be okay.
      • Document the parts and the relevant queries:
    • (1) overall counts by action and sub-action (if applicable)
    • (2) counts by specific identifiers (e.g. by activity and browser session)
    • NOTE: these queries will almost always be limited by time in some way
  • New instrument has been deployed and activated (<link to phab task>)
      • Engineer has verified that events are flowing in and that the instrument is not producing schema validation errors.
    • Verify that events are flowing in:
      • EventGate Grafana dashboard
      • Kafka by Topic Grafana dashboard
      • EventStreams
  • New instrument doesn't have any schema validation errors
  • Prioritization agreement between analyst & PM of the QA work in the context of other needs/requests (e.g. PM may need to wait longer for some analysis so that the analyst can do the QA work)
  • Documentation of old and new table names and date of deployment for analyst's reference:
InstrumentTable nameStream deployed (if applicable)Instrumentation Task
Oldevent.wikifunctions_ui2023-07-26T297511
Newevent.mediawiki_product_metrics_wikifunctions_ui2024-04-11T350497

Data QA checklist

If more than one instrument is being migrated, these steps need to be completed for each one.

  • Count the daily number of schema validation errors for
    • Old instrument
    • New instrument
  • Compare counts of events by action and sub-action (as defined in the mapping from prerequisites)
    • if relevant Compare counts by specific identifier (as defined in the prerequisites)
  • Upload QA notebooks to Gitlab, making sure to follow data publication guidelines
  • Document any issues (or notable observations found) on this ticket
  • Resolve this ticket

NOTE: If any issues were identified that require fixing the new instrument, data QA of the fixed instrument will need to be filed as a new Phab task. Some of the checked prerequisites will carry over.

Event Timeline

MNeisler renamed this task from Q Wikilambda instrumentation port to new core interactions metrics platform version to QA Wikilambda instrumentation port to new core interactions metrics platform version.Mar 1 2024, 3:50 PM
MNeisler triaged this task as Medium priority.
MNeisler added a project: Product-Analytics.
Jdforrester-WMF changed the task status from Open to In Progress.Apr 8 2024, 5:56 PM
Jdforrester-WMF assigned this task to MNeisler.
Jdforrester-WMF moved this task from Backlog to In Progress on the Abstract Wikipedia team board.

QA is currently pending deployment of a patch to add the stream to $wgEventLoggingStreamNames. The patch is expected to be deployed tomorrow and then data should start to be available in event.mediawiki_product_metrics_wikifunctions_ui for QA

MNeisler updated the task description. (Show Details)

@MNeisler any updates on this task? Need any help to advance?

@VirginiaPoundstone
I've completed some initial QA checks and confirmed that there are no significant issues or discrepancies with the new instrumentation (all new and previous fields are logging as expected and there are no large discrepancies in total event counts between the two instruments).

I still need to do some further QA checks to confirm counts by specific actions and session identifiers and will post QA findings here once complete.

I'm out the remainder of this week for some post-Hackathon travel and recovery but am planning to prioritize completing this once I'm back next week.

I've completed QA of the new event.mediawiki_product_metrics_wikifunctions_ui table and confirmed that all events are logging as expected with minimal variance in counts. I've noted a few questions below primarily for clarification. See summary of checks noted below and full QA report for details.

Follow-up questions:

  • Do any of the differences in overall event counts noted below seem unexpected? They seem ok to me but wanted to confirm.
    • The new stream collects fewer (-40%) distinct events than the old stream. This appears to be primarily due to the removal of load time events (name = wf.ui.newView.mounted) in the new stream. This was intentionally removed from the new stream as it was determined to be more performance-related than analytics-related.
    • If we remove these load time events from the old stream, the new stream captures 1.5% more events than the old stream. This variance is small and seems expected due to changes made to log all interactions as separate events. The new stream captures the same number of distinct browser sessions and distinct users.
  • Can you clarify the current definition of the performer.active_browsing_session_token which was added to the new instrument? This is populated for the new stream but I was unable to find documentation on its definition and trying to make sense of the findings noted below:
    • During QA, I found that a single browser session (performer.session_id) had up to 35 distinct logged active browsing sessions; however, the majority of browser sessions (94%) had only one active browser session token logged.
    • There are a small number of instances where more than one performer.session_id is logged for a single performer.active_browsing_session_token. This seems unexpected but accounts for fewer than 1% of all active browsing sessions.
    • Active browsing sessions can have from 1 to 620 events logged. While most of these events are view events, many active sessions include multiple edits and publish events. Note: In the future, it might be beneficial for Abstract Wikipedia to use the provided funnel contextual attributes to help more closely correlate related events. Not a blocker or urgent as I can evaluate all key metrics using the new table as instrumented

QA Summary

  • All contextual and custom attributes are logged as expected based on agreed upon approach documented in mapping doc
    • Custom data properties removed: isnewzobject, edit, viewname, loadtime, resulthaserror , isdirty
    • Custom data properties retained: selectedfunctionzid, zlang, zobjectid, zobjecttype, haserrors,implementationtype
  • Data is logged for 7 expected values of action and numbers appear as expected. ✅ Note:  We have identified an issue where the new action = change event is not capturing edit a function events in instrumentation. Abstract Wikipedia is investigating a possible fix.
  • Minimal variance in overall counts (excluding the name = wf.ui.newView.mounted event which was removed from the new stream) ✅
    • The new stream captures 1.5% more events than the old stream.
    • The new stream captures the same number of browser sessions and distinct users.
    • The new stream captures 1 less page id and revision id than the old stream. 5 fewer pageview tokens.
  • No variance in session and user counts by user types ✅
    • Both streams capture the same number of distinct users and sessions by registration status, bot status, and edit count bucket.
  • Same event workflows logged for a single session or page view token. ✅
    • The new stream captured all expected events. Confirmed for view, edit, create, cancel and call events.
  • Same counts by action type ✅
    • No significant differences in the number of events logged for function calls, views, edit attempts, publish or cancel events.
  • Can evaluate all key user workflows using revised and added fields ✅
    • Confirmed that all key user workflows (e.g. creating a new function, evaluating a function, editing a test, etc) could be successfully queried using the new stream fields.
    • The new stream removed the previous isnewzobject field and instead created two separate interaction events: create or edit to track if a creation or edit attempt was initiated. I confirmed the subsequent publish or cancel events can be accurately identified by reviewing the associated zobjectid and activity_browsing_session_token. Also note zobjectid = Z0 for all newly published functions, tests or implementations, which allows for easy analysis of overall counts of newly created objects published or edits to existing objects published.

Code repo

cc @VirginiaPoundstone @DMartin-WMF @phuedx @Sfaci