FY25-26 SDS2.1.2 Data Reliability - Debugging event loss
Product Requirements
STATUS: DONE
| Reviewer | Date approved |
|---|---|
| Karen Hernandez | Aug 18 2025 |
| Julie van der Hoop | Aug 7 2025 |
| Sam Smith | |
Objective/Hypothesis
If we can implement better debugging for event logging, then product teams will know that their experiment is collecting event data as expected, increasing experiment owners’ confidence.
How does this objective/hypothesis relate to organizational goals?
This hypothesis supports the 2025-2026 fiscal year annual plan for product and technology department to deliver on SDS2 Objective KR2.1, which states:
SDS Objective:
Product managers can quickly, easily, and confidently evaluate the impacts of product features on Wikipedia.
SDS2.1: By the end of Q2, experiment and evaluate 3 interventions that help contributors improve the state of vital content on their Wikipedias.
Key Result:
Key Result 1: Experiment owners have confidence that the data collected using Experimentation Platform is accurate.
Why do this?
This work matters in order to know that:
- the data we are collecting is the right data
- we properly instrumented
- all of the events are coming through
- we debug efficiently
Timeline
By the end of Q1 FY 25-26, FY25-26, we would like to have the code updated and dashboard(s) created to monitor possible bugs, event loss, and integrity of data.
Risks
| Risk | Description | Status | Notes |
|---|---|---|---|
| Timing could be tricky - we need to do the GrowthBook integration and onboarding product teams to the Experimentation Platform this fall. With competing priorities, this work could be superseded by the bigger projects Experiment Platform team is planning. | Time, prioritization. | Mitigating | |
| If we arrive at a conclusion that the platform is difficult to trust | i.e. we find that the event loss rate is a significant percentage (TBD - ~25%? More? Less?) - would this be perceived poorly by product teams? | Emerging | |
Who is involved
| Overview - DACI | |||
|---|---|---|---|
| Driver | Approver | Contributors | Informed |
| Clare Ming | Julie van der Hoop | Steering committee: Sam Smith, Adam Baso, Santiago Faci Additional contributors: Other Experiment Platform team members | Partner product teams we are embedding with while we are onboarding them onto the Experiment Platform. |
| Details about roles and activities | |||
|---|---|---|---|
| Team/Role | Type | Individuals | Sample Activities |
| Experiment Platform | Development Team | Sam Smith (Tech Lead), Adam Baso, Santiago Faci, Clare Ming | Research and implementation |
| WMF Product Management | Product Manager | Julie van der Hoop | Product Requirements and planning; prioritization, stakeholder management, etc. |
Requirements
Hypothesis Requirements
- The end product/result for the Experiment Platform team should be more confident in the
- The Experiment Platform will use these metrics to establish internal and external confidence in the platform. We will also use it to assess any changes that we make to the platform in future.
- Product Teams will use it to assess whether they should use the platform at all and to establish a baseline for any data collection activities moving forward.
Success Criteria
- When we see expected activity in the monitoring of the PHP and Javascript SDKs:
- Dashboards report on error loss for the JS and PHP SDKs
- Experiment owners have confidence that the data collected using Experimentation Platform is accurate.
Target Outcomes
Provide additional debugging and logging capabilities that raises our confidence and that of our adopting product teams.
Ideal Outcomes:
- Dashboards, documentation, more/less bugs to fix.
- Average score of 4 (on 1-5 scale) from post-experiment survey question: How confident do you feel that your configuration was correct and complete?
What is out of scope?
We are currently tackling some technical debt in the xLab UI that will help our users include contextual attributes (and exclude others) for their experimentation needs. This work should not be included in the scope of this hypothesis even though we will need to apply additional logging/debugging once this work is done.
Background & existing research or documentation
- https://wikitech.wikimedia.org/wiki/Experimentation_Lab
- Experiment flows - Experimentation Lab: System Design Sprint
- https://phabricator.wikimedia.org/T384704
- https://wikimedia.slack.com/archives/C01R06P8D1B/p1738753046872759
Open questions
- How do we assess what is an acceptable level of event loss?
Product Roadmap
Potential roadmap for this work (rough guess).
- Mid-August 2025: high level planning and known work outlined in tickets
- End of August 2025: code implemented/deployed
- Early to mid-September 2025: dashboards ready and available for monitoring
Milestones
- Both SDKs have counters for assessing total number of expected events
- Dashboard(s) created to track numbers of expected events and actual events