Page MenuHomePhabricator

[EPIC] Improve data collection and experiment ergonomics
Open, MediumPublic

Description

Background

For years the Growth team had an embedded data scientist Morten Warncke-Wang who lead the experimentation spirit of the team and its main product focus, the GrowthExperiments extension.

That involved building a custom bucketing system to be able to execute A/B tests on newcomer users and also building experiment focused instrumentation and long lived instruments for general product health observability. Over the years many GrowthExperiments-specific analytics schemas were created, custom streams, and more or less generic instrumentation capabilities. All of that great work has helped Growth establish their features as standard newcomer experiences ensuring edit quality and often increasing user activation.

However, the ever evolving nature of software has left us in a situation where there’s lots of data collection going on in GE that we don’t have the resources to analyze or we deem the interest not worth it. On an audit performed by data analysts and data engineers in early 2025 (SDS 2.4.18: Growth’s instrumentation portfolio audit) 9 out of 15 data collection streams were identified as high risk according to the foundation collection guidelines. Also the adoption of the new experimentation platform de facto standard, Test Kitchen, has shifted the team analytics approach to move away from a lot of manual analysis towards a more balanced mixture of automated and manual analysis.

In order to keep our team productivity and adapt to the general analytics shift in the foundation, we would benefit from paying some technical debt in the GE extension. In the last three experiments the team has conducted using Test Kitchen (Community updates, Leveling up notifications and Revise tone task), writing the instrumentation has consumed a huge amount of resources and felt like a start from scratch every time. This is an area where many improvements can be made. But this is not just a long term development maintenance issue, it also affects final users whose (private!) data is being collected for no good reason.

With this background in mind there are at least three areas of work:

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenSgs
ResolvedKStoller-WMF
DeclinedSgs
ResolvedSgs
Resolvedphuedx
ResolvedSgs
ResolvedCyndymediawiksim
OpenMichael
ResolvedSgs
ResolvedAAlhazwani-WMF
Resolved mpopov
ResolvedSgs
ResolvedSgs
Resolvedcjming
ResolvedRehan_khan_78
OpenNone
OpenNone
OpenCyndymediawiksim
OpenNone
OpenSgs
ResolvedSgs