Page MenuHomePhabricator

FY25-26 SDS2.1.5 User Experience - Attribute Selection
Closed, ResolvedPublic

Description

FY25-26 SDS2.1.5 User Experience - Attribute Selection

Product Requirements

STATUS: Not started

ReviewerDate approved
Julie van der HoopAug 7 2025
Karen Hernandez

Objective/Hypothesis

If we tell users that their instrument, if created in xLab, contains a set of attributes that changes the risk category, we will deter instrumentation users from over-collecting data and increase clarity around what combination of attributes require privacy review.

How does this objective/hypothesis relate to organizational goals?

This hypothesis supports the 2025-2026 fiscal year annual plan for product and technology department to deliver on Objective SDS2, KR2.1, which states:

SDS Objective:

Product managers can quickly, easily, and confidently evaluate the impacts of product feature changes in Wikipedia

Key Result:

By the end of Q2, experiment and evaluate 3 interventions that help contributors improve the state of vital content on their Wikipedias.

Why do this?

When registering an instrument, users can choose which contextual attributes they want to collect. Once the instrument is running these attributes will be added to every submitted event and, finally, they will be stored. So far there is no automated validation or guidance to help users understand which contextual attributes (or combinations thereof) can pose privacy risks or determine the appropriate risk level according to the selected values. The PM and the PA must manually review the related documentation (Data Collection Guidelines) and determine the risk level of the data collected by their instrument. In addition to that, the Regulation section in xLab is not fully updated to those guidelines and a new UI/UX for that section is already designed. This hypothesis aims to implement the updated Regulation section and also give some automated guidance, validation and advice to users. The system would identify when specific attributes or combinations of attributes could make the instrument a low/medium/high risk data collection activity and requires privacy review. The platform would help users to increase clarity regarding privacy risk and also could reduce the over-collecting of data when it’s not really needed. It’s also worth mentioning here that eliminating a manual step is also beneficial in terms of avoiding a pain point that may also cause confusion.

Timeline

By the 2nd quarter of FY25-26, we would like to improve the experimentation platform to automatically help stakeholders know more about the privacy concerns related to the attributes they are going to collect

Risks

RiskDescriptionStatusNotes
Pending work/technical debt regarding instrumentationThere is some pending (and ongoing) work as technical debt that we are already addressing to make instruments working as expected for this hypothesis workMitigating

Who is involved

Overview - DACI
DriverApproverContributorsInformed
Santiago FaciJulie van der HoopSteering committee: 2.1 Additional contributors: Sarai Sanchez, Clare Ming, Sam SmithPartner product teams (Reader Experiences + Moderator Tools)
Details about roles and activities
Team/RoleTypeIndividualsSample Activities
WMF Experiment Platform TeamDevelopment teamClare Ming, Sam Smith, Mikhail Popov, Santiago FaciResearch & Implementation
Product DesignUI/UX DesignSarai SanchezUI and UX design
WMF Research & Decision Science TeamConsultationJennifer Wang, Megan NeislerReview related work
Legal, Security, Trust and Safety TeamConsultationEric MillReview related work

Requirements

Hypothesis Requirements

  • The Regulation section of the instrument form has been improved to provide more details about the selected risk level according to the data collection guidelines (https://phabricator.wikimedia.org/T380592)
  • New validation rules/advice have been added to xLab to disallow or force privacy review based on the attributes the instrument aims to collect
  • xLab will check whether the selected risk level in the Regulation section matches with the selected contextual attributes configured for the given instrument and provides some guidance in the case the user needs to make changes or further actions

Success Criteria

In order to understand whether the hypothesis is successful or not, we have defined the following success criteria:

  • When users are registering their instrument by using xLab, the platform will disallow or discourage collection of some specific combination of attributes based on Edge Uniques Full Privacy Review and the data collection guidelines
  • xLab knows more about the data collected by an instrument and can run a more comprehensive validation process regarding this and the data collection risks. That way xLab is able to check whether or not the risk level specified by the user is according to the attributes the instrument is going to collect
  • Over-collecting personal or sensitive data has been reduced
  • The platform documentation has been updated according to all changes made for this work

Target Outcomes

The goal is to:

  • The Regulation section of the instrument registration form is updated according to the current Data Collection Guidelines
  • Increase awareness of privacy by disallowing or discouraging collection of some combination of attributes where privacy is at risk
  • xLab will help users to configure regulation for their instruments
    • When registering an instrument, xLab will provide advice to the user about the implications of collecting the chosen contextual attributes
    • When registering an instrument, xLab will be able to check whether the regulation part is well configured according to the contextual attributes that are going to be collected
  • Ultimately, this work results in a more intuitive setup, which reduces the questions to Experiment Platform team and speeds up instrument creation

Ideal Outcomes:

  • Allow users to toggle user agent collection by an instrument

What is out of scope?

There are various elements that we will consider out-of-scope for validation of this hypothesis:

  • Adding support for xLab to be able to analyze which additional fields from a custom schema are going to be collected

Background & existing research or documentation

Open questions

  • How does the instrument’s contextual attributes override those set in the common web stream?
  • What is the scope of the technical debt that needs to be addressed as pre-work? Should this hypothesis start in Q2?
    • The scope is already defined with the tasks mentioned above. That work will be addressed within the Q1 and we will be able to start working effectively on the hypothesis’ work in Q2

Dependencies

As specified above, we could need some consultation/review from Legal, Security and/or Data Science teams to make sure that the work done here is according to the privacy policy and data collection guidelines

Event Timeline

phuedx triaged this task as High priority.Aug 21 2025, 3:33 PM
phuedx moved this task from Incoming to Backlog on the Test Kitchen board.