FY25-26 SDS2.1.5 User Experience - Attribute Selection
Product Requirements
STATUS: Not started
| Reviewer | Date approved |
|---|---|
| Julie van der Hoop | Aug 7 2025 |
| Karen Hernandez | |
Objective/Hypothesis
If we tell users that their instrument, if created in xLab, contains a set of attributes that changes the risk category, we will deter instrumentation users from over-collecting data and increase clarity around what combination of attributes require privacy review.
How does this objective/hypothesis relate to organizational goals?
This hypothesis supports the 2025-2026 fiscal year annual plan for product and technology department to deliver on Objective SDS2, KR2.1, which states:
SDS Objective:
Product managers can quickly, easily, and confidently evaluate the impacts of product feature changes in Wikipedia
Key Result:
By the end of Q2, experiment and evaluate 3 interventions that help contributors improve the state of vital content on their Wikipedias.
Why do this?
When registering an instrument, users can choose which contextual attributes they want to collect. Once the instrument is running these attributes will be added to every submitted event and, finally, they will be stored. So far there is no automated validation or guidance to help users understand which contextual attributes (or combinations thereof) can pose privacy risks or determine the appropriate risk level according to the selected values. The PM and the PA must manually review the related documentation (Data Collection Guidelines) and determine the risk level of the data collected by their instrument. In addition to that, the Regulation section in xLab is not fully updated to those guidelines and a new UI/UX for that section is already designed. This hypothesis aims to implement the updated Regulation section and also give some automated guidance, validation and advice to users. The system would identify when specific attributes or combinations of attributes could make the instrument a low/medium/high risk data collection activity and requires privacy review. The platform would help users to increase clarity regarding privacy risk and also could reduce the over-collecting of data when it’s not really needed. It’s also worth mentioning here that eliminating a manual step is also beneficial in terms of avoiding a pain point that may also cause confusion.
Timeline
By the 2nd quarter of FY25-26, we would like to improve the experimentation platform to automatically help stakeholders know more about the privacy concerns related to the attributes they are going to collect
Risks
| Risk | Description | Status | Notes |
|---|---|---|---|
| Pending work/technical debt regarding instrumentation | There is some pending (and ongoing) work as technical debt that we are already addressing to make instruments working as expected for this hypothesis work | Mitigating | |
Who is involved
| Overview - DACI | |||
|---|---|---|---|
| Driver | Approver | Contributors | Informed |
| Santiago Faci | Julie van der Hoop | Steering committee: 2.1 Additional contributors: Sarai Sanchez, Clare Ming, Sam Smith | Partner product teams (Reader Experiences + Moderator Tools) |
| Details about roles and activities | |||
|---|---|---|---|
| Team/Role | Type | Individuals | Sample Activities |
| WMF Experiment Platform Team | Development team | Clare Ming, Sam Smith, Mikhail Popov, Santiago Faci | Research & Implementation |
| Product Design | UI/UX Design | Sarai Sanchez | UI and UX design |
| WMF Research & Decision Science Team | Consultation | Jennifer Wang, Megan Neisler | Review related work |
| Legal, Security, Trust and Safety Team | Consultation | Eric Mill | Review related work |
Requirements
Hypothesis Requirements
- The Regulation section of the instrument form has been improved to provide more details about the selected risk level according to the data collection guidelines (https://phabricator.wikimedia.org/T380592)
- New validation rules/advice have been added to xLab to disallow or force privacy review based on the attributes the instrument aims to collect
- xLab will check whether the selected risk level in the Regulation section matches with the selected contextual attributes configured for the given instrument and provides some guidance in the case the user needs to make changes or further actions
Success Criteria
In order to understand whether the hypothesis is successful or not, we have defined the following success criteria:
- When users are registering their instrument by using xLab, the platform will disallow or discourage collection of some specific combination of attributes based on Edge Uniques Full Privacy Review and the data collection guidelines
- xLab knows more about the data collected by an instrument and can run a more comprehensive validation process regarding this and the data collection risks. That way xLab is able to check whether or not the risk level specified by the user is according to the attributes the instrument is going to collect
- Over-collecting personal or sensitive data has been reduced
- The platform documentation has been updated according to all changes made for this work
Target Outcomes
The goal is to:
- The Regulation section of the instrument registration form is updated according to the current Data Collection Guidelines
- Increase awareness of privacy by disallowing or discouraging collection of some combination of attributes where privacy is at risk
- xLab will help users to configure regulation for their instruments
- When registering an instrument, xLab will provide advice to the user about the implications of collecting the chosen contextual attributes
- When registering an instrument, xLab will be able to check whether the regulation part is well configured according to the contextual attributes that are going to be collected
- Ultimately, this work results in a more intuitive setup, which reduces the questions to Experiment Platform team and speeds up instrument creation
Ideal Outcomes:
- Allow users to toggle user agent collection by an instrument
What is out of scope?
There are various elements that we will consider out-of-scope for validation of this hypothesis:
- Adding support for xLab to be able to analyze which additional fields from a custom schema are going to be collected
Background & existing research or documentation
- Edge Uniques Privacy Review
- Data Collection Guidelines
- Wikimedia Foundation Privacy Policy
- Slack thread on related technical debt
Open questions
- How does the instrument’s contextual attributes override those set in the common web stream?
- What is the scope of the technical debt that needs to be addressed as pre-work? Should this hypothesis start in Q2?
- The scope is already defined with the tasks mentioned above. That work will be addressed within the Q1 and we will be able to start working effectively on the hypothesis’ work in Q2
Dependencies
As specified above, we could need some consultation/review from Legal, Security and/or Data Science teams to make sure that the work done here is according to the privacy policy and data collection guidelines