Page MenuHomePhabricator

Implement a way to measure the amount of contextual attributes used per stream/instrument
Closed, DeclinedPublic

Description

Description

As part of T401384: FY25-26 SDS2.1.5 User Experience - Attribute Selection one success criteria is reducing over-collecting of personal or sensitive data. To do that we need to know what was, in average, the number or those attributes collected per stream/instrument before and after we implement the mentioned hypothesis' work.

We might choose one of the following ways:

  • Write a script that counts the number of contextual attributes in existing streams, write another that also counts how many contextual attributes are configured in the xLab database and report median over time
  • Record number or clicks on link to the Data Collection Guidelines
  • Include a question in the post-experiment survey

Acceptance criteria

  • We have decided which is the best way to measure over-collecting
  • The decided solution has been implemented

Event Timeline

@phuedx pointed out that we do have a way to count the number of streams and instruments and contextual attributes they had: https://gitlab.wikimedia.org/repos/data-engineering/custom-data-monitor

@Sfaci could you use this as a baseline or see how it might help us?

Milimetric moved this task from Incoming to READY TO GROOM on the Test Kitchen board.
Milimetric moved this task from READY TO GROOM to Backlog on the Test Kitchen board.
Milimetric subscribed.

Upon reflection, the team feels we need to find a different way to measure how successful Experimentation Lab is/was at enabling us to collect less data. It's not clear over-collection was a problem or how we might measure to what extent it was.