Page MenuHomePhabricator

Write a script to capture custom data properties counts in secondary schemas
Closed, ResolvedPublic3 Estimated Story Points


Implementation task coming out of spike T354955


Write a script for capturing the number and description of custom data properties in instrumentation schemas.

Use Case

To establish Metrics Platform baseline of measuring the number of custom data properties in WMF instruments given the following success criteria:

decrease the number of custom fields across instruments by X%

User Story

As a product manager, I want to know the change over time of the number of custom data properties used in WMF schemas for instrumentation purposes in order to gauge the efficacy of Metrics Platform adoption.


We know how many custom data properties are being used at any one time and over time in WMF instruments.

Acceptance Criteria


  • Documentation

Technical Notes

In T354955#9512064, an example script shows how to convert yaml into an object that can be iterated over to include/exclude schemas with specific fragment references and to count the number of properties in the schema.

Depending on how we want to make the result available over time, we may want to build a tool/interface to show results over time. This can be spun off into its own task unless we go the lo-fi route of outputting data to a spreadsheet in which case, details of that solution should be included here as part of the 2nd AC.

Event Timeline

cjming set the point value for this task to 3.
  • A script is written to provide the following data points:


  • For MVP, we maintain a spreadsheet that tracks this data over time (monthly? quarterly?)…


… starting from July 2023.

On it!

fantastic - not sure how we want to document this to keep an eye on data over time - is it enough to just have the repo + spreadsheet and run the script at some interval?

moving this to sign off for the time being and we can spin off additional tickets if needed to persist data or publish results somewhere

@WDoranWMF: Do we have a place in our process for recurring tasks at n-sprintly intervals?

Otherwise, I suppose we just have to add it to the list of things to discuss whilst kicking off a sprint.