Until MPIC or the experimentation platform becomes the canonical source of instrument documentation, create a public, on-wiki process for documenting instruments. This will be a lightweight, flexible process intended to be an interim solution.
Use cases for instrument documentation
- As someone interested in the Metrics Platform, learn what instruments are currently active
- As someone interested in WMF data collection, learn about what data is being collected by Metrics Platform instruments
- As someone doing data analysis using Metrics Platform instrument data, find information about the data collected by an instrument
Documentation elements
- Instrument name
- Stream name
- Location of instrument code
- Contact
- Schema
- Links to docs (Instrument data specification, measurement plan, etc.)
Outcomes
- Documentation process:
- Instrument list:
- Next steps:
- Replace this process with MPIC and/or the experimental platform once those tools are available
Considerations
- Automation: Using the information sources that are available right now, there's no easy way to automate a list of active Metrics Platform instruments:
- The best source is to search for metrics_platform_client in ext-EventStreamConfig.php. However, there are a few false positives there currently due to lingering cleanup tasks.
- There are the event stores in Hive within Datahub, but these don't all represent Metrics Platform instruments. In addition, the individual tables pages only contain the information from the schemas; they don't automatically populate information about interaction data values or which contextual attributes are actually available, that information still lives only in the instrumentation spec, the instrument code, and the stream config.
- New tools: Metrics Platform is currently working on MPIC and the experimentation platform. Due to the potential for these tools to provide an accurate, automated list of active instruments, I don't recommend investing in other methods of automating an instrument list in the short term.
- Public access: For transparency with the wider community, it's important to provide some publicly accessible information about instruments. I recommend that we ask instrument creators to publish a summary of their instrument on wiki (see the process), including a summary of their measurement plan and the data collected by the instrument. (See this example from Wikilamba.) The will meet our transparency goal without asking instrument creators to move complex docs on wiki and without asking interested community members to read through long, complex Google Docs.
Answered questions
- Does one stream correspond to one instrument?
- It's possible that an instrument could be composed of one or more sterams. However, in reality, it's always been the case that one instrument corresponds to one stream.
- There are a few streams in the config that I can't find anywhere in code: wikifunctions.ui and mediawiki.reference_previews
- These don't represent active instruments and should be cleaned up.
- What's the relationship between wikifunctions.ui and mediawiki.product_metrics.wikifunctions_ui?
- Based on T355438, it looks like wikifunctions.ui uses the old MP monoschema. mediawiki.product_metrics.wikifunctions_ui supersedes wikifunctions.ui
- Some streams (like mediawiki.reference_previews) have an events property inside metrics_plaftorm_client. What is this? Is this documented anywhere?
- This is a legacy property from a deprecated MP method.