Page MenuHomePhabricator

Create an interim process for documenting Metrics Platform instruments
Closed, ResolvedPublic

Description

Until MPIC or the experimentation platform becomes the canonical source of instrument documentation, create a public, on-wiki process for documenting instruments. This will be a lightweight, flexible process intended to be an interim solution.

Use cases for instrument documentation

  • As someone interested in the Metrics Platform, learn what instruments are currently active
  • As someone interested in WMF data collection, learn about what data is being collected by Metrics Platform instruments
  • As someone doing data analysis using Metrics Platform instrument data, find information about the data collected by an instrument

Documentation elements

  • Instrument name
  • Stream name
  • Location of instrument code
  • Contact
  • Schema
  • Links to docs (Instrument data specification, measurement plan, etc.)

Outcomes

Considerations

  • Automation: Using the information sources that are available right now, there's no easy way to automate a list of active Metrics Platform instruments:
    • The best source is to search for metrics_platform_client in ext-EventStreamConfig.php. However, there are a few false positives there currently due to lingering cleanup tasks.
    • There are the event stores in Hive within Datahub, but these don't all represent Metrics Platform instruments. In addition, the individual tables pages only contain the information from the schemas; they don't automatically populate information about interaction data values or which contextual attributes are actually available, that information still lives only in the instrumentation spec, the instrument code, and the stream config.
  • New tools: Metrics Platform is currently working on MPIC and the experimentation platform. Due to the potential for these tools to provide an accurate, automated list of active instruments, I don't recommend investing in other methods of automating an instrument list in the short term.
  • Public access: For transparency with the wider community, it's important to provide some publicly accessible information about instruments. I recommend that we ask instrument creators to publish a summary of their instrument on wiki (see the process), including a summary of their measurement plan and the data collected by the instrument. (See this example from Wikilamba.) The will meet our transparency goal without asking instrument creators to move complex docs on wiki and without asking interested community members to read through long, complex Google Docs.

Answered questions

  • Does one stream correspond to one instrument?
    • It's possible that an instrument could be composed of one or more sterams. However, in reality, it's always been the case that one instrument corresponds to one stream.
  • There are a few streams in the config that I can't find anywhere in code: wikifunctions.ui and mediawiki.reference_previews
    • These don't represent active instruments and should be cleaned up.
  • What's the relationship between wikifunctions.ui and mediawiki.product_metrics.wikifunctions_ui?
    • Based on T355438, it looks like wikifunctions.ui uses the old MP monoschema. mediawiki.product_metrics.wikifunctions_ui supersedes wikifunctions.ui
  • Some streams (like mediawiki.reference_previews) have an events property inside metrics_plaftorm_client. What is this? Is this documented anywhere?
    • This is a legacy property from a deprecated MP method.

Event Timeline

I've created an on-wiki list of instruments as a starting point (https://wikitech.wikimedia.org/wiki/Metrics_Platform/Instrument_list) and linked to it from the main instrument guide. This will give us something to iterate on going forward. I'm going to move this task back to the backlog and pick it up again when I'm back from vacation in a few weeks. In the meantime, edits to the instrument list and feedback on this task are very welcome!

Next steps:

  • Determine whether this is the correct list of active instruments.
  • Find out who needs instrument documentation and for what purposes (likely product analysts and product managers). Talk to these groups and to instrument maintainers to find out whether it makes sense for instrument docs to be centralized or decentralized.
  • Create a plan for surfacing information about instruments publicly when docs have restricted access.
  • Determine whether we can consolidate https://wikitech.wikimedia.org/wiki/Metrics_Platform/Deployed_Streams with the instrument list.

Does one stream correspond to one instrument?

Technically? It's possible that an instrument could be composed of one or more streams. However, in reality, it's always been the case that one instrument corresponds to one stream.

There are a few streams in the config that I can't find anywhere in code: wikifunctions.ui and mediawiki.reference_previews

They should be cleaned up. At the moment, this is an ad-hoc task that isn't the responsibility of any one team.

What's the relationship between wikifunctions.ui and mediawiki.product_metrics.wikifunctions_ui?
Based on T355438, it looks like wikifunctions.ui uses the old MP monoschema.

mediawiki.product_metrics.wikifunctions_ui supersedes wikifunctions.ui. See T350497: Update the WikiLambda instrumentation to use core interaction events and T358873: QA Wikilambda instrumentation port to new core interactions metrics platform version.

Some streams (like mediawiki.reference_previews) have an events property inside metrics_plaftorm_client. What is this? Is this documented anywhere?

I think that the documentation for this field has been removed. It's a deprecated property that supported the dispatch() method, which was deprecated as part of https://wikitech.wikimedia.org/wiki/Metrics_Platform/Decision_Records/Deprioritize_Custom_Data.

apaskulin added a subscriber: VirginiaPoundstone.

Thanks for these responses, @phuedx! This is really helpful. I've removed wikifunctions.ui and mediawiki.reference_previews from the instrument list, and added the new Growth instrument since I noticed it in the config file.

Technically? It's possible that an instrument could be composed of one or more sterams. However, in reality, it's always been the case that one instrument corresponds to one stream.

This makes sense. Since it's usually one stream to one instrument, I've gone ahead and redirected https://wikitech.wikimedia.org/wiki/Metrics_Platform/Deployed_Streams to https://wikitech.wikimedia.org/wiki/Metrics_Platform/Instrument_list. It seems like these two lists would end up being basically the same, but I think it's conceptually simpler to list by instrument since "instrument" is a core concept of MP.


@VirginiaPoundstone I've updated the task description with the outcomes and the things I took into consideration. I'd love to get your thoughts! I'm moving this task into signoff, but feedback from anyone is still welcome.

This will eventually be automated as part of the Metrics platform instrumentation &. experimentation catalog. But for now, this is great.
Moving this to done!