Page MenuHomePhabricator

Update guide to creating an instrument with Metrics Platform
Closed, ResolvedPublic

Description

As part of T329506, do an initial update of the guide to creating an instrument with Metrics Platform so that it is up to date and follows a step-by-step process.

To do:

Documentation journey as of August 15

MP docs flow.jpg (1×2 px, 226 KB)

Supporting documents and next steps

  • Measurement plan template
    • To do: Create a template based on existing good examples
  • Measurement plan examples
  • Instrumentation spec template
    • To do: Update the template to match the terms and flow from the guide
  • Instrumentation spec examples
  • MP base schemas
    • To do: Look into a visualization tool for schema.wikimedia.org
  • Lists of contextual attributes (one for PHP and one for JS)
  • Create a Custom Schema
    • To do: Update example, add details about choosing from an existing custom schema, add validate step
  • Setup Mediawiki for Metrics Platform
  • Implementations
    • To do: De-duplicate content with the main guide
  • Getting Started
    • To do: Combine with Setup guide
  • WikimediaEvents OWNERS.md
  • Creating a Stream Configuration
    • To do: De-duplicate with Setup Mediawiki for Metrics Platform and other places with config examples
  • Event Platform/Instrumentation How To
  • Supporting documents outside the scope of the Metrics Platform collection:
    • Backport windows
    • Data collection guidelines

Main guide next steps

  • Following the launch of the standardized clickrate instrument, change the instrument example to a scroll event.
  • Create a public, interim process for documenting instruments

July 17 content review

Metrics Platform/How to/Create An Instrument

Metrics Platform/How to/Create First Metrics Platform Instrument

Orgiginally based on Getting started with Metrics Platform (Google Doc)

Event Timeline

Update: https://wikitech.wikimedia.org/wiki/Metrics_Platform/How_to/Create_First_Metrics_Platform_Instrument is now the canonical guide to creating an instrument. It follows a step-by-step process that starts with defining the experiment in a measurement plan. It attempts to include advice given to instrument creators in places such as https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/1054846

Next steps are to review with subject matter experts and iterate on the details and flow.

I'd like to request a review from Data Products on these aspects of https://wikitech.wikimedia.org/wiki/Metrics_Platform/How_to/Create_First_Metrics_Platform_Instrument:

  1. High-level flow: Do the steps in the top-level headings make sense and represent the process correctly?
  2. New content review for "1. Create a measurement plan": Is this the correct list of things that should be included in a measurement plan?
  3. New content review for "2. Create an instrumentation specification": How can we improve the description of the instrumentation specification?

I'd like to schedule a synchronous meeting to talk through expanding the "Mapping strategy" section to include:

  • An explanation of events and interaction data
  • A step-by-step process for choosing a core schema and setting up a spec for how to use it
  • A reference to the data contract
apaskulin added a subscriber: Sfaci.

@Sfaci, can you take a look at items 1-3 in the previous comment? Thanks!

@apaskulin Here is my review:

  1. About the main flow:
  1. I'm sorry but I think somebody closer to Data Science/Analysis could review this part better than me. I don't know anything about a measurement plan and all the things that should be included there. I think you pinged @mpopov here to review as well which is great. Anyway, I have heard some questions about which is a right duration for an instrument and I'm not sure about it (is it a year too long for an instrument?). Should we include some orientation about it here? There are some details at https://wikitech.wikimedia.org/wiki/Metrics_Platform/How_to/Create_First_Metrics_Platform_Instrument#10._Decommission_the_instrument but we only say that the important part is having an end date and not create long lived instruments. Another question that raised there was if it's possible to extend that end date once the instrument has started.
  1. I think the new content is great! I like the idea of adding practical examples to explain how to use core schemas and interaction data. Just wondering if you are already working on adding here some details about which fields are included by default to an event through the core schemas. We are talking about it and I think it would be a good place to explain how an event looks like by default when using core schemas and how they could be extended using contextual attributes and interaction data (action* fields).

Thanks, @Sfaci! This is great feedback.

I miss there something about the instrument deployment or/and where it should be implemented.

This is a great point. I've added a brief note to the "Code the instrument" section about adding the instrument code to WikimediaEvents. Do we also want to include the option of coding the instrument in the product's codebase? It seems like the MinT instruments have also gone this route. Are there certain circumstances in which it would be a good idea to code the instrument in the product's codebase?

Should be the "Document the instrument section" be after the "Code the instrument" one?

+1 fixed!

Another thing I miss right now is some explanation about how to choose the right stream name.

I agree. What do you think about including the stream name as part of the instrumentation spec? Since it needs to be passed to MP in the instrument code, I think it would help to have it decided on before the coding step. Do you think it would be worth creating a ticket to get some input on stream-name best practices from Data Products?

Should we move https://wikitech.wikimedia.org/wiki/Event_Platform/Instrumentation_How_To#Viewing_and_querying_events to the Metrics Platform documentation?

Good question. I'll be looking into this as part of T370363: Document when to use Event Platform vs Metrics Platform for instrumentation.

I think somebody closer to Data Science/Analysis could review this part better than me

No worries! I'll follow up with Mikhail synchronously.

which is a right duration for an instrument

Hm, this is a good question. I'm not sure, but I would guess that this would be a policy-related decision. In the guide, we link to https://foundation.wikimedia.org/wiki/Legal:Data_Collection_Guidelines which links to https://foundation.wikimedia.org/wiki/Legal:Data_retention_guidelines

Another question that raised there was if it's possible to extend that end date once the instrument has started.

Do I understand correctly that (until MPIC is ready), starting and stopping an instrument is just a matter of editing the stream config? So the answer to this question would be yes, as long as the docs are updated and everything complies with our data policies, the instrument could be extended by delaying the removal of the stream config?

Just wondering if you are already working on adding here some details about which fields are included by default to an event through the core schemas. We are talking about it and I think it would be a good place to explain how an event looks like by default when using core schemas and how they could be extended using contextual attributes and interaction data (action* fields).

Definitely. I think this is a great idea! I've started an outline of that in https://wikitech.wikimedia.org/wiki/Metrics_Platform/How_to/Create_First_Metrics_Platform_Instrument#Specifying_event_data Edits are welcome! We're still missing details about the default fields and an example of a minimal event, which you've asked about in Slack.

This is a great point. I've added a brief note to the "Code the instrument" section about adding the instrument code to WikimediaEvents. Do we also want to include the option of coding the instrument in the product's codebase? It seems like the MinT instruments have also gone this route. Are there certain circumstances in which it would be a good idea to code the instrument in the product's codebase?

You are right! It seems there is not only one approach about this. I'll explore it and I'll bring you an answer.

I agree. What do you think about including the stream name as part of the instrumentation spec? Since it needs to be passed to MP in the instrument code, I think it would help to have it decided on before the coding step. Do you think it would be worth creating a ticket to get some input on stream-name best practices from Data Products?

I agree with you! Based on my short experience on this, it seems teams start to think about the stream name really early so it makes sense to include it as a part of the instrumentation spec. And it seems it would be worth creating that ticket. Growth team created its own ticket to discuss about it (T370907: Metrics Platform Integration: Agree on a stream name convention) so I guess we could save time for future teams regarding that decision. In fact, I would say there are some best practices there already discussed. We could use that ticket as the starting point.

Hm, this is a good question. I'm not sure, but I would guess that this would be a policy-related decision. In the guide, we link to https://foundation.wikimedia.org/wiki/Legal:Data_Collection_Guidelines which links to https://foundation.wikimedia.org/wiki/Legal:Data_retention_guidelines

About this, correct me if I'm wrong but I thought that retention time and instrument duration could be different things. I mean, you have to retain the data only for 90 days or any other amount of time but your instrument could be there one year. You could comply with the guidelines even running a long-lived instrument, right? But my understanding is that we want to avoid eternal instrumentation.

Do I understand correctly that (until MPIC is ready), starting and stopping an instrument is just a matter of editing the stream config? So the answer to this question would be yes, as long as the docs are updated and everything complies with our data policies, the instrument could be extended by delaying the removal of the stream config?

You are right, that's the way to enable/disable or start/stop an instrument at this time. But even with MPIC ready and running we could do something similar editing the instrument and changing its status. Technically will be always doable. What I was wondering is if we should allow that or not. I'll try to confirm that somehow.

I've started an outline of that in https://wikitech.wikimedia.org/wiki/Metrics_Platform/How_to/Create_First_Metrics_Platform_Instrument#Specifying_event_data

Great! I'll take a look at that and I'll ping you when I have all the details about default event fields to include it there.

You are right! It seems there is not only one approach about this. I'll explore it and I'll bring you an answer.

Thanks! (For this and the other things you've offered to follow up on!) In the meantime, I've added both options to the guide with a to-do to add advice.

it seems teams start to think about the stream name really early so it makes sense to include it as a part of the instrumentation spec

Awesome! I've started a draft section based on the conversation in that task and linked to it from the instrumentation spec section. Edits welcome!

You could comply with the guidelines even running a long-lived instrument, right? But my understanding is that we want to avoid eternal instrumentation.

Great point. I see that retention and collection can be independent. This also came up recently with Growth and in my conversations with the team. I've added a to-do to the decommissioning section to add more clarity on this once a consensus is reached.

A big thanks to everyone that helped provide feedback and answered questions for this! I've done extensive updates to Metrics Platform/How to/Create First Metrics Platform Instrument that:

  • simplify the process to streamline the experience for instruments using the base schemas, and
  • make explicit some of the implicit processes already described in the guide.

I've added to-dos where further clarification needs to be added, and I'll be making more improvements as we go. But at this point, I believe a team could plan and launch a simple instrument on their own using this guide. This is a great first milestone as we work to further improve the docs going forward.

I've used this process as an opportunity to do a bottom-up content audit by identifying key docs that support the main guide and next steps for improving them. The diagram in the task description includes the documentation journey with the key supporting docs for each step. Next, I'll be opening tasks for each of the next steps I identified here.

I'm going to move this task directly into done and do a formal review and signoff step as a separate task once more of the to-dos in the guide are resolved.