Page MenuHomePhabricator

[SPIKE] Establish a plan for internal Data Products team training on the data contract
Open, MediumPublic8 Estimated Story Points

Description

Description

As an assist to the work of T329506: User Facing Metrics Platform Documentation and spreading knowledge about how and when to use the data contract, we need to conduct some onboarding for the Data Products team as well as select product engineers and product analysts.

This is an opportunity for us to clean up our documentation, find gaps in docs that currently exist, clean up phab tasks and other artifacts containing the history of how we made decisions form monoschema to data contract.

Tasks

  • Offboard / handover Alex from documentation work
  • Review documentation to ensure alignment with terms and definitions and reduce confusing and complexity caused by inconsistent terms usage
  • Pull together all the pieces required to onboard developers to data contract into a single knowledge hub (doc, a wiki page)
  • Think through ways to train people about how to use it and best practices.
    • Do’s and don’t list
    • Workshops
    • Videos

Acceptance Criteria

  • We know who, what, when, and where training will happen

Required

  • Approved onboarding plan

Event Timeline

VirginiaPoundstone lowered the priority of this task from High to Medium.Aug 19 2024, 3:38 PM
VirginiaPoundstone lowered the priority of this task from Medium to Low.Aug 27 2024, 12:48 PM
VirginiaPoundstone raised the priority of this task from Low to Medium.Aug 30 2024, 3:26 PM

@phuedx What are best steps to advance this SPIKE?

We know who, what, when, and where training will happen

Who: Data Products members
What: … ?
When: That depends on what
Where: On-wiki and synchronous meeting(s)

I'm uncertain what the "what" is here? Do we have any other onboarding docs that we can replicate? Could we replicate the structures from other projects or teams?

While we're thinking about that there are a couple of tightly-scoped things that we could do to move things forward here:

  1. Collect all the tasks associated with developing the monoschema
  2. Collect all the tasks associated with moving from monoschema to the data contract
  3. Conduct a brief (30-50 minute) interview with the folks who did the move
  4. Collect all the tasks associated with developing an instrument with the data contract and create case studies from them
VirginiaPoundstone renamed this task from [SPIKE] Establish a plan for internal team training on the data contract to [SPIKE] Establish a plan for internal Data Products team training on the data contract.Sep 3 2024, 3:14 PM

@apaskulin have you started documentation of the data contract yet?

I've been thinking about this as three separate docs:

  1. Data contract docs for instrument creators using the base schemas
  2. Data contract docs for instrument creators using custom schemas yet
  3. Data contract docs for maintainers of the Metrics Platform
    • Some of what maintainers need to know may be covered in the docs for instrument creators using custom schemas, but there will probably be additional information and context that's only relevant to maintainers.
    • Status: Out of scope for T329506 but I can help with where this should live relative to the other MP docs

Yeah... I guess it does fall more towards maintainers...

but... it would be really great to have a single table where people can go and see all the fields, script names, description, null Y/N, object types and objects in one place. This is often referred to as a "data dictionary". Having a JSON (or something) example to see it in practice would be an added bonus. An entity diagram would be the cherry on top to help people understand how all the parts fit together.

Having the data dictionary would be a really useful point of entry and reference.

Having the data dictionary would be a really useful point of entry and reference.

This should be fairly easy to generate from the schemas themselves as every field has inline documentation…

This should be fairly easy to generate from the schemas themselves as every field has inline documentation…

Totally agree here. We should avoid manually duplicating the information in the schema to maintain the schema as the single source of truth. However, we should be able to either use a tool to visualize the schema in a more readable format or use the schema to generate a wikitable. I'm tracking this in T372680

VirginiaPoundstone added a subscriber: WDoranWMF.

@apaskulin just chatting about this as team and @WDoranWMF suggested that DataHub might be the right place for this documentation. He is going to take a look and figure out the answer here.

  1. Some of this overlaps with T376841: Render human-readable schemas on schema.wikimedia.org
  2. @VirginiaPoundstone and @JEbe-WMF to meet to refine the scope of this task given the above. We'll estimate afterwards
  3. Moved to In Progress to reflect the need to have a meeting
Milimetric set the point value for this task to 8.Oct 29 2024, 2:16 PM