Page MenuHomePhabricator

Introduce new schema for WMDE banner metrics
Open, Needs TriagePublic1 Estimated Story Points

Assigned To
Authored By
kai.nissen
May 17 2023, 10:39 AM
Tags
Referenced Files
None

Description

Quoted Text

In order to use a new event schema, we need to publish it to the schema repository. Schemas are supposed to provide backwards compatibility to prevent errors that occur due to an mismatching schema. We still receive events from older banners, so we should introduce a new one for the upcoming campaign.

Documentation
Event Platform/Schemas - Wikitech

Acceptance Criteria

  • A new schema is created and committed to the schema repository.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@VirginiaPoundstone It seems that there has been some activity on this ticket recently. Can you please give me an update on what was discussed and what the outcome was?

@mforns, @Ottomata, @phuedx, @VirginiaPoundstone This ticket is stalled. Is there anything we can do to move it forward?

Ottomata renamed this task from Introduce new logging schema to Introduce new schema for WMDE banner metrics.Dec 2 2024, 6:02 PM

@kai.nissen I pinged the Metrics Platform team to get an update.

Unless they advise otherwise, you could move forward and create your event schema and stream as needed.

@kai.nissen so sorry for having missed all the pings for so long.

About 18 months ago, we decided to change the existing metrics platform "monoschema", to get rid of the custom_data field.
I believe that having such a field is not a good data engineering practice, because of the reasons detailed in this document.
We designed the new set of fragments and schemas to offer an alternative to the custom_data field.

I understand that the new fragments and schemas do not fulfill your requirements.
In this case, my advice would be to:

  1. Create a fragment for your project. In it, you will be able to add whichever extra fields you need in different experiments. Whenever you create a new experiment, you can add new fields there if necessary, without the need to create another schema. That said, please do not use a custom_data field, but rather create explicit fields with types and descriptions for each concept you want to store.
  2. Create an "empty" schema for your project that just references the fragment that you created in step 1, and also references the corresponding base fragments. This way you get all the standard structure from the base fragments, and the customization from your own fragment.

That said, I'm not sure this is the preferred way? Maybe @phuedx has different ideas?

That said, I'm not sure this is the preferred way? Maybe @phuedx has different ideas?

(1) is my preference.

Hehe, 1 and 2 were intended to be a sequence, rather than 2 options. Sorry, I think my question in the end was misleading.
But knowing that you are OK with (1), 2 is just the application of that in the form of a schema (not a fragment).
And to clarify, the reason to structure the fields in a fragment and not in a schema directly, would be to be able to re-use the fragment for app vs web base schemas.

@kai.nissen we also have some documentation about how to make a custom schema fragment which might also help: https://wikitech.wikimedia.org/wiki/Metrics_Platform/Custom_schemas

gabriel-wmde changed the point value for this task from 5 to 1.May 27 2025, 10:01 AM

@mforns Could you take a look at Abban's change/merge requests on Gerrit and Gitlab, or suggest someone I could assign as a reviewer?

@kai.nissen Hi!
I've reviewed the changes.
I think at this point, it would be good that someone from the Experimentation Platform team reviewed this change, since I'm not fully aware of the latest changes in that system.
I added @Sfaci as reviewer 🙏

Hello! I'm our new PM on the experiment platform team, and I'm just back from vacation. I'll take a look at this and the conversation on the patch this week. Any context you'd like to provide is helpful!

@kai.nissen I'd love to understand more about the context behind this new schema and what you're looking to achieve in the bigger picture, so that we can best support or advise. Would you be available for a quick 15-30 minute chat in the next few days? If you're not the right person to discuss this with, I'd appreciate being connected with whoever is.

Thank you!

Hello @kai.nissen @AbbanWMDE bumping a request to connect! jvanderhoop@wikimedia.org :)

kai.nissen changed the point value for this task from 1 to 5.

@mforns Can you give this PR another look over? I previously changed it to fit into the xLab platform, but the fundraising team decided to not use xLab for now so I reverted it again. I'd love to finally get this off my plate.

AbbanWMDE changed the point value for this task from 5 to 1.

Hi! I added a few nits to your schema MR. :)

@AbbanWMDE OH NO! I never submitted my review comments. They were stuck in Pending on Gitlab. I'm so sorry!

I just submitted them.