Page MenuHomePhabricator

Introduce new logging schema
Open, Needs TriagePublic5 Estimated Story Points

Description

Quoted Text

In order to use a new event schema, we need to publish it to the schema repository. Schemas are supposed to provide backwards compatibility to prevent errors that occur due to an mismatching schema. We still receive events from older banners, so we should introduce a new one for the upcoming campaign.

Documentation
Event Platform/Schemas - Wikitech

Acceptance Criteria

  • A new schema is created and committed to the schema repository.

Event Timeline

kai.nissen set the point value for this task to 8.May 17 2023, 11:14 AM

We created a preliminary draft of the schema at https://github.com/wmde/fundraising-banners/pull/372
The next steps are:

  • Check with PM for the title of the new schema. Suggested names: /wikimedia/analytics/wmde/bannerevent, /wikimedia/analytics/wmde/fundraisingbannerevent or /wikimedia/analytics/wmdebannerevent
  • Check with PM for additional fields needed (can be included with a $ref, see other schemas in the repo). The schema page makes it look like /fragment/analytics/common/2.0.0# would be required/a good choice, but then we may also need PM-mandated values for stream and domain. At least stream should be defined.
  • Check out https://gerrit.wikimedia.org/r/plugins/gitiles/schemas/event/secondary/ and add the schema in the correct path (determined by the title)
  • Generate version 1.0.0 with the jsonschema-tools
  • Create a pull request with gerrit - ask PM who will review it

To check: if customData is ok to be specified as an object with arbitrary string values.

  • Check with PM for the title of the new schema. Suggested names: /wikimedia/analytics/wmde/bannerevent, /wikimedia/analytics/wmde/fundraisingbannerevent or /wikimedia/analytics/wmdebannerevent

I can't find anything on a naming convention in the documentation. @Ottomata Can you point us to the respective doc page if there is one? It may be a good idea to put it in a folder with a general name, so it might also be used by others. The schema itself should also have a general name, so we can reuse it for community engagement campaigns. I'd suggest analytics/banner/wmdebanners. Or analytics/banner/wmdefundraising, if we want to introduce a different schema for community engagement banners.

  • Check with PM for additional fields needed (can be included with a $ref, see other schemas in the repo). The schema page makes it look like /fragment/analytics/common/2.0.0# would be required/a good choice, but then we may also need PM-mandated values for stream and domain. At least stream should be defined.

Is this for the custom_data field (last table row in the description of the parent task)? If so, and we decide to use it, we can name the stream similar to the schema (e. g. banner.wmde_fundraising). I doubt, that anything will ever consume the stream, though.
Not sure about the domain. If possible, we shouldn't bind this to a domain. We could use a general wikipedia.org, but would that block us from ever triggering events from banners on commons.wikimedia.org?

@Ottomata Can we ask you for review after creating a patch on Gerrit?

respective doc page

Not sure about the domain

FWIW, semantics of meta.domain is not well defined. Sometimes it is used as for domain names, sometimes it is used for business 'domain' (as in Domain Driven Design). (We should add this fact to https://wikitech.wikimedia.org/wiki/Event_Platform/Flaws ...)

Can we ask you for review after creating a patch on Gerrit?

I'll be on baby duty leave for a while. You can add me but I won't be prompt. Please add folks from the WMF Data-Products team. @mforns can help direct you.

More generally though, there has been a lot of progress on WMF's 'Metrics Platform' over the last year, so the advice for building instrumentation events is probably newer than what you received last year.

Cc @phuedx

kai.nissen changed the point value for this task from 8 to 5.
AbbanWMDE moved this task from Doing to Review on the WMDE-FUN-Sprint-2024-02-27 board.
AbbanWMDE subscribed.

Hi @kai.nissen and @gabriel-wmde!

I might be completely off, but I think Metrics Platform's web base schema would suite your needs here, without having to create a new schema.
I see that your schema suggestion has 4 main fields:

tracking_keyword   string
event_name         string
feature            string
user_choice        string

If I understand correctly, tracking_keyword is a kind of banner identifier; event_name is the name of the event; feature refers to the element that the user interacted with; and user_choice complements the interaction of the user with the feature.

The Metrics Platform base schema offers 5 fields that could match your needs:
(note this fields are not directly defined in the base schema itself, but in one of the referenced fragments)

action          string
action_subtype  string
action_source   string
action_context  string
element_id      string

As a suggestion, you could use action to store the event_name, action_source to store the tracking_keyword, element_id to store the feature, and action_context to store user_choice.
@phuedx does this make sense, or did I miss something?

In the latest version of Metrics Platform schema tooling, we try to minimize the number of existing schemas, to make the collected data simpler and more standard.
It seems to me that the base schema would satisfy your needs, so I'd recommend you to use it instead of creating a new one.

As a suggestion, you could use action to store the event_name, action_source to store the tracking_keyword, element_id to store the feature, and action_context to store user_choice.
@phuedx does this make sense, or did I miss something?

element_friendly_name to store the feature might be more appropriate (because we're talking about a human-friendly reference to an UI element rather than a machine-readable, direct reference.

To check: if customData is ok to be specified as an object with arbitrary string values.

Do you have a list of the custom data that you're capturing?

Thanks for looking into it, @mforns and @phuedx!

I totally see your point of decluttering and generalising schemas: The purpose of introducing this schema actually is to replace and phase out the existing three other WMDE banner related schemas. We misused some of the fields in our existing schemas, because it didn't fit our needs.

Your suggestion makes sense from a generalising point of view, but the web base schema doesn't seem to provide a field for logging additional data that is tied to a specific type of event. In our schema draft, the field custom_data is supposed to store any event-specific metric that might be needed. For some things we collect additional data only for a short period of time (e. g. the number of slides seen by a user until the event was triggered by an action), some we keep collecting continuously (e. g. the viewport measurements and the height of the banner when the code decides to not show it). I consider this a crucial requirement, so I think the web base schema is not fully satisfying our needs.