Page MenuHomePhabricator

Schema repository structure, naming
Closed, ResolvedPublic

Description

It would be nice to come to agreement about some simple conventions for naming schema, filing schema in subdirectories, and structuring the analytics schema repository in general. These conventions would hopefully allow us to move more quickly, be autonomous, and most importantly, not be talking about names so much (which is really boring for everybody).

I think it would be good to see what we can learn from the primary repository, which has a lot of schema, but still think about how schema are used for analytics, which is a different use-case. We should also take into account the engineering pros and cons associated with certain approaches, as well as factors like usability, self-documentation, discoverability, and complexity over time. I would personally also like to see us make choices that minimize the number of decisions that need to be made in order to produce a schema. These add up, and can be small things, for example "what do I name this", "where should this go", "should this be capitalized", etc.

See also: https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#Event_Data_Modeling_and_Schema_Naming

Event Timeline

Haha "should this be capitalize"...no! :)

I'd like to make a suggestion.

For app specific event schemas, prefix with the app name:

  • analytics/mediawiki/mediasearch_interaction
  • analytics/wikipedia_ios/button_click
  • analytics/wikipedia_android/button_click
  • analytics/wikivoyage_ios/search_request

For anything that might be shared across apps, don't prefex:

  • analytics/session_tick

I think that some folks had suggested prefixing with things like reading/ and contributing/. If yall really want to do this, please go ahead, but fist remember the past and consider the future. 'contributing' used to be called editing. In N years from now, MediaWiki may not be the primary app product teams are instrumenting.

For app specific event schemas, prefix with the app name:

  • analytics/mediawiki/mediasearch_interaction
  • analytics/wikipedia_ios/button_click
  • analytics/wikipedia_android/button_click
  • analytics/wikivoyage_ios/search_request

For anything that might be shared across apps, don't prefex:

  • analytics/session_tick

I think this is a sensible idea. In other words, /analytics/<schema>, or /analytics/<appname>/<schema>. I also like being very boring and explicit about the application name. This would be very simple and straightforward to maintain.

So they're linked here, here are the recommendations from Product Analytics that I think this task is at least partly in response to: https://www.mediawiki.org/wiki/Product_Analytics/Event_Platform_recommendations

I think that some folks had suggested prefixing with things like reading/ and contributing/. If yall really want to do this, please go ahead, but fist remember the past and consider the future. 'contributing' used to be called editing. In N years from now, MediaWiki may not be the primary app product teams are instrumenting.

Actually there isn't a Reading team even now, I think? I don't think those were necessarily meant to correspond to team names, though it seems in practice that they do, which isn't ideal. Also, "mobile apps" seems like kind of an outlier as a group of products rather than a specific functional area.

For app specific event schemas, prefix with the app name:
[...]

I think this is a sensible idea. In other words, /analytics/<schema>, or /analytics/<appname>/<schema>. I also like being very boring and explicit about the application name. This would be very simple and straightforward to maintain.

+1 in principle. That said, probably most of the instruments are going to be implemented somewhere in MediaWiki core or extensions. Do all of those live under mediawiki/? Something like mediawiki/<extension_name>/...? Or just <extension_name>/ in the case of extensions?

+1 in principle. That said, probably most of the instruments are going to be implemented somewhere in MediaWiki core or extensions. Do all of those live under mediawiki/? Something like mediawiki/<extension_name>/...? Or just <extension_name>/ in the case of extensions?

I think aiming for fewer hierarchies / prefixes, and keeping as much flat as possible, will make naming less of a hurdle. I'd suggest just analytics/mediawiki/whatever_thing_happens for both mediawiki core and mediawiki extensions.

So they're linked here, here are the recommendations from Product Analytics that I think this task is at least partly in response to: https://www.mediawiki.org/wiki/Product_Analytics/Event_Platform_recommendations

I think that some folks had suggested prefixing with things like reading/ and contributing/. If yall really want to do this, please go ahead, but fist remember the past and consider the future. 'contributing' used to be called editing. In N years from now, MediaWiki may not be the primary app product teams are instrumenting.

Actually there isn't a Reading team even now, I think? I don't think those were necessarily meant to correspond to team names, though it seems in practice that they do, which isn't ideal. Also, "mobile apps" seems like kind of an outlier as a group of products rather than a specific functional area.

Those aren't team names, they're describing experiences/workflows. We specifically decided not to use team names.

@Ottomata @jlinehan @kzimmerman and I met to discuss schema repository naming, structure, and (briefly) questions around ownership/stewardship.

The organization of the existing schemas (particularly the ones in /analytics/mobile_apps) and the recommendations made by Product Analytics was motivated by the desire to have better data discoverability and ability to navigate the repository, as schemas serve as the source of truth documentation about what to expect in an events table. This, however, leads to several issues: (1) decisions we make now about how to organize may not be the best decision and we would be locking in future decisions, (2) when creating a new schema there will always be overhead in trying to decide where to put that schema, and (3) even boundaries between experiences/activities (as proposed) are fuzzy.

We identified that it would actually be better to have a flat (rather than hierarchical) structure but then have some/better tools for browsing the repository and the schemas inside. One idea was to employ a tagging system and build tools which would allow users to find and browse schemas based on tags, instead of hard-coding metadata like that into the file structure itself.

Kate will check in with Product Analytics after the holidays.

Ottomata triaged this task as Medium priority.Jun 11 2021, 6:28 PM

Hi, any follow up here? Should we close this ticket?

Aklapper subscribed.

Removing inactive assignee from this open task. (Please update assignees on open tasks after offboarding. Thanks.)