Page MenuHomePhabricator

Figure out whether we want a single, cross-platform schema for search event logging, or whether we want several schemas that share core attributes but then have added extras
Closed, ResolvedPublic

Description

We expose search to users on a lot of different platforms (desktop, web, apps, etc.), so how do we collect information that's comparable between the platforms?

First option: have a single, core schema (e.g. [[Schema:Search]]), which is implemented across all platforms.

Second option: have a set of schemas, one for each platform, which have the same core attributes, then whatever added extras each platform wants?

Let's figure out which approach we want to take.

Event Timeline

Deskana raised the priority of this task from to Needs Triage.
Deskana updated the task description. (Show Details)
Deskana subscribed.

The single schema approach is tricky, because it quick becomes a gigantic schema which makes it hard to extract data from. It's hard to get a schema that captures all the nuances of how we expose search to our users. On the other hand, it forces us to generate data that's cross-comparable between platforms, because we're implementing the same set of actions on all platforms.

With multiple schemas, they're going to be smaller, but there is possibility for differences to crop up that make the data incomparable. You can add extra metadata that's available on one platform that isn't available on others *really* easily (e.g. install ID for the apps) without bloating the schema and turning it into a monstrosity.

@Ironholds, thoughts?

As someone new to this, I'm having trouble grasping what a "schema" really means in this context. Can you link to any existing examples?

As someone new to this, I'm having trouble grasping what a "schema" really means in this context. Can you link to any existing examples?

Sure. Here are two examples, one search related and one not:

@Deskana Thanks for those links. And for anyone coming along after me, these are JSON schemas.

Deskana renamed this task from Figure out whether we want a single, cross-platform schema for Search, or whether we want several schemas that share core attributes but then have added extras to Figure out whether we want a single, cross-platform schema for search event logging, or whether we want several schemas that share core attributes but then have added extras.May 20 2015, 4:44 PM
Deskana set Security to None.

I could help with cross-departmental coordination, but it seems like we need to reach some level of internal agreement first.

Multiple schemas sounds fine, @Deskana - I guess what I'm really getting at is ownership. We need to either own the schemas or get assurances from the various teams that:

  1. The definitions are sensible and consistent;
  2. The schemas are going to be maintained;
  3. Bugs we report with the output will be handled reliably and promptly;
  4. Bugs they find in the schemas will be reported to *us* so that we can transparently note them.

@Ironholds: Is there a list of all the relevant schemas (or holes where there should be a schema but isn't)? I could find who currently owns each, or who would be willing to cede ownership to us. So far, I think you may have identified:

"Normal" Search (already us?)
Mobile Web Search
Mobile App Search (hopefully not device-specific)

Are there others?

Those are the three, yep. I think Sam Smith is wrong on #2, someone(?) on apps #3.

We had a meeting with most if not all of the relevant people. The schemas all share a common ancestor, so are more alike than different. iOS and Android are going to share the same schema. Here are some details from the meeting notes:

Android

Purpose: To track how effective our UI changes were at improving search quality, and how many people are using search
URL for search schema - https://meta.wikimedia.org/wiki/Schema:MobileWikiAppSearch (version: 10641988)
Sampling rate - 1%
Path to source code: wikipedia/src/main/java/org/wikipedia/analytics/SearchFunnel.java
Status: In production since January 2015

iOS

Purpose: Ditto Android
URL for search schema: https://meta.wikimedia.org/wiki/Schema:MobileWikiAppSearch (Schema Version: 10641988)
Sampling rate: 1%
Path to source code: https://github.com/wikimedia/apps-ios-wikipedia/blob/master/Wikipedia/WMFSearchFunnel.m
Phab: T90257

Desktop web

Purpose: Effectiveness of interaction with top-right search "suggest" field
URL for search schema: https://meta.wikimedia.org/wiki/Schema:Search (Schema version: 11670541)
Sampling rate: 0.1%
Phab: T90518
Path to source code: https://github.com/wikimedia/mediawiki/blob/master/resources/src/mediawiki/mediawiki.searchSuggest.js
https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/master/resources/loggingSchema/search.js
https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/master/includes/Hooks.php

Mobile web

Purpose: To track how effective our UI changes were at improving search interaction
Sampling rate: Was 100%!!! Now 0.1% (currently turned off)
URL: https://meta.wikimedia.org/wiki/Schema:MobileWebSearch
Path to source code: https://github.com/wikimedia/mediawiki-extensions-MobileFrontend/blob/master/resources/mobile.search/MobileWebSearchLogger.js
Phab: T96326 and T99788

Deskana claimed this task.

I think we've decided that having multiple schemas is acceptable. I'm going to mark this task as resolved.

Would it make sense to edit the schema pages to have the common fields in the same sequence, at the top? That way it would be easier to confirm the commonality. If rearranging a page is harmless, I could do it.