As part of the work on T258183, we need to specify what exactly we will be measuring.
For now it seems like [[ https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas#Creating_a_new_schema | developing a new schema ]] is the best way to move forward. I've followed the guidelines for the new Event Platform and created a basic media-search specific schema that used the [[ https://docs.google.com/document/d/1djZm--cRwzq52sO8IW2KSyk9sbWD9yz8XssSqRxruQs/edit#heading=h.xsfavbt0qjbe | Media Search Measurement Specifications ]] document as a starting point. This means either [[ https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas#Creating_a_new_schema | developing a new EventPlatform schema ]] or extending/adapting an existing one like [[ https://gerrit.wikimedia.org/r/plugins/gitiles/schemas/event/secondary/+/master/jsonschema/analytics/legacy/searchsatisfaction/current.yaml | SearchSatisfaction ]]For now, the goal is to capture more high-level user actions as opposed to recording every click, mouse movement, etc.
Earlier @nettrom_WMF and I discussed potentially re-using the SearchSatisfaction schema here; the more I think about this, the less sure I am that this is how we ought to proceed. I think that re-using the SearchSatisfaction schema has both pros and cons worth considering:**Required event properties**
- The schema already exists
- Ideally this would make it easier to compare metrics between the traditional **Special:Search** page and the new **Special:MediaSearch** page on Commons.Our schema describes the event object that is recorded whenever we log something. This object has various properties. Currently every event will include the following properties:
Cons: - `session_id`: generated using `mw.user.generateRandomSessionId` when the page JS loads. Multiple searches performed in the same browser tab by the same user in a single sitting can be associated using this field. If the user does a hard-refresh of the page or opens a new tab later, they will get a new session ID.
- `skin`: Every event will include a "skin" property with a value of "vector", "minerva", etc.
- SearchSatisfaction is a legacy schema - `language_code`: Commons only has one content language, but we would probably like to know if the user has specified an interface language. Every event will include the first item in the user's language fallback chain as a code ("de", "es", and there is a desire to start using the new platform in earnestetc).
- Even if we do adapt - `action`: Like the existing schema to MediaSearchSearchSatisfaction schema, there are several features in the new UI that don't have any equivalent in the default search experience: tabs based on media type,our schema will assign an "action" property to every event representing what the user was actually doing on the page. the QuickviewThe action must belong to a pre-defined list, etcand additional event properties may be included based on the action. Eventually we probably will want some kind of schema to represent user interactions with these featuresThe actions we currently support are listed below.
Given the last point, I'm starting to think that even if we are able to extend or inherit from SearchSatisfaction for some things, we will need explicit definitions of some media-specific "actions". I'm using the [[ https://docs.google.com/document/d/1djZm--cRwzq52sO8IW2KSyk9sbWD9yz8XssSqRxruQs/edit#heading=h.xsfavbt0qjbe | Media Search Measurement Specifications ]] document as a starting point, but I'd like to try to go into a little more detail here. **Feedback on these would be greatly appreciated** – for one thing, I'd like to know if I'm thinking about these "actions" in too high-level of a way: is it better to log things at a very low level in terms of clicks, mouse movements, etc. to provide as much data as possible – especially when things go wrong? Or is this just going to kill our signal-to-noise ratio in the data?**Current draft schema action types:**
| Action (high-level) | Descrip| Description | Addition | Pal properties |
| --- | --- | --- |
| S| `search_new` | User performs a new search for a given query | session ID, query, hits returned, media typetotal result count, active filters |media type |
| Load more | User scrolls or clicks within an existing set of results to continue a search without changing the query | session ID, query`search_load_more` | User "continues" an existing search, hits returnedeither within the same tab (by scrolling down the page) or on a new tab | query, media typetotal result count, active filters |media type |
| C| `search_clear` | User clicks the "X" button and clears the query along with all results in all tabs | session ID |
| Tab | `tab_change` | User changes tabs (which correspond to file types) without changing the search query or clearing existing results | session ID, query, hits returned, media typetotal result count, active filters |media type |
| Filter change| `result_click` | User updates a filter valueclicks on a result within a given tab,. forcing retrieval of a new set of results within that tab | session IDThis may or may not trigger a quickview, query,depending on the media type. hits returned| result pageid, media type, position (one-dimensional index value), active filters |whether or not quickview will be shown |
| Result click | User clicks on a result within a given tab. Exactly what happens depends on other factors (whether the tab allows quickviews, whether they explicitly ctrl-click, etc) | session ID, position (one-dimensional index starting at zero), pageID of result, click type?, media type`quickview_hide` | User dismisses quickview using the "X" button or keyboard | |
| Display Quickview | Quickview panel is opened to display preview of a specific result within a given tab | session ID, pageID of result`quickview_more_details_click` | User clicks the "more details" button, media type |
| Quickview media play | User starts/stops playback of an audio/video quickview element; we probably also want to measure how long media is played for somehow | session ID,actually navigating to the result page | pageid of result |
**Still to do**
TBD |- Actions representing user interactions with the Quickview audio/video player
| Quickview link click | User clicks a link inside quicview (could be from description wikitext, link to user who uploaded, etc) | session ID, TBD |- Actions representing user interactions with the "concept chips" elements
| Quickview snippet copy | User copies a snippet of wikitext for a quickview file to their clipboard through use of a dedicated button | session ID - Action representing user interactions with the search filter settings (this starts a new search under the hood, TBD |do we want to log it as such?)
| Hide quickview | User dismisses quickview using the "X" button or keyboard | session ID, TBD |- A property representing whether the user entered their search term through typing or through selecting an auto-complete suggestion?
| Concept chip click | User clicks a search suggestion/concept chip element | session ID, TBD | - A "check-in" action that is automatically logged every X amount of time (how long?)
Some of these will need to wait until the relevant feature is ready for instrumentation; what should be considered the minimum acceptance criteria for this particular task? We can incrementally add things to the schema in the future by updating it to a new version in a backwards-compatible way.