Develop a new schema for MediaSearch analytics or adapt an existing one
Closed, ResolvedPublic
Actions

Description

As part of the work on T258183, we need to specify what exactly we will be measuring.

For now it seems like developing a new schema is the best way to move forward. I've followed the guidelines for the new Event Platform and created a basic media-search specific schema that used the Media Search Measurement Specifications document as a starting point. For now, the goal is to capture more high-level user actions as opposed to recording every click, mouse movement, etc.

Required event properties

Our schema describes the event object that is recorded whenever we log something. This object has various properties. Currently every event will include the following properties:

session_id: generated using mw.user.generateRandomSessionId when the page JS loads. Multiple searches performed in the same browser tab by the same user in a single sitting can be associated using this field. If the user does a hard-refresh of the page or opens a new tab later, they will get a new session ID.
skin: Every event will include a "skin" property with a value of "vector", "minerva", etc.
language_code: Commons only has one content language, but we would probably like to know if the user has specified an interface language. Every event will include the first item in the user's language fallback chain as a code ("de", "es", etc).
Basic event metadata (datetime and timezone info, the schema to be used, etc) is always included
action: Like the SearchSatisfaction schema, our schema will assign an "action" property to every event representing what the user was actually doing on the page. The action must belong to a pre-defined list, and additional event properties may be included based on the action. The actions we currently support are listed below.

Current draft schema action types:

Action	Description	Additional properties
`search_new`	User performs a new search for a given query	query, total result count, media type
`search_load_more`	User "continues" an existing search, either within the same tab (by scrolling down the page) or on a new tab	query, total result count, media type
`search_clear`	User clicks the "X" button and clears the query along with all results in all tabs
`tab_change`	User changes tabs (which correspond to file types) without changing the search query or clearing existing results	query, total result count, media type
`result_click`	User clicks on a result within a given tab. This may or may not trigger a quickview, depending on the media type.	result pageid, media type, position (one-dimensional index value), whether or not quickview will be shown
`quickview_hide`	User dismisses quickview using the "X" button or keyboard
`quickview_more_details_click`	User clicks the "more details" button, actually navigating to the result page	pageid of result

Still to do

Actions representing user interactions with the Quickview audio/video player
Action representing use of the upcoming "copy text" button in the Quickview
Actions representing user interactions with the "concept chips" elements
Action representing user interactions with the search filter settings (this starts a new search under the hood, do we want to log it as such?)
A property representing whether the user entered their search term through typing or through selecting an auto-complete suggestion?
A "check-in" action that is automatically logged every X amount of time (how long?)

Some of these will need to wait until the relevant feature is ready for instrumentation; what should be considered the minimum acceptance criteria for this particular task? We can incrementally add things to the schema in the future by updating it to a new version in a backwards-compatible way.

Details

	Subject	Repo	Branch	Lines +/-
	Adds schema for Special:MediaSearch page	schemas/event/secondary	master	+454 -10
	Basic event schema for Special:MediaSearch analytics	schemas/event/secondary	master	+450 -1

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	nettrom_WMF	T258229 Build dashboards for search activity on MediaSearch on Commons
Resolved	egardner	T258183 [L] Instrument MediaSearch results page
Resolved	CBogen	T263875 Develop a new schema for MediaSearch analytics or adapt an existing one

Event Timeline

egardner created this task.Sep 25 2020, 6:34 PM

Don't forget tracking our upcoming Copy buttons (to copy the filename or wikitext), which I assume will nest under "Quickview interactions" 🙂

egardner updated the task description. (Show Details)Sep 25 2020, 8:17 PM

I'd like to know if I'm thinking about these "actions" in too high-level of a way: is it better to log things at a very low level in terms of clicks, mouse movements, etc. to provide as much data as possible – especially when things go wrong? Or is this just going to kill our signal-to-noise ratio in the data?

From my experience building and working with SearchSatisfaction, were I to do it again I would probably try to keep the logged events as close to the things I want to measure as possible. The more complex backend analysis we had to do, the less confident we were in the final results being correct. There is of course the possibility to extract unexpected information from granular data, but I think the confidence and reduced complexity given by logging events that map as directly as possible to the metrics you want to measure outweights the additional possibilities.

I suppose the question that must be answered before we can go further with the events is, what do you want to measure? What metrics are the output of processing these events? Can look through prior search reports to get an idea of the metrics we've used in the past.

+1 to @EBernhardson 's comment

I think the core question is to measure the impact you are after. In what way the new media search features are adding value? From the ticket we can see there is some new functionality but I am not clear on the value provided to readers.
once the value (or the impact we want to make) is defined we would want to come up with metrics (that might be proxys) for this value added.

As a general rule (and I think you alluded to this with your signal to ratio comment earlier) measuring everything and later trying to find insights from a ton of data points is not easily done or advisable.

we want to measure the following:

when a user clicks to go to QuickView (this will let us know how often they're clicking through to a results to give a sense of whether they're finding what they're looking for)
when a user clicks through from QuickView to a detailed view of an image, audio, or video file (same as the above)
when a user clicks through from the results page to a file/page without quickview (ie a category or a PDF file)
search session length (how long does it take to find what they're looking for?)
number of searches in a search session (same as above)
total number of search sessions (how often is search being used?)
when a user clicks on a concept chip (how often are concept chips being used?)
when a user copies a filename or wikitext snippet (how people are using the functionality to copy the filename or wikitext can give us a signal as to how often they're reusing the images in other wikimedia projects)

Thanks for the feedback, this is very helpful and confirms my suspicion that targeting higher-level user actions will be better (and simpler) for now.

I'm thinking that we want to define event objects that always have a session_id and an action property (in addition to the basic event metadata like $schema and date/time, etc).

The action field will be limited to these higher-level things for the moment (the contents of the table above, more or less), and other properties will depend on the type of action we're dealing with. This is similar to how SearchSatisfaction works but our actions will be somewhat different and I won't worry about lower-level things like clicks on non-result elements, scroll events, etc.

I've started drafting a schema along these lines and will link the patch back to this ticket once it's ready (early next week most likely).

Change 630630 had a related patch set uploaded (by Eric Gardner; owner: Eric Gardner):
[schemas/event/secondary@master] Basic event schema for Specail:MediaSearch analytics

https://gerrit.wikimedia.org/r/630630

gerritbot added a project: Patch-For-Review.Sep 28 2020, 3:50 PM

I just published a patch that adds a basic MediaSearch schema along these lines to the Secondary repository. I'll use this schema for local development and testing of analytics instrumentation, but please let me know if I've made any mistakes either in designing the schema itself, or in materializing it with the jsonschema-tools command-line tool.

CBogen moved this task from Incoming to Doing on the Structured-Data-Backlog (Current Work) board.Sep 28 2020, 4:02 PM

LGoto moved this task from Triage to Tracking on the Product-Analytics board.Sep 29 2020, 5:10 PM

One thing currently missing from the draft schema is a notion of "language" – I assume we will want to be able to segment and query data about search sessions by language at some point.

Since this schema is for use on Commons, I assume that we care about interface language as opposed to content language (the latter is just "English" for all pages on commons as far as I understand).

If we want to track the interface language that visitors to Special:MediaSearch are using, should we record the value returned from mw.language.getFallbackLanguageChain()? That would provide an array of language codes like [ "de", "en" ], etc. Would it be sufficient to just store the first code, or should we store all of them?

Is there a better way to store the user's interface language for the purposes of measuring usage?

If we want to track the interface language that visitors to Special:MediaSearch are using, should we record the value returned from mw.language.getFallbackLanguageChain()? That would provide an array of language codes like [ "de", "en" ], etc. Would it be sufficient to just store the first code, or should we store all of them?

Two thoughts:

I think the interface language for all logged out users will be english (did a quick test an accept-language doesn't seem to have any effect. There could certainly be other mechanisms (geo?) but i'm not familiar). I haven't looked too closely at the breakdowns for media, but in content search anonymous users are most of them. Still might be able to derive some information from the logged-in users but we must make sure we exclude logged out during analysis.

afaik the fallback chains are globally constant, meaning all users with their interface in de will have fallbacks of de, en. The fallbacks are explicitly about what languages we estimate the user might be able to speak, and are used to select appropriate i18n messages when they aren't available in the primary language. Mostly, i don't think there will be much value in logging the full chain vs logging the first element.

Two thoughts:

I think the interface language for all logged out users will be english (did a quick test an accept-language doesn't seem to have any effect). I haven't looked too closely at the breakdowns for media, but in content search anonymous users are most of them. Still might be able to derive some information from the logged-in users but we must make sure we exclude logged out during analysis.

It looks like logged-out users who specify an interface language with ?uselang= url params will have that set as their interface language. Commons also uses a non-standard language-switching tool (there is a dropdown in the left sidebar instead of the ULS widget at the top that most other wikis have). I'm not sure if there are other ways that users switch languages for Commons pages.

afaik the fallback chains are globally constant, meaning all users with their interface in de will have fallbacks of de, en. The fallbacks are explicitly about what languages we estimate the user might be able to speak, and are used to select appropriate i18n messages when they aren't available in the primary language. Mostly, i don't think there will be much value in logging the full chain vs logging the first element.

This makes sense. In this case maybe the first value of the language fallback chain should be recorded as a property for every action that we log (stored as a string language code).

egardner updated the task description. (Show Details)Sep 30 2020, 4:10 PM

Restricted Application added a subscriber: Masumrezarock100. · View Herald TranscriptSep 30 2020, 4:10 PM

egardner updated the task description. (Show Details)Sep 30 2020, 4:14 PM

egardner moved this task from Doing to Code Review on the Structured-Data-Backlog (Current Work) board.Sep 30 2020, 6:34 PM

This is awesome work so far! I've read through this task, its parent task, and the proposed patch and updated the measurement specification to reflect the set of questions mentioned by @CBogen in T263875#6495409. From what I can tell, the proposed schema allows us to answer our current set of questions.

Digging into things, I came out with a few thoughts and questions. Some of this is thinking about features/instrumentation that is further down the road, I think with those I'm mainly focusing on having a schema that allows us to incorporate that without issues if we find that it's necessary to instrument it.

The schema is using some concepts that fit into the proposed schema fragments (e.g. session identifiers, sequences, skins). I'm unsure what the state of fragments is, meaning I'm unsure whether it's something the proposed schema should incorporate or not. Maybe @mpopov has recommendations here?
Is there a plan to bring MediaSearch to other wikis in the future, or will it be a Commons-specific media search interface? If it becomes the way to search media on wikis, will search results be a combination of wiki-specific and Commons results? I'm asking because it affects what we think of as a "search result", and whether we need to be prepared to store wiki & page ID or not.
The measurement specification points to all interactions starting with a search being made, and not just by visiting the Special:MediaSearch page. At the moment I don't have any need to change that, but I wanted to flag it in case there's something we've missed.
What's logged if the user visits Special:MediaSearch then clicks on the "Audio" tab? Is that an interaction we need to capture or is it sufficient to capture that as a new search with that media type?
What do we log if the user clicks to change any of the result preferences (e.g. image size, file type)? Do those result in new searches, and do we in that case need to log what their settings were? From the perspective of the measurement specification I think these are just new searches, and it ignores the settings. But, I'm also expecting future questions about whether users modify those settings.
Do we need to distinguish between searches made on the mobile/desktop site? This is mentioned in the measurement specification as an open question, and given how often that kind of split comes up in analysis it's one I'm expecting to see again. I've worked with many schemas that have it as a specific field to make querying easier, but I'm a little unsure how this should work in the Event Platform era. I'm not against extracting this from e.g. a hostname field, but we're working on establishing new practices and having a neater way to do it would be welcome. Maybe @Ottomata has a quick answer to this?

Sorry for having a lot of questions! :)

Thanks @nettrom_WMF – I'll answer what I can below, but some of these questions are things I would like to know more about as well.

The schema is using some concepts that fit into the proposed schema fragments (e.g. session identifiers, sequences, skins). I'm unsure what the state of fragments is, meaning I'm unsure whether it's something the proposed schema should incorporate or not. Maybe @mpopov has recommendations here?

If there is a better/standard way to capture some of these things I'm happy to re-work the schema (but specific guidance would be helpful).

Is there a plan to bring MediaSearch to other wikis in the future, or will it be a Commons-specific media search interface? If it becomes the way to search media on wikis, will search results be a combination of wiki-specific and Commons results? I'm asking because it affects what we think of as a "search result", and whether we need to be prepared to store wiki & page ID or not.

@Ramsey-WMF or @CBogen might know more about this.

What's logged if the user visits Special:MediaSearch then clicks on the "Audio" tab? Is that an interaction we need to capture or is it sufficient to capture that as a new search with that media type?

Right now, if a user lands on the search page without entering a query, and then changes tabs to "Audio", an event will be logged that looks like this: action: tab_change; media_type: audio, query: '', total_result_count: 0. No search will actually begin until they submit a query.

What do we log if the user clicks to change any of the result preferences (e.g. image size, file type)? Do those result in new searches, and do we in that case need to log what their settings were? From the perspective of the measurement specification I think these are just new searches, and it ignores the settings. But, I'm also expecting future questions about whether users modify those settings.

Filter changes do result in new searches (that's all that will get logged currently). I agree that we need to keep track of this somehow. I think there are two ways: either we include information about currently active filters in every search event we log (similar to how media type is always included now), or we log an explicit filter_change event that includes whatever the updated settings are. We could also do both of these things.

Do we need to distinguish between searches made on the mobile/desktop site? This is mentioned in the measurement specification as an open question, and given how often that kind of split comes up in analysis it's one I'm expecting to see again. I've worked with many schemas that have it as a specific field to make querying easier, but I'm a little unsure how this should work in the Event Platform era. I'm not against extracting this from e.g. a hostname field, but we're working on establishing new practices and having a neater way to do it would be welcome. Maybe @Ottomata has a quick answer to this?

Currently every event includes a skin property, but I'm open to other ways of doing this.

Is there a plan to bring MediaSearch to other wikis in the future, or will it be a Commons-specific media search interface? If it becomes the way to search media on wikis, will search results be a combination of wiki-specific and Commons results? I'm asking because it affects what we think of as a "search result", and whether we need to be prepared to store wiki & page ID or not.

We haven't decided for sure to do this, but there's definitely potential that we might - so if that changes how we construct the schema, we should keep that in mind.

The measurement specification points to all interactions starting with a search being made, and not just by visiting the Special:MediaSearch page. At the moment I don't have any need to change that, but I wanted to flag it in case there's something we've missed.

We log pageviews in other ways, right? I can't think of any other reason why we'd want to log just visiting the page other than to count pageviews.

What do we log if the user clicks to change any of the result preferences (e.g. image size, file type)? Do those result in new searches, and do we in that case need to log what their settings were? From the perspective of the measurement specification I think these are just new searches, and it ignores the settings. But, I'm also expecting future questions about whether users modify those settings.

Filter changes do result in new searches (that's all that will get logged currently). I agree that we need to keep track of this somehow. I think there are two ways: either we include information about currently active filters in every search event we log (similar to how media type is always included now), or we log an explicit filter_change event that includes whatever the updated settings are. We could also do both of these things.

I think it would be great to log an explicit filter_change event that makes it simpler to know how often filters are being used without having to examine searches - but that's my layman's understanding of how analysis works. I have no idea if that's as helpful as it seems to me.

Do we need to distinguish between searches made on the mobile/desktop site?

Yes, I'd say it would be very helpful to do so, but from Eric's response it sounds like we already are doing this.

Regarding filter-change events, here's how I'm approaching this for now:

When the user changes the settings of a given filter within a given tab, a filter_change event will be logged. This event will include filter_type and filter_value properties to represent whatever the new filter settings are. Filters can only be changed one-at-a-time currently, and each change logs a separate event. The previous results are discarded and new ones are retreived; however, since the actual search term has not changed, the subsequent search request is logged as search_load_more rather than search_new.

{"action":"filter_change","media_type":"bitmap","filter_type":"mimeType","filter_value":"tiff","$schema":"/analytics/media_search/1.0.0","session_id":"04cc4794393ed31788ff","language_code":"en","skin":"vector","meta":{"stream":"analytics.media_search","dt":"2020-10-05T20:38:34.657Z"},"client_dt":"2020-10-05T20:38:33.578Z"}

When the user resets a given filter to it's initial value (say going from "File type: JPEG" back to "all file types"), the same filter_change event is logged. filter_type works the same as previously but the filter_value property gets a special value of UNSET to indicate that the filter was reset. Otherwise un-setting a filter works the same as setting one.

{"action":"filter_change","media_type":"bitmap","filter_type":"mimeType","filter_value":"UNSET","$schema":"/analytics/media_search/1.0.0","session_id":"04cc4794393ed31788ff","language_code":"en","skin":"vector","meta":{"stream":"analytics.media_search","dt":"2020-10-05T20:42:19.941Z"},"client_dt":"2020-10-05T20:42:18.930Z"}

Thoughts on this? Is there a better way to represent un-setting a filter than just hard-coding a specially-designated string?

In T263875#6510065, @egardner wrote:

If there is a better/standard way to capture some of these things I'm happy to re-work the schema (but specific guidance would be helpful).

As mentioned there's the idea of having schema fragments available to reference to standardize this. The schema guidelines mentions some of these in the frequently used fields section. There are also analytics-related fragments, which as far as I know is work-in-progress. I'll be meeting with @mpopov to learn more about where the latter is at and whether it's something we should be using.

What's logged if the user visits Special:MediaSearch then clicks on the "Audio" tab? Is that an interaction we need to capture or is it sufficient to capture that as a new search with that media type?

Right now, if a user lands on the search page without entering a query, and then changes tabs to "Audio", an event will be logged that looks like this: action: tab_change; media_type: audio, query: '', total_result_count: 0. No search will actually begin until they submit a query.

That's perfectly reasonable to me!

Do we need to distinguish between searches made on the mobile/desktop site? […]

Currently every event includes a skin property, but I'm open to other ways of doing this.

Because the Event Platform is new I don't think there's any standardized way of doing it, so I'm partly trying to establish whether we should keep old standards or make new ones. In the Growth team's EventLogging schemas we used a boolean field for it (is_mobile, see for example Schema:HomepageVisit). This makes aggregations split by desktop/mobile straightforward to do, but possibly at a small cost of duplicating information. It's also easy anyone querying the data to understand than having to map skins to platforms. Since it sounds like both @CBogen and I expect this to be useful, I'll be selfish and advocate for having is_mobile or something similar.

In T263875#6512665, @CBogen wrote:

The measurement specification points to all interactions starting with a search being made, and not just by visiting the Special:MediaSearch page. At the moment I don't have any need to change that, but I wanted to flag it in case there's something we've missed.

We log pageviews in other ways, right? I can't think of any other reason why we'd want to log just visiting the page other than to count pageviews.

Yes, pageviews are also logged through the webrequest log. One difference between these is that event logging might be blocked by an ad blocker, or the user might not have JavaScript support, while it still counts as a pageview. That makes calculating proportions difficult, however I don't see that we have any questions around measuring raw visits (they're instead about searches themselves). I've updated the measurement specification to not list "visiting the MediaSearch page" as an event in itself so this doesn't lead to confusion.

In T263875#6519271, @egardner wrote:

Regarding filter-change events, here's how I'm approaching this for now:

When the user changes the settings of a given filter within a given tab, a filter_change event will be logged. This event will include filter_type and filter_value properties to represent whatever the new filter settings are. Filters can only be changed one-at-a-time currently, and each change logs a separate event. The previous results are discarded and new ones are retreived; however, since the actual search term has not changed, the subsequent search request is logged as search_load_more rather than search_new.

This makes sense to me and should work great. I tested on MediaSearch and noticed that as soon as I change a filter it resets and searches again, so having this reflected in the instrumentation (a filter_change event followed by a search_load_more) works well.

When the user resets a given filter to it's initial value (say going from "File type: JPEG" back to "all file types"), the same filter_change event is logged. filter_type works the same as previously but the filter_value property gets a special value of UNSET to indicate that the filter was reset. Otherwise un-setting a filter works the same as setting one.

Thoughts on this? Is there a better way to represent un-setting a filter than just hard-coding a specially-designated string?

Hmm, I could see having a special value like "UNSET" making sense, or it could be an empty string. Let me poll the Product Analytics team to see if there's a convention for this and report back.

Is there a plan to bring MediaSearch to other wikis in the future, or will it be a Commons-specific media search interface? If it becomes the way to search media on wikis, will search results be a combination of wiki-specific and Commons results? I'm asking because it affects what we think of as a "search result", and whether we need to be prepared to store wiki & page ID or not.

The short answer: we don't know yet. It has been discussed, but we don't have enough data to justify moving in that direction for sure. It's a possibility if we see a lot of success on Commons and/or VisualEditor image searches, which are the only places at the moment we can be 100% sure will get MediaSearch.

Is there a better way to represent un-setting a filter than just hard-coding a specially-designated string?

Hmm, I could see having a special value like "UNSET" making sense, or it could be an empty string. Let me poll the Product Analytics team to see if there's a convention for this and report back.

Quick update here: I've updated the schema to include a prior_state property that will be used in filter change events, per the suggestion here: https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#Modeling_state_changes (thanks @Ottomata).

That will yield data like this when the user changes from the default setting in a given filter:

{"action":"filter_change","media_type":"bitmap","filter_type":"mimeType","filter_value":"tiff","prior_state":{"filter_type":"mimeType","filter_value":""},"$schema":"/analytics/media_search/1.0.0","session_id":"94dbf88478655892ace6","language_code":"en","skin":"vector","meta":{"stream":"analytics.media_search","id":"8ad612b6-9256-4b5b-b2c7-d177d5b8befa","dt":"2020-10-14T22:23:30.557Z","request_id":"e378b3c0-0e6b-11eb-9236-1540161f5229"},"client_dt":"2020-10-14T22:23:29.547Z","http":{"client_ip":"127.0.0.1","request_headers":{"user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:81.0) Gecko/20100101 Firefox/81.0"}}}

And like this when the same filter gets re-set to the initial value:

{"action":"filter_change","media_type":"bitmap","filter_type":"mimeType","filter_value":"tiff","prior_state":{"filter_type":"mimeType","filter_value":""},"$schema":"/analytics/media_search/1.0.0","session_id":"94dbf88478655892ace6","language_code":"en","skin":"vector","meta":{"stream":"analytics.media_search","id":"8ad612b6-9256-4b5b-b2c7-d177d5b8befa","dt":"2020-10-14T22:23:30.557Z","request_id":"e378b3c0-0e6b-11eb-9236-1540161f5229"},"client_dt":"2020-10-14T22:23:29.547Z","http":{"client_ip":"127.0.0.1","request_headers":{"user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:81.0) Gecko/20100101 Firefox/81.0"}}}

An empty string for filter_value represents the default value in both places.

• razzi edited projects, added Analytics-Radar; removed Analytics.Oct 15 2020, 3:52 PM

@egardner : Thanks for the updates and work so far. Thanks also for your patience while I work on getting feedback to you on this, I met with @mpopov last week and discussed a lot of things around this schema and should've relayed information to you sooner, sorry!

For session_id, are we reusing the same definition of it and timeout for it as SearchSatisfaction does? The SearchSatisfaction schema mentions it has a 10 minute timeout. If we are, then we should change the name of the field to search_session_id (or mediasearch_session_id) to distinguish it from other session identifiers (e.g. the app or web sessions in the Analytics fragments.

For skin, renaming it to mw_skin enables us to distinguish it from other skins such as the app skin. Also, grab the documentation for this field from SearchSatisfaction, having the reference to the code call is useful.

When it comes to the language code there was some good discussion above of what that means. The Event Platform makes it easy to have more verbose documentation of what fields are (i.e. paragraphs), so it would be great to have that!

A couple of other schemas that have good documentation are session_tick and ios_edit_history_compare.

One thing to note is that until the schema is deployed, making large and significant changes to it is useful. Mikhail mentioned this patch of the session_tick schema as an example of that happening. Once it's deployed and events have started streaming in, we're limited to only making backwards-compatible changes.

A couple of open questions:

Are we looking to join the MediaSearch schema with anything else? As far as I know, the answer is "no", but someone should let me know if I've missed something.
I'm concerned about action-specific values being added as fields. For example, we have query and total_result_count, which I think only applies to action = "search" events, and page_id and position for action = "result_click" events. This leads to sparse data and potentially a lot of fields/columns (the EditAttemptStep schema is the classic example of how not to do this). I think we should consider moving to the complex map value that's described here, meaning we'd have action and action_data (or action_parameters, I don't have strong opinions on its naming) with the latter as the map type. That gives us more flexibility in adding values/parameters as we add instrumentation for features. Thoughts?

Also, I think storing previous and current state of the filters is a great way to do it! Perhaps particularly if we switch to a map type for storing additional action parameters/values. The only other alternative I was going to suggest was having a combination of value and is_default fields (similar to how PrefUpdate does it), where is_default is true if the value is set back to whatever the default is, and false otherwise. Looking at it again, I think storing the previous and current state is a better option.

CBogen moved this task from MediaSearch-Beta to MediaSearch-ReleaseCandidate on the SDAW-MediaSearch board.Oct 23 2020, 6:46 PM

CBogen edited projects, added SDAW-MediaSearch (MediaSearch-ReleaseCandidate); removed SDAW-MediaSearch (MediaSearch-Beta).

egardner moved this task from Code Review to Blocked on the Structured-Data-Backlog (Current Work) board.Oct 28 2020, 4:41 PM

PI is taking on designing and providing this schema for @egardner to instrument on top of

• sdkim added a project: Product-Data-Infrastructure.Nov 19 2020, 8:44 PM

• sdkim moved this task from Inbox to Next on the Product-Data-Infrastructure board.

Change 646846 had a related patch set uploaded (by Eric Gardner; owner: Eric Gardner):
[schemas/event/secondary@master] Add basic schema for Special:MediaSearch analytics

https://gerrit.wikimedia.org/r/646846

Change 630630 abandoned by Eric Gardner:
[schemas/event/secondary@master] Basic event schema for Special:MediaSearch analytics

Reason:
Abandoning in favor of this patch: https://gerrit.wikimedia.org/r/c/schemas/event/secondary/ /646846

https://gerrit.wikimedia.org/r/630630

Just a quick update on this in case anyone is wondering.

@jlinehan and I met at the beginning of this week to talk through this schema as well as the instrumentation patch that depends on it (which introduces a simple Vue plugin that acts as a wrapper around mw.eventLog).

It sounds like some small updates may be needed here (using v2.0 of the common fragment definition, potentially changing the way that we keep track of things like session IDs and date/time stamps), but my understanding is that most of the work is done. I'm hoping that we can finalize both this patch as well as the instrumentation patch (which should similarly only need some minor adjustments) in order to get everything on the final train that goes out next week.

Change 646846 merged by Mholloway:
[schemas/event/secondary@master] Adds schema for Special:MediaSearch page

https://gerrit.wikimedia.org/r/646846

egardner moved this task from Blocked to Verify on Production on the Structured-Data-Backlog (Current Work) board.Dec 12 2020, 12:35 AM

• jlinehan added a project: Better Use Of Data.Feb 8 2021, 4:38 PM

• jlinehan moved this task from Inbox to Sign-off on the Better Use Of Data board.Feb 8 2021, 6:40 PM

@CBogen can you verify with your team that this is done and can be closed? If so, please change the status to resolved. Thank you!

In T263875#6819431, @kzimmerman wrote:

@CBogen can you verify with your team that this is done and can be closed? If so, please change the status to resolved. Thank you!

@nettrom_WMF has confirmed that the data is coming in, but I can't see it because I don't have access. Morten was going to file a ticket to get me that access - once it comes through and I can confirm, I will close the ticket.

Thanks @CBogen !

Confirmed that I can see the data now. Thanks all!

Develop a new schema for MediaSearch analytics or adapt an existing oneClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Develop a new schema for MediaSearch analytics or adapt an existing one
Closed, ResolvedPublic
Actions

Related Objects
Search...