Page MenuHomePhabricator

Spike. Load search data into turnilo to test whether exploratory data can do away with some of the dashboards
Closed, ResolvedPublic

Description

Looks like data in serach comes from two sources:

wmf_raw_.cirrussearchrequest:

https://github.com/wikimedia/wikimedia-discovery-golden/blob/master/modules/metrics/search/cirrus_aggregates.R

TestSearchSatisfaction eventlogging schema: https://meta.wikimedia.org/wiki/Schema:TestSearchSatisfaction

The schema data is going into MySQL. The team has another schema (different name, same data) that is used to harvest data for AB test that is persisted only to hadoop so they can send a bigger volume of events:

https://grafana.wikimedia.org/d/000000018/eventlogging-schema?orgId=1&var-schema=SearchSatisfaction

Event Timeline

fdans triaged this task as High priority.
fdans added a project: Analytics-Kanban.
fdans moved this task from Incoming to Smart Tools for Better Data on the Analytics board.

Any news on this? Can we do something to help this move forward?

This will not move forward until Q2 (October). We'll talk about it again at that time.

@EBernhardson will work on this along with the DYM work, probably before Q2 :)

Evaluated with respect to our did you mean suggestions and this seems like a plausible path forward. Since this was only a spike will create a separate task to put together the appropriate oozie workflows to generate the data in an ongoing basis.