Page MenuHomePhabricator

Spike. Load search data into turnilo to test whether exploratory data can do away with some of the dashboards
Closed, ResolvedPublic

Description

Looks like data in serach comes from two sources:

wmf_raw_.cirrussearchrequest:

https://github.com/wikimedia/wikimedia-discovery-golden/blob/master/modules/metrics/search/cirrus_aggregates.R

TestSearchSatisfaction eventlogging schema: https://meta.wikimedia.org/wiki/Schema:TestSearchSatisfaction

The schema data is going into MySQL. The team has another schema (different name, same data) that is used to harvest data for AB test that is persisted only to hadoop so they can send a bigger volume of events:

https://grafana.wikimedia.org/d/000000018/eventlogging-schema?orgId=1&var-schema=SearchSatisfaction

Event Timeline

Nuria created this task.Feb 13 2019, 6:58 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 13 2019, 6:58 PM
fdans assigned this task to Nuria.Feb 18 2019, 4:40 PM
fdans triaged this task as High priority.
fdans added a project: Analytics-Kanban.
fdans moved this task from Incoming to Smart Tools for Better Data on the Analytics board.
Nuria moved this task from Next Up to Paused on the Analytics-Kanban board.Apr 2 2019, 4:09 PM
Gehel added a subscriber: Gehel.Jul 10 2019, 3:10 PM

Any news on this? Can we do something to help this move forward?

Gehel added a comment.Jul 10 2019, 3:16 PM

This will not move forward until Q2 (October). We'll talk about it again at that time.

@EBernhardson will work on this along with the DYM work, probably before Q2 :)

Evaluated with respect to our did you mean suggestions and this seems like a plausible path forward. Since this was only a spike will create a separate task to put together the appropriate oozie workflows to generate the data in an ongoing basis.

debt closed this task as Resolved.Jul 30 2019, 5:42 PM