Page MenuHomePhabricator

[SessionLength] Allow data consumers to interact with data related to Session Length
Open, MediumPublic

Description

Context

To ensure users can query, view and analyze data related to Session Length, this task is scoped to delivering the data set and Superset dashboard to complete the end to end experience. In starting this task, it assumes the work needed for creating and deploying the instrument that tracks users' session data has been completed.

Acceptance Criteria
  • As a technical data consumer, I want to be able to run SQL queries on a database so that I can analyze data related to Session Length
  • As a data consumer, I want to be able to open superset and query session-related data so that I can compose dashboards and reports to view and better understand how session length relates to my other products
Dashboards

Sample dashboard with test dataset
Dashboard with production dataset

Event Timeline

Note for triage meeting on Tuesday, Nov 10th: Seve has asked for someone in PA to assist in this task

LGoto triaged this task as Medium priority.
LGoto moved this task from Triage to Needs Investigation on the Product-Analytics board.

@sdkim our work on this is going to be in Q3; I'm assigning it to myself for now because I need to look at workloads going into Q3

In Superset data table session_length:

  • Add new columns project_family and session_length_bucket
  • Add pre-defined metrics for quantiles

Link to sample dashboard: https://superset.wikimedia.org/r/451


NOTES:

  • Since we've include external links like qtm.100ke, gproxx, etc. Maybe we could consider adding project_family (database group) during aggregation instead of querying this in Superset.
  • Currently, we are not sampling the data collection (rate = 1/1). If we decide to change the sampling rate, what's the reliable way to calculate the "count of sessions"? Or should we exclude this metric?

Switch the data source for dashboard to session_length_daily table.

Per discussion with Marcel

  • we will do the calculation in Superset for "count of sessions" metrics as an estimation based on the sampling rate. session_count metric was added with current sampling rate = 1/100.
  • in session_length_daily dataset, the external projects are cleanup. So we will query in Superset to get project_family column.

The sample rate for dashboard was updated to 1/10.

Link to ashboard with production dataset (wmf.session_length_daily): https://superset.wikimedia.org/r/495.

Per conversation with Marcel, we will keep test dataset (mforns.session_length_daily) and sample dashboard for now. And delete both after we have a couple weeks of production data.

cchen closed this task as Resolved.EditedMar 15 2021, 9:29 PM

Reviewed dashboard with Kate.

NOTES:

Comparing session count in the dashboard with unique devices count on Wiki Stats2 for March 10th:
Unique devices count for en.wiki on Wikistats is ~ 73M
Session count for en.wiki on session dashboard is ~48.9M

This might be related to a delay in session tick 0 from the instrumentation.

cchen added a subscriber: mforns.

Reopen this for the optimized intermediate session length data set mentioned in T277512.

@mforns created a testing dashboard (https://superset.wikimedia.org/r/498) with the optimized intermediate table, which reduces the load time from the production dashboard. (Thank you Marcel !!!)

Next step

  • Modify wmf.session_length_daily datasource in Superset with new fields and calculation after updates.
  • QA dashboard again after backfill.

Session length dashboard with optimized production data: https://superset.wikimedia.org/r/502