Document search dashboard data & chart trade-offs
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	mpopov
	Jan 30 2023, 3:58 PM

Description

Dashboards created as part of T279105:

Search user engagement https://superset.wikimedia.org/superset/dashboard/265/
Search user engagement - geolocations https://superset.wikimedia.org/superset/dashboard/303/
Search user engagement - Emerging Languages https://superset.wikimedia.org/superset/dashboard/290/
Search Metrics https://superset.wikimedia.org/superset/dashboard/369/

Those dashboards frequently time out as a result of some charts using Hive tables imported as physical datasets and some of the charts using Presto to create virtual datasets. There are some potential solutions for optimizing those dashboards (e.g. SQL templating enabled in T312134, ingesting the data into Druid, aggregating some of the raw data), but all of them come with some cost (time & effort, loss of flexibility).

When those dashboards were created initially, a decision was made to make them more flexible in the long-term (e.g. if dwell time thresholds change, if list of countries in emerging markets change), at the expense of delegating computation to chart generation time. It would be helpful to have these trade-offs documented so that we can work with the primary stakeholders (PdM & EM for Search Platform) to decide next steps for optimizing those dashboards and making them more stable.

Example:

Chart/metric	Current configuration	Cost	Benefit	Alternative(s)
Dwell time	Presto query runs on raw data to generate chart on-demand	Query may time out due to volume of data	If threshold (10s) changes, can just modify underlying query	(1) Pre-compute with a set threshold; (2) Use SQL templating & data ranges to restrict volume of data Presto has to process

Related Objects

Mentioned Here: T279105: Create/revive Search Platform team metrics dashboard
T312134: Request for SQL Templating to be enabled in Superset

Event Timeline

mpopov created this task.Jan 30 2023, 3:58 PM

mpopov triaged this task as Medium priority.Jan 30 2023, 4:01 PM

mpopov updated the task description. (Show Details)

mpopov edited projects, added Product-Analytics (Kanban); removed Product-Analytics.Jan 31 2023, 6:15 PM

Dashboards and Metrics	Current configuration	Cost	Benefit	Alternative(s)
Dwell time related metrics on search engagement dashboards	Presto query runs on pre-aggregated data to generate chart on-demand, in the query we use (10s) as a threshold for dwell time	Query may time out due to volume of data	If threshold (10s) changes, can just modify underlying query	(1) Pre-compute with a set threshold; (2) Use SQL templating & data ranges to restrict volume of data Presto has to process
Search engagement dashboards of geolocation and emerging market	Presto query runs on raw data to generate chart on-demand. The query joins to another canonical table for countries and markets.	Query may time out due to volume of data	If the list of emerging market changes, can just modify underlying canonical table	(1) Remove the join, and show all countries on the dashboard; (2) Use SQL templating & data ranges to restrict the volume of data Presto has to process

mpopov moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.Mar 9 2023, 7:48 PM

cchen closed this task as Resolved.May 24 2023, 4:42 PM

Document search dashboard data & chart trade-offsClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

Document search dashboard data & chart trade-offs
Closed, ResolvedPublic
Actions