Page MenuHomePhabricator

Document search dashboard data & chart trade-offs
Closed, ResolvedPublic

Description

Dashboards created as part of T279105:

Those dashboards frequently time out as a result of some charts using Hive tables imported as physical datasets and some of the charts using Presto to create virtual datasets. There are some potential solutions for optimizing those dashboards (e.g. SQL templating enabled in T312134, ingesting the data into Druid, aggregating some of the raw data), but all of them come with some cost (time & effort, loss of flexibility).

When those dashboards were created initially, a decision was made to make them more flexible in the long-term (e.g. if dwell time thresholds change, if list of countries in emerging markets change), at the expense of delegating computation to chart generation time. It would be helpful to have these trade-offs documented so that we can work with the primary stakeholders (PdM & EM for Search Platform) to decide next steps for optimizing those dashboards and making them more stable.

Example:

Chart/metricCurrent configurationCostBenefitAlternative(s)
Dwell timePresto query runs on raw data to generate chart on-demandQuery may time out due to volume of dataIf threshold (10s) changes, can just modify underlying query(1) Pre-compute with a set threshold; (2) Use SQL templating & data ranges to restrict volume of data Presto has to process

Event Timeline

mpopov triaged this task as Medium priority.Jan 30 2023, 4:01 PM
mpopov updated the task description. (Show Details)
Dashboards and MetricsCurrent configurationCostBenefitAlternative(s)
Dwell time related metrics on search engagement dashboardsPresto query runs on pre-aggregated data to generate chart on-demand, in the query we use (10s) as a threshold for dwell timeQuery may time out due to volume of dataIf threshold (10s) changes, can just modify underlying query(1) Pre-compute with a set threshold; (2) Use SQL templating & data ranges to restrict volume of data Presto has to process
Search engagement dashboards of geolocation and emerging marketPresto query runs on raw data to generate chart on-demand. The query joins to another canonical table for countries and markets.Query may time out due to volume of dataIf the list of emerging market changes, can just modify underlying canonical table(1) Remove the join, and show all countries on the dashboard; (2) Use SQL templating & data ranges to restrict the volume of data Presto has to process