Dashboards created as part of T279105:
- Search user engagement https://superset.wikimedia.org/superset/dashboard/265/
- Search user engagement - geolocations https://superset.wikimedia.org/superset/dashboard/303/
- Search user engagement - Emerging Languages https://superset.wikimedia.org/superset/dashboard/290/
- Search Metrics https://superset.wikimedia.org/superset/dashboard/369/
Those dashboards frequently time out as a result of some charts using Hive tables imported as physical datasets and some of the charts using Presto to create virtual datasets. There are some potential solutions for optimizing those dashboards (e.g. SQL templating enabled in T312134, ingesting the data into Druid, aggregating some of the raw data), but all of them come with some cost (time & effort, loss of flexibility).
When those dashboards were created initially, a decision was made to make them more flexible in the long-term (e.g. if dwell time thresholds change, if list of countries in emerging markets change), at the expense of delegating computation to chart generation time. It would be helpful to have these trade-offs documented so that we can work with the primary stakeholders (PdM & EM for Search Platform) to decide next steps for optimizing those dashboards and making them more stable.
Example:
Chart/metric | Current configuration | Cost | Benefit | Alternative(s) |
---|---|---|---|---|
Dwell time | Presto query runs on raw data to generate chart on-demand | Query may time out due to volume of data | If threshold (10s) changes, can just modify underlying query | (1) Pre-compute with a set threshold; (2) Use SQL templating & data ranges to restrict volume of data Presto has to process |