Page MenuHomePhabricator

Triage Superset Dashboard Timeouts
Closed, ResolvedPublic

Description

Based on the spreadsheet provided by product analytics, this task involves running the specific dashboards in Superset and finding out which SQL statements time out (exceeded the Superset designated timeout), based on the Presto query table.

https://docs.google.com/spreadsheets/d/1YqYDEDNXpBPMjG_WUoNB8iHF9rO4yu1kJMHddIatn58/edit?usp=sharing,

Event Timeline

odimitrijevic triaged this task as High priority.
odimitrijevic moved this task from Incoming (new tickets) to Visualize on the Data-Engineering board.

The only dashboard that times out consistently is the IP Masking Dashboard reported by @Iflorez. The other 2 charts (editors metrics and edit topics dashboard) load when the Presto cluster has no other queries running; obviously ideally even when the cluster is under load it would still load all dashboards within the timeout, but those dashboards themselves are fine.

Even the individual charts on the IP Masking Dashboard load fine by themselves:

My recommendations:

  • Put fewer charts on a single dashboard (IP Masking has 25, I'd recommend no more than a page full of charts, or ~6 charts. Dashboards can link to each other, and pages on wikitech or elsewhere can serve as collections of charts)
  • Add default filters to charts, if possible; many of the charts load less data if the user uses the accompanying filter component.
  • Pre-aggregate data to make annual charts with month granularity much faster than crunching a year of hourly or whatever data on each load
  • Create reports instead of dashboards if the query takes more than a few seconds and doesn't need to be updated frequently

I'm going to go ahead and call this done; new timeouts can create new tickets.