Triage Superset Dashboard Timeouts
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	odimitrijevic
	Nov 1 2021, 2:54 PM

Description

Based on the spreadsheet provided by product analytics, this task involves running the specific dashboards in Superset and finding out which SQL statements time out (exceeded the Superset designated timeout), based on the Presto query table.

https://docs.google.com/spreadsheets/d/1YqYDEDNXpBPMjG_WUoNB8iHF9rO4yu1kJMHddIatn58/edit?usp=sharing,

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		odimitrijevic	T294259 Presto/Superset User Experience Improvement
		Resolved		• razzi	T294768 Triage Superset Dashboard Timeouts

Event Timeline

odimitrijevic created this task.Nov 1 2021, 2:54 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 1 2021, 2:54 PM

odimitrijevic added a parent task: T294259: Presto/Superset User Experience Improvement.Nov 1 2021, 2:55 PM

odimitrijevic claimed this task.Nov 1 2021, 3:45 PM

odimitrijevic triaged this task as High priority.

odimitrijevic moved this task from Incoming (new tickets) to Visualize on the Data-Engineering board.

• razzi claimed this task.Nov 3 2021, 8:07 PM

• razzi added a project: User-razzi.

• razzi moved this task from Next Up to In Progress on the Data-Engineering-Kanban board.Nov 16 2021, 5:09 PM

• razzi moved this task from Default to In progress on the User-razzi board.Nov 18 2021, 6:18 PM

KaiOS superset report timeouts are reported in https://phabricator.wikimedia.org/T277320.

The only dashboard that times out consistently is the IP Masking Dashboard reported by @Iflorez. The other 2 charts (editors metrics and edit topics dashboard) load when the Presto cluster has no other queries running; obviously ideally even when the cluster is under load it would still load all dashboards within the timeout, but those dashboards themselves are fine.

Even the individual charts on the IP Masking Dashboard load fine by themselves:

total	94.18
chart	time to load by itself
Blocks	00:00:04.33
Blocks Excluding Wikidata	00:00:04.22
Blocked Unique IP/User	00:00:04.22
Blocked Unique IP/User Excluding Wikidata	00:00:04.21
Top Blocks By Project	00:00:02.95
Top Blocked Users/IPs By Project	00:00:02.89
Reverted Edits	00:00:04.18
Reverted Edits Excluding Wikidata	00:00:04.31
Deleted Pages	00:00:04.24
Deleted Pages Excluding Wikidata	00:00:04.04
Protected Pages	00:00:04.24
Protected Pages Excluding Wikidata	00:00:04.12
Top Deleted Pages By Project	00:00:02.71
Top Reverted Edits By Project	00:00:02.96
Top Protected Pages By Project	00:00:02.04
Check User Requests	00:00:04.32
Top 5 Check User Requests by Projects	00:00:02.97
Admin Filter	00:00:03.36
Active Admins	00:00:04.24
Active Admins Excluding Wikidata	00:00:04.21
Top Active Admins by Projects	00:00:05.05
Edits/Admin Ratio	00:00:04.25
Edits/Admin Ratio Excluding Wikidata	00:00:04.20
Top Edits/Admin Ratio By Project	00:00:05.92

My recommendations:

Put fewer charts on a single dashboard (IP Masking has 25, I'd recommend no more than a page full of charts, or ~6 charts. Dashboards can link to each other, and pages on wikitech or elsewhere can serve as collections of charts)
Add default filters to charts, if possible; many of the charts load less data if the user uses the accompanying filter component.
Pre-aggregate data to make annual charts with month granularity much faster than crunching a year of hourly or whatever data on each load
Create reports instead of dashboards if the query takes more than a few seconds and doesn't need to be updated frequently