Alluxio for Improved Superset Query Performance
Open, MediumPublic
Actions

Assigned To

None

Authored By

	odimitrijevic
	Aug 5 2021, 3:35 PM

Description

As a user of superset I wish to experience faster dashboard rendering and fewer timeouts so that I can quickly view the reports.

The solution identified is to implement Presto's built-in Alluxio SDK as a discrete cache for HDFS files on each presto worker node.

An earlier iteration of this plan was attempted in 2021, where we had intended to use a distributed alluxio cache service. This failed because we were unable to connect Alluxio to a kerberised Hive metastore.

This version of the plan differs from that previous attempt in that Alluxio is only ever used locally on each presto worker node, using a jar file provided with presto itself.
The caches are unaware of each other and the only client of each cache is the presto server running on the same machine.

Related Objects
Search...

Status	Assigned	Task
Open	None	T288252 Alluxio for Improved Superset Query Performance
Open	None	T269832 Add a presto query logger
Resolved	• razzi	T292087 Setup Presto UI in production
Open	None	T266641 [Data Platform] Test Alluxio as cache layer for Presto
Resolved	BTullis	T342343 Upgrade Presto to version 0.283
Declined	BTullis	T287864 Deploy an-test-coord1002 to facilitate failover testing of analytics coordinator role
Resolved	BTullis	T289664 Site: Eqiad - 1 VM request for analytics test cluster - coordinator replica role
		Unknown Object (Task)
Resolved	BTullis	T293938 (Need By: TBD) rack/setup/install an-test-coord1002
Declined	None	T288766 Deploy an-test-presto1002 as a Ganeti VM to test Presto and Alluxio integration
		Unknown Object (Task)
Declined	Jclark-ctr	T290987 Q1:(Need By: TBD) rack/setup/install an-presto10[06-15]

Event Timeline

odimitrijevic created this task.Aug 5 2021, 3:35 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 5 2021, 3:35 PM

odimitrijevic moved this task from Next Up to Q2 Epics on the Data-Engineering-Kanban board.Aug 5 2021, 3:36 PM

BTullis subscribed.Aug 5 2021, 5:13 PM

odimitrijevic updated the task description. (Show Details)Aug 11 2021, 10:42 PM

odimitrijevic added a subtask: T269832: Add a presto query logger.Aug 12 2021, 4:30 PM

odimitrijevic added a subtask: T266641: [Data Platform] Test Alluxio as cache layer for Presto.

odimitrijevic added a subtask: T287864: Deploy an-test-coord1002 to facilitate failover testing of analytics coordinator role.Aug 12 2021, 6:56 PM

BTullis added a subtask: T288766: Deploy an-test-presto1002 as a Ganeti VM to test Presto and Alluxio integration.Aug 12 2021, 7:01 PM

Adding 2 things:

Alluxio is built using 3 different Leader-follower systems: core (caching), job (data movement), catalog (Hive tables)
The performance test on the test cluster will probably give no visible result given the relatively small size of the test-cluster and data in there. We should nonetheless be able to trace execution and confirm the flow is the expected one.

odimitrijevic triaged this task as High priority.Aug 20 2021, 6:15 PM

odimitrijevic moved this task from Incoming to Smart Tools for Better Data on the Analytics board.

Ottomata added a subtask: Unknown Object (Task).Sep 16 2021, 5:16 PM

wiki_willy closed subtask Unknown Object (Task) as Declined.Sep 30 2021, 8:01 PM

odimitrijevic moved this task from Q2 Epics to Next Up on the Data-Engineering-Kanban board.Oct 25 2021, 3:29 PM

Ottomata closed subtask T288766: Deploy an-test-presto1002 as a Ganeti VM to test Presto and Alluxio integration as Declined.Oct 25 2021, 3:48 PM

odimitrijevic moved this task from Next Up to Q2 Epics on the Data-Engineering-Kanban board.Oct 25 2021, 4:26 PM

Should we decline this ticket now, or mark it as resolved, or re-title it?

I think decline.

BTullis closed subtask T266641: [Data Platform] Test Alluxio as cache layer for Presto as Resolved.Oct 28 2021, 4:38 PM

BTullis closed this task as Resolved.Oct 28 2021, 5:28 PM

BTullis claimed this task.

• razzi changed the status of subtask T269832: Add a presto query logger from Open to In Progress.Nov 9 2021, 8:37 PM

• EChetty changed the status of subtask T269832: Add a presto query logger from In Progress to Open.Feb 13 2023, 1:40 PM

BTullis closed subtask T287864: Deploy an-test-coord1002 to facilitate failover testing of analytics coordinator role as Declined.May 5 2023, 3:06 PM

BTullis reopened subtask T266641: [Data Platform] Test Alluxio as cache layer for Presto as Open.Jul 20 2023, 10:06 AM

I'm re-opening this ticket, as we have made significant advances on the use of the built-in Alluxio SDK cache: https://prestodb.io/docs/current/cache/local.html
Two child tickets T266641: [Data Platform] Test Alluxio as cache layer for Presto and T342343: Upgrade Presto to version 0.283 are under way, so I think it makes sense to bring back this ticket to track any follow-up or ancillary work.

Speaking to @odimitrijevic about this ticket the other day, we discussed that it would be good to see if we can get a baseline against which to measure any performance improvements, when we do enable caching in presto.
Would we want to have duplicate catalogs (one with caching, one without) for example, so that we can gauge the difference that it makes?

It also occurred to me that we have some cache metrics available to monitor via JMX: https://prestodb.io/docs/current/cache/local.html#monitoring
We should make sure that we have those available with a suitable Grafana dashboard.

BTullis updated the task description. (Show Details)Oct 13 2023, 1:45 PM

lbowmaker moved this task from Icebox (not considered in current quarter) to Incoming (new tickets) on the Data-Engineering board.Nov 10 2023, 1:20 PM

lbowmaker moved this task from Incoming (new tickets) to Radar (External Teams) on the Data-Engineering board.Nov 10 2023, 2:47 PM

Gehel lowered the priority of this task from High to Medium.Dec 7 2023, 1:50 PM

Gehel moved this task from Incoming to Misc on the Data-Platform-SRE board.

BTullis removed BTullis as the assignee of this task.Mar 15 2024, 6:00 PM

BTullis moved this task from Misc to Epics on the Data-Platform-SRE board.Mar 22 2024, 5:04 PM

BTullis moved this task from Epics to Misc on the Data-Platform-SRE board.Mar 22 2024, 5:10 PM

BTullis removed a project: Epic.

I think that the latest superset deployments have caching enabled. This might not be useful anymore.

Gehel edited projects, added Data-Platform-SRE; removed Data-Platform-SRE (2024.05.27 - 2024.06.16).May 24 2024, 1:23 PM

Actually, this brings a different level of caching and could reduce network pressure in some instances (T364893#9800673).

Alluxio for Improved Superset Query PerformanceOpen, MediumPublicActions

Description

Related ObjectsSearch...

Event Timeline

Alluxio for Improved Superset Query Performance
Open, MediumPublic
Actions

Related Objects
Search...