Page MenuHomePhabricator

Upgrade to Spark 3.2 to support Spark lineage for Iceberg tables
Open, Needs TriagePublic

Description

When running a Spark job with DataHub's Spark lineage connector on an Iceberg table, it fails with this error:

12:03:12.188 [spark-listener-group-shared] INFO io.openlineage.spark.agent.util.PlanUtils - apply method failed with
java.lang.NoSuchMethodError: org.apache.iceberg.spark.SparkSessionCatalog.icebergCatalog()Lorg/apache/iceberg/catalog/Catalog;
	at io.openlineage.spark3.agent.lifecycle.plan.catalog.IcebergHandler.getIcebergTable(IcebergHandler.java:171)
	at io.openlineage.spark3.agent.lifecycle.plan.catalog.IcebergHandler.getDatasetVersion(IcebergHandler.java:156)
	at io.openlineage.spark3.agent.lifecycle.plan.catalog.CatalogUtils3.getDatasetVersion(CatalogUtils3.java:100)
	at io.openlineage.spark3.agent.utils.DatasetVersionDatasetFacetUtils.extractVersionFromDataSourceV2Relation(DatasetVersionDatasetFacetUtils.java:46)

The first time the icebergCatalog() method appears in the source code appears to be in Iceberg v1.2 for Spark 3.2. It does not exist in Iceberg v1.2 for Spark 3.1.

If we want to support Spark lineage for Iceberg tables, the first (and hopefully only) step is to upgrade to Spark 3.2.

Event Timeline

Ottomata renamed this task from Upgrade to Spark 3.2 to support Spark lineage to Upgrade to Spark 3.2 to support Spark lineage for Iceberg tables.Nov 4 2024, 5:42 PM
tchin removed tchin as the assignee of this task.Nov 5 2024, 7:49 AM

Need to clarify how this gets addressed with Hadoop 3 upgrade.

The plan is to move to Spark 3.5. We need to sync with SREs to define when we do this (before HAdoop 3, after, same time...)