Page MenuHomePhabricator

Upgrade to latest PrestoDB and enable iceberg support
Closed, ResolvedPublic3 Estimated Story Points

Description

We are on version 0.246, but we need at least 0.254 for iceberg support.

  • build and release new presto version .deb
  • install new presto on analytics test cluster
  • puppetize iceberg presto catalog
  • test on test cluster
  • repeat ^ on prod cluster

Event Timeline

Open question: Presto latest is 0.273.3 - Would we bump more than the minimal one for Iceberg?

Change 809236 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/puppet@production] analytics_test_cluster presto - remove deprecated hive.parquet.fail-on-corrupted-statistics

https://gerrit.wikimedia.org/r/809236

Change 809236 merged by Ottomata:

[operations/puppet@production] analytics_test_cluster presto - remove deprecated property

https://gerrit.wikimedia.org/r/809236

Change 809241 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/puppet@production] presto - enable iceberg catalog in analytics_test_cluster

https://gerrit.wikimedia.org/r/809241

Change 809241 merged by Ottomata:

[operations/puppet@production] presto - enable iceberg catalog in analytics_test_cluster

https://gerrit.wikimedia.org/r/809241

@XCollazo-WMF, @Milimetric, @JAllemandou analytics test presto is upgraded to 0.273.3 with an analytics_test_iceberg catalog configured.

connector.name=iceberg
hive.metastore.uri=thrift://analytics-test-hive.eqiad.wmnet:9083
iceberg.catalog.type=hive
iceberg.compression-codec=SNAPPY
iceberg.file-format=PARQUET

I haven't tested creating or querying any tables, but I do see the catalog available via presto CLI on an-test-client1001.

The Hive connector seems to have a lot more needed Kerberos configuration. As far as I can tell the iceberg connector doesn't have these. I'm worried that the Presto Iceberg connector might not have kerberos support?

I'm worried that the Presto Iceberg connector might not have kerberos support?

Typically, you can pass these details down with a Hadoop Configuration object. Is that not the case with Presto?

I'm worried that the Presto Iceberg connector might not have kerberos support?

Def the current problem. Joal created a test iceberg table via spark in the test cluster. The presto connector fails when trying to talk to the hive metastore because of failed kerberos auth.

you can pass these details down with a Hadoop Configuration object. Is that not the case with Presto?

I'm not sure. The Presto Hive connector has specific configs to deal with this. I'll read some Presto code and try some things to see if that will work.

It does look like Presto is just re-using its Hive and HDFS auth components in its Iceberg connector, so it should work. Investigating...

Change 809676 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/puppet@production] analytics test cluster presto - configure iceberg with kerberos support

https://gerrit.wikimedia.org/r/809676

Change 809676 merged by Ottomata:

[operations/puppet@production] analytics test cluster presto - configure iceberg with kerberos support

https://gerrit.wikimedia.org/r/809676

I confirm it works for me! Let's maybe give it a try on the prod cluster and ask our end-users to check their queries/dashboards?

Do we have a way to do little bit more superset testing? I'm worried that some edge case in the latest presto version will break something? Does superset-staging use test cluster presto?

No, superset staging doesn't use presto-test - there is almost no data nor computation power under that one. I don't have a better solution than deploying and testing, to possibly roll-back if too many problems show up :S

EChetty set the point value for this task to 3.Jun 30 2022, 5:01 PM

Just sent this email to analytics-announce:

Hello!

We will be upgrading Presto to version 0.273.3 on Wednesday July 6th. This will require a restart of the Presto cluster. Any running queries may be interrupted, but we don't expect any issues aside from that.

If you notice any new problems with Superset dashboards that use Presto, after we do this upgrade, please let us know. Rolling back is relatively easy.

We will announce again here once the upgrade is complete.
You can follow along at https://phabricator.wikimedia.org/T311525

-Andrew Otto & WMF Data Engineering

Change 811735 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/debs/presto@debian] Upstream release 0.273.3

https://gerrit.wikimedia.org/r/811735

Change 811735 merged by Ottomata:

[operations/debs/presto@debian] Upstream release 0.273.3

https://gerrit.wikimedia.org/r/811735

Change 811739 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/puppet@production] analytics_cluster presto - reorg settings and unify configs for presto 0.273.3 upgrade

https://gerrit.wikimedia.org/r/811739

Change 811739 merged by Ottomata:

[operations/puppet@production] presto - reorg settings and unify configs for presto 0.273.3 upgrade

https://gerrit.wikimedia.org/r/811739

Mentioned in SAL (#wikimedia-analytics) [2022-07-06T17:57:34Z] <ottomata> upgrading presto to 0.273.3 in analytics cluster - T311525

Change 811759 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/puppet@production] Enable iceberg hive for presto

https://gerrit.wikimedia.org/r/811759

Change 811759 merged by Ottomata:

[operations/puppet@production] Enable iceberg hive for presto

https://gerrit.wikimedia.org/r/811759

Woo hoo!

presto> show catalogs;
      Catalog
-------------------
 analytics_hive
 analytics_iceberg
 system
(3 rows)

@JAllemandou

Awesome @Ottomata .
Now we need to make it work - I get errors when querying.
When doing super simple aggregations it works:

select count(1) from analytics_iceberg.joal.mediawiki_reading_depth_iceberg;

  _col0   
----------
 11836091 
(1 row)

But when doing anything more if fails with weird offset errors:

select * from analytics_iceberg.joal.mediawiki_reading_depth_iceberg limit 10;

Query 20220707_121633_00145_mf43s, FAILED, 5 nodes
Splits: 26 total, 0 done (0.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20220707_121633_00145_mf43s failed: all input must be consumed: offset=1

or:

select date(meta.dt), count(1) as c from analytics_iceberg.joal.mediawiki_reading_depth_iceberg where month(meta.dt) = 5 group by date(meta.dt) order by c desc limit 10;

Query 20220707_121844_00149_mf43s, FAILED, 5 nodes
Splits: 186 total, 0 done (0.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20220707_121844_00149_mf43s failed: offset outside destination buffer: offset=3

(this latest error with offset 3 is the one I almost consistantly get)

It looks like possible a compression mismatch problem? The error from the presto server is:

Jul 07 19:18:53 an-coord1001 presto-server[31298]: io.airlift.compress.MalformedInputException: all input must be consumed: offset=1
Jul 07 19:18:53 an-coord1001 presto-server[31298]:         at io.airlift.compress.lz4.Lz4RawDecompressor.decompress(Lz4RawDecompressor.java:83)
Jul 07 19:18:53 an-coord1001 presto-server[31298]:         at io.airlift.compress.lz4.Lz4Decompressor.decompress(Lz4Decompressor.java:36)
Jul 07 19:18:53 an-coord1001 presto-server[31298]:         at com.facebook.presto.parquet.ParquetCompressionUtils.decompress(ParquetCompressionUtils.java:149)
Jul 07 19:18:53 an-coord1001 presto-server[31298]:         at com.facebook.presto.parquet.ParquetCompressionUtils.decompressLz4(ParquetCompressionUtils.java:115)
Jul 07 19:18:53 an-coord1001 presto-server[31298]:         at com.facebook.presto.parquet.ParquetCompressionUtils.decompress(ParquetCompressionUtils.java:62)
Jul 07 19:18:53 an-coord1001 presto-server[31298]:         at com.facebook.presto.parquet.reader.PageReader.readPage(PageReader.java:84)
Jul 07 19:18:53 an-coord1001 presto-server[31298]:         at com.facebook.presto.parquet.reader.AbstractColumnReader.readNextPage(AbstractColumnReader.java:258)
Jul 07 19:18:53 an-coord1001 presto-server[31298]:         at com.facebook.presto.parquet.reader.AbstractColumnReader.readNext(AbstractColumnReader.java:138)
Jul 07 19:18:53 an-coord1001 presto-server[31298]:         at com.facebook.presto.parquet.reader.ParquetReader.readPrimitive(ParquetReader.java:354)
Jul 07 19:18:53 an-coord1001 presto-server[31298]:         at com.facebook.presto.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:460)
Jul 07 19:18:53 an-coord1001 presto-server[31298]:         at com.facebook.presto.parquet.reader.ParquetReader.readBlock(ParquetReader.java:443)
Jul 07 19:18:53 an-coord1001 presto-server[31298]:         at com.facebook.presto.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:230)
Jul 07 19:18:53 an-coord1001 presto-server[31298]:         at com.facebook.presto.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:208)

...

Something is weird with the parquet data files for this table. Using parquet-tools inspect for a mediawiki_reading_depth_iceberg data file, I see things like:

############ Column(access_method) ############
name: access_method
path: access_method
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8
compression: UNKNOWN (space_saved: -24%)

vs. the same on one of the analytics_test_iceberg.joal.navigationtiming_iceberg data files from the analytics_test_cluster:

############ Column(client_ip) ############
name: client_ip
path: http.client_ip
max_definition_level: 2
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8
compression: SNAPPY (space_saved: 19%)

Note the different compression used.

Is it possible the files were written incorrectly somehow?

Thanks Andrew for the details! Indeed I had messed up my table creation (I have played a bit with compression, without success). I have recreated the table with our usual snappy compression and it now works as expected: blazing fast!
Example of the differences we'll get from queries SUperset could do:
With Iceberg:

select date(meta.dt), count(1) as c from analytics_iceberg.joal.mediawiki_reading_depth_iceberg where month(meta.dt) = 5 group by date(meta.dt) order by c desc limit 10;
   _col0    |    c    
------------+---------
 2022-05-16 | 1032149 
 2022-05-30 | 1023711 
 2022-05-23 | 1019053 
 2022-05-29 | 1015749 
 2022-05-09 | 1003177 
 2022-05-24 |  998581 
 2022-05-15 |  994772 
 2022-05-31 |  993864 
 2022-05-17 |  985280 
 2022-05-04 |  984949 
(10 rows)

Query 20220708_075447_00049_mf43s, FINISHED, 5 nodes
Splits: 226 total, 226 done (100.00%)
0:03 [70.6M rows, 478MB] [27.3M rows/s, 185MB/s]

Without Iceberg

select date(from_iso8601_timestamp(meta.dt)), count(1) as c from analytics_hive.event_sanitized.mediawiki_reading_depth where month(from_iso8601_timestamp(meta.dt)) = 5 group by date(from_iso8601_timestamp(meta.dt)) order by c desc limit 10;
   _col0    |    c    
------------+---------
 2022-05-16 | 1032149 
 2022-05-30 | 1023711 
 2022-05-23 | 1019053 
 2022-05-29 | 1015749 
 2022-05-09 | 1003177 
 2022-05-24 |  998581 
 2022-05-15 |  994772 
 2022-05-31 |  993864 
 2022-05-17 |  985280 
 2022-05-04 |  984949 
(10 rows)

Query 20220708_075908_00052_mf43s, FINISHED, 5 nodes
Splits: 10,633 total, 10,633 done (100.00%)
0:12 [83.6M rows, 893MB] [6.76M rows/s, 72.3MB/s]

Nice!

Quick thought about compesssion: since we are talking about icebergifying event_sanitized, would it make sense to use a more space efficient compression than snappy on the event_sanitized tables? gzip perhaps?

Just passing by here to document that Presto 0.273.3 is compiled against, and the release tarball includes, Iceberg 0.13.1.

0.13.1 was released on Feb 14, 2022.