We are on version 0.246, but we need at least 0.254 for iceberg support.
- build and release new presto version .deb
- install new presto on analytics test cluster
- puppetize iceberg presto catalog
- test on test cluster
- repeat ^ on prod cluster
We are on version 0.246, but we need at least 0.254 for iceberg support.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T311743 [Iceberg] Epic: Icebergify event_sanitized database | |||
Open | None | T333013 [Iceberg Migration] Apache Iceberg Migration | |||
Resolved | xcollazo | T311738 [Iceberg] Debianize and install iceberg support for Spark, Presto, and optionally Hive | |||
Resolved | Ottomata | T311525 Upgrade to latest PrestoDB and enable iceberg support |
FYI: We will not be doing T266640: Decide whether to migrate from Presto to Trino as part of this upgrade.
Open question: Presto latest is 0.273.3 - Would we bump more than the minimal one for Iceberg?
Change 809236 had a related patch set uploaded (by Ottomata; author: Ottomata):
[operations/puppet@production] analytics_test_cluster presto - remove deprecated hive.parquet.fail-on-corrupted-statistics
Change 809236 merged by Ottomata:
[operations/puppet@production] analytics_test_cluster presto - remove deprecated property
Change 809241 had a related patch set uploaded (by Ottomata; author: Ottomata):
[operations/puppet@production] presto - enable iceberg catalog in analytics_test_cluster
Change 809241 merged by Ottomata:
[operations/puppet@production] presto - enable iceberg catalog in analytics_test_cluster
@XCollazo-WMF, @Milimetric, @JAllemandou analytics test presto is upgraded to 0.273.3 with an analytics_test_iceberg catalog configured.
connector.name=iceberg hive.metastore.uri=thrift://analytics-test-hive.eqiad.wmnet:9083 iceberg.catalog.type=hive iceberg.compression-codec=SNAPPY iceberg.file-format=PARQUET
I haven't tested creating or querying any tables, but I do see the catalog available via presto CLI on an-test-client1001.
The Hive connector seems to have a lot more needed Kerberos configuration. As far as I can tell the iceberg connector doesn't have these. I'm worried that the Presto Iceberg connector might not have kerberos support?
I'm worried that the Presto Iceberg connector might not have kerberos support?
Typically, you can pass these details down with a Hadoop Configuration object. Is that not the case with Presto?
I'm worried that the Presto Iceberg connector might not have kerberos support?
Def the current problem. Joal created a test iceberg table via spark in the test cluster. The presto connector fails when trying to talk to the hive metastore because of failed kerberos auth.
you can pass these details down with a Hadoop Configuration object. Is that not the case with Presto?
I'm not sure. The Presto Hive connector has specific configs to deal with this. I'll read some Presto code and try some things to see if that will work.
It does look like Presto is just re-using its Hive and HDFS auth components in its Iceberg connector, so it should work. Investigating...
Change 809676 had a related patch set uploaded (by Ottomata; author: Ottomata):
[operations/puppet@production] analytics test cluster presto - configure iceberg with kerberos support
Change 809676 merged by Ottomata:
[operations/puppet@production] analytics test cluster presto - configure iceberg with kerberos support
I confirm it works for me! Let's maybe give it a try on the prod cluster and ask our end-users to check their queries/dashboards?
Do we have a way to do little bit more superset testing? I'm worried that some edge case in the latest presto version will break something? Does superset-staging use test cluster presto?
No, superset staging doesn't use presto-test - there is almost no data nor computation power under that one. I don't have a better solution than deploying and testing, to possibly roll-back if too many problems show up :S
Just sent this email to analytics-announce:
Hello!
We will be upgrading Presto to version 0.273.3 on Wednesday July 6th. This will require a restart of the Presto cluster. Any running queries may be interrupted, but we don't expect any issues aside from that.
If you notice any new problems with Superset dashboards that use Presto, after we do this upgrade, please let us know. Rolling back is relatively easy.
We will announce again here once the upgrade is complete.
You can follow along at https://phabricator.wikimedia.org/T311525-Andrew Otto & WMF Data Engineering
Change 811735 had a related patch set uploaded (by Ottomata; author: Ottomata):
[operations/debs/presto@debian] Upstream release 0.273.3
Change 811735 merged by Ottomata:
[operations/debs/presto@debian] Upstream release 0.273.3
Change 811739 had a related patch set uploaded (by Ottomata; author: Ottomata):
[operations/puppet@production] analytics_cluster presto - reorg settings and unify configs for presto 0.273.3 upgrade
Change 811739 merged by Ottomata:
[operations/puppet@production] presto - reorg settings and unify configs for presto 0.273.3 upgrade
Mentioned in SAL (#wikimedia-analytics) [2022-07-06T17:57:34Z] <ottomata> upgrading presto to 0.273.3 in analytics cluster - T311525
Change 811759 had a related patch set uploaded (by Ottomata; author: Ottomata):
[operations/puppet@production] Enable iceberg hive for presto
Change 811759 merged by Ottomata:
[operations/puppet@production] Enable iceberg hive for presto
Woo hoo!
presto> show catalogs; Catalog ------------------- analytics_hive analytics_iceberg system (3 rows)
Awesome @Ottomata .
Now we need to make it work - I get errors when querying.
When doing super simple aggregations it works:
select count(1) from analytics_iceberg.joal.mediawiki_reading_depth_iceberg; _col0 ---------- 11836091 (1 row)
But when doing anything more if fails with weird offset errors:
select * from analytics_iceberg.joal.mediawiki_reading_depth_iceberg limit 10; Query 20220707_121633_00145_mf43s, FAILED, 5 nodes Splits: 26 total, 0 done (0.00%) 0:01 [0 rows, 0B] [0 rows/s, 0B/s] Query 20220707_121633_00145_mf43s failed: all input must be consumed: offset=1
or:
select date(meta.dt), count(1) as c from analytics_iceberg.joal.mediawiki_reading_depth_iceberg where month(meta.dt) = 5 group by date(meta.dt) order by c desc limit 10; Query 20220707_121844_00149_mf43s, FAILED, 5 nodes Splits: 186 total, 0 done (0.00%) 0:01 [0 rows, 0B] [0 rows/s, 0B/s] Query 20220707_121844_00149_mf43s failed: offset outside destination buffer: offset=3
(this latest error with offset 3 is the one I almost consistantly get)
It looks like possible a compression mismatch problem? The error from the presto server is:
Jul 07 19:18:53 an-coord1001 presto-server[31298]: io.airlift.compress.MalformedInputException: all input must be consumed: offset=1 Jul 07 19:18:53 an-coord1001 presto-server[31298]: at io.airlift.compress.lz4.Lz4RawDecompressor.decompress(Lz4RawDecompressor.java:83) Jul 07 19:18:53 an-coord1001 presto-server[31298]: at io.airlift.compress.lz4.Lz4Decompressor.decompress(Lz4Decompressor.java:36) Jul 07 19:18:53 an-coord1001 presto-server[31298]: at com.facebook.presto.parquet.ParquetCompressionUtils.decompress(ParquetCompressionUtils.java:149) Jul 07 19:18:53 an-coord1001 presto-server[31298]: at com.facebook.presto.parquet.ParquetCompressionUtils.decompressLz4(ParquetCompressionUtils.java:115) Jul 07 19:18:53 an-coord1001 presto-server[31298]: at com.facebook.presto.parquet.ParquetCompressionUtils.decompress(ParquetCompressionUtils.java:62) Jul 07 19:18:53 an-coord1001 presto-server[31298]: at com.facebook.presto.parquet.reader.PageReader.readPage(PageReader.java:84) Jul 07 19:18:53 an-coord1001 presto-server[31298]: at com.facebook.presto.parquet.reader.AbstractColumnReader.readNextPage(AbstractColumnReader.java:258) Jul 07 19:18:53 an-coord1001 presto-server[31298]: at com.facebook.presto.parquet.reader.AbstractColumnReader.readNext(AbstractColumnReader.java:138) Jul 07 19:18:53 an-coord1001 presto-server[31298]: at com.facebook.presto.parquet.reader.ParquetReader.readPrimitive(ParquetReader.java:354) Jul 07 19:18:53 an-coord1001 presto-server[31298]: at com.facebook.presto.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:460) Jul 07 19:18:53 an-coord1001 presto-server[31298]: at com.facebook.presto.parquet.reader.ParquetReader.readBlock(ParquetReader.java:443) Jul 07 19:18:53 an-coord1001 presto-server[31298]: at com.facebook.presto.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:230) Jul 07 19:18:53 an-coord1001 presto-server[31298]: at com.facebook.presto.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:208) ...
Something is weird with the parquet data files for this table. Using parquet-tools inspect for a mediawiki_reading_depth_iceberg data file, I see things like:
############ Column(access_method) ############ name: access_method path: access_method max_definition_level: 1 max_repetition_level: 0 physical_type: BYTE_ARRAY logical_type: String converted_type (legacy): UTF8 compression: UNKNOWN (space_saved: -24%)
vs. the same on one of the analytics_test_iceberg.joal.navigationtiming_iceberg data files from the analytics_test_cluster:
############ Column(client_ip) ############ name: client_ip path: http.client_ip max_definition_level: 2 max_repetition_level: 0 physical_type: BYTE_ARRAY logical_type: String converted_type (legacy): UTF8 compression: SNAPPY (space_saved: 19%)
Note the different compression used.
Is it possible the files were written incorrectly somehow?
Thanks Andrew for the details! Indeed I had messed up my table creation (I have played a bit with compression, without success). I have recreated the table with our usual snappy compression and it now works as expected: blazing fast!
Example of the differences we'll get from queries SUperset could do:
With Iceberg:
select date(meta.dt), count(1) as c from analytics_iceberg.joal.mediawiki_reading_depth_iceberg where month(meta.dt) = 5 group by date(meta.dt) order by c desc limit 10; _col0 | c ------------+--------- 2022-05-16 | 1032149 2022-05-30 | 1023711 2022-05-23 | 1019053 2022-05-29 | 1015749 2022-05-09 | 1003177 2022-05-24 | 998581 2022-05-15 | 994772 2022-05-31 | 993864 2022-05-17 | 985280 2022-05-04 | 984949 (10 rows) Query 20220708_075447_00049_mf43s, FINISHED, 5 nodes Splits: 226 total, 226 done (100.00%) 0:03 [70.6M rows, 478MB] [27.3M rows/s, 185MB/s]
Without Iceberg
select date(from_iso8601_timestamp(meta.dt)), count(1) as c from analytics_hive.event_sanitized.mediawiki_reading_depth where month(from_iso8601_timestamp(meta.dt)) = 5 group by date(from_iso8601_timestamp(meta.dt)) order by c desc limit 10; _col0 | c ------------+--------- 2022-05-16 | 1032149 2022-05-30 | 1023711 2022-05-23 | 1019053 2022-05-29 | 1015749 2022-05-09 | 1003177 2022-05-24 | 998581 2022-05-15 | 994772 2022-05-31 | 993864 2022-05-17 | 985280 2022-05-04 | 984949 (10 rows) Query 20220708_075908_00052_mf43s, FINISHED, 5 nodes Splits: 10,633 total, 10,633 done (100.00%) 0:12 [83.6M rows, 893MB] [6.76M rows/s, 72.3MB/s]
Nice!
Quick thought about compesssion: since we are talking about icebergifying event_sanitized, would it make sense to use a more space efficient compression than snappy on the event_sanitized tables? gzip perhaps?
Just passing by here to document that Presto 0.273.3 is compiled against, and the release tarball includes, Iceberg 0.13.1.
0.13.1 was released on Feb 14, 2022.