'How to run Apache Flink with Hive metastore locally to test Apache Iceberg

I would like to fiddle a bit around with Apache Flink and Apache Iceberg and test this on a local machine. I read through the documentation, but I'm still not sure what has to be setup locally to make this run. What I already did is that I have a docker-compose file to start locally a hadoop-namenode and -datanode and a hive-server which stores the metadata in Postgres.

Additionally I setup a local Flink project (Java project with Scala 2.12.) in my IDE and besides of the default Flink dependencies, I added the flink-clients, flink-table-api-java-bridge, flink-table-planner, flink-connector-hive, hive-exec, hadoop-client with version 2.8.3, the flink-hadoop-compatibility and also the iceberg-flink-runtime-1.14 dependencies.

I'm then trying to create a simple catalog with a flink SQL statement like this:

tEnv.executeSql(String.join("\n",
                "CREATE CATALOG iceberg_catalog WITH (",
              "'type'='iceberg', ",
              "'catalog-type'='hive', ",
              "'uri'='thrift://localhost:9083', ",
              "'warehouse'='hdfs://namenode:8020/warehouse/path')"));

Afterwards I'm getting the following warnings and stack trace:

12:11:43,869 WARN  org.apache.flink.runtime.util.HadoopUtils                    [] - Could not find Hadoop configuration via any of the supported methods (Flink configuration, environment variables).
12:11:44,203 INFO  org.apache.hadoop.hive.conf.HiveConf                         [] - Found configuration file null
12:11:44,607 WARN  org.apache.hadoop.util.NativeCodeLoader                      [] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12:11:44,816 ERROR org.apache.hadoop.hive.metastore.utils.MetaStoreUtils        [] - Got exception: java.lang.ClassCastException class [Ljava.lang.Object; cannot be cast to class [Ljava.net.URI; ([Ljava.lang.Object; and [Ljava.net.URI; are in module java.base of loader 'bootstrap')
java.lang.ClassCastException: class [Ljava.lang.Object; cannot be cast to class [Ljava.net.URI; ([Ljava.lang.Object; and [Ljava.net.URI; are in module java.base of loader 'bootstrap')
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.resolveUris(HiveMetaStoreClient.java:262) [hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:182) [hive-exec-3.1.2.jar:3.1.2]

I read through the documentation, but I'm not sure what is necessary to run all of this locally out of the IDE (and not inside a dedicated Flink cluster, with the dependencies added via the libs etc.).

It would be great if you could give me a hint what I'm missing here or doing wrong.



Solution 1:[1]

Note that the CATALOG represents the iceberg table's directory and is not part of Hive. When you create a catalog, it does not leave anything in Hive metastore.

But when you use Iceberg Flink SQL such as "Create database iceberg_db" to create a database in this hive catalog, you'll see it in hive metastore as well.

In the same way, when you create a table using hive Catalog, if you look at it using hive desc formatted, you'll find a table property named "table_type" with the value "ICEBERG".

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 liliwei