'How can i show hive table using pyspark

Hello i created a spark HD insight cluster on azure and i’m trying to read hive tables with pyspark but the proble that its show me only default database

Anyone have an idea ?



Solution 1:[1]

If you have created tables in other databases, try show tables from database_name. Replace database_name with the actual name.

Solution 2:[2]

You are missing details of hive server in SparkSession. If you haven't added any it will create and use default database to run sparksql.

If you've added configuration details in spark default conf file for spark.sql.warehouse.dir and spark.hadoop.hive.metastore.uris then while creating SparkSession add enableHiveSupport().

Else add configuration details while creating sparksession

.config("spark.sql.warehouse.dir","/user/hive/warehouse")
.config("hive.metastore.uris","thrift://localhost:9083")
.enableHiveSupport()

Solution 3:[3]

If you are using HDInsight 4.0, Spark and Hive not share metadata anymore.

For default you will not see hive tables from pyspark, is a problem that i share on this post: How save/update table in hive, to be readbale on spark.

But, anyway, things you can try:

  1. If you want test only on head node, you can change the hive-site.xml, on property "metastore.catalog.default", change the value to hive, after that open pyspark from command line.
  2. If you want to apply to all cluster nodes, changes need to be made on Ambari.
    • Login as admin on ambari
    • Go to spark2 > Configs > hive-site-override
    • Again, update property "metastore.catalog.default", to hive value
    • Restart all required on Ambari panel

These changes define hive metastore catalog as default. You can see hive databases and table now, but depending of table structure, you will not see the table data properly.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 过过招
Solution 2 Yukeshkumar
Solution 3 Renato Aguiar