'Pyspark: Parquet tables visible in SQL?

I am fairly new to PySpark/Hive and I have a problem:

I have a dataframe and want to write it as a paritioned table to HDFS. So far, I've done that via:

df = spark.sql('''
                CREATE EXTERNAL TABLE database.df(
                ID                 STRING
               )
                PARTITIONED BY (
                  DATA_DATE_PART STRING
                )
                STORED AS PARQUET
                LOCATION 'hdfs://path/file'
                ''')  
df.createOrReplaceTempView("df")
df = spark.sql('''
               INSERT INTO database.df PARTITION(DATA_DATE_PART = '{}')
               SELECT ID    
               FROM df
               '''.format(date))

But as with growing dataframes, instead of having to define all columns, I thought there is a better solution to this:

df.write.mode('overwrite').partitionBy('DATA_DATE_PART').parquet("/path/file")

However, a table like this I cannot access via spark.sql nor see it in my HUE browser. I can see it though via PySpark shell: hdfs dfs -ls /path/ So my question, why is that? I've read that parquet files can be special when reading with SQL but my first script does well and the tables are visible everywhere.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source