'Not able to find spark-warehouse directory

I am a beginner in Spark Programming. I was practicing the below program:

package practice.spark.examples

import org.apache.log4j.Logger
import org.apache.spark.sql.{SaveMode, SparkSession}

object SparkSQLTableDemo extends Serializable {
  @transient lazy val logger: Logger = Logger.getLogger(getClass.getName)

  def main(args: Array[String]): Unit = {

    val spark = SparkSession.builder()
      .appName("SparkSQLTableDemo")
      .master("local[3]")
      .enableHiveSupport()  // to allow the connectivity to a persistent Hive Metastore.
      .getOrCreate()

    val flightTimeParquetDF = spark.read
      .format("parquet")
      .option("path", "data/flight*.parquet")
      .load()

    spark.sql("CREATE DATABASE IF NOT EXISTS AIRLINE_DB")
    spark.catalog.setCurrentDatabase("AIRLINE_DB")

    flightTimeParquetDF.write
      .mode(SaveMode.Overwrite)
      .partitionBy("ORIGIN", "OP_CARRIER")
      .saveAsTable("flight_data_tbl")

    spark.catalog.listTables("AIRLINE_DB").show()
    //spark.sql("Select * from flight_data_tbl limit 5").show()

    logger.info("Finished.")
    spark.stop()
  }

}

After running this code we must get 2 directories created right? 1:metastore_db 2:spark-warehouse

But in the current directory, only metastore_db was created. My spark-warehouse folder was created in the Desktop/ location. Attaching the pics below: enter image description here enter image description here

I tried googling about how to set the spark-warehouse directory but was very confused. Also every time I run the code I get the below warnings. I thought this has something to do with the error.

    22/02/12 21:30:22 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
22/02/12 21:30:22 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
22/02/12 21:30:31 WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped
22/02/12 21:30:33 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
22/02/12 21:30:33 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore [email protected]
22/02/12 21:30:33 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
22/02/12 21:31:02 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
22/02/12 21:31:02 WARN HiveConf: HiveConf of name hive.internal.ss.authz.settings.applied.marker does not exist
22/02/12 21:31:02 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
22/02/12 21:31:02 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist

Can someone please guide me in any way such that I can set the warehouse path to my current practice directory i.e., under metastore_db as shown in Image1 ?

Thank You in Advance :)



Solution 1:[1]

You can use below example to specify metastore location.

import java.io.File

import org.apache.spark.sql.{Row, SaveMode, SparkSession}

case class Record(key: Int, value: String)

// warehouseLocation points to the default location for managed databases and tables
val warehouseLocation = new File("spark-warehouse").getAbsolutePath

val spark = SparkSession
  .builder()
  .appName("Spark Hive Example")
  .config("spark.sql.warehouse.dir", warehouseLocation)
  .enableHiveSupport()
  .getOrCreate()

Details please refer https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Warren Zhu