'Not able to find spark-warehouse directory
I am a beginner in Spark Programming. I was practicing the below program:
package practice.spark.examples
import org.apache.log4j.Logger
import org.apache.spark.sql.{SaveMode, SparkSession}
object SparkSQLTableDemo extends Serializable {
@transient lazy val logger: Logger = Logger.getLogger(getClass.getName)
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder()
.appName("SparkSQLTableDemo")
.master("local[3]")
.enableHiveSupport() // to allow the connectivity to a persistent Hive Metastore.
.getOrCreate()
val flightTimeParquetDF = spark.read
.format("parquet")
.option("path", "data/flight*.parquet")
.load()
spark.sql("CREATE DATABASE IF NOT EXISTS AIRLINE_DB")
spark.catalog.setCurrentDatabase("AIRLINE_DB")
flightTimeParquetDF.write
.mode(SaveMode.Overwrite)
.partitionBy("ORIGIN", "OP_CARRIER")
.saveAsTable("flight_data_tbl")
spark.catalog.listTables("AIRLINE_DB").show()
//spark.sql("Select * from flight_data_tbl limit 5").show()
logger.info("Finished.")
spark.stop()
}
}
After running this code we must get 2 directories created right? 1:metastore_db 2:spark-warehouse
But in the current directory, only metastore_db was created. My spark-warehouse folder was created in the Desktop/ location. Attaching the pics below: enter image description here enter image description here
I tried googling about how to set the spark-warehouse directory but was very confused. Also every time I run the code I get the below warnings. I thought this has something to do with the error.
22/02/12 21:30:22 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
22/02/12 21:30:22 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
22/02/12 21:30:31 WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped
22/02/12 21:30:33 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
22/02/12 21:30:33 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore [email protected]
22/02/12 21:30:33 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
22/02/12 21:31:02 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
22/02/12 21:31:02 WARN HiveConf: HiveConf of name hive.internal.ss.authz.settings.applied.marker does not exist
22/02/12 21:31:02 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
22/02/12 21:31:02 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
Can someone please guide me in any way such that I can set the warehouse path to my current practice directory i.e., under metastore_db as shown in Image1 ?
Thank You in Advance :)
Solution 1:[1]
You can use below example to specify metastore location.
import java.io.File
import org.apache.spark.sql.{Row, SaveMode, SparkSession}
case class Record(key: Int, value: String)
// warehouseLocation points to the default location for managed databases and tables
val warehouseLocation = new File("spark-warehouse").getAbsolutePath
val spark = SparkSession
.builder()
.appName("Spark Hive Example")
.config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport()
.getOrCreate()
Details please refer https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Warren Zhu |
