'Output is not showing, spark scala
Output is showing the schema, but output of sql query is not visible. I dont understand where I am doing wrong.
object ex_1 {
def parseLine(line:String): (String, String, Int, Int) = {
val fields = line.split(" ")
val project_code = fields(0)
val project_title = fields(1)
val page_hits = fields(2).toInt
val page_size = fields(3).toInt
(project_code, project_title, page_hits, page_size)
}
def main(args: Array[String]): Unit = {
Logger.getLogger("org").setLevel(Level.ERROR)
val sc = new SparkContext("local[*]", "Weblogs")
val lines = sc.textFile("F:/Downloads_F/pagecounts.out")
val parsedLines = lines.map(parseLine)
println("hello")
val spark = SparkSession
.builder
.master("local")
.getOrCreate
import spark.implicits._
val RDD1 = parsedLines.toDF("project","page","pagehits","pagesize")
RDD1.printSchema()
RDD1.createOrReplaceTempView("logs")
val min1 = spark.sql("SELECT * FROM logs WHERE pagesize >= 4733")
val results = min1.collect()
results.foreach(println)
println("bye")
spark.stop()
}
}
Solution 1:[1]
As confirmed in the comments, using the show method displays the result of spark.sql(..).
Since spark.sql returns a DataFrame, calling show is the ideal way to display the data. Where you where calling collect, previously, is not advised:
Running collect requires moving all the data into the application's driver process, and doing so on a very large dataset can crash the driver process with OutOfMemoryError.
..
..
val min1 = spark.sql("SELECT * FROM logs WHERE pagesize >= 4733")
// where `false` prevents the output from being truncated.
min1.show(false)
println("bye")
spark.stop()
Even if your DataFrame is empty you will still see a table output including the column names (i.e: the schema); whereas .collect() and println would print nothing in this scenario.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | tjheslin1 |
