'Best way to benchmark spark reading time
What is the best way to benchmark the reading time of spark ?
val rdd = spark.sparkContext.binaryFiles(s"$Path//$partitionColumn=$partitionId/*.avro")
implicit val streamEncoder: Encoder[(String, PortableDataStream)] = Encoders.kryo[(String, PortableDataStream)]
spark.createDataset(rdd)
I use spark 2.2
Solution 1:[1]
I suggest to use this library: https://github.com/LucaCanali/sparkMeasure.
Check examples available in the Readme file. Like this Databrick notebook.
For instance you could read your Avro using the runAndMeasure function:
taskMetrics.runAndMeasure(spark.createDataset(rdd).count())
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
