'The odd number of partitions for reading data in Spark RDD
I am working with 3GB data and I have read data using Spark RDD:
rdd = sc.textFile("data.json")
When I used rdd.getNumPartitions(), the number of partition is 99! It is really odd.
If I even use sc.textFile("data.json", 20), there are again 99 portions! Also, I cannot change the number of partitions by rdd.repartition() or rdd.coalesce(). It still keep 99 for the number of partitions.
I am really confused and I do not know why my data split into 99 partitions without any reasons! Please advice.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
