'Pyspark pandas partition issues
I am using pyspark.pandas to read a parquet file which is partitioned. However when I check using spark.explain(), it says no partition defined. ps.read_parquet does not seem to have an optin to specify partition explicitly and some how it is also not automatically inferring the partition from parquet file.
import pyspark.pandas as ps
df=ps.read_parquet(path_to_partitioned_file)
df.spark.explain()
22/05/19 21:06:27 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
22/05/19 21:06:27 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
Solution 1:[1]
You need to install binutils as ld (the GNU Linker) is included in the binutils package.
On Kali, run the following on the terminal:
sudo apt-get install binutils
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Fritz Bester |
