'ValueError: ['columnC'] is not in list
I have dataframe with 2 columns:
listDF = df.select('columnA', 'columnB')
But when I try to get the number of records or even show them, I always get an error that 'columnC' is not in list:
count = listDF.count()
#count = listDF.shape[0]
#any other option to get count
#listDF.show(1)
Py4JJavaError: An error occurred while calling o1031.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage
616.0 failed 4 times....
....
File
"/mnt1/yarn/usercache/hadoop/appcache/.../pyspark.zip/pyspark/sql/types.py", line
1543, in __getitem__
idx = self.__fields__.index(item)
ValueError: 'columnC' is not in list
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
...
File "/mnt1/yarn/usercache/hadoop/appcache/.../serializers.py", line 379, in
dump_stream
vs = list(itertools.islice(iterator, batch))
File "/mnt1/yarn/usercache/hadoop/appcache/.../util.py", line 55, in wrapper
return f(*args, **kwargs)
File "/usr/lib/spark/python/pyspark/sql/session.py", line 673, in prepare
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1421, in verify
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1395, in verify_struct
File "/mnt1/yarn/usercache/hadoop/appcache/.../pyspark.zip/pyspark/sql/types.py",
line 1548, in __getitem__
raise ValueError(item)
ValueError: columnC
What am I doing wrong?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
