'ValueError: ['columnC'] is not in list

I have dataframe with 2 columns:

listDF = df.select('columnA', 'columnB')

But when I try to get the number of records or even show them, I always get an error that 'columnC' is not in list:

count = listDF.count()
#count = listDF.shape[0]
#any other option to get count 

#listDF.show(1)

Py4JJavaError: An error occurred while calling o1031.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 
616.0 failed 4 times....
....

File 

"/mnt1/yarn/usercache/hadoop/appcache/.../pyspark.zip/pyspark/sql/types.py", line 
1543, in __getitem__
    idx = self.__fields__.index(item)
ValueError: 'columnC' is not in list

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
...
File "/mnt1/yarn/usercache/hadoop/appcache/.../serializers.py", line 379, in 
dump_stream
   vs = list(itertools.islice(iterator, batch))
File "/mnt1/yarn/usercache/hadoop/appcache/.../util.py", line 55, in wrapper
   return f(*args, **kwargs)
File "/usr/lib/spark/python/pyspark/sql/session.py", line 673, in prepare
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1421, in verify
File "/usr/lib/spark/python/pyspark/sql/types.py", line 1395, in verify_struct
File "/mnt1/yarn/usercache/hadoop/appcache/.../pyspark.zip/pyspark/sql/types.py", 
line 1548, in __getitem__
   raise ValueError(item)
ValueError: columnC

What am I doing wrong?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source