'Encounter an Error Converting Rdd in Dataframe Pyspark

I am trying to turn a rdd into a dataframe. The operation seems to be successful but when I try to count the number of elements in the dataframe I get an error. I encounter no problems when I try to show the first elements but I have an error when I try to .collect() the values of the dataframe. In any case this is my code:

from pyspark import SparkContext

from pyspark.sql import SparkSession

from pyspark.sql import SQLContext

from pyspark.sql.functions import col

sc = SparkContext(appName = 'ANALYSIS', master = 'local')

rdd = sc.textFile('file.csv')

rdd = rdd.filter(lambda line: line != header)

rdd = rdd.map(lambda line: line.rsplit(',', 6))

spark = SparkSession.builder \
    .master("local[*]") \
    .appName("ANALYSIS") \
    .config("spark.some.config.option", "some-value") \
    .getOrCreate()

feature = ['to_drop','watched','watching','wantwatch','dropped','rating','votes']

df = spark.createDataFrame(rdd, schema = feature)

rdd.collect() --> **it works**

df.show() --> **it works**

df.count() --> **does not work**

Can someone kindly report any errors to me? Thanks

The error I encounter during the execution is the following

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-15-3c9a60fd698f> in <module>
----> 1 df.count()

/opt/conda/lib/python3.8/site-packages/pyspark/sql/dataframe.py in count(self)
    662         2
    663         """
--> 664         return int(self._jdf.count())
    665 
    666     def collect(self):

/opt/conda/lib/python3.8/site-packages/py4j/java_gateway.py in __call__(self, *args)
   1302 
   1303         answer = self.gateway_client.send_command(command)
-> 1304         return_value = get_return_value(
   1305             answer, self.gateway_client, self.target_id, self.name)
   1306 

/opt/conda/lib/python3.8/site-packages/pyspark/sql/utils.py in deco(*a, **kw)
    109     def deco(*a, **kw):
    110         try:
--> 111             return f(*a, **kw)
    112         except py4j.protocol.Py4JJavaError as e:
    113             converted = convert_exception(e.java_exception)

/opt/conda/lib/python3.8/site-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    324             value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325             if answer[1] == REFERENCE_TYPE:
--> 326                 raise Py4JJavaError(
    327                     "An error occurred while calling {0}{1}{2}.\n".
    328                     format(target_id, ".", name), value)


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source