'PySpark: AttributeError: 'DataFrame' object has no attribute 'forEach'

I was trying to get data from hdfs and iterate through each data to do an analysis on column _c1.

import findspark
findspark.init('/location/spark')
import pyspark
from pyspark import SparkContext
sc = SparkContext()
from pyspark.sql import SQLContext
sql = SQLContext(sc)

df = sql.read.csv('hdfs://namenode:9000/data.csv', header=False, inferSchema= True)
df.show() //works
df.forEach(lambda row: some_analyzer(row['_c1'])) // here is the error

But I am getting "AttributeError: 'DataFrame' object has no attribute 'forEach'" error.

I am new to PySpark. I am really looking forward for the help.



Solution 1:[1]

It should be foreach. All in lower case.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Vaibhav Jadhav