'PySpark: AttributeError: 'DataFrame' object has no attribute 'forEach'
I was trying to get data from hdfs and iterate through each data to do an analysis on column _c1.
import findspark
findspark.init('/location/spark')
import pyspark
from pyspark import SparkContext
sc = SparkContext()
from pyspark.sql import SQLContext
sql = SQLContext(sc)
df = sql.read.csv('hdfs://namenode:9000/data.csv', header=False, inferSchema= True)
df.show() //works
df.forEach(lambda row: some_analyzer(row['_c1'])) // here is the error
But I am getting "AttributeError: 'DataFrame' object has no attribute 'forEach'" error.
I am new to PySpark. I am really looking forward for the help.
Solution 1:[1]
It should be foreach. All in lower case.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Vaibhav Jadhav |
