'pyspark error: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD
When I tried to search in Spark to Elasticsearch an error ocurred The code that i use is the following:
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark import SparkConf
from pyspark.sql import SparkSession
body2="""{
"query": {
"bool": {
"must": [
{
"range": {
"@timestamp": {
"lte": "2022-05-05T09:25:15.000-03:00",
"gte": "2022-05-04T09:25:15.000-03:00"
}
}
},
{
"match": {
"type.keyword": "TABLA1"
}
}
]
}
},
"size":10
}"""
sconf = SparkConf().setAppName("Test")
sc = SparkContext.getOrCreate(sconf);
es_read_conf = {
"es.nodes": "10.48.17.67",
"es.port": "9200",
"es.query": body2,
"es.resource" : "Incice1/TABLA1",
"es.nodes.wan.only": "true",
"es.net.http.auth.user": "bocx",
"es.net.http.auth.pass": "9bhJFa6S5AM52SxS"
}
es_rdd = sc.newAPIHadoopRDD(
inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
keyClass="org.apache.hadoop.io.NullWritable",
valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
conf=es_read_conf)
The error that I get when excecute the code in pyspark:
... inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
keyClass="org.apache.... hadoop.io.NullWritable",
valueClass="org... .elasticsearch.hadoop.mr.LinkedMapWritable",
conf... =es_read_conf)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/spark/python/pyspark/context.py", line 859, in newAPIHadoopRDD
jrdd = self._jvm.PythonRDD.newAPIHadoopRDD(self._jsc, inputFormatClass, keyClass,
File "/opt/spark/python/lib/py4j-0.10.9.3-src.zip/py4j/java_gateway.py", line 1321, in __call__
File "/opt/spark/python/pyspark/sql/utils.py", line 111, in deco
return f(*a, **kw)
File "/opt/spark/python/lib/py4j-0.10.9.3-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: java.lang.ClassNotFoundException: org.elasticsearch.hadoop.mr.LinkedMapWritable
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:398)
at org.apache.spark.util.Utils$.classForName(Utils.scala:216)
(...)
I get a search in devtools of elasticsearch but when I search in pyspark I get this error Thanks for hear and the response
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|