'pyspark error: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD

When I tried to search in Spark to Elasticsearch an error ocurred The code that i use is the following:

from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark import SparkConf
from pyspark.sql import SparkSession


body2="""{
    "query": {
        "bool": {
            "must": [
                {
                    "range": {
                        "@timestamp": {
                            "lte": "2022-05-05T09:25:15.000-03:00",
                            "gte": "2022-05-04T09:25:15.000-03:00"
                        }
                    }
                },
                {
                    "match": {
                        "type.keyword": "TABLA1"
                    }
                }
            ]
        }
    },
    "size":10
}"""

sconf = SparkConf().setAppName("Test")
sc = SparkContext.getOrCreate(sconf);

es_read_conf = {
    "es.nodes": "10.48.17.67",
    "es.port": "9200",
    "es.query": body2,
    "es.resource" : "Incice1/TABLA1",
    "es.nodes.wan.only": "true",
    "es.net.http.auth.user": "bocx", 
    "es.net.http.auth.pass": "9bhJFa6S5AM52SxS"
}

es_rdd = sc.newAPIHadoopRDD(
    inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
    keyClass="org.apache.hadoop.io.NullWritable", 
    valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable", 
    conf=es_read_conf)

The error that I get when excecute the code in pyspark:

...     inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
    keyClass="org.apache.... hadoop.io.NullWritable",
    valueClass="org... .elasticsearch.hadoop.mr.LinkedMapWritable",
    conf... =es_read_conf)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/spark/python/pyspark/context.py", line 859, in newAPIHadoopRDD
    jrdd = self._jvm.PythonRDD.newAPIHadoopRDD(self._jsc, inputFormatClass, keyClass,
  File "/opt/spark/python/lib/py4j-0.10.9.3-src.zip/py4j/java_gateway.py", line 1321, in __call__
  File "/opt/spark/python/pyspark/sql/utils.py", line 111, in deco
    return f(*a, **kw)
  File "/opt/spark/python/lib/py4j-0.10.9.3-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: java.lang.ClassNotFoundException: org.elasticsearch.hadoop.mr.LinkedMapWritable
        at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
        at java.base/java.lang.Class.forName0(Native Method)
        at java.base/java.lang.Class.forName(Class.java:398)
        at org.apache.spark.util.Utils$.classForName(Utils.scala:216)

(...)

I get a search in devtools of elasticsearch but when I search in pyspark I get this error Thanks for hear and the response



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source