'Can't open GeoJson with Python Sedona GeoJsonReader
I am using Python's Apache Sedona to open a GeoJson file. I followed this guide. I follow every step for opening a GeoJson, but for the sake of clarity, this is what I did:
spark = SparkSession.\
builder.\
master("local[*]").\
appName("Sedona App").\
config("spark.serializer", KryoSerializer.getName).\
config("spark.kryo.registrator", SedonaKryoRegistrator.getName) .\
config("spark.jars.packages", "org.apache.sedona:sedona-python-adapter-3.0_2.12:1.1.0-incubating,org.datasyslab:geotools-wrapper:1.1.0-25.2") .\
getOrCreate()
SedonaRegistrator.registerAll(spark)
sc = spark.sparkContext
amenity_file = 'example/2/amenity.geojson'
geojson_file = GeoJsonReader.readToGeometryRDD(sc, amenity_file)
The last line spits this:
22/03/25 16:52:17 WARN FormatMapper: [Sedona] The GeoJSON file doesn't have feature properties
However, I continued with the following line (just like in the example):
Adapter.toDf(geojson_file, spark).show()
But I got an error:
22/03/25 16:52:26 ERROR FormatMapper: [Sedona] com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (String)"{"; line: 1, column: 1])
at [Source: (String)"{"; line: 1, column: 3]
22/03/25 16:52:26 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 8)
java.lang.RuntimeException: com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (String)"{"; line: 1, column: 1])
at [Source: (String)"{"; line: 1, column: 3]
at org.wololo.geojson.GeoJSONFactory.create(GeoJSONFactory.java:31)
at org.wololo.jts2geojson.GeoJSONReader.read(GeoJSONReader.java:20)
at org.wololo.jts2geojson.GeoJSONReader.read(GeoJSONReader.java:16)
at org.apache.sedona.core.formatMapper.FormatMapper.readGeoJSON(FormatMapper.java:206)
at org.apache.sedona.core.formatMapper.FormatMapper.readGeometry(FormatMapper.java:304)
at org.apache.sedona.core.formatMapper.FormatMapper.call(FormatMapper.java:377)
at org.apache.sedona.core.formatMapper.FormatMapper.call(FormatMapper.java:52)
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitions$1(JavaRDDLike.scala:153)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:837)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:837)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:462)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:465)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (String)"{"; line: 1, column: 1])
at [Source: (String)"{"; line: 1, column: 3]
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:664)
at com.fasterxml.jackson.core.base.ParserBase._handleEOF(ParserBase.java:486)
at com.fasterxml.jackson.core.base.ParserBase._eofAsNextChar(ParserBase.java:498)
at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._skipWSOrEnd(ReaderBasedJsonParser.java:2354)
at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextFieldName(ReaderBasedJsonParser.java:905)
at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:249)
at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:68)
at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:15)
at com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4254)
at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2711)
at org.wololo.geojson.GeoJSONFactory.create(GeoJSONFactory.java:21)
... 32 more
22/03/25 16:52:26 WARN TaskSetManager: Lost task 0.0 in stage 8.0 (TID 8, works-mbp.lan, executor driver): java.lang.RuntimeException: com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (String)"{"; line: 1, column: 1])
at [Source: (String)"{"; line: 1, column: 3]
at org.wololo.geojson.GeoJSONFactory.create(GeoJSONFactory.java:31)
at org.wololo.jts2geojson.GeoJSONReader.read(GeoJSONReader.java:20)
at org.wololo.jts2geojson.GeoJSONReader.read(GeoJSONReader.java:16)
at org.apache.sedona.core.formatMapper.FormatMapper.readGeoJSON(FormatMapper.java:206)
at org.apache.sedona.core.formatMapper.FormatMapper.readGeometry(FormatMapper.java:304)
at org.apache.sedona.core.formatMapper.FormatMapper.call(FormatMapper.java:377)
at org.apache.sedona.core.formatMapper.FormatMapper.call(FormatMapper.java:52)
at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitions$1(JavaRDDLike.scala:153)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:837)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:837)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:462)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:465)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (String)"{"; line: 1, column: 1])
at [Source: (String)"{"; line: 1, column: 3]
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:664)
at com.fasterxml.jackson.core.base.ParserBase._handleEOF(ParserBase.java:486)
at com.fasterxml.jackson.core.base.ParserBase._eofAsNextChar(ParserBase.java:498)
at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._skipWSOrEnd(ReaderBasedJsonParser.java:2354)
at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextFieldName(ReaderBasedJsonParser.java:905)
at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:249)
at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:68)
at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:15)
at com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4254)
at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2711)
at org.wololo.geojson.GeoJSONFactory.create(GeoJSONFactory.java:21)
... 32 more
22/03/25 16:52:26 ERROR TaskSetManager: Task 0 in stage 8.0 failed 1 times; aborting job
EDIT: The file contains geometries... but it is inside an Array, then Struct. This is how it looks:
{
"type": "FeatureCollection",
"name": "amenity",
"features": [
{
"type": "Feature",
"feature_type": "amenity",
"id": "1231312323f",
"properties": {
"accessibility": null,
"address_id": "1231232312",
"alt_name": null,
"category": "elevator",
"correlation_id": null,
"hours": null,
"name": null,
"phone": null,
"unit_ids": [
"1232312",
"123212"
],
"website": null
},
"geometry": {
"type": "Point",
"coordinates": [
-121.8888997,
37.3285715
]
}
}]}
UPDATE:
I tried also making up a very simple GeoJson and still throws the same error... I believe this is not to do with the GeoJsonReader:
{
"type": "Point",
"coordinates": [
-105.01621,
39.57422
]
}
Solution 1:[1]
As @Paul H pointed, the issue was related to the format. This was surprising as the file was an IMDF file verified by Apple... however, the GeoJsonReader renders it as corrupt. To solve the issue, filter the geojson from the 'Features' key.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | user18140022 |
