'An error occurred while calling o137.partitions. : org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://ip
I am trying to execute this github project in an aws emr spark cluster https://github.com/pran4ajith/spark-twitter-streaming.git
I've succeeded to run 2 fisrt codes
- tweet_stream_producer.py
- sparkml_train_model.py
But when I run consumer part with command
spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0,io.delta:delta-core_2.12:0.7.0 tweet_stream_consumer.py
I got file path error
Py4JJavaError: An error occurred while calling o137.partitions.
: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://ip-10-0-0-61.ec2.internal:8020/home/hadoop/spark-twitter-streaming/TwitterStreaming/src/app/models/metadata
It seems that the problem is located mapping between local file system path and hadoop file system path
model_path = str(SRC_DIR / 'models')
pipeline_model = PipelineModel.load(model_path)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
