'Spark throws error when trying to save a CSV file
Community wizards,
I am really frustrated. When it comes to Spark, Hadoop et al., nothing seems to be straightforward.
For the past hours, I tried to find a solution to the following issue:
ERROR Executor: Exception in task 0.0 in stage 13.0 (TID 823)
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileWithMode0(Ljava/lang/String;JJJI)Ljava/io/FileDescriptor;
Versions:
- OS: Windows 10
- Spark version: 2.4.6
- Scala version: 2.11.12
- Hadoop version: 2.7.1
- Java version: 1.8.0_202 (64-bit)
Variables:
- SPARK_HOME: C:\Spark
- HADOOP_HOME: C:\Hadoop\hadoop-2.7.1
- SCALA_HOME: C:\Program Files (x86)\scala
- JRE_HOME: C:\Program Files\Java\jre1.8.0_202
- JAVA_HOME: C:\Program Files\Java\jdk1.8.0_202
Paths:
- %SPARK_HOME%\bin
- %HADOOP_HOME%\bin
- %SCALA_HOME%\bin
- %JRE_HOME%\bin
- %JAVA_HOME%\bin
The command that throws the error is:
df.coalesce(1).write.format("csv").save("result")
The folder (result) seems to be created, but it's empty.
I have literally no idea how to solve this issue.
Any help would be warmly welcomed.
Solution 1:[1]
I believe your HADOOP_HOME=C:\Hadoop\hadoop-2.7.1 is pointed to Hadoop Binaries/Libraries, instead you should need a tool called WINUTILS.EXE to work in Windows.
You can download Hadoop Version of winutils from git and map HADOOP_HOME to Root directory of Winutils. https://github.com/steveloughran/winutils
Source:
From Hadoop's Confluence: Hadoop requires native libraries on Windows to work properly -that includes to access the file:// filesystem, where Hadoop uses some Windows APIs to implement posix-like file access permissions
https://cwiki.apache.org/confluence/display/HADOOP2/WindowsProblems
Solution 2:[2]
I was facing the same issue. The solution that did a magic for me is - you need to download a bin folder for respective hadoop version that you are using.Once downloaded , replace the old bin folder with new one and replace the winutil.exe inside hadoop/bin/winutil.exe.
Solution 3:[3]
It seems that you don't have Hadoop binaries for Windows installed in HADOOP_HOME directory.
Or it could be that their dependencies (such as Visual C++ Runtime) are missing.
You also might need to load shared libraries directly, it depends on the way you start your Spark application.
System.load(System.getenv("HADOOP_HOME") + "/lib/hadoop.ddl");
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Kanika Jain |
| Solution 3 | andreoss |
