'Spark throws error when trying to save a CSV file

Community wizards,

I am really frustrated. When it comes to Spark, Hadoop et al., nothing seems to be straightforward.

For the past hours, I tried to find a solution to the following issue:

ERROR Executor: Exception in task 0.0 in stage 13.0 (TID 823)
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileWithMode0(Ljava/lang/String;JJJI)Ljava/io/FileDescriptor;

Versions:

OS: Windows 10
Spark version: 2.4.6
Scala version: 2.11.12
Hadoop version: 2.7.1
Java version: 1.8.0_202 (64-bit)

Variables:

SPARK_HOME: C:\Spark
HADOOP_HOME: C:\Hadoop\hadoop-2.7.1
SCALA_HOME: C:\Program Files (x86)\scala
JRE_HOME: C:\Program Files\Java\jre1.8.0_202
JAVA_HOME: C:\Program Files\Java\jdk1.8.0_202

Paths:

%SPARK_HOME%\bin
%HADOOP_HOME%\bin
%SCALA_HOME%\bin
%JRE_HOME%\bin
%JAVA_HOME%\bin

The command that throws the error is:

df.coalesce(1).write.format("csv").save("result")

The folder (result) seems to be created, but it's empty.

I have literally no idea how to solve this issue.

Any help would be warmly welcomed.

Solution 1:^[1]

I believe your HADOOP_HOME=C:\Hadoop\hadoop-2.7.1 is pointed to Hadoop Binaries/Libraries, instead you should need a tool called WINUTILS.EXE to work in Windows.

You can download Hadoop Version of winutils from git and map HADOOP_HOME to Root directory of Winutils. https://github.com/steveloughran/winutils

Source:

From Hadoop's Confluence: Hadoop requires native libraries on Windows to work properly -that includes to access the file:// filesystem, where Hadoop uses some Windows APIs to implement posix-like file access permissions

https://cwiki.apache.org/confluence/display/HADOOP2/WindowsProblems

Solution 2:^[2]

I was facing the same issue. The solution that did a magic for me is - you need to download a bin folder for respective hadoop version that you are using.Once downloaded , replace the old bin folder with new one and replace the winutil.exe inside hadoop/bin/winutil.exe.

Solution 3:^[3]

It seems that you don't have Hadoop binaries for Windows installed in HADOOP_HOME directory. Or it could be that their dependencies (such as Visual C++ Runtime) are missing.

You also might need to load shared libraries directly, it depends on the way you start your Spark application.

System.load(System.getenv("HADOOP_HOME") + "/lib/hadoop.ddl");

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2	Kanika Jain
Solution 3	andreoss

'Spark throws error when trying to save a CSV file

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]