'How to connect to remote HDFS

i am trying to connect to an HDFS instance running on a remote machine.

I am running eclipse on a windows machine and the HDFS is running on a Unix box. Here is what i have tried

         Configuration conf = new Configuration();
         conf.set("fs.defaultFS", "hdfs://remoteHostName:portNumber");
         DFSClient client = null;
         System.out.println("try");
         try 
         {
             System.out.println("trying");   
             client = new DFSClient(conf);

        System.out.println(client);
         } 
         catch (IOException e) {

             e.printStackTrace();
        }

         finally {
             if(client!=null)
                 try {
                    client.close();
                } catch (IOException e) {

                    e.printStackTrace();
                }


         }

but this gives me the following exception

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.ipc.RPC.getProxy(Ljava/lang/Class;JLjava/net/InetSocketAddress;Lorg/apache/hadoop/security/UserGroupInformation;Lorg/apache/hadoop/conf/Configuration;Ljavax/net/SocketFactory;ILorg/apache/hadoop/io/retry/RetryPolicy;Z)Lorg/apache/hadoop/ipc/VersionedProtocol;
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:135)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:280)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:245)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:235)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:226)

by the way, i got the portNumber from the hdfs-site.xml on the remote machine

Is this approach correct?

Also, would it be easier to do this in Python?

EDIT

Note that i do have the Hadoop binaries unzipped on my windows and i have set the HADOOP_HOME environment variable accordingly. Could this be causing a problem?



Solution 1:[1]

See: Hadoop 2.6.0 Browsing filesystem Java for your specific problem.

Beyond that, you might want to consider using REST for remote interactions. Apache Knox can provide you with access to the remote cluster and shield your code from having to know cluster internals such as host:port, kerberos or not, etc. These things can change out from under your remote clients.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Community