Category "hadoop"

Spark-submit not working when application jar is in hdfs

I'm trying to run a spark application using bin/spark-submit. When I reference my application jar inside my local filesystem, it works. However, when I copied m

Spark read from S3 working, but I am unable to write using the same session [duplicate]

I am using a pyspark test script to read and write files to S3. Here is how I initialize the spark-session: import findspark from pyspark.sql

what is difference between partition and replica of a topic in kafka cluster

What is difference between partition and replica of a topic in kafka cluster. I mean both store the copies of messages in a topic. Then what is the real diffre

Hive queries fail when the hive.execution.engine is set to MR, they work when set to Tez?

I am using HDP 2.1 sandbox for my work. The version of hive as listed by the jar file is: hive-exec-0.13.0.2.1.1.0-385.jar. I have created a directory in HDFS

Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z

trying to run MR program version(2.7) in windows 7 64 bit in eclipse while running the above exception occurring . I verified that using 64 bit 1.8 java versi

uploading files to hadoop hdfs?

Hello everyone i m new in using hadoop it is my college work so i am doing some research i have installed hadoop-2.7.3 and i m unable to find tha path where sho

Need to load data from Hadoop to Druid after applying transformations. If I use Spark, can we load data from Spark RDD or dataframe to Druid directly?

I have data present in hive tables. I want to apply bunch of transformations before loading that data into druid. So there are ways but I'm not sure about those

Spark Streaming "ERROR JobScheduler: error in job generator"

I build a spark Streaming application to keep receiving messages from Kafka and then write them into a table HBase. This app runs pretty good for first 25 mins

java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory

I install Hadoop-0.20.2 in windows using cygwin. If i run $ bin/hadoop version Hadoop 0.20.2 Subversion https://svn.apache.org/repos/asf/hadoop/common/branch

Use named_struct function in Hive with all the columns of a table

In Hive, you can use a function named_struct in order to create a list of key value pairs; the keys are usually the column names and the values are the values i

why password less ssh not working?

I connected 3 data nodes(in all these data nodes pass-wordless is working fine) in my cluster which are working fine but when i try to connect another data node