I'm trying to run a spark application using bin/spark-submit. When I reference my application jar inside my local filesystem, it works. However, when I copied m
I am using a pyspark test script to read and write files to S3. Here is how I initialize the spark-session: import findspark from pyspark.sql
What is difference between partition and replica of a topic in kafka cluster. I mean both store the copies of messages in a topic. Then what is the real diffre
I am using HDP 2.1 sandbox for my work. The version of hive as listed by the jar file is: hive-exec-0.13.0.2.1.1.0-385.jar. I have created a directory in HDFS
trying to run MR program version(2.7) in windows 7 64 bit in eclipse while running the above exception occurring . I verified that using 64 bit 1.8 java versi
Hello everyone i m new in using hadoop it is my college work so i am doing some research i have installed hadoop-2.7.3 and i m unable to find tha path where sho
I have data present in hive tables. I want to apply bunch of transformations before loading that data into druid. So there are ways but I'm not sure about those
I build a spark Streaming application to keep receiving messages from Kafka and then write them into a table HBase. This app runs pretty good for first 25 mins
I install Hadoop-0.20.2 in windows using cygwin. If i run $ bin/hadoop version Hadoop 0.20.2 Subversion https://svn.apache.org/repos/asf/hadoop/common/branch
In Hive, you can use a function named_struct in order to create a list of key value pairs; the keys are usually the column names and the values are the values i
I connected 3 data nodes(in all these data nodes pass-wordless is working fine) in my cluster which are working fine but when i try to connect another data node