Category "hive"

Error while trying to create external table in hive

I am trying to create an external table using hive with hadoop but somehow it failed. These are the error I get when I try to run my queries. 02:23:29.516 [Hive

What's the data format of Athena's .csv.metadata files?

What's the data format of the .csv.metadata files written by Amazon Athena? Alongside the output file of every query there is a metadata file. It looks like it

Hive - double precision

I have been working on hive and found something peculiar. Basically, while using double as a datatype for your column we need not have any precision specified (

encountered : identifier expected cross, having,inner left, limit,order,right,where,commacausedby:exception syntax error

my query looks like but I am getting error select a.account_number,b.reference_acc from hdd.master_record format1 a join hdd.monetary b on a.load_date = b.load_

'hiveserver2 not listening on port 10000 and 10001'

When I run: hive --service hiveserver2 --hiveconf hive.server2.thrift.port=10000 --hiveconf hive.root.logger=INFO,console It shows Starting HiveServer2

What is the difference between -hivevar and -hiveconf?

From hive -h : --hiveconf <property=value> Use value for given property --hivevar <key=value> Variable subsitution to apply to hive

How to find the min of multiple values in HIVE?

Hive has min(col) to find the minimum value of a column. But how about finding the minimum of multiple values (NOT one column), for example select min(2,1,3,4

Error :org.apache.thrift.transport.TTransportException java.net.SocketException: Broken pipe (Write failed) (State=08S01,code=0)

Our one of the Gateway machines getting a continuous error on Hive. While we are trying to execute any(select, Insert and drop) command in a beeline, frequent

How to store dynamically generated JSON object in Big Query Table?

I have a use case to store dynamic JSON objects in a column in Big Query. The schema of the object is dynamically generated by the source and not known beforeha

Hive Error : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

I have got twitter data using flume on HDFS. Have 3 node cluster and MySQL Metastore for hive. When i execute below query select user_name.screen_name, user_n

error : currently subquery expressions are only allowed as where clause predicates in hive

I have written a hive query language as below. It is giving me error as written in title. the query is : SELECT clnt_nbr, CASE WHEN clnt_nbr i

SparkSQL error: collect_set() cannot have map type data

For SparkSQL on hive, when I used named_struct in the query, it returns results: SELECT id, collect_set(emp_info) as employee_info FROM ( SELECT t.id,

Save Spark dataframe as dynamic partitioned table in Hive

I have a sample application working to read from csv files into a dataframe. The dataframe can be stored to a Hive table in parquet format using the method df.

output/echo a meesage in hql/ hive query language

I need to create a hive.hql as follows. HIVE.hql: select * from tabel1; select * from table2; My question is: can i echo any message to my console like " re

reading and writing from hive tables with spark after aggregation

We have a hive warehouse, and wanted to use spark for various tasks (mainly classification). At times write the results back as a hive table. For example, we wr

Cannot connect to hive using beeline, user root cannot impersonate anonymous

I'm trying to connect to hive using beeline !connect jdbc:hive2://localhost:10000 and I'm being asked for a username and password Connecting to jdbc:hive2://l

Pandas dataframe in pyspark to hive

How to send a pandas dataframe to a hive table? I know if I have a spark dataframe, I can register it to a temporary table using df.registerTempTable("table_

Pandas dataframe in pyspark to hive

How to send a pandas dataframe to a hive table? I know if I have a spark dataframe, I can register it to a temporary table using df.registerTempTable("table_

Access Hive Data from Java

I need to acces the data in Hive, from Java.According to the documentation for Hive JDBC Driver,the current JDBC driver can only be used to query data from def

How to make MSCK REPAIR TABLE execute automatically in AWS Athena

I have a Spark batch job which is executed hourly. Each run generates and stores new data in S3 with the directory naming pattern DATA/YEAR=?/MONTH=?/DATE=?/dat