Category "amazon-athena"

double quotes on all keys and all values in athena query result

Preface: I defined an Athena table in AWS, using s3 as the source (defined it manually without glue crawler). The files contain data from Eventbridge, and each

AWS Athena MSCK REPAIR TABLE "table_name" Error adding new partitions

When trying to refresh the partitions in a AWS Athena/Glue table I am getting this error line 1:1: mismatched input 'MSCK'. Expecting: 'ALTER', 'ANALYZE', 'CAL

row_number is not unique for duplicate records

I am trying to find the latest update of a particular row from a bunch of rows per uuid. For that we use row_number() over a partition as shown below, "row_numb

How to transform data in Amazon Athena

I have some data in S3 location in json format. It have 4 columns val, time__stamp, name and type. I would like to create an external Athena table from this dat

Amazon Athena get data from the past one hour

I have some data rows in AWS Athena table and I am trying to get the data from the last 1 hour. I am using awswrangler, I will post my snippet below. Basically,

AWS Athena: partition table for S3 bucket with non-standard file structure

I'm very new to Athena and I'm having a little bit of hard time understanding how partitioning works and if it can work for me. I have files in S3 in the follow

AthenaQueryError: Athena query failed: "NOT_SUPPORTED: Unsupported Hive type

I recently ran into the following error "AthenaQueryError: Athena query failed: "NOT_SUPPORTED: Unsupported Hive type", and for this I followed this stack overf

How to retrieve data from different AWS regions for my glue job?

I have Glue DBs(db1 and db2) and tables(tbl1 and tbl2) available in different AWS regions(eu-west-1 and us-east-1) respectively. My glue job in eu-west-1, needs

How to get the partition column names for a table?

I have a table that is partitioned on one or more columns. I can do ... SHOW PARTITIONS table_db.table_1 which gives a list of all partitions like this, year=2

Is there any other ways that I can specify output file size or number of output files using Athena except for "Bucketing"?

I understand that I can set the number or size of files using "Bucketing" method (Refer to this guide: https://aws.amazon.com/premiumsupport/knowledge-center/se

Athena count/Sum column divided by count/Sum column returns zero

I am trying to find percentage based on id column. issue - I am trying to use count(column)/select count(column) from table which is giving output as 'Zero' Tab

AWS Athena/Presto SQL: Having trouble getting null values

I am doing a query in aws Athena where I want to get some total values, however I am having issues getting a column where the values are null, this column somet

Why do I get ThrottlingException - Rate Exceeded status:400 when making AWS Athena API call from API server?

We have an S3 data lake in AWS (with Lake Formation, Glue etc.) The end goal is to query the S3 data sources using SQL in Athena. When making the query in the A

how to read json.snappy file from athena

I have input file in s3 bucket with .json.snappy compression and I am trying to read through athena table. I tried using different serde 'org.apache.hive.hcatal

Amazon Athena external lambda function (udf) - create view

I am trying to create an external function in Athena using AWS Lambda function. I am able to do so and query successfully using Athena query editor. Code is bel

Need to select values which do not contain '-' or '[0-9] or '.' by example not like '-123.423' using Athena [closed]

need to find values in numeric_column(string) that don't contain '-' or '[0-9] or '.' I am a little bit novice in Athena... so honestly don't

How to create External Table without specifying columns in Redshift?

I have a folder containing files in parquet format. I used crawler to create table defined in Glue Data Catalog which counted to 2500+ columns. I want to create

AWS Athena table from python output with dates - dates get wrongly converted

I have a pandas DataFrame containing a date column ("2022-02-02"). I write this table to parquet using pyarrow. df[col] = df[col].astype(str) df.to_parquet(loc)

Delta Table / Athena And Spark

I have my delta table, which can be read from Athena. When I try to get the data through a query from spark I get the following error: Caused by: org.apache.sp

Copy and Merge files to another S3 bucket

I have a source bucket where small 5KB JSON files will be inserted every second. I want to use AWS Athena to query the files by using an AWS Glue Datasource and