Category "parquet"

includeOpForFullLoad property on Parquet files - AWS DMS

I have tried includeOpForFullLoad property for csv files while reading from oracle and writing in to s3. I wanted to use this property for parquet files. I can

Presto fails to import PARQUET files from S3

I have a presto table that imports PARQUET files based on partitions from s3 as follows: create table hive.data.datadump ( tUnixEpoch varchar, tDateTi

sparkR sql() returns string

We have parquet data saved on a server and I am trying to use SparkR sql() function in the following ways df <- sql("SELECT * FROM parquet.`<path to parq

Efficient way to read specific columns from parquet file in spark

What is the most efficient way to read only a subset of columns in spark from a parquet file that has many columns? Is using spark.read.format("parquet").load(

Py4JJavaError when trying to write pyspark DataFrame to parquet

I wanted to convert a large .csv vile into .parquet format using pyspark. I am using python 3. I tried changing the codec used for compression, as suggested in

How/Where can I write time series data? As Parquet format to Hadoop, or HBase, Cassandra?

I have real-time time series sensor data. My primary goal is to keep the raw data. I should do this so that the cost of storage is minimal. My scenario like th

Pyspark throwing error while trying to read parquet

I am a newbie in pyspark, While trying to read parquet file through pyspark I get the below error. I have tried various things like reinstallation of jre and jd

Convert csv to parquet file using python

I am trying to convert a .csv file to a .parquet file. The csv file (Temp.csv) has the following format 1,Jon,Doe,Denver I am using the following python

Best way to store many pandas dataframes in a single file

I have 20,000 ~1000-row dataframes, each of which has a name, in a 170GB pickle file at the moment. I'd like to write these to a file these so I can load them i

How do I write this Python code to use 2+ fewer nested if statements?

I have the following code which I use to loop through row groups in a parquet metadata file to find the maximum values for columns i,j,k across the whole file.

How to read partitioned parquet files from S3 using pyarrow in python

I looking for ways to read data from multiple partitioned directories from s3 using python. data_folder/serial_number=1/cur_date=20-12-2012/abcdsd0324324.snapp

Python: Obtain number of rows for ParquetDataset?

How do I obtain the number of rows of a ParquetDataset that is structured in the form of a folder containing multiple parquet files. I tried from pyarrow.parq

Category "parquet"

Other Categories