'How to read parquet file data partitioned on column from AWS S3 using python
I have saved the below table using pyspark to AWS S3, partitioned by column "channel_name". using below code.
df.write.option("header",True) \
.partitionBy("channel_name") \
.mode('append')\
.parquet("s3://path")
| start_timestamp | channel_name | value |
|---|---|---|
| 2020-11-02 08:51:50 | velocity | 1 |
| 2020-11-02 09:14:29 | Temp | 0 |
| 2020-11-02 09:18:32 | velocity | 0 |
| 2020-11-02 09:32:42 | velocity | 4 |
| 2020-11-03 13:06:03 | Temp | 2 |
| 2020-11-03 13:10:01 | Temp | 1 |
| 2020-11-03 13:54:38 | Temp | 5 |
| 2020-11-03 14:46:25 | velocity | 5 |
| 2020-11-03 14:57:31 | Kilometer | 6 |
| 2020-11-03 15:07:07 | Kilometer | 7 |
But i want to read same data which is partitoned on column "channel_name" using python, its not working, it is excluding that partitioned column "channel_name". below is code i tried with AWSwrangler.
import awswrangler as wr
df = wr.s3.read_parquet(path="s3://shreyasbigdata/Prod_test_item_id=V214944/")
It looks like this, but i want that "channel_name" column also.
| start_timestamp | value |
|---|---|
| 2020-11-02 08:51:50 | 1 |
| 2020-11-02 09:14:29 | 0 |
| 2020-11-02 09:18:32 | 0 |
| 2020-11-02 09:32:42 | 4 |
| 2020-11-03 13:06:03 | 2 |
| 2020-11-03 13:10:01 | 1 |
| 2020-11-03 13:54:38 | 5 |
| 2020-11-03 14:46:25 | 5 |
| 2020-11-03 14:57:31 | 6 |
| 2020-11-03 15:07:07 | 7 |
I tried with different libraries but its not working. Would be great if you help me to read all the columns including partitioned one.
Solution 1:[1]
I got the Answer thank you
import s3fs
import pyarrow.parquet as pq
fs = s3fs.S3FileSystem()
bucket = 'bucket_name'
path = 'path_of_folder' #if its a directory omit the traling /
bucket_uri = f's3://{bucket}/{path}'
dataset = pq.ParquetDataset(bucket_uri, filesystem=fs)
table = dataset.read()
df = table.to_pandas()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | SSS |
