'Need to merge multiple hive partitions into one partition in spark
I have around 50 partitions in hive table. I need to merge each set of partitions into one partition. I tried to use rename partition command. But getting error message.
Need help in merging multiple hive partitions into one partition in spark
ALTER TABLE db.table PARTITION (appname='SCORING',indicator='segment_id:1|process_date:20220417') RENAME TO PARTITION (appname='SCORING',indicator='process_date:20220417')
ALTER TABLE db.table PARTITION (appname='SCORING',indicator='segment_id:3|process_date:20220417') RENAME TO PARTITION (appname='SCORING',indicator='process_date:20220417')
ALTER TABLE db.table PARTITION (appname='SCORING',indicator='segment_id:4|process_date:20220417') RENAME TO PARTITION (appname='SCORING',indicator='process_date:20220417')
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename partition. Partition already exists:db.table
Solution 1:[1]
You can do this by using a sql statement distribute by
.
In spark programmign language there are more tools to change the partitions.
You can use partitionby to repartition in spark.
or you could write a select to grab the partitioned data. Then you can use coalece or repartition to create 1 partition.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Matt Andruff |