'hive: how long does compaction run?
Hive version: 3.1.0.3.1.4.0-315 spark version: 2.3.2.3.1.4.0-315
Basically, i am trying to read transactional table data from spark. As per this page [https://stackoverflow.com/questions/50254590/how-to-read-orc-transaction-hive-table-in-spark][1], found that transactional table has to be compacted. Hence, i want to try this approach.
I am new to this and was trying compaction on delta files but it always shows "initiated" and never complete. This is happening for both Major and Minor compaction. Any help will be highly appreciated.
- I want to know whether is this good approach.
- Also, how to monitor the compaction job process other than show compactions? i can only see the line "Compaction enqueued with id 1" from the hiveserver_stdout.log.
- Generally, how long does this compaction takes to complete?
- is there any way to stop the compactions?
TIA.
[Edited]
SHOW COMPACTIONS;
+---------------+-----------+----------------+----------------+--------+------------+-----------+-------------+---------------+--------------+
| compactionid | dbname | tabname | partname | type | state | workerid | starttime | duration | hadoopjobid |
+---------------+-----------+----------------+----------------+--------+------------+-----------+-------------+---------------+--------------+
| CompactionId | Database | Table | Partition | Type | State | Worker | Start Time | Duration(ms) | HadoopJobId |
| 1 | tmp | shop_na2 | dt=2014-00-00 | MAJOR | initiated | --- | --- | --- | --- |
| 2 | tmp | na2_check | dt=2014-00-00 | MINOR | initiated | --- | --- | --- | --- |
+---------------+-----------+----------------+----------------+--------+------------+-----------+-------------+---------------+--------------+
3 rows selected (0.408 seconds)
The same compactions result has been showing for past 36 hours, though retention period has been set as 86400 sec.
Solution 1:[1]
It is advised to perform this operation when the load on the cluster is less, maybe initiate over a weekend when there are less jobs running, it is a resource intensive operation and amount of time depends on the data but a moderate quantity of deltas would span multiple hours. You can use the query SHOW COMPACTIONS; to get an update on the status of compaction including the following details
Database name
Table name
Partition name
Major or minor compaction
Compaction state:
Initiated - waiting in queue
Working - currently compacting
Ready for cleaning - compaction completed and old files scheduled for removal
Thread ID
Start time of compaction
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Anand Satheesh |
