'Find Create Date of files from script generated file. hadoop/hdfs
I have a CMD outputs like below
hdfs fsck /data -files -blocks -locations -openforwrite
/data/prod/encrypt/fin1/dt=202203/FlumeData.1647955937413.tmp 378 bytes, replicated: repli cation=3, 1 block(s): OK OPENFORWRITE
- BP-16266705-1.1.1.1-1481807115:blk_1141513_67850462 len=378 Live_repl=3 [DatanodeInfoWithSto rage[1.1.1.1:9866,DS-b6ff1d67-d842-49cb-8d3a-cd185bc03,DISK], DatanodeInfoWithStorage[1.1.1.1 0:9866,DS-f69ec6eb-5884-4ab1-a583-5640336dd41d,DISK], DatanodeInfoWithStorage[1.1.1.1:9866,DS-16f08976-40ab-b369-dbdc0a268611,DISK]]
From thousands of std output I'm only interested in files with OPENFORWRITE tag thus I run the below command.
hdfs fsck /data -files -blocks -locations -openforwrite |grep -i openforwrite
/data/prod/encrypt/fin1/dt=202203/FlumeData.1647955937413.tmp 20075 bytes, replicated: replication=3, 1 block(s), OPENFORWRITE: OK
/data/prod/encrypt/fin1/dt=202203/FlumeData.1647955937413.tmp 20075 bytes, replicated: replication=3, 1 block(s), OPENFORWRITE: OK
/data/prod/encrypt/fin1/dt=202203/FlumeData.1647955937413.tmp 20075 bytes, replicated: replication=3, 1 block(s), OPENFORWRITE: OK
Now from the above output I need to find/delete the file which are 7 days old, I thought it should be simple so I ran my command as below but it's not able to get the date or date diff.
hdfs dfs -ls |
grep OPENFORWRITE |
cut -d " " -f1 |
awk '
BEGIN{ MIN=10080; LAST=60*MIN; "date +%s" | getline NOW }
{
cmd="date -d'\''"$1" "$2"'\'' +%s"; cmd | getline WHEN;
DIFF=NOW-WHEN; if(DIFF < LAST){ print $3 }
}
'
date: invalid date '/data/prod/encrypt/fin1/dt=202203/FlumeData.1647955937413.tmp'
How can I resolve this. Thanks MK
Solution 1:[1]
Suggesting to read list of lines into lines Array using readarray command.
Then scan each file in a line via find command. Filtering with -mtime 7
Note the differece: -mtime 7 file modified exactly 7 days ago.
-mtime +7 file modified more than 7 days ago.
-mtime -7 file modified less than 7 days ago.
Note filters can combine time ranges logic filters: For example to filter a file that is 4-5 days old
-mtime -6 -and -mtime +4
Suggesting:
readarray -t linesArr <<< $(hdfs fsck /data -files -blocks -locations -openforwrite |grep -i openforwrite)
for line in ${linesArr[@]}; do
currFile=$(echo $line| awk '{print $1}')
if [[ $(find $currFile -mtime 7) ]]; then
echo "$line"
fi
done
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
