'Find Create Date of files from script generated file. hadoop/hdfs

I have a CMD outputs like below

hdfs fsck /data -files -blocks -locations -openforwrite

/data/prod/encrypt/fin1/dt=202203/FlumeData.1647955937413.tmp 378 bytes, replicated: repli                                                        cation=3, 1 block(s):  OK OPENFORWRITE
 - BP-16266705-1.1.1.1-1481807115:blk_1141513_67850462 len=378 Live_repl=3  [DatanodeInfoWithSto                                                        rage[1.1.1.1:9866,DS-b6ff1d67-d842-49cb-8d3a-cd185bc03,DISK], DatanodeInfoWithStorage[1.1.1.1                                                        0:9866,DS-f69ec6eb-5884-4ab1-a583-5640336dd41d,DISK], DatanodeInfoWithStorage[1.1.1.1:9866,DS-16f08976-40ab-b369-dbdc0a268611,DISK]]

From thousands of std output I'm only interested in files with OPENFORWRITE tag thus I run the below command.

hdfs fsck /data -files -blocks -locations -openforwrite |grep -i openforwrite

/data/prod/encrypt/fin1/dt=202203/FlumeData.1647955937413.tmp 20075 bytes, replicated: replication=3, 1 block(s), OPENFORWRITE:  OK
/data/prod/encrypt/fin1/dt=202203/FlumeData.1647955937413.tmp 20075 bytes, replicated: replication=3, 1 block(s), OPENFORWRITE:  OK
/data/prod/encrypt/fin1/dt=202203/FlumeData.1647955937413.tmp 20075 bytes, replicated: replication=3, 1 block(s), OPENFORWRITE:  OK

Now from the above output I need to find/delete the file which are 7 days old, I thought it should be simple so I ran my command as below but it's not able to get the date or date diff.

hdfs dfs -ls |
grep OPENFORWRITE |
cut  -d " " -f1 |
awk '
    BEGIN{ MIN=10080; LAST=60*MIN; "date +%s" | getline NOW }
    {
        cmd="date -d'\''"$1" "$2"'\'' +%s"; cmd | getline WHEN;
        DIFF=NOW-WHEN; if(DIFF < LAST){ print $3 } 
   }
'

date: invalid date '/data/prod/encrypt/fin1/dt=202203/FlumeData.1647955937413.tmp'

How can I resolve this. Thanks MK



Solution 1:[1]

Suggesting to read list of lines into lines Array using readarray command.

Then scan each file in a line via find command. Filtering with -mtime 7

Note the differece: -mtime 7 file modified exactly 7 days ago.

-mtime +7 file modified more than 7 days ago.

-mtime -7 file modified less than 7 days ago.

Note filters can combine time ranges logic filters: For example to filter a file that is 4-5 days old

-mtime -6 -and -mtime +4

Suggesting:

readarray -t linesArr <<< $(hdfs fsck /data -files -blocks -locations -openforwrite |grep -i openforwrite)
for line in ${linesArr[@]}; do 
   currFile=$(echo $line| awk '{print $1}')
   if [[ $(find $currFile -mtime 7) ]]; then
      echo "$line"
   fi
done 

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1