'fsimage contains deleted path

I fetch fsimage data from HDFS using following cmds:

hdfs dfsadmin -fetchImage /home/data/scripts/tmp/fsimage.raw
hdfs oiv -p Delimited -delimiter ,  -t  temporaryDir -i /home/data/scripts/tmp/fsimage.raw -o /home/data/scripts/tmp/fsimage.csv

The fsimage data are parsed as expected. However, I can find some removed paths in it:

  • this is a normal path:

    /user/history/done_intermediate/admin/job_1639970873836_92653.summary,3,2022-01-19 17:05,2022-01-19 17:05,134217728,1,489,0,0,-rwxrwx---,admin,supergroup

  • this is an abnormal path, it was removed some days ago:

    /home/hive/warehouse/nhn_fi_ii_test.db/tableA/pt_d=2019-08-13,0,2021-12-14 11:20,1970-01-01 08:00,0,0,0,-1,-1,drwxr-xr-x,hdfs,supergroup

I have 2 questions:

  1. Why the removed files can still be remained in fsimage?
  2. How to distinguish them? I try to filter them by the 4th column the 'last access time', if it equals '1970', then this is a deleted path...Is this correct?

Any help is appreciated.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source