'How does namenode query block information from datanode?

I can query information about how my files (stored in HDFS) are stored in blocks via hdfs fsck / -files -blocks.

As I understand, the namenode does not store information about the location of the blocks of each file and instead loads that information from the datanodes. Reference - Why datanode sends the block location information to namenode?

So if the namenode is querying the information from the datanodes, the namenode should know where the datanodes metadata is located. Which would mean that there technically could be a pathway from -

 NameNode FSImage -> DataNodes metadata -> Info about how data is stored in blocks

There are files that have names that look something like bl*.meta but I think they just contain the checksum information for the blocks and therefore may not be relevant here. Reference - What metadata is stored on a datanode in HDFS?

Where do the datanodes store the file -> block mapping?

Since there could be multiple datanodes with file data split across them, how do I get this information from the NameNode's FSImage/edit logs?



Solution 1:[1]

You can get the block information directly from the bl*.meta. This information is sent to the NameNode when a Datanode comes online. The information is in the meta files but really the position on the disks itself is what is used to generate the meta files when a node starts (implied by hadoop 2.6 manual balancing of disks).

You may find it helpful to study some of the github projects that are created to balance datanode disks as it requires knowledge of metadata.

You may want to look at some of the github tools availble for analyzing FSImage. Which may help you find the information you are looking for.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Matt Andruff