'Hive retrieving meta data too slow

Situation : I am aiming to retrieve location info for a list of hive external tables. For one table the easiest way is just using show create table table_name but I have quite a number of table, so I am finding alternative to achieve this. I managed to find there is the sys db in hive.

It seems the db is storing meta info of all tables, and I found the table sds that is storing these location info.

However when I query this sds table with the simplest where query select * from sds where sd_id = a_sd_id searching for info of only one table. It takes more than 50 seconds to return the result.

On the other hand, what is weird is that if I try to retrieve the same info using show create table the_table_name command, all table info include the location info is returned in 0.05 second .

So now my questions is when I trigger show create table, where did hive retrieve these info? Is it the same source when I query from the sys.sds table? If the two are the same source then the huge time gap between the ways cannot be explained.

Could anyone help cast some light on why the situation turns out like this and how can I retrieve the location info as I expected, i.e. retrieving from mysql metastore which can return as fast as the show create table command? I suppose the show create table should be accessing the mysql. But if the sys db is a mapping of the mysql db, why the query on these tables return 100 times slower than the show create?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source