HDFS has namenode, Hive has metastore, both them have metadata services included.

For big data systems it's important to have a metadata service for indexing the physical data to improve the query performance.

The system has no metadata:

The system has metadata included:

As we see with metadata (Catalog) integrated the querying efficiency can be much improved, since the needed data have been copied across cluster to local for group/sort/join purpose.

Return to home | Generated on 09/29/22