how to load data into hive to skip headers

When loading data into hive there are so many cases we have to deal with the headers, such as header in CSV file.

In Spark we can sepcify the option("header",false) to skip header, but in hive we have to specify the option when creating tables.

Create a table like this format:

 hive> create external table states3 (states string) location '/tmp/test/states3' 
     > tblproperties("skip.header.line.count"="2");

This command will create a dir automatically for "localtion". Now we have to put file in this location.

 $ cat states3.txt 
 STATE_NAME
 ----------
 california
 ohio
 north dakota
 new york
 colorado
 new jersey
   
 $ hdfs dfs -put states3.txt /tmp/test/states3

Give a query below, and we could see the headers were skipped.

 hive> select * from states3;
 OK
 california
 ohio
 north dakota
 new york
 colorado
 new jersey
 Time taken: 0.437 seconds, Fetched: 6 row(s)

Return to home | Generated on 09/29/22