When loading data into hive there are so many cases we have to deal with the headers, such as header in CSV file.

In Spark we can sepcify the option("header",false) to skip header, but in hive we have to specify the option when creating tables.

Create a table like this format:

 hive> create external table states3 (states string) location '/tmp/test/states3' 
> tblproperties("skip.header.line.count"="2");

This command will create a dir automatically for "localtion". Now we have to put file in this location.

 $ cat states3.txt 
STATE_NAME
----------
california
ohio
north dakota
new york
colorado
new jersey

$ hdfs dfs -put states3.txt /tmp/test/states3

Give a query below, and we could see the headers were skipped.

 hive> select * from states3;
OK
california
ohio
north dakota
new york
colorado
new jersey
Time taken: 0.437 seconds, Fetched: 6 row(s)

Return to home | Generated on 09/29/22