LOAD DATA
Description
LOAD DATA
statement loads the data into a table from the user specified directory or file. If a directory is specified then all the files from the directory are loaded. If a file is specified then only the single file is loaded. Additionally the LOAD DATA
statement takes an optional partition specification. When a partition is specified, the data files (when input source is a directory) or the single file (when input source is a file) are loaded into the partition of the target table.
Syntax
LOAD DATA [ LOCAL ] INPATH path [ OVERWRITE ] INTO TABLE table_name
[ PARTITION ( partition_col_name = partition_col_val [ , ... ] ) ]
Parameters
path
- Path of the file system. It can be either an absolute or a relative path.
table_name
- The name of an existing table.
PARTITION ( partition_col_name = partition_col_val [ , ... ] )
- Specifies one or more partition column and value pairs.
LOCAL
- If specified, it causes the
INPATH
to be resolved against the local file system, instead of the default file system, which is typically a distributed storage.
OVERWRITE
- By default, new data is appended to the table. If
OVERWRITE
is used, the table is instead overwritten with new data.
Examples
-- Example without partition specification.
-- Assuming the students table has already been created and populated.
SELECT * FROM students;
+ -------------- + ------------------------------ + -------------- +
| name | address | student_id |
+ -------------- + ------------------------------ + -------------- +
| Amy Smith | 123 Park Ave, San Jose | 111111 |
+ -------------- + ------------------------------ + -------------- +
CREATE TABLE test_load (name VARCHAR(64), address VARCHAR(64), student_id INT);
-- Assuming the students table is in '/user/hive/warehouse/'
LOAD DATA LOCAL INPATH '/user/hive/warehouse/students' OVERWRITE INTO TABLE test_load;
SELECT * FROM test_load;
+ -------------- + ------------------------------ + -------------- +
| name | address | student_id |
+ -------------- + ------------------------------ + -------------- +
| Amy Smith | 123 Park Ave, San Jose | 111111 |
+ -------------- + ------------------------------ + -------------- +
-- Example with partition specification.
CREATE TABLE test_partition (c1 INT, c2 INT, c3 INT) USING HIVE PARTITIONED BY (c2, c3);
INSERT INTO test_partition PARTITION (c2 = 2, c3 = 3) VALUES (1);
INSERT INTO test_partition PARTITION (c2 = 5, c3 = 6) VALUES (4);
INSERT INTO test_partition PARTITION (c2 = 8, c3 = 9) VALUES (7);
SELECT * FROM test_partition;
+ ------- + ------- + ----- +
| c1 | c2 | c3 |
+ ------- + --------------- +
| 1 | 2 | 3 |
+ ------- + ------- + ----- +
| 4 | 5 | 6 |
+ ------- + ------- + ----- +
| 7 | 8 | 9 |
+ ------- + ------- + ----- +
CREATE TABLE test_load_partition (c1 INT, c2 INT, c3 INT) USING HIVE PARTITIONED BY (c2, c3);
-- Assuming the test_partition table is in '/user/hive/warehouse/'
LOAD DATA LOCAL INPATH '/user/hive/warehouse/test_partition/c2=2/c3=3'
OVERWRITE INTO TABLE test_load_partition PARTITION (c2=2, c3=3);
SELECT * FROM test_load_partition;
+ ------- + ------- + ----- +
| c1 | c2 | c3 |
+ ------- + --------------- +
| 1 | 2 | 3 |
+ ------- + ------- + ----- +