How to Import Data From Oracle database to HDFS using Sqoop.
Rather than repeat the import command along with connection related input required, each time, you can pass an options file as an argument to sqoop. Create a text file, as follows, and save it someplace, locally on the node you are running the sqoop client on.
Note: This blog does not cover it, but you can create and use a password file as well, that you can pass as argument --password-file <<filename>>.
2.1. Sample Options file:
2.2.2 Data file contents:
$ hadoop fs -cat sqoop-mysql/departments/part-m-00000 | more
d009,Customer Service
d005,Development
d002,Finance
d003,Human Resources
d001,Marketing
d004,Production
d006,Quality Management
d008,Research
d007,Sales
$ sqoop --options-file SqoopImportOptions.txt \
--table employees \
--where "emp_no > 499948" \
--as-textfile \
-m 1 \
--target-dir /user/vchennar/sqoop-mysql/employeeGtTest
$ sqoop --options-file SqoopImportOptions.txt \
--query 'select EMP_NO,FIRST_NAME,LAST_NAME from employees where $CONDITIONS' \
-m 1 \
--target-dir /user/vchennar/sqoop-mysql/employeeFrfrmQrySmpl2
Command:
$ sqoop --options-file SqoopImportAllTablesOptions.txt \
--direct \
--warehouse-dir sqoop-mysql/EmployeeDatabase
Options file content:
$ more SqoopImportAllTablesOptions.txt
______________________________________________
#
#Options file for sqoop import
#
import-all-tables
--connect
jdbc:mysql://vchennar-mySqlServer-node/employees
--username
myUID
--password
myPWD
#
#All other commands should be specified in the command line
The eval tool allows users to quickly run simple SQL queries against a database; results are printed to the console. This allows users to preview their import queries to ensure they import the data they expect.
Install Oracle JDBC Driver
You can download the JDBC Driver from the Oracle website, for example http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-112010-090769.html. You must accept the license agreement before you can download the driver. Download theojdbc6.jar file and copy it to sqoop lib directory:
$ sudo cp ojdbc6.jar sqoop/lib
$ sudo cp ojdbc6.jar sqoop/lib
$ chmod -R 755 sqoop/lib
Import Data from Oracle to HDFS
Use below command to import the data
$ sqoop import -connect jdbc:oracle:thin:apps/apps@10.20.66.6:1521:vis1211 -username apps -P -table ap_invoices_all -columns "INVOICE_ID,VENDOR_ID" -target-dir /apinvoices -m 1
Required items for above command:
IPv4 Address – 10.20.66.6
Database Name – apps
Table Name – ap_invoices_allinvoices_all
Username – apps
Password – apps
Output Directory – Could be any. I have used apinvoices
Below commands may help you to identify if there is any issue
$ ping 10.20.66.6
$ nc -z 10.20.66.6 1521
Refer Sqoop user guide https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html for more details
Verify the Data on HDFS
Open the Browser and go to the URL: http://localhost:50070/dfshealth.jsp
Click on Browse the filesystem
Click on apinvoices directory
Click on part-m-00000 file to see the data.
Sqoop list commands
Run the commands on the Unix prompt, on the node where you have sqoop installed.
1. List databases
Lists databases in your mysql database.
$ sqoop list-databases --connect jdbc:mysql://<<mysql-server>>/employees --username vchennar --password myPassword
.
.
.
INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
employees
test
2. List tables
Lists tables in your mysql database.
$ sqoop list-tables --connect jdbc:mysql://<<mysql-server>>/employees --username vchennar --password myPassword
.
.
.
INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
departments
dept_emp
dept_manager
employees
employees_exp_stg
employees_export
salaries
titles
Importing data in MySql into HDFS
Replace "vchennar-mySqlServer-node" with the host name of the node running mySQL server, replace login credentials and target directory.
1. Importing a table into HDFS - basic import
$ sqoop import \
--connect jdbc:mysql://vchennar-mySqlServer-node/employees \
--username myUID \
--password myPWD \
--table employees \
-m 1 \
--target-dir /user/vchennar/sqoop-mysql/employees
--connect jdbc:mysql://vchennar-mySqlServer-node/employees \
--username myUID \
--password myPWD \
--table employees \
-m 1 \
--target-dir /user/vchennar/sqoop-mysql/employees
.
.
.
.9139 KB/sec)
INFO mapreduce.ImportJobBase: Retrieved 300024 records
2. Executing imports with an options file for static information
Note: This blog does not cover it, but you can create and use a password file as well, that you can pass as argument --password-file <<filename>>.
2.1. Sample Options file:
___________________________________________________________________________
$ vi SqoopImportOptions.txt
#
#Options file for sqoop import
#
import
--connect
jdbc:mysql://vchennar-mySqlServer-node/employees
--username
myUID
--password
myPwd
#
#All other commands should be specified in the command line
___________________________________________________________________________
2.2. Command to execute import, using an options file:
2.2. Command to execute import, using an options file:
Note: Provide the proper path for the options file
2.2.1. The command
$ sqoop --options-file SqoopImportOptions.txt \
--table departments \
-m 1 \
--target-dir /user/vchennar/sqoop-mysql/departments
--table departments \
-m 1 \
--target-dir /user/vchennar/sqoop-mysql/departments
.
.
.
INFO mapreduce.ImportJobBase: Transferred 153 bytes in 26.2453 seconds (5.8296 bytes/sec)
INFO mapreduce.ImportJobBase: Retrieved 9 records.
-m argument is to specify number of mappers. The department table has a handful of records, so I am setting it to 1.
2.2.2 Data file contents:
$ hadoop fs -cat sqoop-mysql/departments/part-m-00000 | more
d009,Customer Service
d005,Development
d002,Finance
d003,Human Resources
d001,Marketing
d004,Production
d006,Quality Management
d008,Research
d007,Sales
Import all columns, filter rows using where clause
--table employees \
--where "emp_no > 499948" \
--as-textfile \
-m 1 \
--target-dir /user/vchennar/sqoop-mysql/employeeGtTest
Import with a free form query without where clause
--query 'select EMP_NO,FIRST_NAME,LAST_NAME from employees where $CONDITIONS' \
-m 1 \
--target-dir /user/vchennar/sqoop-mysql/employeeFrfrmQrySmpl2
(Case of the column needs to match that used to create table, or else the import fails)
Import with a free form query with where clause
$ sqoop --options-file SqoopImportOptions.txt \
--query 'select EMP_NO,FIRST_NAME,LAST_NAME from employees where EMP_NO < 20000 AND $CONDITIONS' \
-m 1 \
--target-dir /user/vchennar/sqoop-mysql/employeeFrfrmQry1
$ sqoop import --connect <connect> --username <username> --password <password> --query "select * from table WHERE rownum<10 OR \$CONDITIONS" --target-dir=/location -m 1 --verbose
Split by
$ sqoop --options-file SqoopImportOptions.txt \
--query 'select EMP_NO,FIRST_NAME,LAST_NAME from employees where $CONDITIONS' \
--query 'select EMP_NO,FIRST_NAME,LAST_NAME from employees where $CONDITIONS' \
--split-by EMP_NO \
--direct \
--target-dir /user/vchennar/sqoop-mysql/SplitByExampleImportImport all tables
Command:
$ sqoop --options-file SqoopImportAllTablesOptions.txt \
--direct \
--warehouse-dir sqoop-mysql/EmployeeDatabase
Options file content:
$ more SqoopImportAllTablesOptions.txt
______________________________________________
#
#Options file for sqoop import
#
import-all-tables
--connect
jdbc:mysql://vchennar-mySqlServer-node/employees
--username
myUID
--password
myPWD
#
#All other commands should be specified in the command line
______________________________________________
Direct and quick queries or inserts and updates with Sqoop eval
1. Query:
$ sqoop eval --connect jdbc:mysql://vchennar-mySqlServer-node/employees \
--username myUID \
--password myPWD \
--query "select * from employees limit 2"
---------------------------------------------------------------------------------
| emp_no | birth_date | first_name | last_name | gender | hire_date |
---------------------------------------------------------------------------------
| 10001 | 1953-09-02 | Georgi | Facello | M | 1986-06-26 |
| 10002 | 1964-06-02 | Bezalel | Simmel | F | 1985-11-21 |
2. Insert:
sqoop eval --connect jdbc:mysql://vchennar-mySqlServer-node/employees \
--username myUID \
--password myPWD \
-e "insert into employees_export values(550000,'1977-08-08','Mouse','Mickey','M','1999-04-12')"
This comment has been removed by the author.
ReplyDelete