Wednesday, December 21, 2016

Sqoop performance tuning

1. Always import/export required data.  Use where clause wherever possible.

2. Use compression ( --compress ) to reduce data size.

3. Use incremental imports 
     --incremental append --check-column <column name> --last-value <value>
     OR
    --incremental lastmodified --check-column <column name> --last-value <value> 

4. Use split by (--split-by) to load balance map jobs to process equal number of records

5. Optimally use concurrent map tasks using --m <num-mappers>

6. Use direct mode to speed up data transfer

7. Use batch mode to export the data
Sqoop export you can use –batch argument which uses batch mode for underlying statement execution that will improve performance

8. Custom Boundary Queries

sqoop import   --connect <JDBC URL>   --username< <USER_NAME>   --password <PASSWORD>   --query <QUERY>   --split-by <ID>  --target-dir <TARGET_DIR_URI> 
 --boundary-query "select min(<ID>), max(<ID>)

from <TABLE>"

References: 
https://community.hortonworks.com/articles/70258/sqoop-performance-tuning.html
https://dzone.com/articles/apache-sqoop-performance-tuning


1 comment:

  1. Excellent article. Very interesting to read. I really love to read such a nice article. Thanks! keep rocking.Big Data Hadoop Online Course

    ReplyDelete