1. Always import/export required data. Use where clause wherever possible.
2. Use compression ( --compress ) to reduce data size.
3. Use incremental imports
--incremental append --check-column <column name> --last-value <value>
OR
--incremental lastmodified --check-column <column name> --last-value <value>
4. Use split by (--split-by) to load balance map jobs to process equal number of records
5. Optimally use concurrent map tasks using --m <num-mappers>
6. Use direct mode to speed up data transfer
7. Use batch mode to export the data
Sqoop export you can use –batch argument which uses batch mode for underlying statement execution that will improve performance
8. Custom Boundary Queries
sqoop import --connect <JDBC URL> --username< <USER_NAME> --password <PASSWORD> --query <QUERY> --split-by <ID> --target-dir <TARGET_DIR_URI>
--boundary-query "select min(<ID>), max(<ID>)
from <TABLE>"
2. Use compression ( --compress ) to reduce data size.
3. Use incremental imports
--incremental append --check-column <column name> --last-value <value>
OR
--incremental lastmodified --check-column <column name> --last-value <value>
4. Use split by (--split-by) to load balance map jobs to process equal number of records
5. Optimally use concurrent map tasks using --m <num-mappers>
6. Use direct mode to speed up data transfer
7. Use batch mode to export the data
Sqoop export you can use –batch argument which uses batch mode for underlying statement execution that will improve performance
8. Custom Boundary Queries
sqoop import --connect <JDBC URL> --username< <USER_NAME> --password <PASSWORD> --query <QUERY> --split-by <ID> --target-dir <TARGET_DIR_URI>
--boundary-query "select min(<ID>), max(<ID>)
from <TABLE>"
References:
https://community.hortonworks.com/articles/70258/sqoop-performance-tuning.html
https://dzone.com/articles/apache-sqoop-performance-tuning
https://community.hortonworks.com/articles/70258/sqoop-performance-tuning.html
https://dzone.com/articles/apache-sqoop-performance-tuning
Excellent article. Very interesting to read. I really love to read such a nice article. Thanks! keep rocking.Big Data Hadoop Online Course
ReplyDeleteSqoop performance tuning
ReplyDelete1. Always import/export required data. Use where clause wherever possible.
2. Use compression ( --compress ) to reduce data size.
Big Data Projects For Final Year