Oozie Sqoop Action Extension
IMPORTANT: The Sqoop action requires Apache Hadoop 0.23.
The sqoop action runs a Sqoop job.
The workflow job will wait until the Sqoop job completes before continuing to the next action.
To run the Sqoop job, you have to configure the sqoop action with the =job-tracker=, name-node and Sqoop command or arg elements as well as configuration.
A sqoop action can be configured to create or delete HDFS directories before starting the Sqoop job.
Sqoop configuration can be specified with a file, using the job-xml element, and inline, using the configuration elements.
Oozie EL expressions can be used in the inline configuration. Property values specified in the configuration element override values specified in the job-xml file.
Note that Hadoop mapred.job.tracker and fs.default.name properties must not be present in the inline configuration.
As with Hadoop map-reduce jobs, it is possible to add files and archives in order to make them available to the Sqoop job. Refer to the [WorkflowFunctionalSpec#FilesAchives][Adding Files and Archives for the Job] section for more information about this feature.
Syntax:
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1"> ... <action name="[NODE-NAME]"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>[JOB-TRACKER]</job-tracker> <name-node>[NAME-NODE]</name-node> <prepare> <delete path="[PATH]"/> ... <mkdir path="[PATH]"/> ... </prepare> <configuration> <property> <name>[PROPERTY-NAME]</name> <value>[PROPERTY-VALUE]</value> </property> ... </configuration> <command>[SQOOP-COMMAND]</command> <arg>[SQOOP-ARGUMENT]</arg> ... <file>[FILE-PATH]</file> ... <archive>[FILE-PATH]</archive> ... </sqoop> <ok to="[NODE-NAME]"/> <error to="[NODE-NAME]"/> </action> ... </workflow-app>
The prepare element, if present, indicates a list of paths to delete or create before starting the job. Specified paths must start with hdfs://HOST:PORT .
The job-xml element, if present, specifies a file containing configuration for the Sqoop job. As of schema 0.3, multiple job-xml elements are allowed in order to specify multiple job.xml files.
The configuration element, if present, contains configuration properties that are passed to the Sqoop job.
Sqoop command
The Sqoop command can be specified either using the command element or multiple arg elements.
When using the command element, Oozie will split the command on every space into multiple arguments.
When using the arg elements, Oozie will pass each argument value as an argument to Sqoop.
The arg variant should be used when there are spaces within a single argument.
Consult the Sqoop documentation for a complete list of valid Sqoop commands.
All the above elements can be parameterized (templatized) using EL expressions.
Examples:
Using the command element:
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1"> ... <action name="myfirsthivejob"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-traker>foo:8021</job-tracker> <name-node>bar:8020</name-node> <prepare> <delete path="${jobOutput}"/> </prepare> <configuration> <property> <name>mapred.compress.map.output</name> <value>true</value> </property> </configuration> <command>import --connect jdbc:hsqldb:file:db.hsqldb --table TT --target-dir hdfs://localhost:8020/user/tucu/foo -m 1</command> </sqoop> <ok to="myotherjob"/> <error to="errorcleanup"/> </action> ... </workflow-app>
The same Sqoop action using arg elements:
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1"> ... <action name="myfirsthivejob"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-traker>foo:8021</job-tracker> <name-node>bar:8020</name-node> <prepare> <delete path="${jobOutput}"/> </prepare> <configuration> <property> <name>mapred.compress.map.output</name> <value>true</value> </property> </configuration> <arg>import</arg> <arg>--connect</arg> <arg>jdbc:hsqldb:file:db.hsqldb</arg> <arg>--table</arg> <arg>TT</arg> <arg>--target-dir</arg> <arg>hdfs://localhost:8020/user/tucu/foo</arg> <arg>-m</arg> <arg>1</arg> </sqoop> <ok to="myotherjob"/> <error to="errorcleanup"/> </action> ... </workflow-app>
NOTE: The arg elements syntax, while more verbose, allows to have spaces in a single argument, something useful when using free from queries.
https://oozie.apache.org/docs/4.0.0/DG_SqoopActionExtension.html
very informative blog and useful article thank you for sharing with us , keep posting learn moreBig Data Hadoop Online Course
ReplyDelete