While writing UDF’s using Java, we can create and use the following three types of functions −
- Filter Functions − The filter functions are used as conditions in filter statements. These functions accept a Pig value as input and return a Boolean value.
- Eval Functions − The Eval functions are used in FOREACH-GENERATE statements. These functions accept a Pig value as input and return a Pig result.
- Algebraic Functions − The Algebraic functions act on inner bags in a FOREACHGENERATE statement. These functions are used to perform full MapReduce operations on an inner bag.
Eval/Filter Functions
- Each UDF must extend the EvalFunc (or) FilterFunc class
- Provide implementation to exec() function
- Create a jar file with UDF class
- Register jar file in pig script
- Define and use it
Extend EvalFunc Class and implement exec() function
public class SimpleUDF extends EvalFunc<String>{ public String exec(Tuple input) throws IOException { ......... } }
OR
Extend FilterFunc Class and implement exec() function
public class SimpleUDF extends FilterFunc {
@Override
public Boolean exec(Tuple input) throws IOException {
....
}
}
Register jar file
REGISTER '/$PIG_HOME/simpleUDF.jar'
Define the alias for simpleUDF as shown below.
DEFINE simpleUDF simpleUDF();
Use it in pig
grunt> Upper_case = FOREACH emp_data GENERATE simpleUDF(name);
I just want to know about UDF in pig and found this post is perfect one ,Thanks for sharing the informative post of UDF in Pig and able to understand the concepts easily,Thoroughly enjoyed reading
ReplyDeleteAlso Check out the : https://www.credosystemz.com/training-in-chennai/best-hadoop-training-in-chennai/