Tuesday, February 28, 2017

How to write UDF in Pig

While writing UDF’s using Java, we can create and use the following three types of functions −
  • Filter Functions − The filter functions are used as conditions in filter statements. These functions accept a Pig value as input and return a Boolean value.
  • Eval Functions − The Eval functions are used in FOREACH-GENERATE statements. These functions accept a Pig value as input and return a Pig result.
  • Algebraic Functions − The Algebraic functions act on inner bags in a FOREACHGENERATE statement. These functions are used to perform full MapReduce operations on an inner bag.

Eval/Filter Functions

  • Each UDF must extend the EvalFunc (or) FilterFunc class
  • Provide implementation to exec() function
  • Create a jar file with UDF class
  • Register jar file in pig script
  • Define and use it 

Extend EvalFunc Class and implement exec() function

public class SimpleUDF extends EvalFunc<String>{
   
   public String exec(Tuple input) throws IOException {

      .........
     }
   } 

OR

Extend FilterFunc Class and implement exec() function

public class SimpleUDF extends FilterFunc {

    @Override
    public Boolean exec(Tuple input) throws IOException {
       ....
    }
} 

Register jar file

REGISTER '/$PIG_HOME/simpleUDF.jar'

Define the alias for simpleUDF as shown below.
DEFINE simpleUDF simpleUDF();

Use it in pig

grunt> Upper_case = FOREACH emp_data GENERATE simpleUDF(name);




1 comment:

  1. I just want to know about UDF in pig and found this post is perfect one ,Thanks for sharing the informative post of UDF in Pig and able to understand the concepts easily,Thoroughly enjoyed reading
    Also Check out the : https://www.credosystemz.com/training-in-chennai/best-hadoop-training-in-chennai/

    ReplyDelete