Friday, March 3, 2017

Extract data ( nested columns ) from JSON without specifying schema using PIG

How to extract required data from JSON without specifying schema using PIG?

Sample Json Data:

----------
{"autopolicy": {"policy_holder_name": "someone", "policy_num": "20141012", "is_active": true, "vehicle": {"brand": {"model": "Lexus", "year": 2012}, "vin": "RANDOM123", "price": 23450.50}}}
------

Pig Latin Script to extract the data
-----

REGISTER '/apps/opt/hadoop/pig/lib/elephantbird/json-simple-1.1.1.jar'
REGISTER '/apps/opt/hadoop/pig/lib/elephantbird/elephant-bird-pig-4.3.jar'
REGISTER '/apps/opt/hadoop/pig/lib/elephantbird/elephant-bird-hadoop-compat-4.3.jar'

data = LOAD '/data/json/test1.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
vehiclemodel= FOREACH data GENERATE $0#'autopolicy'#'vehicle'#'brand'#'model' as model;
dump vehiclemodel;

-----

No comments:

Post a Comment