Friday, November 1, 2019

Dedup logic in Spark SQL

Dedup logic in Spark SQL or Hive:

select
    *
from (select
    *
   ,(row_number() over (partition by user_id order by mts_trckng_rowkey)) as alias_1
    from DB_NAME.TABLE_NAME
    where dt = '20191025'
) alias_2
WHERE alias_2.alias_1 = 1;

Dedup logic in Scala:

import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions._

// Dedup logic : Remove duplicate records by eventid and DataSourceKey combination
val Data_DF_Final_Dedup = Data_DF_Final.withColumn("ROWNUM", row_number().over(Window.partitionBy(col("Id"), col("DataSourceKey")).orderBy($"Updateddate".desc))).filter("ROWNUM = 1").drop("ROWNUM")

display(Data_DF_Final_Dedup)

ADB Spark SQL

%sql 

select * from
(
select  eventid, 
row_number() OVER (PARTITION BY Id ORDER BY Updateddate DESC) alias1
from parquet.`abfss://xy@abc.dfs.core.windows.net/dataproducts/test/v1/sen/full`
where id = 115894
) alias2
where alias2.alias1=1

Wednesday, October 2, 2019

Regular Expression Matching

Given an input string (s) and a pattern (p), implement regular expression matching with support for '.' and '*'.
'.' Matches any single character.
'*' Matches zero or more of the preceding element.
The matching should cover the entire input string (not partial).
Note:
  • s could be empty and contains only lowercase letters a-z.
  • p could be empty and contains only lowercase letters a-z, and characters like . or *.
Example 1:
Input:
s = "aa"
p = "a"
Output: false
Explanation: "a" does not match the entire string "aa".
Example 2:
Input:
s = "aa"
p = "a*"
Output: true
Explanation: '*' means zero or more of the preceding element, 'a'. Therefore, by repeating 'a' once, it becomes "aa".
Example 3:
Input:
s = "ab"
p = ".*"
Output: true
Explanation: ".*" means "zero or more (*) of any character (.)".
Example 4:
Input:
s = "aab"
p = "c*a*b"
Output: true
Explanation: c can be repeated 0 times, a can be repeated 1 time. Therefore, it matches "aab".
Example 5:
Input:
s = "mississippi"
p = "mis*is*p*."
Output: false

Sol:

class Solution {
    public boolean isMatch(String text, String pattern) {
        if (pattern.isEmpty()) return text.isEmpty();
        boolean first_match = (!text.isEmpty() &&
                               (pattern.charAt(0) == text.charAt(0) || pattern.charAt(0) == '.'));

        if (pattern.length() >= 2 && pattern.charAt(1) == '*'){
            return (isMatch(text, pattern.substring(2)) ||
                    (first_match && isMatch(text.substring(1), pattern)));
        } else {
            return first_match && isMatch(text.substring(1), pattern.substring(1));
        }
    }
}

Thursday, September 26, 2019

Celebrity Problem


The Celebrity Problem


In a party of N people, only one person is known to everyone. Such a person may be present in the party, if yes, (s)he doesn’t know anyone in the party. We can only ask questions like “does A know B? “. Find the stranger (celebrity) in minimum number of questions.
We can describe the problem input as an array of numbers/characters representing persons in the party. We also have a hypothetical function HaveAcquaintance(A, B) which returns true if A knows B, false otherwise. How can we solve the problem.
Sol:
(Using two Pointers)
The idea is to use two pointers, one from start and one from the end. Assume the start person is A, and the end person is B. If A knows B, then A must not be the celebrity. Else, B must not be the celebrity. We will find a celebrity candidate at the end of the loop. Go through each person again and check whether this is the celebrity. 

// Person with 2 is celebrity
    static int MATRIX[][] = { { 0, 0, 1, 0 },
                               { 0, 0, 1, 0 }, 
                              { 0, 0, 0, 0 },
                              { 0, 0, 1, 0 } };
  
    // Returns true if a knows
    // b, false otherwise
    static boolean knows(int a, int b) 
    {
        boolean res = (MATRIX[a][b] == 1) ? 
                                     true
                                     false;
        return res;
    }


int findCelebrity(int n) 
    {
        // Initialize two pointers 
        // as two corners
        int a = 0;
        int b = n - 1;
          
        // Keep moving while 
        // the two pointers
        // don't become same.
        while (a < b) 
        {
            if (knows(a, b))
                a++;
            else
                b--;
        }
  
        // Check if a is actually 
        // a celebrity or not
        for (int i = 0; i < n; i++) 
        {
            // If any person doesn't 
            // know 'a' or 'a' doesn't
            // know any person, return -1
            if (i != a && (knows(a, i) || 
                           !knows(i, a)))
                return -1;
        }
        return a;
    }