Friday, April 28, 2017

Micro-services

1. What is a micro-service?

A microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare minimum of centralized management of these services, which may be written in different programming languages and use different data storage technologies. (http://www.martinfowler.com/articles/microservices.html)

2. Can you explain what are the advantages of micro-services over a monolithic project?

  • Change cycles are tied together in a monolithic application; a change made to a small part of the application requires the entire monolithc to be rebuilt and deployed.
  • Scaling of a monolithic application requires scaling of the entire application rather than parts of it that require greater resource.
  • Different tools/frameworks/languages can be employed by different microservices, allowing developers to choose the best tools for a specific task.

Wednesday, April 19, 2017

Over and Window in Hive

The OVER clause
  • OVER with standard aggregates:
    • COUNT
    • SUM
    • MIN
    • MAX
    • AVG
  • OVER with a PARTITION BY statement with one or more partitioning columns of any primitive datatype.
  • OVER with PARTITION BY and ORDER BY with one or more partitioning and/or ordering columns of any datatype.
    • OVER with a window specification. Windows can be defined separately in a WINDOW clause. Window specifications support the following formats:
      (ROWS | RANGE) BETWEEN (UNBOUNDED | [num]) PRECEDING AND ([num] PRECEDING | CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
      (ROWS | RANGE) BETWEEN CURRENT ROW AND (CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
      (ROWS | RANGE) BETWEEN [num] FOLLOWING AND (UNBOUNDED | [num]) FOLLOWING
      When ORDER BY is specified with missing WINDOW clause, the WINDOW specification defaults to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
      When both ORDER BY and WINDOW clauses are missing, the WINDOW specification defaults to ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

PARTITION BY with one partitioning column, no ORDER BY or window specification

SELECT a, COUNT(b) OVER (PARTITION BY c)
FROM T;

PARTITION BY with two partitioning columns, no ORDER BY or window specification

SELECT a, COUNT(b) OVER (PARTITION BY c, d)
FROM T;

PARTITION BY with one partitioning column, one ORDER BY column, and no window specification

SELECT a, SUM(b) OVER (PARTITION BY ORDER BY d)
FROM T;

PARTITION BY with two partitioning columns, two ORDER BY columns, and no window specification

SELECT a, SUM(b) OVER (PARTITION BY c, d ORDER BY e, f)
FROM T;

PARTITION BY with partitioning, ORDER BY, and window specification

SELECT a, SUM(b) OVER (PARTITION BY ORDER BY ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM T;
SELECT a, AVG(b) OVER (PARTITION BY ORDER BY ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
FROM T;
SELECT a, AVG(b) OVER (PARTITION BY ORDER BY ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING)
FROM T;
SELECT a, AVG(b) OVER (PARTITION BY ORDER BY ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM T;

There can be multiple OVER clauses in a single query. A single OVER clause only applies to the immediately preceding function call. In this example, the first OVER clause applies to COUNT(b) and the second OVER clause applies to SUM(b):
SELECT 
 a,
 COUNT(b) OVER (PARTITION BY c),
 SUM(b) OVER (PARTITION BY c)
FROM T;

Aliases can be used as well, with or without the keyword AS:
SELECT 
 a,
 COUNT(b) OVER (PARTITION BY c) AS b_count,
 SUM(b) OVER (PARTITION BY c) b_sum
FROM T;

WINDOW clause

SELECT a, SUM(b) OVER w
FROM T
WINDOW w AS (PARTITION BY ORDER BY ROWS UNBOUNDED PRECEDING);

Distinct counting for each partition

SELECT a, COUNT(distinct a) OVER (PARTITION BY b)
FROM T;

More Example are available here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics