Sunday, February 26, 2017

Spark Architecture



Driver:
  • Entry point for Spark Shell
  • The place where SparkContext is created
  • Translates RDD into execution graph
  • Splits graph into stages
  • Schedules tasks and controls their execution
Cluster manager
  • An external service for acquiring resources on the cluster (e.g. standalone manager, Mesos, YARN)
Worker node
  • Any node that can run application code in the cluster
Executor
  • A process launched for an application on a worker node, that runs tasks and keeps data in memory or disk storage across them. Each application has its own executors.
Task
  • A unit of work that will be sent to one executor
Job
  • A parallel computation consisting of multiple tasks that gets spawned in response to a Spark action (e.g. savecollect); you'll see this term used in the driver's logs.
Stage
  • Each job gets divided into smaller sets of tasks called stages that depend on each other (similar to the map and reduce stages in MapReduce); you'll see this term used in the driver's logs.

No comments:

Post a Comment