Driver:
- Entry point for Spark Shell
- The place where SparkContext is created
- Translates RDD into execution graph
- Splits graph into stages
- Schedules tasks and controls their execution
Cluster manager
- An external service for acquiring resources on the cluster (e.g. standalone manager, Mesos, YARN)
Worker node
- Any node that can run application code in the cluster
- A process launched for an application on a worker node, that runs tasks and keeps data in memory or disk storage across them. Each application has its own executors.
- A unit of work that will be sent to one executor
Job
- A parallel computation consisting of multiple tasks that gets spawned in response to a Spark action (e.g.
save
,collect
); you'll see this term used in the driver's logs.
Stage
- Each job gets divided into smaller sets of tasks called stages that depend on each other (similar to the map and reduce stages in MapReduce); you'll see this term used in the driver's logs.
No comments:
Post a Comment