Apache Storm main concepts
Topology
Stream
Spout
Bolt
Stream groupings
Reliability
Tasks
Workers
Topology :
A topology is a graph of spouts and bolts that are connected with stream groupings. A Storm topology is analogous to a MapReduce job. One key difference is that a MapReduce job eventually finishes, whereas a topology runs forever (or until you kill it, of course).
Stream :
A stream is an unbounded sequence of tuples that is processed and created in parallel in a distributed fashion. The stream is the core abstraction in Storm. Streams are defined with a schema that names the fields in the stream's tuples.
Spout :
A spout is a source of streams in a topology. Generally spouts will read tuples from an external source (e.g. Kafka) and emit them into the topology.
The main method on spouts is nextTuple. nextTuple either emits a new tuple into the topology or simply returns if there are no new tuples to emit.
IRichSpout: this is the interface that spouts must implement.
Bolt :
All processing in topologies is done in bolt. Bolt can do anything from filtering, functions, aggregations, joins, talking to databases, and more. Bolt can do simple stream transformations. Doing complex stream transformations often requires multiple steps and thus multiple bolts.
The main method in bolt is the execute method which takes in as input a new tuple. Bolt emit new tuples using the OutputCollector object. Its perfectly fine to launch new threads in bolts that do processing asynchronously. OutputCollector is thread-safe and can be called at any time.
Ref : http://storm.apache.org/releases/2.0.0-SNAPSHOT/Concepts.html
Topology
Stream
Spout
Bolt
Stream groupings
Reliability
Tasks
Workers
Topology :
A topology is a graph of spouts and bolts that are connected with stream groupings. A Storm topology is analogous to a MapReduce job. One key difference is that a MapReduce job eventually finishes, whereas a topology runs forever (or until you kill it, of course).
Stream :
A stream is an unbounded sequence of tuples that is processed and created in parallel in a distributed fashion. The stream is the core abstraction in Storm. Streams are defined with a schema that names the fields in the stream's tuples.
Spout :
A spout is a source of streams in a topology. Generally spouts will read tuples from an external source (e.g. Kafka) and emit them into the topology.
The main method on spouts is nextTuple. nextTuple either emits a new tuple into the topology or simply returns if there are no new tuples to emit.
IRichSpout: this is the interface that spouts must implement.
Bolt :
All processing in topologies is done in bolt. Bolt can do anything from filtering, functions, aggregations, joins, talking to databases, and more. Bolt can do simple stream transformations. Doing complex stream transformations often requires multiple steps and thus multiple bolts.
The main method in bolt is the execute method which takes in as input a new tuple. Bolt emit new tuples using the OutputCollector object. Its perfectly fine to launch new threads in bolts that do processing asynchronously. OutputCollector is thread-safe and can be called at any time.
Ref : http://storm.apache.org/releases/2.0.0-SNAPSHOT/Concepts.html
No comments:
Post a Comment