Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.
Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more
Architecture of Apache Storm
Storm Components
Topology :
In simple words, Topology is a network of spouts and bolts as in above figure. It is analogous to a MR Job in Hadoop. It is a graph of computation consisting of spouts and bolts. Spouts as data stream source tasks and Bolts as actual processing tasks.
Spout :
Spout is the entry point in a storm topology. It is the source of streams in the topology. A spout connects to the actual data source such as a message queue as Kafka , gets continuous data , converts the actual data into stream of tuples, emits them to bolts for actual processing.
Bolt :Bolt contains the actual processing logic. It works only on streams and can emit streams too for further processing downstream by other bolts or can export/save data for persistent storage. It receives stream from either one or more spouts or some other bolts. Bolts can do anything from run functions, filter tuples, do streaming aggregations, do streaming joins, talk to databases, and more.
Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more
Architecture of Apache Storm
Storm Components
Topology :
In simple words, Topology is a network of spouts and bolts as in above figure. It is analogous to a MR Job in Hadoop. It is a graph of computation consisting of spouts and bolts. Spouts as data stream source tasks and Bolts as actual processing tasks.
Spout :
Spout is the entry point in a storm topology. It is the source of streams in the topology. A spout connects to the actual data source such as a message queue as Kafka , gets continuous data , converts the actual data into stream of tuples, emits them to bolts for actual processing.
Bolt :Bolt contains the actual processing logic. It works only on streams and can emit streams too for further processing downstream by other bolts or can export/save data for persistent storage. It receives stream from either one or more spouts or some other bolts. Bolts can do anything from run functions, filter tuples, do streaming aggregations, do streaming joins, talk to databases, and more.
No comments:
Post a Comment