What is Apache Storm?
- A real time big data processing system
 - Stream based
 - Fault tolerant and distributed
 - Non persistent
 - Written in Clojure and some Java
 - Master / save plus ZooKeeper
 - Big Data Analysis
 
Apache Storm vs Hadoop
Hadoop
- Batch / file based
 - Distributed and fault tolerant
 - Master / save plus ZooKeeper
 - persistent, use HDFS
 - Big Data Analysis
 
Hadoop and Storm are complementary technologies, and can be used in a single system. Storm processes real time streams of data, and Hadoop processes batched data on HDFS.
Apache Storm terms
Tuple – an ordered list of elements
Stream – an unbounded feed of tuples
Spout – a source of streams
Bolt – functions/ filters to process streams
Topologies – ETL like architecture built from sprouts, Streams, Bolts
Nimbus – master node
supervisor – controls worker progresses
Other abstractions on top of Storm
Storm Trident, Storm DRPC











