Apache Storm – Part 1

What is Apache Storm?

  • A real time big data processing system
  • Stream based
  • Fault tolerant and distributed
  • Non persistent
  • Written in Clojure and some Java
  • Master / save plus ZooKeeper
  • Big Data Analysis


Apache Storm vs Hadoop
Hadoop

  • Batch / file based
  • Distributed and fault tolerant
  • Master / save plus ZooKeeper
  • persistent, use HDFS
  • Big Data Analysis

Hadoop and Storm are complementary technologies, and can be used in a single system. Storm processes real time streams of data, and Hadoop processes batched data on HDFS.

Apache Storm terms
Tuple – an ordered list of elements
Stream – an unbounded feed of tuples
Spout – a source of streams
Bolt – functions/ filters to process streams
Topologies – ETL like architecture built from sprouts, Streams, Bolts
Nimbus – master node
supervisor – controls worker progresses

storm_topology

Other abstractions on top of Storm
Storm Trident, Storm DRPC
storm_spark_arch

Physical View
storm_physic_view

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInShare on RedditShare on StumbleUponEmail this to someoneShare on TumblrDigg this

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">