Real Time Processing of Big Data

The presence of new technologies like social media, email, blogs, GIS, RFID and smart devices have opened for new opportunities in building information based services that align and operate on huge quantities of data.

Common approaches and frameworks for handling Big Data (like Apache Hadoop) operates by running the data processing jobs in batch. Now, this batch-based approach does not work well if you need to process and show results in real time. Storm is an open source framework from Twitter that targets this gap, allowing for real time processing of big data streams. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!

Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.

This gives an overview of its scalable architecture, show examples of real life applications and demonstrate how simple it is to get started.

Video producer:

Related Content: