This presentation gives developers insight into how to model data in Cassandra, how to integrate Cassandra and Hadoop, and how to build big data platforms suitable for both batch and real-time processing while maintaining low latency response times suitable for web applications.
This session goes through various product development where Cassandra has shown to be the best choice. Focusing on the primary use-case: a tracking solution that collects raw time-series data in c* and aggregates it near real-time using Hadoop into various new datasets from advert-centric statistics to user-centric behavioural analysis. The talk covers the final technical design chosen after three years of development iterations, touching on technologies: scribe, thrift, kafka, hadoop, pig, mahout; the hurdles faced along the way, integration improvements done between cassandra and hadoop, and the throughput and performance of today’s systems.
Video producer: http://www.javazone.no