Donation

If you found the contents in this blog useful, then please make a donation to keep this blog running. You can make donations via Skrill with email address shazin.sadakath@gmail.com

Tuesday, November 1, 2016

Real Time Streaming Data Processing with Arduino + Raspberry Pi + Apache Kafka + Apache Spark + MQTT

World is constantly going towards a fully automated environment where Smart Industries to Self Driving Cars coming into the scene everyday with the hype of Internet of Things (IoT). Almost all of these automation generate Gigabytes if not Terabytes of sensor data which needs to be processed to make sense of what is happening in the automation.

So for me the most important thing of this would be the ability to develop Scale-able backend systems which could process these data and make them into information which can be used to make strategic decisions whether it be in a Smart Industry to predict which part needs to be replaced and for Self Driving Cars when to hit the Garage for a Service.

With this in mind I researched on how to build these types of backend and found out that Apache Spark is a go to open source framework tailor made for this. It has the capability to process data in Batch, Real Time and to make sense of those data with its Machine Learning and Graphing capabilities.
This made me intrigued to learn more about it and this is my attempt on trying to implement an end to end system from pushing sensor data to real time processing of it.

For this I am using a hypothetical Smart Home with a Thermostat which sends data to a backend  server which then processes and understands what's going on. For Demo purpose I will be sending Temperature and Humidity readings from My home hall to the Backend Server via the following flow and calculate in real time the minimum, maximum and average readings of Temperature and Humidity in every 30 second window. 

System Architecture 


Arduino with Ethernet Shield and DHT22 Sensor



Demonstration

In the demonstration I show how the DHT22 Sensor Readings are coming into Apache Kafka and then finally being processed in Apache Spark. To differentiate the Temperature and Humidity I placed a Tub of Ice Cream near the DHT22 Sensor.




References
  1. https://github.com/iotresearcher/mqtt-kafka-bridge
  2. https://github.com/iotresearcher/spark-streaming
  3. https://www.infoq.com/articles/apache-spark-introduction
  4. https://kafka.apache.org/quickstart