Advertisement

Sri Lanka's First and Only Platform for Luxury Houses and Apartment for Sale, Rent

Tuesday, November 1, 2016

Real Time Streaming Data Processing with Arduino + Raspberry Pi + Apache Kafka + Apache Spark + MQTT

World is constantly going towards a fully automated environment where Smart Industries to Self Driving Cars coming into the scene everyday with the hype of Internet of Things (IoT). Almost all of these automation generate Gigabytes if not Terabytes of sensor data which needs to be processed to make sense of what is happening in the automation.

So for me the most important thing of this would be the ability to develop Scale-able backend systems which could process these data and make them into information which can be used to make strategic decisions whether it be in a Smart Industry to predict which part needs to be replaced and for Self Driving Cars when to hit the Garage for a Service.

With this in mind I researched on how to build these types of backend and found out that Apache Spark is a go to open source framework tailor made for this. It has the capability to process data in Batch, Real Time and to make sense of those data with its Machine Learning and Graphing capabilities.
This made me intrigued to learn more about it and this is my attempt on trying to implement an end to end system from pushing sensor data to real time processing of it.

For this I am using a hypothetical Smart Home with a Thermostat which sends data to a backend  server which then processes and understands what's going on. For Demo purpose I will be sending Temperature and Humidity readings from My home hall to the Backend Server via the following flow and calculate in real time the minimum, maximum and average readings of Temperature and Humidity in every 30 second window. 

System Architecture 


Arduino with Ethernet Shield and DHT22 Sensor

/*
MQTT Arduino Publisher Sketch
*/
#include <SPI.h>
#include <Ethernet.h>
#include <PubSubClient.h>
#include <dht.h>
#define DHT11_PIN 8
#define TOPIC "LK/Shazin/Home/Hall/ThermoStat"
// Update these with values suitable for your network.
byte mac[] = { 0xDE, 0xED, 0xBA, 0xFE, 0xFE, 0xED };
byte server[] = { 192, 168, 1, 4 };
byte ip[] = { 192, 168, 1, 10 };
byte gserver[] = { 64, 233, 187, 99 }; // Google
dht DHT;
void callback(char* topic, byte* payload, unsigned int length) {
// handle message arrived
}
EthernetClient ethClient;
PubSubClient client(server, 1883, callback, ethClient);
void setup()
{
Serial.begin(9600);
Ethernet.begin(mac, ip);
Serial.println("Ethernet Registered");
if (client.connect("arduinoClient")) {
client.publish(TOPIC,0);
Serial.println("Topic Registered");
}
}
void loop()
{
int chk = DHT.read11(DHT11_PIN);
switch (chk)
{
case DHTLIB_OK:
Serial.print("OK,\t");
break;
case DHTLIB_ERROR_CHECKSUM:
Serial.print("Checksum error,\t");
break;
case DHTLIB_ERROR_TIMEOUT:
Serial.print("Time out error,\t");
break;
case DHTLIB_ERROR_CONNECT:
Serial.print("Connect error,\t");
break;
case DHTLIB_ERROR_ACK_L:
Serial.print("Ack Low error,\t");
break;
case DHTLIB_ERROR_ACK_H:
Serial.print("Ack High error,\t");
break;
default:
Serial.print("Unknown error,\t");
break;
}
Serial.print(DHT.humidity, 1);
Serial.print(",\t");
Serial.println(DHT.temperature, 1);
int temp = (int) DHT.temperature;
int humi = (int) DHT.humidity;
String str = "{\"temperature\":"+String(temp)+",\"humidity\":"+String(humi)+"}";
char c[str.length()+1];
// Convert to a String
str.toCharArray(c, str.length()+1);
// Publish to topic
client.publish(TOPIC, c);
Serial.print("sent ");
Serial.println(c);
client.loop();
delay(100);
}


Demonstration

In the demonstration I show how the DHT22 Sensor Readings are coming into Apache Kafka and then finally being processed in Apache Spark. To differentiate the Temperature and Humidity I placed a Tub of Ice Cream near the DHT22 Sensor.




References
  1. https://github.com/iotresearcher/mqtt-kafka-bridge
  2. https://github.com/iotresearcher/spark-streaming
  3. https://www.infoq.com/articles/apache-spark-introduction
  4. https://kafka.apache.org/quickstart