Real-Time Archive • 3rd man

14. September 2019 Big Data, PostgreSQL, Time Series

PostgreSQL & Grafana – Real-time data processing pipeline – Part 6

Believe it or not, we are getting to the end of this small series about a potential real-time data processing pipeline.
In this final part I will show how Grafana can retrieve our pipeline data from PostgreSQL and visualize it as a graph. But before we dive into it, let’s have a quick recap of the previous topics.

So the bit that is still missing, is the visualization of data.

Starting the docker image

To keep things simple, I … more

31. August 2019 Apache Spark, Apache Zeppelin, Big Data, Database, PostgreSQL, Scala

Spark to PostgreSQL – Real-time data processing pipeline – Part 5

Previously I have demonstrated how streaming data can be read and transformed in Apache Spark. This time I use Spark to persist that data in PostgreSQL.

Quick recap – Spark and JDBC

As mentioned in the post related to ActiveMQ, Spark and Bahir, Spark does not provide a JDBC sink out of the box. Therefore, I will have to use the foreach sink and implement an extension of the org.apache.spark.sql.ForeachWriter. It will take each individual data row and write it to PostgreSQL.

Preparing PostgreSQL

Even though I want to use PostgreSQL, I am actually

… more

27. July 2019 Apache Spark, Apache Zeppelin, Big Data, Scala, Time Series

Real-time data processing pipeline – Part 4 – data transformation

So far, we know how to get our streaming data from ActiveMQ into Spark by using Bahir. On this basis, it is now time to implement the data transformation required to get to the desired output format.

Extract

As a quick reminder, here is the Scala code that I have used so far to retrieve the data from ActiveMQ and write it to a memory sink.

%spark

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._

// create a named session
val spark = SparkSession
    .builder()
    .appName("ReadMQTTStream")
    .getOrCreate()

// read data from the OscStream topic
val mqttDf =

… more

20. July 2019 Apache Spark, Apache Zeppelin, Big Data, Scala, Time Series

ActiveMQ, Spark & Bahir – Real-time data processing pipeline – Part 3

Having explained how to visually simulate sensor data and how to get it into ActiveMQ during the first two parts, it is now time to explore an initial setup that allows Apache Spark to read a data stream from ActiveMQ using Bahir.

Things to be aware of

Before we start, there are a couple of things you should be aware of, in case you want to follow along and try this out yourself.

Spark & Bahir version matching

As stated above, I would like to subscribe to an ActiveMQ topic with Apache Bahir to transfer

… more

22. June 2019 .NET Core, Big Data, Docker, Time Series

Real-time data processing pipeline – Part 2 – OSC to ActiveMQ

Welcome back to the second part of my series, showcasing a real-time data processing pipeline!
In part 1, I explored visual real-time sensor data simulation, as the entry point into our pipeline.
Now it’s time to find out, how we can get the generated data into Apache ActiveMQ, by transferring it via the OSC protocol.

Apache ActiveMQ™ is the most popular open source, multi-protocol, Java-based messaging server. It supports a variety of Cross Language Clients and Protocols, and therefore makes it an excellent choice for our pipeline.

Get ActiveMQ up and running

I won’t … more

Search Site

Tag: Real-Time

PostgreSQL & Grafana – Real-time data processing pipeline – Part 6

Starting the docker image

Spark to PostgreSQL – Real-time data processing pipeline – Part 5

Quick recap – Spark and JDBC

Preparing PostgreSQL

Real-time data processing pipeline – Part 4 – data transformation

Extract

ActiveMQ, Spark & Bahir – Real-time data processing pipeline – Part 3

Things to be aware of

Spark & Bahir version matching

Real-time data processing pipeline – Part 2 – OSC to ActiveMQ

Get ActiveMQ up and running