Real-time data processing pipeline – Part 4 – data transformation

So far, we know how to get our streaming data from ActiveMQ into Spark by using Bahir. On this basis, it is now time to implement the data transformation required to get to the desired output format.


As a quick reminder, here is the Scala code that I have used so far to retrieve the data from ActiveMQ and write it to a memory sink.


import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._

// create a named session
val spark = SparkSession

// read data from the OscStream topic
val mqttDf = 

ActiveMQ, Spark & Bahir – Real-time data processing pipeline – Part 3

ActiveMQ, Spark & Bahir - Real-time data processing pipeline – Part 3

Having explained how to visually simulate sensor data and how to get it into ActiveMQ during the first two parts, it is now time to explore an initial setup that allows Apache Spark to read a data stream from ActiveMQ using Bahir.

Things to be aware of

Before we start, there are a couple of things you should be aware of, in case you want to follow along and try this out yourself.

Spark & Bahir version matching

Apache Bahir Spark Extensions 2.3.3

As stated above, I would like to subscribe to an ActiveMQ topic with Apache Bahir to transfer


Real-time data processing pipeline – Part 2 – OSC to ActiveMQ

Real-time data processing pipeline – Part 2 – OSC to ActiveMQ

Welcome back to the second part of my series, showcasing a real-time data processing pipeline!
In part 1, I explored visual real-time sensor data simulation, as the entry point into our pipeline.
Now it’s time to find out, how we can get the generated data into Apache ActiveMQ, by transferring it via the OSC protocol.

Apache ActiveMQ™ is the most popular open source, multi-protocol, Java-based messaging server. It supports a variety of Cross Language Clients and Protocols, and therefore makes it an excellent choice for our pipeline.

Get ActiveMQ up and running

I won’t … more

Real-time data processing pipeline – Part 1 – Visual time series data generation

This is the first part of my series to showcase a potential pipeline for real-time data processing. An overview about the different components that I am going to use can be found here.
So let’s get started and find out how real-time sensor data can be simulated, as each pipeline needs to start somewhere.


There may be times when you need to generate continuous numeric data that allows you to test your real-time streaming processing pipeline. One common approach is to generate this data by code, which, however, can come with some drawbacks.

  • A
Scroll to top