.NET for Apache Spark 0.4.0 was released recently. Therefore, it is now time to test, if it can be used for MQTT Streaming as well.
If you followed
my series about a real-time data processing pipeline, you probably remember that I have used Apache Bahir to retrieve streaming data from Apache ActiveMQ via the MQTT protocol. The data itself was generated by IanniX and forwarded to ActiveMQ utilizing my osc2activemq docker image. Preparing .NET for Apache Spark for MQTT Streaming
For the most parts, you can follow this
quick intro tutorial, which walks … more
.NET for Apache Spark 0.4.0 has been released.
If you want to test it out, you might find my Docker image useful. Details are available at https://hub.docker.com/r/3rdman/dotnet-spark
There’s a new image available. Click here for more details. Quick reference
The image is based on
Ubuntu 18.04, Apache Spark 2.4.3 with Hadoop 2.7, .NET Core 2.1.801 and .NET for Apache Spark 0.4.0. It is intended for the purpose of testing .NET for Apache Spark, without the need to install the required bits manually. Per default, the related container will start up one … more
So far, we know
how to get our streaming data from ActiveMQ into Spark by using Bahir. On this basis, it is now time to implement the data transformation required to get to the desired output format. Extract
As a quick reminder, here is the Scala code that I have used so far to retrieve the data from ActiveMQ and write it to a memory sink.
// create a named session
val spark = SparkSession
// read data from the OscStream topic
val mqttDf =
Having explained how to
visually simulate sensor data and how to get it into ActiveMQ during the first two parts, it is now time to explore an initial setup that allows Apache Spark to read a data stream from ActiveMQ using Bahir. Things to be aware of
Before we start, there are a couple of things you should be aware of, in case you want to follow along and try this out yourself.
Spark & Bahir version matching
As stated above, I would like to subscribe to an ActiveMQ topic with Apache Bahir to transfer
With the multitude of existing projects and solutions related to real-time data processing out there, it can be very easy to get lost in all the available options.
That is why I have started this blog series. I want to showcase an example pipeline that covers the topic of real-time data processing, from beginning (data generation) to the end (data presentation).
Generation on the left, data presentation on the right side
Below is an overview of the pipeline that I am going to use.
Here are the links to the related articles.