Build .NET for Apache Spark with VS Code in a browser

Build .NET for Apache Spark with VS Code in a browser

My last article explained how you can use .NET for Apache Spark together with Entity Framework to stream data to an SQL Server. There is one caveat though. You have to build Microsoft.Spark.Worker yourself.
This time I’ll show you how you can actually build .NET for Apache Spark with VS Code in a browser yourself, including building and running the C# examples.

Setting up your own development environment to build and test .NET for Apache Spark can be tricky and time-consuming. However, as a regular reader, you are probably aware that I like to use docker … more

Spark to PostgreSQL – Real-time data processing pipeline – Part 5

Previously I have demonstrated how streaming data can be read and transformed in Apache Spark. This time I use Spark to persist that data in PostgreSQL.

Quick recap – Spark and JDBC

As mentioned in the post related to ActiveMQ, Spark and Bahir, Spark does not provide a JDBC sink out of the box. Therefore, I will have to use the foreach sink and implement an extension of the org.apache.spark.sql.ForeachWriter. It will take each individual data row and write it to PostgreSQL.

Preparing PostgreSQL

TimescaleDB logo

Even though I want to use PostgreSQL, I am actually


Real-time data processing pipeline – Part 4 – data transformation

So far, we know how to get our streaming data from ActiveMQ into Spark by using Bahir. On this basis, it is now time to implement the data transformation required to get to the desired output format.


As a quick reminder, here is the Scala code that I have used so far to retrieve the data from ActiveMQ and write it to a memory sink.


import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._

// create a named session
val spark = SparkSession

// read data from the OscStream topic
val mqttDf = 

ActiveMQ, Spark & Bahir – Real-time data processing pipeline – Part 3

ActiveMQ, Spark & Bahir - Real-time data processing pipeline – Part 3

Having explained how to visually simulate sensor data and how to get it into ActiveMQ during the first two parts, it is now time to explore an initial setup that allows Apache Spark to read a data stream from ActiveMQ using Bahir.

Things to be aware of

Before we start, there are a couple of things you should be aware of, in case you want to follow along and try this out yourself.

Spark & Bahir version matching

Apache Bahir Spark Extensions 2.3.3

As stated above, I would like to subscribe to an ActiveMQ topic with Apache Bahir to transfer

Scroll to top