.NET for Apache Spark – Stream to SQL Server

In this article I am going to describe how to use .NET for Apache Spark with EntityFrameworkCore to stream data to a Microsoft SQL Server. If you have tried this before, you probably stumbled upon the following exception: Microsoft.Data.SqlClient is not supported on this platform.

So let’s find out, how that can be fixed.

Preparation

If you want to stream to an SQL Server, you obviously need to have access to an SQL Server instance first.

Using docker, it is very easy to fire up a related container. I’ve just named it sqlserver, as … more

.NET for Apache Spark ForeachWriter & PostgreSQL

.NET for Apache Spark IForeachWriter implementation

Introduction

A couple of months ago I’ve described how to transfer data from Apache Spark to PostgreSQL by creating a Spark ForeachWriter in Scala.

This time I will show how this can be done in C#, by creating a ForeachWriter for .NET for Apache Spark.

To create a custom ForeachWriter, one needs to provide an implementation of the IForeachWriter interface, which is supported from version 0.9.0 onward. I am going to use version 0.10.0 in this article, however.

Documentation of the C# Interface is provided within the related source code:

https://github.com/dotnet/spark/blob/master/src/csharp/Microsoft.Spark/Sql/ForeachWriter.cs

The example project I am … more

PostgreSQL & Grafana – Real-time data processing pipeline – Part 6

Believe it or not, we are getting to the end of this small series about a potential real-time data processing pipeline.
In this final part I will show how Grafana can retrieve our pipeline data from PostgreSQL and visualize it as a graph. But before we dive into it, let’s have a quick recap of the previous topics.

So the bit that is still missing, is the visualization of data.

Starting the docker image

To keep things simple, I … more

Spark to PostgreSQL – Real-time data processing pipeline – Part 5

Previously I have demonstrated how streaming data can be read and transformed in Apache Spark. This time I use Spark to persist that data in PostgreSQL.

Quick recap – Spark and JDBC

As mentioned in the post related to ActiveMQ, Spark and Bahir, Spark does not provide a JDBC sink out of the box. Therefore, I will have to use the foreach sink and implement an extension of the org.apache.spark.sql.ForeachWriter. It will take each individual data row and write it to PostgreSQL.

Preparing PostgreSQL

TimescaleDB logo

Even though I want to use PostgreSQL, I am actually

more

Apache Zeppelin mit PySpark und PostgreSQL benutzen – Teil 3

Nachdem wir in Teil 1 und Teil 2 bereits erfahren haben, wie wir Zeppelin direkt mit PostgreSQL benutzen können, widmet sich der letzte Teil der Abfrage der Testdatenbank mit PySpark.

Ehe wir loslegen können, braucht unsere Datenbank erst einmal ein paar Testdaten. Da es in dieser Miniserie nicht um das Abfragen und Aufbereiten von Daten an sich geht, benutze ich für das schnelle Erzeugen von Daten einfach das Tool PgBench. Details dazu könnt ihr unter https://www.postgresql.org/docs/11/pgbench.html nachlesen.

Folgendes Kommando erzeugt dabei 100000 Datensätze in der Tabelle pgbench_accounts

pgbench -d -U testadmin -i test

Nun können wir … more

Scroll to top