.NET for Apache Spark ForeachWriter & PostgreSQL

.NET for Apache Spark IForeachWriter implementation

Introduction

A couple of months ago I’ve described how to transfer data from Apache Spark to PostgreSQL by creating a Spark ForeachWriter in Scala.

This time I will show how this can be done in C#, by creating a ForeachWriter for .NET for Apache Spark.

To create a custom ForeachWriter, one needs to provide an implementation of the IForeachWriter interface, which is supported from version 0.9.0 onward. I am going to use version 0.10.0 in this article, however.

Documentation of the C# Interface is provided within the related source code:

https://github.com/dotnet/spark/blob/master/src/csharp/Microsoft.Spark/Sql/ForeachWriter.cs

The example project I am … more

.NET for Apache Spark – VSCode with Docker on Linux and df.Collect()

.NET for Apache Spark docker image

Overview

My last article explained, how a .NET for Apache Spark project can be debugged in Visual Studio 2019 under Windows. I have also mentioned some limitation at the end of the article.
In this article I will extend the project a bit and demonstrate the aforementioned limitation using version 0.8.0 of my docker image for .NET for Apache Spark.
Furthermore, I will show a possible workaround that can be used, if you are running Docker and Visual Studio Code under Linux (Ubuntu 18.04).

The extended application

In order to demonstrate the issue, I have … more

.NET for Apache Spark – UDF, VS2019, Docker for Windows and a Christmas Puzzle

.NET for Apache Spark Container

Overview

The latest version of my docker image for .NET for Apache Spark tries to support direct debugging from Visual Studio 2019 and Visual Studio Code.
This is the first article of a small series that will show how this can be done on different environments (Windows and Linux), and what limitations might exist.

Test application & data

I have put together a very simple C# application, named “HelloUdf”, for demonstration purposes. It is supposed to read a JSON file (coordinates.json) that contains one coordinate string per line.
Besides reading the file, the application’s task is … more

Debug .NET for Apache Spark with Visual Studio and docker

Greatly simplify debugging your .NET for Apache Spark project by using docker

You do want to test and debug your .NET for Apache Spark application with Visual Studio? But you don’t want to set up Apache Spark yourself?
Then read along and find out how my docker image might be able to help.

Before we dig into the details however, I specifically want to thank Devin Martin for sharing his idea about such a docker image with me!

Background

As you might be aware, you can debug your .NET for Apache Spark application directly in Visual Studio by starting the related DotnetRunner in Debug mode.

Obviously that means … more

.NET for Apache Spark 0.5.0 docker image

Version 0.5.0 of .NET for Apache Spark has been released. This means that it is time to update my previously released docker image and also show how to perform a quick test using the included C# example project.
A more detailed description of the image itself is available at https://hub.docker.com/r/3rdman/dotnet-spark

.NET for Apache Spark docker image
.NET for Apache Spark docker image

If you are looking for a way to debug your .NET for Apache Spark project, then you might be interested in this post as well.

Starting and accessing the container

You can fire up a container based on this .NET … more

Scroll to top