If you are reading this, you are probably aware of my .NET for Apache Spark Docker images that I’ve made available so far. Just recently I’ve added a development image that allows you to easily build .NET for Apache Spark with VS Code in a browser. Today I want to introduce you to the latest member of the family:
The .NET for Apache Spark interactive notebook Docker image.
In case you are not aware of what Jupyter Notebooks are, here’s a quick summary quote from the Jupyter project site.
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.
With the arrival of .NET interactive, it is now possible to run C# code in a Jupyter Notebook as well and therefore this makes a great playground for .NET for Apache Spark.
So, if you rather want to start playing around right away, instead of setting up a complete environment with Java, Apache Spark, .NET for Apache Spark, .NET interactive and all the related dependencies yourself, my new docker image is here to help.
To start a new container based on the dotnet-spark interactive image, just run the following command.
docker run --name dotnet-spark-interactive -d -p 8888:8888 3rdman/dotnet-spark:interactive-latest
Next, you need to examine the logs of the container to get the correct URL that is required to connect to Juypter using the authentication token.
docker logs -f dotnet-spark-interactive
Before you use .NET for Apache Spark in any notebook, please start the backend in debug mode first.
There’s a helper script named start-spark-debug.sh that can do this for you and its usage is demonstrated via the 01-start-spark-debug.ipynb notebook, which resides in the examples directory.
It will continuously run the backend process and display additional information while you are using .NET for Apache Spark in other notebooks. Therefore, please only close this notebook after you have closed any other .NET for Apache Spark notebooks.
Once you executed start-spark-debug.sh you’ll notice that a microsoft.spark JAR file appears in the same directory. This is because the backend is started in the current directory and therefore allows the spark session to use the same directory for import/export files, per default.
I’ll demonstrate this via a more detailed example in another post.
.NET for Apache Spark interactive
With the backend running, you are now ready to create and run your own C# Interactive Notebook and play around with .NET for Apache Spark.
Just don’t forget to install the Microsoft.Spark NuGet package and create your spark session, first.
You might want to have a look at the 02-basic-example.ipynb notebook, also contained in the examples directory, to get started.
In case you get an exception while executing a cell, switch back to the start-spark-debug notebook to inspect any additional debugging information that might be available.
If you want, you can run this image online, as well. Just use the link below.
Please note that the link is using the free binder service and most likely will take some time to fetch the image, launch the server and load the examples directory.
I hope you’ll find the image and this short introduction useful, and I am looking forward to welcoming you again next time.