My last article explained how you can use .NET for Apache Spark together with Entity Framework to stream data to an SQL Server. There is one caveat though. You have to build Microsoft.Spark.Worker yourself.
This time I’ll show you how you can actually build .NET for Apache Spark with VS Code in a browser yourself, including building and running the C# examples.
Setting up your own development environment to build and test .NET for Apache Spark can be tricky and time-consuming. However, as a regular reader, you are probably aware that I like to use docker to simplify things. And this time it’s no different.
The dotnet-spark dev image and code-server
To get started, I use the dotnet-spark development image to fire up a related container.
docker run --name dotnet-spark-dev -d -p 127.0.0.1:8888:8080 3rdman/dotnet-spark:dev-latest
The dev image comes with code-server installed, which is listening on port 8080 internally and mapped to port 8888 on my hosts’ loopback address. Therefore, if I point my browser to http://localhost:8888, I can start a VS Code session and open the dotnet.spark folder that contains a clone of the .NET for Apache Spark GitHub repository.
Building the Core Components
The detailed process of how to build from source is described on this .NET for Apache Spark GitHub page.
As there is a clone of the repository available in the container, already, I just pull the most recent changes and then start with building the Scala Extensions Layer.
After that, it’s time to build the Microsoft.Spark.Worker.
Building and running the C# examples
And finally the C# examples.
Before I can run one of the examples, I need to set the DOTNET_WORKER_DIR
environment variable.
Once that is done, I am ready to run the Sql.Batch.Basic example.
Getting files out of the container
I think it is fair to say that the dotnet-spark dev image can save you a lot of time, if you need to build .NET for Apache Spark yourself.
Finally, one remaining question may be, how to get your compiled files out of the container.
Docker provides the cp
command for that and its usage is shown below.
Thank you very much for reading/watching and have a great time!
21. October 2020
[…] made available so far. Just recently I’ve added a development image that allows you to easily build .NET for Apache Spark with VS Code in a browser. Today I want to introduce you to the latest member of the […]
29. October 2020
[…] Use these images if you want to build .NET for Apache Spark yourself, make changes to the source code or contribute to .NET for Apache Spark. For a brief introduction, check out this blog post. […]
20. June 2021
[…] For a more detailed introduction, check out this blog post. […]