.NET for Apache Spark 0.4.0 has been released.
If you want to test it out, you might find my Docker image useful.
Details are available at https://hub.docker.com/r/3rdman/dotnet-spark
The image is based on Ubuntu 18.04, Apache Spark 2.4.3 with Hadoop 2.7, .NET Core 2.1.801 and .NET for Apache Spark 0.4.0. It is intended for the purpose of testing .NET for Apache Spark, without the need to install the required bits manually.
Per default, the related container will start up one master instance, and two slave instances of Spark. You can modify the number of slave instances by setting the environment variable SPARK_WORKER_INSTANCES in your docker run command, as shown in the example below.
docker run -d --name dotnet-spark -e SPARK_WORKER_INSTANCES=1 -p 8080:8080 -p 8081:8081 3rdman/dotnet-spark:0.4.0-linux
Once started, use the interactive terminal to play around.
docker exec -it dotnet-spark /bin/bash
Per default the Spark master Web UI is listening on port 8080 and the spark workers UI port start with 8081. Depending on the number of SPARK_WORKER_INSTANCES specified, the port number increases with each additional instance.
The HelloSpark example from https://github.com/dotnet/spark/blob/master/docs/getting-started/ubuntu-instructions.md is available in the image under /dotnet/HelloSpark
Please have a look at the instructions from the URL above or the README.txt file contained in the /dotnet/HelloSpark folder.
If you want to test the example with the different workers, use the following command in the interactive terminal:
spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --master spark://$HOSTNAME:$SPARK_MASTER_PORT microsoft-spark-2.4.x-0.4.0.jar dotnet HelloSpark.dll
Spark’s log files are located in /spark/logs
Enjoy playing around!