Start Using Zeppelin Swiftly With Docker
If you don’t want to read the story part just skip and follow the steps.
Hi!, I am a Software Developer. I like to code and of course, I am using different environments. Zeppelin is like a GUI supported REPL with several more features. It is very alike with Jupyter. If you want to use Scala with Jupyter you have to add a Jupyter Kernel manually. I personally started to use Zeppelin for Scala(with Apache Spark), because the Scala interpreter comes as a built-in feature, unlike Jupyter.
Zeppelin is also supporting other interpreters like Python(with Apache Spark), SparkSQL, Hive, Markdown, and Shell.
0. Thanks to dylanmei for creating this docker build and making our lives are easier. Dont forget to star his repository.
- If you don't have docker, install it.
2. I prefer to use docker-compose for the sake of brevity.
3. Create the docker-compose.yml file.
zeppelin:
image: dylanmei/zeppelin
container_name: zeppelin
environment:
ZEPPELIN_PORT: 8080
ZEPPELIN_JAVA_OPTS: >-
-Dspark.driver.memory=1g
-Dspark.executor.memory=2g
MASTER: local[*]
ports:
- 8080:8080
volumes:
- ./zeppelin/data:/usr/zeppelin/data
- ./zeppelin/notebook:/usr/zeppelin/notebook
4. Before starting to compose let’s create volume folders,
mkdir zeppelin
mkdir zeppelin/data
mkdir zeppelin/notebook
The zeppelin folder should be under the same folder with docker-compose.yml file.
Once we execute the command “sudo docker-compose up” docker is going to install dylanmei/zeppelin image from the hub if we don't have it already.
5. create and start the container
sudo docker-compose up
After a while you localhost:8080 should be reachable.
You can put your data files under the path “./zeppelin/data” and you can reach these files from the code like below.
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions.{ col, max, when }
import org.apache.spark.sql.{ DataFrame, SaveMode }
import org.apache.spark.sql._val sparkSession = SparkSession.builder.appName("sample").getOrCreate()val jsonDF = sparkSession.read
.option("multiLine", true)
.option("mode", "PERMISSIVE")
.json("/usr/zeppelin/data/data.json")
Thanks for reading!