Apache Zeppelin is a web-based notebook that enables interactive data exploration, visualization, and collaboration. It supports a wide range of data sources, including Apache Spark, Hadoop, and relational databases. With Zeppelin Notebooks, you can:
- Write and execute code in various languages, such as Python, Scala, and SQL
- Visualize data with interactive charts, tables, and maps
- Share and collaborate on notebooks with others in real-time
- Use built-in support for Apache Spark, Hadoop, and other data processing frameworks
Zeppelin Notebooks are ideal for data scientists, analysts, and engineers who want to quickly explore, visualize, and share insights from large datasets.
1) Install Docker
Check this article for this step
2) Setup Spark
Check this article for this step
3) Setup Zeppelin Notebooks as Docker Container
mkdir -p /opt/zeppelin/logs
mkdir -p /opt/zeppelin/notebook
mkdir -p /opt/zeppelin/drivers
docker run -u $(id -u) -d \
-p 12080:8080 \
-p 4040:4040 \
-v $PWD/logs:/opt/zeppelin/logs \
-v $PWD/notebook:/opt/zeppelin/notebook \
-v /opt/spark:/opt/spark \
-v /opt/zeppelin/drivers:/opt/zeppelin/drivers \
-e ZEPPELIN_LOG_DIR='/opt/zeppelin/logs' \
-e ZEPPELIN_NOTEBOOK_DIR='/opt/zeppelin/notebook' \
-e SPARK_HOME=/opt/spark \
--name zeppelin apache/zeppelin:0.10.0
4) Configure Authentication for Zeppelin
1. Login to the container as root
sudo docker exec -u root -t -i <container-id> /bin/bash
root@<container> sudo apt update
root@<container> sudo apt install vim
root@<container> sudo apt install nano
2. Enable Shiro
By default in conf, you will find shiro.ini.template, this file is used as an example and it is strongly recommended to create a shiro.ini file by doing the following command line
cd /opt/zeppelin/conf
cp shiro.ini.template shiro.ini
vim shiro.ini
[users]
# List of users with their password allowed to access Zeppelin.
# To use a different strategy (LDAP / Database / ...) check the shiro doc at http://shiro.apache.org/configuration.html#Configuration-INISections
# To enable admin user, uncomment the following line and set an appropriate password.
#admin = password1, admin
xyzadmin = <password>, admin, analytics
user1 = <password>, analytics
[roles]
analytics = *
admin = *
Restart Zeppelin
cd /opt/zeppelin/bin
zeppelin-daemon.sh restart
if the changes do not reflect, also restart the docker container
sudo docker restart <container-id>
3. Secure the Websocket channel
Set to property zeppelin.anonymous.allowed to false in conf/zeppelin-site.xml. If you don’t have this file yet, just copy conf/zeppelin-site.xml.template to conf/zeppelin-site.xml.
<property>
<name>zeppelin.anonymous.allowed</name>
<value>false</value>
<description>For Auth</description>
</property>
Restart Zeppelin
cd /opt/zeppelin/bin
zeppelin-daemon.sh restart
if the changes do not reflect, also restart the docker container
sudo docker restart <container-id>
Useful links:
- https://zeppelin.apache.org/docs/0.7.1/security/shiroauthentication.html#1-enable-shiro
- http://shiro.apache.org/configuration.html#Configuration-INISections
5) Configure Interpreters
1. Download JDBC Driver for Postgres
https://jdbc.postgresql.org/download.html
https://jdbc.postgresql.org/download/postgresql-42.2.23.jar
wget https://jdbc.postgresql.org/download/postgresql-42.2.23.jar
Place it at this location: /opt/zeppelin/drivers/postgresql-42.2.23.jar
2. Configure Spark Interpreter in Zeppelin
Zeppelin > Interpreters > spark
spark.jars = /opt/zeppelin/drivers/postgresql-42.2.23.jar
3. Configure JDBC Interpreter in Zeppelin
Zeppelin > Interpreters > jdbc
- default.url = jdbc:postgresql://<db-server-address>/<db-name>
- default.user = <db-user>
- default.password = <password>
Available Interpreters
- %jdbc
- %spark
- %spark.sql
- %pyspark
- %sh