How to setup Zeppelin Notebooks on Ubuntu

Apache Zeppelin is a web-based notebook that enables interactive data exploration, visualization, and collaboration. It supports a wide range of data sources, including Apache Spark, Hadoop, and relational databases. With Zeppelin Notebooks, you can:

Write and execute code in various languages, such as Python, Scala, and SQL
Visualize data with interactive charts, tables, and maps
Share and collaborate on notebooks with others in real-time
Use built-in support for Apache Spark, Hadoop, and other data processing frameworks

Zeppelin Notebooks are ideal for data scientists, analysts, and engineers who want to quickly explore, visualize, and share insights from large datasets.

1) Install Docker

Check this article for this step

2) Setup Spark

Check this article for this step

3) Setup Zeppelin Notebooks as Docker Container

mkdir -p /opt/zeppelin/logs
mkdir -p /opt/zeppelin/notebook
mkdir -p /opt/zeppelin/drivers

docker run -u $(id -u) -d \
           -p 12080:8080 \
           -p 4040:4040 \
           -v $PWD/logs:/opt/zeppelin/logs \
           -v $PWD/notebook:/opt/zeppelin/notebook \
           -v /opt/spark:/opt/spark \
           -v /opt/zeppelin/drivers:/opt/zeppelin/drivers \
           -e ZEPPELIN_LOG_DIR='/opt/zeppelin/logs' \
           -e ZEPPELIN_NOTEBOOK_DIR='/opt/zeppelin/notebook' \
           -e SPARK_HOME=/opt/spark \
           --name zeppelin apache/zeppelin:0.10.0

4) Configure Authentication for Zeppelin

1. Login to the container as root

sudo docker exec -u root -t -i <container-id> /bin/bash
root@<container> sudo apt update
root@<container> sudo apt install vim
root@<container> sudo apt install nano

2. Enable Shiro

By default in conf, you will find shiro.ini.template, this file is used as an example and it is strongly recommended to create a shiro.ini file by doing the following command line

cd /opt/zeppelin/conf
cp shiro.ini.template shiro.ini
vim shiro.ini

[users]
# List of users with their password allowed to access Zeppelin.
# To use a different strategy (LDAP / Database / ...) check the shiro doc at http://shiro.apache.org/configuration.html#Configuration-INISections
# To enable admin user, uncomment the following line and set an appropriate password.
#admin = password1, admin
xyzadmin = <password>, admin, analytics
user1 = <password>, analytics

  
[roles]
analytics = *
admin = *

Restart Zeppelin

cd /opt/zeppelin/bin
zeppelin-daemon.sh restart

if the changes do not reflect, also restart the docker container

sudo docker restart <container-id>

3. Secure the Websocket channel

Set to property zeppelin.anonymous.allowed to false in conf/zeppelin-site.xml. If you don’t have this file yet, just copy conf/zeppelin-site.xml.template to conf/zeppelin-site.xml.

<property>
  <name>zeppelin.anonymous.allowed</name>
  <value>false</value>
  <description>For Auth</description>
</property>

Restart Zeppelin

cd /opt/zeppelin/bin
zeppelin-daemon.sh restart

if the changes do not reflect, also restart the docker container

sudo docker restart <container-id>

Useful links:

5) Configure Interpreters

1. Download JDBC Driver for Postgres

https://jdbc.postgresql.org/download.html

https://jdbc.postgresql.org/download/postgresql-42.2.23.jar

wget https://jdbc.postgresql.org/download/postgresql-42.2.23.jar

Place it at this location: /opt/zeppelin/drivers/postgresql-42.2.23.jar

2. Configure Spark Interpreter in Zeppelin

Zeppelin > Interpreters > spark

spark.jars = /opt/zeppelin/drivers/postgresql-42.2.23.jar

3. Configure JDBC Interpreter in Zeppelin

Zeppelin > Interpreters > jdbc

default.url = jdbc:postgresql://<db-server-address>/<db-name>
default.user = <db-user>
default.password = <password>

Available Interpreters

%jdbc
%spark
- %spark.sql
- %pyspark
%sh

Share on Social Media

Leave a Reply Cancel reply