Skip to content

Techeons

Imagine | Explore | Innovate

Menu
  • Home
Menu

How to setup Zeppelin Notebooks on Ubuntu

Posted on September 1, 2024

Apache Zeppelin is a web-based notebook that enables interactive data exploration, visualization, and collaboration. It supports a wide range of data sources, including Apache Spark, Hadoop, and relational databases. With Zeppelin Notebooks, you can:

  • Write and execute code in various languages, such as Python, Scala, and SQL
  • Visualize data with interactive charts, tables, and maps
  • Share and collaborate on notebooks with others in real-time
  • Use built-in support for Apache Spark, Hadoop, and other data processing frameworks

Zeppelin Notebooks are ideal for data scientists, analysts, and engineers who want to quickly explore, visualize, and share insights from large datasets.


1) Install Docker

Check this article for this step

2) Setup Spark

Check this article for this step

3) Setup Zeppelin Notebooks as Docker Container

mkdir -p /opt/zeppelin/logs
mkdir -p /opt/zeppelin/notebook
mkdir -p /opt/zeppelin/drivers
docker run -u $(id -u) -d \
           -p 12080:8080 \
           -p 4040:4040 \
           -v $PWD/logs:/opt/zeppelin/logs \
           -v $PWD/notebook:/opt/zeppelin/notebook \
           -v /opt/spark:/opt/spark \
           -v /opt/zeppelin/drivers:/opt/zeppelin/drivers \
           -e ZEPPELIN_LOG_DIR='/opt/zeppelin/logs' \
           -e ZEPPELIN_NOTEBOOK_DIR='/opt/zeppelin/notebook' \
           -e SPARK_HOME=/opt/spark \
           --name zeppelin apache/zeppelin:0.10.0

4) Configure Authentication for Zeppelin

1. Login to the container as root

sudo docker exec -u root -t -i <container-id> /bin/bash
root@<container> sudo apt update
root@<container> sudo apt install vim
root@<container> sudo apt install nano

2. Enable Shiro

By default in conf, you will find shiro.ini.template, this file is used as an example and it is strongly recommended to create a shiro.ini file by doing the following command line

cd /opt/zeppelin/conf
cp shiro.ini.template shiro.ini
vim shiro.ini
[users]
# List of users with their password allowed to access Zeppelin.
# To use a different strategy (LDAP / Database / ...) check the shiro doc at http://shiro.apache.org/configuration.html#Configuration-INISections
# To enable admin user, uncomment the following line and set an appropriate password.
#admin = password1, admin
xyzadmin = <password>, admin, analytics
user1 = <password>, analytics

  
[roles]
analytics = *
admin = *

Restart Zeppelin

cd /opt/zeppelin/bin
zeppelin-daemon.sh restart

if the changes do not reflect, also restart the docker container

sudo docker restart <container-id>

3. Secure the Websocket channel

Set to property zeppelin.anonymous.allowed to false in conf/zeppelin-site.xml. If you don’t have this file yet, just copy conf/zeppelin-site.xml.template to conf/zeppelin-site.xml.

<property>
  <name>zeppelin.anonymous.allowed</name>
  <value>false</value>
  <description>For Auth</description>
</property>

Restart Zeppelin

cd /opt/zeppelin/bin
zeppelin-daemon.sh restart

if the changes do not reflect, also restart the docker container

sudo docker restart <container-id>

Useful links:

  • https://zeppelin.apache.org/docs/0.7.1/security/shiroauthentication.html#1-enable-shiro
  • http://shiro.apache.org/configuration.html#Configuration-INISections

5) Configure Interpreters

1. Download JDBC Driver for Postgres

https://jdbc.postgresql.org/download.html

https://jdbc.postgresql.org/download/postgresql-42.2.23.jar

wget https://jdbc.postgresql.org/download/postgresql-42.2.23.jar

Place it at this location: /opt/zeppelin/drivers/postgresql-42.2.23.jar

2. Configure Spark Interpreter in Zeppelin

Zeppelin > Interpreters > spark

spark.jars = /opt/zeppelin/drivers/postgresql-42.2.23.jar

3. Configure JDBC Interpreter in Zeppelin

Zeppelin > Interpreters > jdbc

  • default.url = jdbc:postgresql://<db-server-address>/<db-name>
  • default.user = <db-user>
  • default.password = <password>

Available Interpreters

  • %jdbc
  • %spark
    • %spark.sql
    • %pyspark
  • %sh

Share on Social Media
x facebook pinterest linkedin tumblr reddit emailwhatsapptelegrammastodon

Leave a Reply Cancel reply

You must be logged in to post a comment.

Recent Posts

  • Nginx: How to increase timeout for Nginx
  • Cheat Sheet: Essential Git Commands
  • Setting a default shell in Linux
  • Setting up Composer on Linux
  • Switch easily between Python versions on a Mac using pyenv

Tags

ai alerting aws b2 backblaze certificate cheatsheet cloud commands data-science datalake devops dns docker dremio git gitlab infra jenkins kubernetes linux metabase minikube minio monitoring mount mysql nginx nodejs notebooks openssh php python scala secrets spark ssh ssl ubuntu ufw usb web dev tools windows xampp zeppelin

©2026 Techeons | Design: Newspaperly WordPress Theme