Skip to content

Techeons

Imagine | Explore | Innovate

Menu
  • Home
Menu

How to setup Spark on Ubuntu

Posted on September 1, 2024

Apache Spark is an open-source distributed computational framework that is created to provide faster computational results. It is an in-memory computational engine, meaning the data will be processed in memory.

Spark supports various APIs for streaming, graph processing, SQL, MLLib. It also supports Java, Python, Scala, and R as the preferred languages. Spark is mostly installed in Hadoop clusters but you can also install and configure spark in standalone mode.

1) Install Java

java -version
# If java is not installed, install it:
sudo apt update
sudo apt install default-jre
java -version

2) Install Scala

sudo apt install scala
scala -version

3) Create “spark” user

sudo addgroup spark
sudo adduser --ingroup spark spark
sudo usermod -a -G hadoop spark

4) Install Spark

wget https://dlcdn.apache.org/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz
tar -xvzf spark-3.1.2-bin-hadoop3.2.tgz
sudo mv spark-3.1.2-bin-hadoop3.2.tgz /opt/spark
sudo chmod -R 755 /opt/spark/
sudo chown -R spark:spark /opt/spark
visudo
##------------------------------
# User privilege specification
root    ALL=(ALL) ALL
spark ALL=(ALL) ALL
##------------------------------
sudo su - spark

5) Configure Environment Variables for Spark

sudo su - spark
echo "export SPARK_HOME=/opt/spark" >> ~/.profile
echo "export PATH=$PATH:/opt/spark/bin:/opt/spark/sbin" >> ~/.profile
echo "export PYSPARK_PYTHON=/usr/bin/python3" >> ~/.profile
source ~/.profile
OR

sudo su - spark
sudo vim ~/.profile
 
# Add at the end of the file:
##------------------------------
export SPARK_HOME=/opt/spark
export PATH=$PATH:/opt/spark/bin:/opt/spark/sbin
export PYSPARK_PYTHON=/usr/bin/python3
##------------------------------

6) SSH Config – This is needed for Spark Slave

mkdir ~/.ssh
cd ~/.ssh/
ssh-keygen -t rsa -P ""
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
sudo vi /etc/ssh/sshd_config
sudo /etc/init.d/ssh reload

7) Start Apache Spark

sudo su - spark
start-master.sh
start-workers.sh spark://localhost:7077

Spark Master should be available at http://<your-ip-address>:8080

Also check if spark-shell works fine:

spark-shell
Share on Social Media
x facebook pinterest linkedin tumblr reddit emailwhatsapptelegrammastodon

Leave a Reply Cancel reply

You must be logged in to post a comment.

Recent Posts

  • Nginx: How to increase timeout for Nginx
  • Cheat Sheet: Essential Git Commands
  • Setting a default shell in Linux
  • Setting up Composer on Linux
  • Switch easily between Python versions on a Mac using pyenv

Tags

ai alerting aws b2 backblaze certificate cheatsheet cloud commands data-science datalake devops dns docker dremio git gitlab infra jenkins kubernetes linux metabase minikube minio monitoring mount mysql nginx nodejs notebooks openssh php python scala secrets spark ssh ssl ubuntu ufw usb web dev tools windows xampp zeppelin

©2026 Techeons | Design: Newspaperly WordPress Theme