Skip to content

Techeons

Imagine | Explore | Innovate

Menu
  • Home
Menu

How to setup Dremio on Ubuntu using Docker

Posted on September 1, 2024

Dremio is a data lakehouse platform that offers a range of tools and features to help organizations manage, process, and analyze large amounts of data. Here are some key aspects of Dremio:

Key Features:

  • Data Lakehouse Architecture: Dremio combines the benefits of data lakes and data warehouses, allowing users to store and process data in a single platform.
  • Data Ingestion: Supports ingestion from various sources, including AWS S3, Azure ADLS, and Google Cloud Storage.
  • SQL Support: Allows users to query data using standard SQL, making it accessible to a wide range of users.
  • Data Transformation: Offers tools for data transformation, aggregation, and enrichment.
  • Data Governance: Provides features for data security, access control, and data lineage.
  • Integration: Supports integration with various data tools and platforms, such as Tableau, Power BI, and Python.

Benefits:

  • Faster Insights: Dremio’s architecture and features enable faster data processing and analysis.
  • Simplified Data Management: Reduces the complexity of managing large datasets.
  • Improved Collaboration: Allows data teams to work together more effectively.

Use Cases:

  • Data Analytics: Dremio is suitable for various data analytics use cases, including business intelligence, data science, and data engineering.
  • Data Integration: Can be used to integrate data from multiple sources.
  • Data Warehousing: Offers a cost-effective alternative to traditional data warehousing solutions.

Note: This is not recommended for production. This guide will help you setup a single-node deployment which can be used for evaluation and testing.

1) Install Docker and Docker Compose

Check this article for this step

2) Setup Dremio (Open Source) as a Docker container

mkdir -p /opt/dremio/dremio_data

docker run -d \
  --name dremio \
  -p 9047:9047 \
  -p 31010:31010 \
  -v /opt/dremio/dremio_data:/opt/dremio/data \
  dremio/dremio-oss

Dremio should be running on http://<your-ip-address>:9047

3) Configure Minio data store as a data source on Dremio

Official documentation: https://docs.dremio.com/current/sonar/data-sources/object/s3#configuring-s3-compatible-storage

To configure S3-compatible storage as a data source in the Dremio console:

  1. Under Advanced Options, check Enable compatibility mode.
  2. Under Advanced Options > Connection Properties, add fs.s3a.path.style.access and set the value to true.
    Note: This setting ensure that the request path is created correctly when using IP addresses or hostnames as the endpoint.
  3. Under Advanced Options > Connection Properties, add the fs.s3a.endpoint property and its corresponding server endpoint value (IP address).
    Limitation: The endpoint value cannot contain the http(s):// prefix nor can it start with the string s3. For example, if the endpoint is http://123.1.2.3:9000, the value is 123.1.2.3:9000.

Connection Properties

fs.s3a.path.style.access = true
fs.s3a.endpoint = <your-ip-address>:9000

The following steps describe how to configure your S3 source for MinIO with an encrypted connection in the Dremio console:

  1. Use OpenSSL to generate a self signed certificate. See Securing Access to Minio Servers or use an existing self signed certificate.
  2. Start up Minio server with ./minio server [data folder] --certs-dir [certs directory].
  3. Install Dremio.
  4. In your client environment where Dremio is located, install the certificate into <JAVA_HOME>/jre/lib/security with the following command:
<JAVA_HOME>/keytool -import -v -trustcacerts -alias alias -file cert-file -keystore cacerts -keypass changeit -storepass changeit

Want some sample CSV files to get started? This is a good resource:

  • https://github.com/datablist/sample-csv-files

Useful links:

  • https://docs.dremio.com/current/get-started/docker-quickstart
  • https://docs.min.io/docs/how-to-secure-access-to-minio-server-with-tls

Share on Social Media
x facebook pinterest linkedin tumblr reddit emailwhatsapptelegrammastodon

Leave a Reply Cancel reply

You must be logged in to post a comment.

Recent Posts

  • Nginx: How to increase timeout for Nginx
  • Cheat Sheet: Essential Git Commands
  • Setting a default shell in Linux
  • Setting up Composer on Linux
  • Switch easily between Python versions on a Mac using pyenv

Tags

ai alerting aws b2 backblaze certificate cheatsheet cloud commands data-science datalake devops dns docker dremio git gitlab infra jenkins kubernetes linux metabase minikube minio monitoring mount mysql nginx nodejs notebooks openssh php python scala secrets spark ssh ssl ubuntu ufw usb web dev tools windows xampp zeppelin

©2026 Techeons | Design: Newspaperly WordPress Theme