Dremio is a data lakehouse platform that offers a range of tools and features to help organizations manage, process, and analyze large amounts of data. Here are some key aspects of Dremio:
Key Features:
- Data Lakehouse Architecture: Dremio combines the benefits of data lakes and data warehouses, allowing users to store and process data in a single platform.
- Data Ingestion: Supports ingestion from various sources, including AWS S3, Azure ADLS, and Google Cloud Storage.
- SQL Support: Allows users to query data using standard SQL, making it accessible to a wide range of users.
- Data Transformation: Offers tools for data transformation, aggregation, and enrichment.
- Data Governance: Provides features for data security, access control, and data lineage.
- Integration: Supports integration with various data tools and platforms, such as Tableau, Power BI, and Python.
Benefits:
- Faster Insights: Dremio’s architecture and features enable faster data processing and analysis.
- Simplified Data Management: Reduces the complexity of managing large datasets.
- Improved Collaboration: Allows data teams to work together more effectively.
Use Cases:
- Data Analytics: Dremio is suitable for various data analytics use cases, including business intelligence, data science, and data engineering.
- Data Integration: Can be used to integrate data from multiple sources.
- Data Warehousing: Offers a cost-effective alternative to traditional data warehousing solutions.
Note: This is not recommended for production. This guide will help you setup a single-node deployment which can be used for evaluation and testing.
1) Install Docker and Docker Compose
Check this article for this step
2) Setup Dremio (Open Source) as a Docker container
mkdir -p /opt/dremio/dremio_data
docker run -d \
--name dremio \
-p 9047:9047 \
-p 31010:31010 \
-v /opt/dremio/dremio_data:/opt/dremio/data \
dremio/dremio-oss
Dremio should be running on http://<your-ip-address>:9047
3) Configure Minio data store as a data source on Dremio
Official documentation: https://docs.dremio.com/current/sonar/data-sources/object/s3#configuring-s3-compatible-storage
To configure S3-compatible storage as a data source in the Dremio console:
- Under Advanced Options, check Enable compatibility mode.
- Under Advanced Options > Connection Properties, add
fs.s3a.path.style.accessand set the value totrue.
Note: This setting ensure that the request path is created correctly when using IP addresses or hostnames as the endpoint. - Under Advanced Options > Connection Properties, add the
fs.s3a.endpointproperty and its corresponding server endpoint value (IP address).
Limitation: The endpoint value cannot contain thehttp(s)://prefix nor can it start with the strings3. For example, if the endpoint ishttp://123.1.2.3:9000, the value is123.1.2.3:9000.
Connection Properties
fs.s3a.path.style.access = true
fs.s3a.endpoint = <your-ip-address>:9000
The following steps describe how to configure your S3 source for MinIO with an encrypted connection in the Dremio console:
- Use OpenSSL to generate a self signed certificate. See Securing Access to Minio Servers or use an existing self signed certificate.
- Start up Minio server with
./minio server [data folder] --certs-dir [certs directory]. - Install Dremio.
- In your client environment where Dremio is located, install the certificate into <JAVA_HOME>/jre/lib/security with the following command:
<JAVA_HOME>/keytool -import -v -trustcacerts -alias alias -file cert-file -keystore cacerts -keypass changeit -storepass changeit
Want some sample CSV files to get started? This is a good resource:
Useful links: