Skip to content
DataLakehouse.help
GitHub

Building a Lakehouse on Your Laptop

These Directions will show you how to setup a data lakehouse on your laptop for the usage in other exercises on this website.

Pre-Reqs: Docker & Docker-Compose installed

Step 1 - The docker-compose.yml

The docker-compose.yml will define all the pieces you need in your lakehouse which will include:

  • Nessie: Catalog with Git-Like functionality for Apache Iceberg tables

  • Minio: S3 Compliant Object Storage software to act as our data lake storage.

  • Dremio: Data Lakehouse platform to provide an easy to use and fast point of access for the Apache Iceberg tables stored on Nessie/Minio and other sources we connect.

  • Superset: Open source BI Dashboard tool

docker-compose.yml

version: "3"

services:
  # Nessie Catalog Server Using In-Memory Store
  nessie:
    image: projectnessie/nessie:latest
    container_name: nessie
    networks:
      laptop-lakehouse:
    ports:
      - 19120:19120
  # Minio Storage Server
  minio:
    image: minio/minio:latest
    container_name: minio
    environment:
      - MINIO_ROOT_USER=admin
      - MINIO_ROOT_PASSWORD=password
      - MINIO_DOMAIN=storage
      - MINIO_REGION_NAME=us-east-1
      - MINIO_REGION=us-east-1
    networks:
      laptop-lakehouse:
    ports:
      - 9001:9001
      - 9000:9000
    command: ["server", "/data", "--console-address", ":9001"]
  # Dremio
  dremio:
    platform: linux/x86_64
    image: dremio/dremio-oss:latest
    ports:
      - 9047:9047
      - 31010:31010
      - 32010:32010
    container_name: dremio
    networks:
      laptop-lakehouse:
  # Superset
  superset:
    platform: linux/x86_64
    image: alexmerced/dremio-superset
    ports:
      - 8080:8088
    container_name: dremio
    networks:
      laptop-lakehouse:
networks:
  
* If you want the files minio writes to be in your host file system add a volume entry

```yaml
image: minio/minio
    volumes:
      - /path/to/your/host/directory:/data

Open up a terminal in the same folder as this docker-compose.yml file and run the command

# latest versions of docker-desktop
docker compose up

# older versions
docker-compose up

This will create all the containers specified in our docker-compose.yml if you ever need to shut them down in another terminal in the same folder just run:

docker compose down
## or
docker-compose down

Step 2 - Setting up our storage bucket

  • Open up an internet browser

  • Visit minio at http://localhost:9001

  • login with the username: admin and the password: password (these were specified in the docker-compose.yml)

  • Create a bucket, let’s call it warehouse

Step 3 - Connect Dremio to Data Lake

  • Open up a new internet browser tab

  • Visit Dremio at http://localhost:9047

  • Fill out the form to create your account

  • Then on the dashboard choose to connect a new source

  • Select Nessie as your new source

There are two sections we need to fill out, the general and storage sections:

General (Connecting to Nessie Server)

Storage Settings

(So Dremio can read and write data files for Iceberg tables)
  • For your access key, set “admin”
  • For your secret key, set “password”
  • Set root path to “/warehouse” Set the following connection properties:
    • fs.s3a.path.style.access to true
    • fs.s3a.endpoint to minio:9000
    • dremio.s3.compat to true
  • Uncheck “encrypt connection” (since our local Nessie instance is running on http)

Step 4 - Connecting Dremio to Superset

  • Turn on Superset with the command:
docker-compose exec superset superset init
  • login at localhost:8080/login with username admin and password admin
  • use the url to esablish connection dremio+flight://<dremio-username>:<dremio-password>@dremio:32010/?UseEncryption=false (if not using the docker-compose file above change dremio to the ip address to machine with Dremio running)
  • If you need to look up the ip address of a docker container use docker network ls to see your docker networks then docker network <name_or_id> to see the details of the containers on that network
  • click test connection

Testing it Out

  • Head to the SQL Runner on Dremio

  • Run the following SQL statements

CREATE TABLE nessie.names (name varchar);
INSERT INTO nessie.names VALUES ('Gnarly the Narwhal');
SELECT * FROM nessie.names;
  • Go explore you storage on minio, you should see all the Apache Iceberg data & metadata stored in your warehouse bucket.

Things to Be Aware Of