Skip to content

jacobprall/airflow-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Airflow ETL Demo -- Cortex Code for Data

A self-contained Airflow environment for demonstrating how Cortex Code builds a complete ETL pipeline from a single prompt -- including wiring up all infrastructure connections.

What This Demo Shows

An Airflow environment with sample CSV sales data and a local Postgres source database. Cortex Code is given the prompt:

"Build me a nightly ETL pipeline that moves sales data from our operational Postgres database into Snowflake for analytics."

The agent then:

  1. Explores the environment (Airflow running, CSV data, Postgres source)
  2. Creates Snowflake warehouse tables (using its built-in Snowflake connection)
  3. Configures the Airflow-to-Snowflake connection via airflow connections add
  4. Creates a 5-task DAG: ingest, extract, clean, aggregate, load
  5. Triggers the pipeline and monitors it to green
  6. Verifies data landed in the Snowflake warehouse

Zero config -- the user never touches a credentials file. Cortex Code handles everything.

Architecture

CSV Sales Data
     |
     v
[Postgres]                  operational / transactional (local container)
     |
     | Airflow extracts
     v
[Clean + Aggregate]         pandas in Airflow
     |
     v
[Snowflake Warehouse]       analytical / BI (connection configured by Cortex Code)

The source database is a local Postgres container. This could be any database -- the pipeline pattern is the same.

Prerequisites

  • Docker and Docker Compose
  • Cortex Code CLI with an active Snowflake connection

Quick Start

# 1. Enter the directory
cd airflow-demo

# 2. Copy the env file (source Postgres is pre-configured, no Snowflake creds needed)
cp .env.example .env

# 3. Start everything
docker compose up -d --build

# 4. Wait ~30-60 seconds, then open the Airflow UI
open http://localhost:8080
# Login: airflow / airflow

That's it. Cortex Code handles the Snowflake setup during the demo.

Project Structure

airflow-demo/
  .env.example                  # Airflow + source Postgres config (pre-filled)
  docker-compose.yml            # Airflow + metadata Postgres + source Postgres
  Dockerfile                    # Custom Airflow image with Python deps
  requirements.txt              # pandas, psycopg2, snowflake-connector
  dags/
    etl_sales_pipeline.py       # The 5-task pipeline Cortex Code builds
  data/
    sales_2025.csv              # ~1000 rows of sample sales data
  sql/
    create_source_tables.sql    # Source Postgres DDL (auto-runs on container start)
    create_warehouse_tables.sql # Snowflake DDL (Cortex Code runs this directly)
  scripts/
    generate_data.py            # Regenerate sample CSV data
    seed_snowflake.py           # Utility to create tables (optional, Cortex Code does this)
  demo/
    DEMO_SCRIPT.md              # Video recording guide
    prompt.txt                  # The exact prompt for Cortex Code

Services

Container Purpose Port
postgres Airflow metadata DB (internal)
source-db Operational source database localhost:5433
airflow-webserver Airflow UI localhost:8080
airflow-scheduler DAG scheduling (internal)

How the Snowflake Connection Works

Cortex Code configures the Airflow connection during the demo:

docker compose exec airflow-webserver airflow connections add snowflake_default \
    --conn-type snowflake \
    --conn-login <user> \
    --conn-password <password> \
    --conn-extra '{"account": "<account>", "warehouse": "<warehouse>", "database": "<database>", "schema": "<schema>", "role": "<role>"}'

The DAG reads from this connection at runtime via BaseHook.get_connection("snowflake_default").

Regenerating Sample Data

python scripts/generate_data.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors