Skip to content

datameisterpl/genai-s2t-energy-demo-public

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

GenAI S2T Energy Demo

Demo: Using GenAI to create Source-to-Target documentation for SQL, Python, DAX, and M code

⚠️ Please note: This repo is tailored for Business Intelligence Analysts who work with Power BI and SQL, and less frequently with Python and Git.


πŸ“‹ Table of Contents

  1. Purpose
  2. Data Model
  3. Prerequisites
  4. Quick Start
  5. Project Structure
  6. Setup Instructions
  7. Azure Data Studio Setup
  8. Running Scripts
  9. Power BI Report
  10. Troubleshooting
  11. License
  12. Contributing

🎯 Purpose

Purpose 1: GenAI Documentation Demo

The main purpose of this simple data model is to serve as props and scene for demonstrating how to build Source-to-Target (S2T) documentation using GenAI for:

  • 🐍 Python code
  • πŸ—ƒοΈ SQL code
  • Ⓜ️ M (Power Query) code
  • πŸ“Š DAX code

⚠️ Important: Due to this demo purpose, some data flow and development rules were bent. Please focus on the use case rather than the data model and Power BI reportβ€”they are background only.

Purpose 2: Portable BI Prototyping

This repo may be useful for you if:

  • βœ… You need a quick way to prototype with Power BI
  • βœ… You need a SQL database without server installation
  • βœ… You need to ingest various types of data (CSV, pdf, etc.)
  • βœ… Everything needs to be portable (no database server required)

Solution: DuckDB β€” a lightweight, in-process SQL database that runs entirely in Python. No installation, no server, just a single file.


πŸ“Š Data Model

Data Model


πŸ’Ÿ Credits

For demo model I have used transformed data from:

https://github.com/owid/energy-data

https://ourworldindata.org/energy


πŸ“¦ Prerequisites

Tool Required Download
Python 3.11+ βœ… Yes python.org
Power BI Desktop βœ… Yes Microsoft Store
Azure Data Studio Optional Microsoft
VS Code Optional code.visualstudio.com
Git Optional git-scm.com

πŸš€ Quick Start

If you want to recreate the proces of setting up prototype please follow below steps:

1. Clone repo

git clone https://github.com/datameisterpl/genai-s2t-energy-demo-public.git
cd genai-s2t-energy-demo

2. Install Python packages

pip install duckdb pandas matplotlib

3. Setup database

python scripts/python/00_setup_database.py

4. Run all ingestion scripts

python scripts/python/01_ingest_iso_to_region.py
python scripts/python/01_ingest_GDP.py
python scripts/python/01_ingest_population.py
python scripts/python/01_ingest_energy_data.py

5. SQL Transformation

A) Use Jupyter Notebook

  • pip install jupyterlab jupysql sqlalchemy duckdb duckdb-engine pandas
  • ensure you have Jupyter extension enable in Visual Studio
  • run
%load_ext sql
%sql duckdb:///../data/dev_worldtrend.duckdb
  • now use %%sql at the begining of new code cell and you enjoy working with SQL in Database from Visual Studio
  • see notebooks\sql_data_analytics.ipynb for examples

B) Use Azure Data Studio

  • Create new Notebook, use Python3 Kernel, connect to DuckBD (see details below)
  • As the tool will be retired by Microsoft soon consider option 1

6. Open Power BI report

  • Navigate to: powerbi/Energy Report.pbip to open Power BI (from Power BI Desktop)
  • Data are loaded via Python connection
  • Ensure you update <REPO_PATH> on every table to location of your repo

7. (Optional) Export Samples for GenAI Prompt

  • update config.py file to define your OUTPUT_PATH
  • run sql_draft_scripts\print_all_tables.py

πŸ—‚οΈ Project Structure

genai-s2t-energy-demo/
β”‚
β”œβ”€β”€ πŸ“„ README.md                    # This file
β”œβ”€β”€ πŸ“„ config_template.py           # Template for output path config
β”œβ”€β”€ πŸ“„ config.py                    # Your personal config (gitignored)
β”œβ”€β”€ πŸ“„ .gitignore                   # Git ignore rules
β”‚
β”œβ”€β”€ πŸ“ data/
β”‚   β”œβ”€β”€ πŸ“ raw/                     # Source CSV files
β”‚   β”‚   β”œβ”€β”€ energy_data.csv
β”‚   β”‚   β”œβ”€β”€ GDP.csv
β”‚   β”‚   β”œβ”€β”€ iso_to_region.csv
β”‚   β”‚   └── population.csv
β”‚   └── πŸ“„ dev_worldtrend.duckdb    # DuckDB database file
β”‚
β”œβ”€β”€ πŸ“ images/                      # Documentation images
β”‚   └── data_model.jpg
β”‚
β”œβ”€β”€ πŸ“ notebooks/                   # Jupyter notebooks
β”‚   └── sql_data_analytics.ipynb    # SQL analytics in notebook
β”‚
β”œβ”€β”€ πŸ“ scripts/
β”‚   β”œβ”€β”€ πŸ“ python/                  # Python ingestion scripts
β”‚   β”‚   β”œβ”€β”€ DEV_setup_database.py
β”‚   β”‚   β”œβ”€β”€ 00_explore_*.py         # Data exploration scripts
β”‚   β”‚   β”œβ”€β”€ 01_ingest_*.py          # Data ingestion scripts
β”‚   β”‚   └── πŸ“ pbi_ingestion/       # Power BI data source scripts
β”‚   β”‚       β”œβ”€β”€ 01_silver_countries_and_regions.py
β”‚   β”‚       └── 02_silver_countries_all_data.py
β”‚   └── πŸ“ sql/                     # SQL transformation scripts
β”‚       β”œβ”€β”€ 01_silver_countries_and_regions.sql
β”‚       └── 02_silver_countries_all_data.sql
β”‚
β”œβ”€β”€ πŸ“ powerbi/                     # Power BI Project
β”‚   β”œβ”€β”€ Energy Report.pbip
β”‚   β”œβ”€β”€ πŸ“ Energy Report.Report/    # Report visuals & pages
β”‚   β”‚   └── πŸ“ definition/pages/
β”‚   β”‚       └── πŸ“ visuals/
β”‚   β”‚           └── *.json          # Visual configurations
β”‚   └── πŸ“ Energy Report.SemanticModel/
β”‚       └── πŸ“ definition/tables/   # Data model tables (TMDL)
β”‚           β”œβ”€β”€ Calendar.tmdl
β”‚           β”œβ”€β”€ DIM_Countries_and_Regions.tmdl
β”‚           β”œβ”€β”€ FCT__Countries_Gold_Data.tmdl
β”‚           β”œβ”€β”€ bridge_year.tmdl
β”‚           └── _Measures.tmdl
β”‚
└── πŸ“ sql_draft_scripts/           # Working SQL scripts
    β”œβ”€β”€ copy_tmdl_files.py
    β”œβ”€β”€ print_all_tables.py
    β”œβ”€β”€ print_sample.py
    └── tables_display.py

πŸ› οΈ Setup Instructions


Update Path Placeholders

After cloning, find and replace the placeholder <REPO_PATH> with your local path in these files:

SQL files β€” scripts/sql/*.sql β†’ connection strings
TMDL files β€” powerbi/Energy Report.SemanticModel/definition/tables/*.tmdl β†’ Python script paths

Example:

Find: <REPO_PATH>
Replace: C:\\Users\\YourName\\genai-s2t-energy-demo

πŸ’‘ Tip: In VS Code, use Ctrl+Shift+H to find and replace across all files.


(Optional) Configure Output Path

If you want to export sample data to a custom folder:

  1. Copy config_template.py β†’ config.py
  2. Edit config.py:
OUTPUT_PATH = r'C:\\Your\\Custom\\Output\\Folder'

πŸͺŸ Azure Data Studio Setup

Azure Data Studio can connect to DuckDB using Python notebooks.

Step 1: Create New Notebook

File β†’ New Notebook β†’ Kernel: Python

Step 2: Run Setup Cell

Copy this code into the first cell and run it:

# === SETUP: Run this cell first each session ===
import duckdb
import pandas as pd
from IPython.display import display, HTML

# Display settings
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

# Connection (read + write)
# ⚠️ UPDATE THIS PATH to your repo location
DB_PATH = r'<REPO_PATH>\\data\\DEV_WorldTrend.duckdb'
con = duckdb.connect(DB_PATH)

# Helper function for nice SQL display
def sql(query: str):
    result = con.execute(query).df()
    html = f"""
    <div style="overflow-x: auto; width: 100%;">
        {result.to_html(index=False)}
    </div>
    <p><b>{len(result)} rows</b></p>
    """
    display(HTML(html))
    return result

print("βœ… Connected to DEV_WorldTrend.duckdb")

Step 3: Query Your Data

Now you can run SQL queries in new cells:

sql("SHOW ALL TABLES")

sql("SELECT * FROM bronze.iso_to_region LIMIT 10")

sql(
    """
    SELECT country, year, coal_consumption 
    FROM bronze.energy_data 
    WHERE iso_code = 'DEU'
    ORDER BY year DESC
    LIMIT 20
    """
)

Step 4: Close Connection (End of Session)

con.close()
print("πŸ”’ Connection closed")

Important: Close the connection before committing to Git or refreshing Power BI, otherwise you'll get database locked errors.


▢️ Running Scripts

All scripts should be run from the repository root folder:

cd genai-s2t-energy-demo

# βœ… Correct
python scripts/python/01_ingest_population.py

# ❌ Wrong (don't run from inside scripts folder)
cd scripts/python
python 01_ingest_population.py

πŸ“ˆ Power BI Report

Opening the Report

Navigate to powerbi/Energy Report.pbip and open it in Power BI Desktop.

Data Refresh

  1. Close any DuckDB connections (notebooks, scripts).
  2. In Power BI: Home β†’ Refresh.

Connection Issues?

If Power BI shows connection errors:

  • Check that <REPO_PATH> is replaced in TMDL files.
  • Ensure the DuckDB database exists: data/DEV_WorldTrend.duckdb.
  • Verify Python path in Power BI: File β†’ Options β†’ Python scripting.

🧰 Troubleshooting

"Database is locked" error

Cause: Another process has the DuckDB file open.
Fix:

  • Close Azure Data Studio notebooks
  • Close any Python scripts
  • Restart Power BI Desktop

"No module named 'duckdb'" error

Fix:

pip install duckdb

"No module named 'matplotlib'" error in Power BI

Fix:

pip install matplotlib

Power BI requires matplotlib even if your script doesn't use it.

Scripts fail with "file not found"

Cause: Running script from the wrong folder.
Fix: Always run from repo root:

cd genai-s2t-energy-demo
python scripts/python/your_script.py

Path issues after cloning

Fix: Search and replace <REPO_PATH> in all files:

  • VS Code: Ctrl+Shift+H
  • Find: <REPO_PATH>
  • Replace: C:\\Your\\Actual\\Path\\genai-s2t-energy-demo

πŸ“ License

MIT License β€” see LICENSE file.

🀝 Contributing

This is a demo repository. Feel free to fork and adapt for your own use cases.

About

"Scene and props" for generating Source To Target for Python SQL M and DAX using GenAI. Also small prototype setup with DuckDB

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors