Demo: Using GenAI to create Source-to-Target documentation for SQL, Python, DAX, and M code
- Purpose
- Data Model
- Prerequisites
- Quick Start
- Project Structure
- Setup Instructions
- Azure Data Studio Setup
- Running Scripts
- Power BI Report
- Troubleshooting
- License
- Contributing
The main purpose of this simple data model is to serve as props and scene for demonstrating how to build Source-to-Target (S2T) documentation using GenAI for:
- π Python code
- ποΈ SQL code
βοΈ M (Power Query) code- π DAX code
β οΈ Important: Due to this demo purpose, some data flow and development rules were bent. Please focus on the use case rather than the data model and Power BI reportβthey are background only.
This repo may be useful for you if:
- β You need a quick way to prototype with Power BI
- β You need a SQL database without server installation
- β You need to ingest various types of data (CSV, pdf, etc.)
- β Everything needs to be portable (no database server required)
Solution: DuckDB β a lightweight, in-process SQL database that runs entirely in Python. No installation, no server, just a single file.
For demo model I have used transformed data from:
https://github.com/owid/energy-data
https://ourworldindata.org/energy
| Tool | Required | Download |
|---|---|---|
| Python 3.11+ | β Yes | python.org |
| Power BI Desktop | β Yes | Microsoft Store |
| Azure Data Studio | Optional | Microsoft |
| VS Code | Optional | code.visualstudio.com |
| Git | Optional | git-scm.com |
If you want to recreate the proces of setting up prototype please follow below steps:
git clone https://github.com/datameisterpl/genai-s2t-energy-demo-public.git
cd genai-s2t-energy-demopip install duckdb pandas matplotlib
python scripts/python/00_setup_database.py
python scripts/python/01_ingest_iso_to_region.py
python scripts/python/01_ingest_GDP.py
python scripts/python/01_ingest_population.py
python scripts/python/01_ingest_energy_data.py- pip install jupyterlab jupysql sqlalchemy duckdb duckdb-engine pandas
- ensure you have Jupyter extension enable in Visual Studio
- run
%load_ext sql
%sql duckdb:///../data/dev_worldtrend.duckdb- now use %%sql at the begining of new code cell and you enjoy working with SQL in Database from Visual Studio
- see notebooks\sql_data_analytics.ipynb for examples
- Create new Notebook, use Python3 Kernel, connect to DuckBD (see details below)
- As the tool will be retired by Microsoft soon consider option 1
- Navigate to: powerbi/Energy Report.pbip to open Power BI (from Power BI Desktop)
- Data are loaded via Python connection
- Ensure you update <REPO_PATH> on every table to location of your repo
- update config.py file to define your OUTPUT_PATH
- run sql_draft_scripts\print_all_tables.py
genai-s2t-energy-demo/
β
βββ π README.md # This file
βββ π config_template.py # Template for output path config
βββ π config.py # Your personal config (gitignored)
βββ π .gitignore # Git ignore rules
β
βββ π data/
β βββ π raw/ # Source CSV files
β β βββ energy_data.csv
β β βββ GDP.csv
β β βββ iso_to_region.csv
β β βββ population.csv
β βββ π dev_worldtrend.duckdb # DuckDB database file
β
βββ π images/ # Documentation images
β βββ data_model.jpg
β
βββ π notebooks/ # Jupyter notebooks
β βββ sql_data_analytics.ipynb # SQL analytics in notebook
β
βββ π scripts/
β βββ π python/ # Python ingestion scripts
β β βββ DEV_setup_database.py
β β βββ 00_explore_*.py # Data exploration scripts
β β βββ 01_ingest_*.py # Data ingestion scripts
β β βββ π pbi_ingestion/ # Power BI data source scripts
β β βββ 01_silver_countries_and_regions.py
β β βββ 02_silver_countries_all_data.py
β βββ π sql/ # SQL transformation scripts
β βββ 01_silver_countries_and_regions.sql
β βββ 02_silver_countries_all_data.sql
β
βββ π powerbi/ # Power BI Project
β βββ Energy Report.pbip
β βββ π Energy Report.Report/ # Report visuals & pages
β β βββ π definition/pages/
β β βββ π visuals/
β β βββ *.json # Visual configurations
β βββ π Energy Report.SemanticModel/
β βββ π definition/tables/ # Data model tables (TMDL)
β βββ Calendar.tmdl
β βββ DIM_Countries_and_Regions.tmdl
β βββ FCT__Countries_Gold_Data.tmdl
β βββ bridge_year.tmdl
β βββ _Measures.tmdl
β
βββ π sql_draft_scripts/ # Working SQL scripts
βββ copy_tmdl_files.py
βββ print_all_tables.py
βββ print_sample.py
βββ tables_display.py
After cloning, find and replace the placeholder <REPO_PATH> with your local path in these files:
SQL files β scripts/sql/*.sql β connection strings
TMDL files β powerbi/Energy Report.SemanticModel/definition/tables/*.tmdl β Python script paths
Example:
Find: <REPO_PATH>
Replace: C:\\Users\\YourName\\genai-s2t-energy-demo
π‘ Tip: In VS Code, use Ctrl+Shift+H to find and replace across all files.
If you want to export sample data to a custom folder:
- Copy
config_template.pyβconfig.py - Edit
config.py:
OUTPUT_PATH = r'C:\\Your\\Custom\\Output\\Folder'Azure Data Studio can connect to DuckDB using Python notebooks.
File β New Notebook β Kernel: Python
Copy this code into the first cell and run it:
# === SETUP: Run this cell first each session ===
import duckdb
import pandas as pd
from IPython.display import display, HTML
# Display settings
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
# Connection (read + write)
# β οΈ UPDATE THIS PATH to your repo location
DB_PATH = r'<REPO_PATH>\\data\\DEV_WorldTrend.duckdb'
con = duckdb.connect(DB_PATH)
# Helper function for nice SQL display
def sql(query: str):
result = con.execute(query).df()
html = f"""
<div style="overflow-x: auto; width: 100%;">
{result.to_html(index=False)}
</div>
<p><b>{len(result)} rows</b></p>
"""
display(HTML(html))
return result
print("β
Connected to DEV_WorldTrend.duckdb")Now you can run SQL queries in new cells:
sql("SHOW ALL TABLES")
sql("SELECT * FROM bronze.iso_to_region LIMIT 10")
sql(
"""
SELECT country, year, coal_consumption
FROM bronze.energy_data
WHERE iso_code = 'DEU'
ORDER BY year DESC
LIMIT 20
"""
)con.close()
print("π Connection closed")Important: Close the connection before committing to Git or refreshing Power BI, otherwise you'll get database locked errors.
All scripts should be run from the repository root folder:
cd genai-s2t-energy-demo
# β
Correct
python scripts/python/01_ingest_population.py
# β Wrong (don't run from inside scripts folder)
cd scripts/python
python 01_ingest_population.pyNavigate to powerbi/Energy Report.pbip and open it in Power BI Desktop.
- Close any DuckDB connections (notebooks, scripts).
- In Power BI: Home β Refresh.
If Power BI shows connection errors:
- Check that
<REPO_PATH>is replaced in TMDL files. - Ensure the DuckDB database exists:
data/DEV_WorldTrend.duckdb. - Verify Python path in Power BI: File β Options β Python scripting.
Cause: Another process has the DuckDB file open.
Fix:
- Close Azure Data Studio notebooks
- Close any Python scripts
- Restart Power BI Desktop
Fix:
pip install duckdbFix:
pip install matplotlibPower BI requires matplotlib even if your script doesn't use it.
Cause: Running script from the wrong folder.
Fix: Always run from repo root:
cd genai-s2t-energy-demo
python scripts/python/your_script.pyFix: Search and replace <REPO_PATH> in all files:
- VS Code: Ctrl+Shift+H
- Find:
<REPO_PATH> - Replace:
C:\\Your\\Actual\\Path\\genai-s2t-energy-demo
MIT License β see LICENSE file.
This is a demo repository. Feel free to fork and adapt for your own use cases.
