Skip to content

Commit 85c17f6

Browse files
committed
Added 3 files(app.py, sales_data.csv, transformed_data.csv, runtime_log.txt, .github/workflows/etl_pipeline.yaml)
0 parents  commit 85c17f6

File tree

5 files changed

+116
-0
lines changed

5 files changed

+116
-0
lines changed
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
name: ETL Pipeline Automation
2+
3+
on:
4+
push:
5+
brances:
6+
- main
7+
8+
jobs:
9+
etl_pipeline_job:
10+
runs-on: ubuntu-latest
11+
12+
steps:
13+
- name: Checkout Code
14+
uses: actions/checkout@v4
15+
16+
- name: Setup Python
17+
uses: actions/setup-python@v5
18+
with:
19+
python-version: '3.13'
20+
21+
- name: Install Libraries or Dependencies
22+
run: pip install streamlit pandas datetime
23+
24+
- name: Run the python script
25+
run: python app.py
26+
27+
- name: Push back the changes to the files to the github repo
28+
uses: mikeal/push-to-github-action@master
29+
env:
30+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
31+
branch_name: 'main'

app.py

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
import streamlit as st
2+
import pandas as pd
3+
import datetime
4+
5+
# Function to load data
6+
def load_data():
7+
data = pd.read_csv('sales_data.csv')
8+
return data
9+
10+
# Function to transform data
11+
def transform_data(df):
12+
df['Date'] = pd.to_datetime(df['Date'])
13+
df['Year'] = df['Date'].dt.year
14+
df['Month'] = df['Date'].dt.month
15+
df['Sales_to_Profit_Ratio'] = df['Sales'] / df['Profit']
16+
df['Cumulative_Sales'] = df['Sales'].cumsum()
17+
return df
18+
19+
# Function to extract summary statistics
20+
def get_summary_statistics(df):
21+
summary = df.describe()
22+
return summary
23+
24+
# Define the log file
25+
log_file = 'runtime_log.txt'
26+
27+
def log_message(message):
28+
with open(log_file, 'a') as f:
29+
f.write(f'{datetime.datetime.now()} - {message}\n')
30+
31+
# Main function to run the Streamlit app
32+
def main():
33+
st.title("ETL Process for Sales Data")
34+
35+
# Step 1: Extract
36+
st.header("Step 1: Extract")
37+
data = load_data()
38+
print(data)
39+
print(type(data)) # data - type(<class 'pandas.core.frame.DataFrame'>)
40+
st.subheader("Raw Data")
41+
st.table(data)
42+
df_summary_statistics = get_summary_statistics(data)
43+
print(df_summary_statistics)
44+
45+
# Step 2: Transform
46+
st.header("Step 2: Transform")
47+
transformed_data = transform_data(data)
48+
print(transformed_data)
49+
print(type(transformed_data)) # transformed_data - type(<class 'pandas.core.frame.DataFrame)
50+
st.subheader('Transformed Data')
51+
st.table(transformed_data)
52+
transformed_df_summary_statistics = get_summary_statistics(transformed_data)
53+
print(transformed_df_summary_statistics)
54+
55+
# Load data
56+
st.header("Step 3: Load data")
57+
transformed_data.to_csv('tranformed_data.csv', index=False)
58+
59+
# Generate Run time log
60+
log_message(f'ETL job completed and data saved to transformed_data.csv')
61+
62+
# showing the contents in the file in streamlit app
63+
with open(log_file, 'r') as f:
64+
contents = f.read()
65+
66+
st.header('Runtime Logs')
67+
st.write(contents)
68+
69+
main()

runtime_log.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
2025-03-17 10:08:18.021671 - ETL job completed and data saved to transformed_data.csv
2+
2025-03-17 10:08:32.361232 - ETL job completed and data saved to transformed_data.csv
3+
2025-03-17 10:09:13.872924 - ETL job completed and data saved to transformed_data.csv
4+
2025-03-17 10:09:22.577862 - ETL job completed and data saved to transformed_data.csv

sales_data.csv

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
Date,Product,Sales,Profit
2+
2023-01-01,Product A,100,20
3+
2023-01-02,Product B,150,30
4+
2023-01-03,Product A,200,40
5+
2023-01-04,Product C,250,50
6+
2023-01-05,Product B,300,60

tranformed_data.csv

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
Date,Product,Sales,Profit,Year,Month,Sales_to_Profit_Ratio,Cumulative_Sales
2+
2023-01-01,Product A,100,20,2023,1,5.0,100
3+
2023-01-02,Product B,150,30,2023,1,5.0,250
4+
2023-01-03,Product A,200,40,2023,1,5.0,450
5+
2023-01-04,Product C,250,50,2023,1,5.0,700
6+
2023-01-05,Product B,300,60,2023,1,5.0,1000

0 commit comments

Comments
 (0)