This project applies predictive analytics to the Indian UPI (Unified Payments Interface) ecosystem to forecast transaction demand and detect potential fraud. It combines Simple Linear Regression for demand forecasting and Logistic Regression for fraud classification, providing a dual perspective on digital payment trends and risks.
├── data/
│ ├── UPI Demand Prediction data.xlsx
│ ├── UPI Fraud Prediction data.csv
│ ├── UPI Fraud model testing data.xlsx
├── python files/
│ ├── demand_forecasting.ipynb
│ ├── fraud_detection.ipynb
├── Report on UPI Project.pdf [ A detailed report on the project ]
├── Executive Summary.pdf [ Summary of fthe project ]
└── README.md
- Forecast UPI transaction demand using Simple Linear Regression (SLR).
- Detect potential fraudulent transactions using Logistic Regression.
- Evaluate model performance using R², RMSE (for regression) and accuracy, precision, recall, F1-score (for classification).
- Generate actionable insights for fintech and policymakers.
You can see the attach file "Report on UPI Project.pdf" for detailed report, or here is a short summary of the report -
- 84 monthly observations (2018–2024) with 11 features.
- Features: GDP growth, smartphone penetration, internet users, POS terminals, PMJDY accounts, repo rate, etc.
- Source: RBI, NPCI, MoSPI, and public records.
- 10,000 transaction records across 19 attributes.
- Features: transaction amount, frequency, device ID, location, failed attempts, etc.
- Source: Kaggle & Simulated dataset (for testing the model).
- Data Collection & Preprocessing
- Handled missing values, encoded categorical data, scaled features.
- Exploratory Data Analysis
- Detected outliers, correlations, and feature relationships.
- Model Building
- Simple Linear Regression for demand forecasting.
- Logistic Regression for fraud detection.
- Model Evaluation
- SLR: R², RMSE.
- Logistic Regression: Accuracy, Precision, Recall, F1-score, ROC-AUC.
- No missing values in either dataset.
- Detected outliers using boxplots and Z-scores.
- High correlation between internet usage, smartphone penetration, and UPI demand.
- Visualizations: heatmaps, scatterplots, boxplots.
- Significant Predictors: Smartphone Penetration (p<0.001), Internet Users (p=0.005), PMJDY Accounts (p=0.017)
- Model Performance: R² = 0.998, Adj R² = 0.997
- Predicted UPI Demand (Jan 2025): ≈ 272.25 million transactions
- Significant Predictors: FailedAttempts, LocationRiskScore, Amount
- Model Accuracy: 83%
- Precision (Fraud): 0.78 | Recall (Fraud): 0.80 | F1-score: 0.79
- Detected 84 users with >70% fraud probability.
- Smartphone and internet penetration are the strongest drivers of UPI growth.
- Seasonal spikes correspond with festivals and e-commerce events.
- Failed login attempts and high-risk geolocations are major fraud predictors.
- Predictive analytics can significantly improve proactive fraud monitoring.
Conclusions:
- Predictive analytics provides actionable insights for both demand and fraud patterns.
- Regularized regression models improve stability and interpretability.
Recommendations:
- Enhance digital literacy and smartphone access.
- Implement geolocation-based fraud scoring.
- Flag high-value transactions for verification.
- Continuously retrain models with new data.
- Linear assumptions may not capture all relationships.
- Some behavioral and contextual data were unavailable.
- Future models can use non-linear techniques (Random Forest, XGBoost) and real-time streaming data.