A production-style fraud analytics project for identifying suspicious financial transactions using behavioural signals, anomaly detection, and supervised machine learning.
This repository is designed as a strong portfolio project for banking, fintech, payments, risk analytics, and financial crime roles. It packages a complete notebook-based workflow into a GitHub-ready structure with clear documentation, reproducible dependencies, and deliverables.
The goal is to populate the target field is_fraud for a transaction dataset where confirmed fraud labels are not initially available. The workflow combines:
- behavioural feature engineering
- rule-based fraud heuristics
- unsupervised anomaly detection
- weak supervision through proxy fraud labels
- supervised probability scoring
- out-of-time validation
- transaction-level reason codes for explainability
This mirrors how fraud analytics teams often start when historical fraud confirmations are incomplete or delayed.
Fraud detection in financial transactions is typically not a simple classification exercise. In real-world environments, fraud is:
- rare
- dynamic
- adversarial
- operationally constrained by review capacity
- sensitive to time leakage and concept drift
Because of that, this project uses a more realistic workflow than a basic random train/test split. The modelling sequence follows core principles used in financial fraud monitoring and model governance:
- chronological train/test separation
- anomaly detection fit on historical data only
- thresholding based on operational block-rate logic
- explainability through top model drivers
Input file:
data/finance_fraud_data_copy.csv
Notebook:
notebooks/Finance_Fraud_Pred.ipynb
Scored output:
outputs/finance_fraud_scored.csv
The workflow begins by loading the transaction dataset, validating required fields, and parsing transaction timestamps.
Key fields used include:
Transaction_IDTimestampCustomer_IDAmountMerchant_CategoryDistance_from_HomeDevice_TypeIP_Risk_ScoreAvg_Spending_HabitIs_WeekendIs_Night_Transaction
The notebook derives transaction-level and customer-level fraud signals such as:
- time features: hour, day of week, month
- transaction amount log transform
- amount relative to spending habit
- customer average amount and standard deviation
- transaction z-score versus customer behaviour
- customer-category usage rate
- customer-device usage rate
These are standard behavioural risk indicators because fraud is often defined by deviation from known patterns rather than by amount alone.
Transactions are sorted by time and split chronologically into:
- training set = earlier transactions
- test set = later transactions
This is important in fraud analytics because it reduces look-ahead bias and better reflects production deployment, where models are always applied to future transactions.
A transparent rules layer assigns risk points using patterns such as:
- elevated IP risk
- unusual spend relative to customer norm
- suspicious distance from home
- rare device or merchant usage
- night-time or weekend activity
This produces:
rule_risk_scorerule_flag
Because confirmed fraud labels are unavailable at the outset, the notebook uses Isolation Forest to detect anomalous transactions.
Important design choice:
- the anomaly model is fit on training data only
- the anomaly threshold is derived from training data only
- the test period is scored as unseen future behaviour
This produces:
iso_anom_scoreiso_flag
A weakly supervised target is created using:
pseudo_is_fraud = rule_flag OR iso_flag
This is a practical approach when building an initial fraud discovery model before confirmed case labels are available.
A Logistic Regression model is then trained on the proxy labels to produce a stable fraud risk score.
Output score:
fraud_probability
This stage converts a rough fraud discovery signal into a scalable scoring model.
Model performance is assessed on the out-of-time test set using:
- ROC-AUC
- PR-AUC
- precision
- recall
- F1 score
- confusion matrix
A block threshold is chosen using a target operational block rate. This aligns with fraud operations where institutions often manage alert volumes through investigation capacity and false-positive tolerance.
The notebook also includes:
- global feature importance for the logistic model
- transaction-level reason codes based on feature contribution
This improves analyst usability and supports model governance, investigation review, and auditability.
financial-transaction-fraud-detection/
├── README.md
├── LICENSE
├── requirements.txt
├── .gitignore
├── data/
│ └── finance_fraud_data_copy.csv
├── notebooks/
│ └── Finance_Fraud_Pred.ipynb
├── outputs/
│ └── finance_fraud_scored.csv
└── src/
└── fraud_model_pipeline.py
git clone <your-repo-url>
cd financial-transaction-fraud-detectionpip install -r requirements.txtjupyter notebooknotebooks/Finance_Fraud_Pred.ipynb
Run the notebook from top to bottom.
Main output fields:
fraud_probability: continuous risk score from the supervised modelis_fraud: final binary fraud flag based on the selected block ratereason_codes: top contributors for flagged transactions
These outputs can be used to simulate:
- transaction blocking
- analyst review queues
- fraud investigation prioritization
- threshold backtesting
This project incorporates several practices that align with good fraud analytics discipline:
- out-of-time validation instead of random leakage-prone splitting
- anomaly detection for weak-label environments
- operational threshold selection rather than arbitrary 0.50 cutoffs
- transaction explainability via reason codes
- transparent rules layer alongside machine learning
While this is a portfolio project rather than a regulated production model, the structure reflects concepts that are relevant to global financial fraud teams, including model monitoring, behavioural analysis, alert triage, and explainability.
A future version could add:
- PSI and drift monitoring
- monthly performance backtesting
- SHAP-based local explanations
- challenger models such as Random Forest or XGBoost
- API deployment for real-time scoring
- Docker packaging
- unit tests and CI workflow
Omotayo Owolabi
Financial Analysis | Risk Analytics | Behavioural Modelling | Fraud Detection
This project is released under the MIT License.