Skip to content

TayoOwolabi/financial-transaction-fraud-detection

Repository files navigation

Financial Transaction Fraud Detection using Behavioural Analytics

A production-style fraud analytics project for identifying suspicious financial transactions using behavioural signals, anomaly detection, and supervised machine learning.

This repository is designed as a strong portfolio project for banking, fintech, payments, risk analytics, and financial crime roles. It packages a complete notebook-based workflow into a GitHub-ready structure with clear documentation, reproducible dependencies, and deliverables.

Project objective

The goal is to populate the target field is_fraud for a transaction dataset where confirmed fraud labels are not initially available. The workflow combines:

  • behavioural feature engineering
  • rule-based fraud heuristics
  • unsupervised anomaly detection
  • weak supervision through proxy fraud labels
  • supervised probability scoring
  • out-of-time validation
  • transaction-level reason codes for explainability

This mirrors how fraud analytics teams often start when historical fraud confirmations are incomplete or delayed.

Why this project matters

Fraud detection in financial transactions is typically not a simple classification exercise. In real-world environments, fraud is:

  • rare
  • dynamic
  • adversarial
  • operationally constrained by review capacity
  • sensitive to time leakage and concept drift

Because of that, this project uses a more realistic workflow than a basic random train/test split. The modelling sequence follows core principles used in financial fraud monitoring and model governance:

  • chronological train/test separation
  • anomaly detection fit on historical data only
  • thresholding based on operational block-rate logic
  • explainability through top model drivers

Dataset

Input file:

  • data/finance_fraud_data_copy.csv

Notebook:

  • notebooks/Finance_Fraud_Pred.ipynb

Scored output:

  • outputs/finance_fraud_scored.csv

Methodology

1. Data loading and validation

The workflow begins by loading the transaction dataset, validating required fields, and parsing transaction timestamps.

Key fields used include:

  • Transaction_ID
  • Timestamp
  • Customer_ID
  • Amount
  • Merchant_Category
  • Distance_from_Home
  • Device_Type
  • IP_Risk_Score
  • Avg_Spending_Habit
  • Is_Weekend
  • Is_Night_Transaction

2. Behavioural feature engineering

The notebook derives transaction-level and customer-level fraud signals such as:

  • time features: hour, day of week, month
  • transaction amount log transform
  • amount relative to spending habit
  • customer average amount and standard deviation
  • transaction z-score versus customer behaviour
  • customer-category usage rate
  • customer-device usage rate

These are standard behavioural risk indicators because fraud is often defined by deviation from known patterns rather than by amount alone.

3. Out-of-time validation design

Transactions are sorted by time and split chronologically into:

  • training set = earlier transactions
  • test set = later transactions

This is important in fraud analytics because it reduces look-ahead bias and better reflects production deployment, where models are always applied to future transactions.

4. Rules-based behavioural scoring

A transparent rules layer assigns risk points using patterns such as:

  • elevated IP risk
  • unusual spend relative to customer norm
  • suspicious distance from home
  • rare device or merchant usage
  • night-time or weekend activity

This produces:

  • rule_risk_score
  • rule_flag

5. Unsupervised anomaly detection

Because confirmed fraud labels are unavailable at the outset, the notebook uses Isolation Forest to detect anomalous transactions.

Important design choice:

  • the anomaly model is fit on training data only
  • the anomaly threshold is derived from training data only
  • the test period is scored as unseen future behaviour

This produces:

  • iso_anom_score
  • iso_flag

6. Proxy fraud labels

A weakly supervised target is created using:

pseudo_is_fraud = rule_flag OR iso_flag

This is a practical approach when building an initial fraud discovery model before confirmed case labels are available.

7. Supervised fraud probability model

A Logistic Regression model is then trained on the proxy labels to produce a stable fraud risk score.

Output score:

  • fraud_probability

This stage converts a rough fraud discovery signal into a scalable scoring model.

8. Validation and operational thresholding

Model performance is assessed on the out-of-time test set using:

  • ROC-AUC
  • PR-AUC
  • precision
  • recall
  • F1 score
  • confusion matrix

A block threshold is chosen using a target operational block rate. This aligns with fraud operations where institutions often manage alert volumes through investigation capacity and false-positive tolerance.

9. Explainability

The notebook also includes:

  • global feature importance for the logistic model
  • transaction-level reason codes based on feature contribution

This improves analyst usability and supports model governance, investigation review, and auditability.

Repository structure

financial-transaction-fraud-detection/
├── README.md
├── LICENSE
├── requirements.txt
├── .gitignore
├── data/
│   └── finance_fraud_data_copy.csv
├── notebooks/
│   └── Finance_Fraud_Pred.ipynb
├── outputs/
│   └── finance_fraud_scored.csv
└── src/
    └── fraud_model_pipeline.py

How to run

1. Clone the repository

git clone <your-repo-url>
cd financial-transaction-fraud-detection

2. Install dependencies

pip install -r requirements.txt

3. Launch Jupyter

jupyter notebook

4. Open and run

notebooks/Finance_Fraud_Pred.ipynb

Run the notebook from top to bottom.

Recommended interpretation of outputs

Main output fields:

  • fraud_probability: continuous risk score from the supervised model
  • is_fraud: final binary fraud flag based on the selected block rate
  • reason_codes: top contributors for flagged transactions

These outputs can be used to simulate:

  • transaction blocking
  • analyst review queues
  • fraud investigation prioritization
  • threshold backtesting

Financial-risk and fraud standards reflected in the workflow

This project incorporates several practices that align with good fraud analytics discipline:

  • out-of-time validation instead of random leakage-prone splitting
  • anomaly detection for weak-label environments
  • operational threshold selection rather than arbitrary 0.50 cutoffs
  • transaction explainability via reason codes
  • transparent rules layer alongside machine learning

While this is a portfolio project rather than a regulated production model, the structure reflects concepts that are relevant to global financial fraud teams, including model monitoring, behavioural analysis, alert triage, and explainability.

Potential extensions

A future version could add:

  • PSI and drift monitoring
  • monthly performance backtesting
  • SHAP-based local explanations
  • challenger models such as Random Forest or XGBoost
  • API deployment for real-time scoring
  • Docker packaging
  • unit tests and CI workflow

Author

Omotayo Owolabi
Financial Analysis | Risk Analytics | Behavioural Modelling | Fraud Detection

License

This project is released under the MIT License.

About

Production-style fraud detection pipeline for financial transactions using behavioural analytics, anomaly detection, and machine learning with out-of-time validation and explainable risk scoring.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors