Financial Transaction Fraud Detection using Behavioural Analytics

A production-style fraud analytics project for identifying suspicious financial transactions using behavioural signals, anomaly detection, and supervised machine learning.

This repository is designed as a strong portfolio project for banking, fintech, payments, risk analytics, and financial crime roles. It packages a complete notebook-based workflow into a GitHub-ready structure with clear documentation, reproducible dependencies, and deliverables.

Project objective

The goal is to populate the target field is_fraud for a transaction dataset where confirmed fraud labels are not initially available. The workflow combines:

behavioural feature engineering
rule-based fraud heuristics
unsupervised anomaly detection
weak supervision through proxy fraud labels
supervised probability scoring
out-of-time validation
transaction-level reason codes for explainability

This mirrors how fraud analytics teams often start when historical fraud confirmations are incomplete or delayed.

Why this project matters

Fraud detection in financial transactions is typically not a simple classification exercise. In real-world environments, fraud is:

rare
dynamic
adversarial
operationally constrained by review capacity
sensitive to time leakage and concept drift

Because of that, this project uses a more realistic workflow than a basic random train/test split. The modelling sequence follows core principles used in financial fraud monitoring and model governance:

chronological train/test separation
anomaly detection fit on historical data only
thresholding based on operational block-rate logic
explainability through top model drivers

Dataset

Input file:

data/finance_fraud_data_copy.csv

Notebook:

notebooks/Finance_Fraud_Pred.ipynb

Scored output:

outputs/finance_fraud_scored.csv

Methodology

1. Data loading and validation

The workflow begins by loading the transaction dataset, validating required fields, and parsing transaction timestamps.

Key fields used include:

Transaction_ID
Timestamp
Customer_ID
Amount
Merchant_Category
Distance_from_Home
Device_Type
IP_Risk_Score
Avg_Spending_Habit
Is_Weekend
Is_Night_Transaction

2. Behavioural feature engineering

The notebook derives transaction-level and customer-level fraud signals such as:

time features: hour, day of week, month
transaction amount log transform
amount relative to spending habit
customer average amount and standard deviation
transaction z-score versus customer behaviour
customer-category usage rate
customer-device usage rate

These are standard behavioural risk indicators because fraud is often defined by deviation from known patterns rather than by amount alone.

3. Out-of-time validation design

Transactions are sorted by time and split chronologically into:

training set = earlier transactions
test set = later transactions

This is important in fraud analytics because it reduces look-ahead bias and better reflects production deployment, where models are always applied to future transactions.

4. Rules-based behavioural scoring

A transparent rules layer assigns risk points using patterns such as:

elevated IP risk
unusual spend relative to customer norm
suspicious distance from home
rare device or merchant usage
night-time or weekend activity

This produces:

rule_risk_score
rule_flag

5. Unsupervised anomaly detection

Because confirmed fraud labels are unavailable at the outset, the notebook uses Isolation Forest to detect anomalous transactions.

Important design choice:

the anomaly model is fit on training data only
the anomaly threshold is derived from training data only
the test period is scored as unseen future behaviour

This produces:

iso_anom_score
iso_flag

6. Proxy fraud labels

A weakly supervised target is created using:

pseudo_is_fraud = rule_flag OR iso_flag

This is a practical approach when building an initial fraud discovery model before confirmed case labels are available.

7. Supervised fraud probability model

A Logistic Regression model is then trained on the proxy labels to produce a stable fraud risk score.

Output score:

fraud_probability

This stage converts a rough fraud discovery signal into a scalable scoring model.

8. Validation and operational thresholding

Model performance is assessed on the out-of-time test set using:

ROC-AUC
PR-AUC
precision
recall
F1 score
confusion matrix

A block threshold is chosen using a target operational block rate. This aligns with fraud operations where institutions often manage alert volumes through investigation capacity and false-positive tolerance.

9. Explainability

The notebook also includes:

global feature importance for the logistic model
transaction-level reason codes based on feature contribution

This improves analyst usability and supports model governance, investigation review, and auditability.

Repository structure

financial-transaction-fraud-detection/
├── README.md
├── LICENSE
├── requirements.txt
├── .gitignore
├── data/
│   └── finance_fraud_data_copy.csv
├── notebooks/
│   └── Finance_Fraud_Pred.ipynb
├── outputs/
│   └── finance_fraud_scored.csv
└── src/
    └── fraud_model_pipeline.py

How to run

1. Clone the repository

git clone <your-repo-url>
cd financial-transaction-fraud-detection

2. Install dependencies

pip install -r requirements.txt

3. Launch Jupyter

jupyter notebook

4. Open and run

notebooks/Finance_Fraud_Pred.ipynb

Run the notebook from top to bottom.

Recommended interpretation of outputs

Main output fields:

fraud_probability: continuous risk score from the supervised model
is_fraud: final binary fraud flag based on the selected block rate
reason_codes: top contributors for flagged transactions

These outputs can be used to simulate:

transaction blocking
analyst review queues
fraud investigation prioritization
threshold backtesting

Financial-risk and fraud standards reflected in the workflow

This project incorporates several practices that align with good fraud analytics discipline:

out-of-time validation instead of random leakage-prone splitting
anomaly detection for weak-label environments
operational threshold selection rather than arbitrary 0.50 cutoffs
transaction explainability via reason codes
transparent rules layer alongside machine learning

While this is a portfolio project rather than a regulated production model, the structure reflects concepts that are relevant to global financial fraud teams, including model monitoring, behavioural analysis, alert triage, and explainability.

Potential extensions

A future version could add:

PSI and drift monitoring
monthly performance backtesting
SHAP-based local explanations
challenger models such as Random Forest or XGBoost
API deployment for real-time scoring
Docker packaging
unit tests and CI workflow

Author

Omotayo Owolabi
Financial Analysis | Risk Analytics | Behavioural Modelling | Fraud Detection

License

This project is released under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Financial Transaction Fraud Detection using Behavioural Analytics

Project objective

Why this project matters

Dataset

Methodology

1. Data loading and validation

2. Behavioural feature engineering

3. Out-of-time validation design

4. Rules-based behavioural scoring

5. Unsupervised anomaly detection

6. Proxy fraud labels

7. Supervised fraud probability model

8. Validation and operational thresholding

9. Explainability

Repository structure

How to run

1. Clone the repository

2. Install dependencies

3. Launch Jupyter

4. Open and run

Recommended interpretation of outputs

Financial-risk and fraud standards reflected in the workflow

Potential extensions

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Finance_Fraud_Pred.ipynb		Finance_Fraud_Pred.ipynb
LICENSE		LICENSE
README.md		README.md
finance_fraud_data_copy.csv		finance_fraud_data_copy.csv
finance_fraud_scored.csv		finance_fraud_scored.csv
fraud_model_pipeline.py		fraud_model_pipeline.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Financial Transaction Fraud Detection using Behavioural Analytics

Project objective

Why this project matters

Dataset

Methodology

1. Data loading and validation

2. Behavioural feature engineering

3. Out-of-time validation design

4. Rules-based behavioural scoring

5. Unsupervised anomaly detection

6. Proxy fraud labels

7. Supervised fraud probability model

8. Validation and operational thresholding

9. Explainability

Repository structure

How to run

1. Clone the repository

2. Install dependencies

3. Launch Jupyter

4. Open and run

Recommended interpretation of outputs

Financial-risk and fraud standards reflected in the workflow

Potential extensions

Author

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages