This notebook explores different machine learning techniques for predicting credit card payment defaults, with a focus on k-Nearest Neighbors (kNN) and neural networks. It uses the default of credit card clients dataset, containing information on 30,000 clients.
- Pandas for data manipulation
- NumPy for numerical operations
- Matplotlib and Seaborn for data visualization
- scikit-learn for machine learning tasks
- Keras for deep learning
-
Data Exploration
- Data summary statistics and unique counts for columns.
- Exploration of default rates among different genders and marital statuses.
-
Data Visualization
- Boxplots and histograms to understand the distribution of the variables.
-
k-Nearest Neighbors (kNN) Classifier
- A kNN model is trained to predict 'default' status.
- k is optimized using a validation set, with the optimal value being 4.
-
Model Evaluation
- Metrics like ROC-AUC, accuracy, and confusion matrix are used for evaluation.
-
kNN with Segmentation
- k-Means clustering is applied to segment the data based on age.
- kNN models are built for each segment.
-
Keras Neural Network Model
- A neural network model is trained which outperforms the other models.
We use the kNN algorithm for classification. The dataset is split into a 70-30 ratio for training and validation sets. We choose the optimal value of k=4 based on the validation set.
We apply k-Means clustering to segment the population based on age. For each segment, a kNN model is built. Each segmented model performs similarly, but with a higher true positive rate compared to the non-segmented kNN model.
A neural network model using Keras is built and compared against the kNN models. It shows the best performance among all the models.
