Skip to content

clt449/ClickPredictML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

ClickPredictML

Predicting Ad Clicks

Project Summary: Data Analysis and Modeling for Click Prediction

Background: Goal is to analyze data related to ad clicks and predict whether a user will click on an ad.

Project Steps:

  1. Data Exploration and Preprocessing:

    • Explored the dataset containing features like 'Daily Time Spent on Site,' 'Age,' 'Area Income,' 'Daily Internet Usage,' and more.
    • Conducted basic data preprocessing tasks such as handling missing values and categorical variables.
  2. Exploratory Data Analysis (EDA):

    • Used various data visualization techniques to understand the relationships between features and the target variable ('Clicked on Ad').
    • Created correlation heatmaps, pair plots, and other visualizations to identify patterns and insights.
  3. Data Modeling:

    • Chose the Logistic Regression and Random Forest models for classification due to their suitability for the problem.
    • Split the dataset into training and testing sets.
    • Utilized preprocessing techniques including standardization and scaling.
  4. Model Evaluation:

    • Trained both models on the training data and evaluated their performance on the test data.
    • Calculated accuracy, precision, recall, and F1-score for both models.
    • Generated ROC curves and calculated AUC to assess model discrimination.
  5. Hyperparameter Tuning:

    • Performed hyperparameter tuning for both Logistic Regression and Random Forest models using GridSearchCV.
    • Identified optimal hyperparameters to improve model performance.
  6. Model Comparison and Selection:

    • Evaluated the performance of both models based on accuracy, precision, recall, and AUC.
    • Considered model interpretability, complexity, and generalization.

Results:

  • Both models demonstrated strong performance, with accuracy scores around 96-99% on the test set.
  • Logistic Regression achieved a slightly higher AUC, indicating better overall discrimination.
  • Random Forest achieved higher training accuracy but exhibited slightly lower precision and recall compared to Logistic Regression.

Recommendation:

  • The choice between Logistic Regression and Random Forest depends on your priorities:
    • If interpretability and simpler deployment are important, consider Logistic Regression due to its competitive performance and high interpretability.
    • If achieving a slightly higher accuracy is a priority, and you are willing to trade off some interpretability, Random Forest is a strong option.

Takeaway: Through this project, you've demonstrated your proficiency in data analysis, machine learning, and model evaluation. You've gained insights into different aspects of model performance and have the flexibility to choose a model based on the specific needs of your future roles. Your ability to explore, preprocess, model, and evaluate data showcases your skills in making data-driven decisions and could help set you apart in your career transition.

About

Predicting Ad Clicks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors