data-preprocessing

Here are 45 public repositories matching this topic...

data-prep-kit / data-prep-kit

Open source project for data preparation for GenAI applications

python data spark malware code-quality data-preprocessing ray data-preparation deduplication data-prep finetuning data-preprocessing-pipelines datacuration large-language-models llm llmapps large-scale-data-processing datarecipes

Updated Mar 13, 2026
HTML

danielhanchen / sciblox

Star

sciblox - Easier Data Science and Machine Learning

python data-science machine-learning data-mining sklearn data-visualization imputation data-analysis data-preprocessing boosting

Updated Jul 28, 2017
HTML

sharmaroshan / Numpy-and-Pandas

Star

Numpy and Pandas are one of the most important building blocks of knowledge to get started in the field of Data Science, Analytics, Machine Learning, Business Intelligence, and Business Analytics. This Tutorial Focuses to help the Beginners to learn the core Concepts of Numpy and Pandas and get started with Machine Learning and Data Science.

data-science machine-learning numpy pandas feature-extraction data-analysis data-preprocessing aggregation feature-engineering dataframe pandas-profiling

Updated Apr 12, 2020
HTML

karamolegkos / EverAnalyzer

Star

EverAnalyzer is my thesis in the Department of Digital Systems of the University of Piraeus. EverAnalyzer is a platform for collecting, preprocessing, processing and analyzing Big Data from the Twitter platform.

java data-science big-data spark mongodb hadoop jsp mahout data-analytics data-collection data-preprocessing data-processing hadoop-mapreduce sparkmllib everanalyzer

Updated Sep 2, 2022
HTML

MezbanS / Real-Estate.

Star

This project creates a statistical model to predict demand for loans in each region of the USA based on monthly family income and rental costs. The results are displayed on a dashboard updated periodically with data retrieval.

exploratory-data-analysis data-preprocessing data-modeling data-reporting

Updated Oct 21, 2023
HTML

aenni0409 / DataScienceJob

Star

My side project about Data Scientist

python text-mining plotly data-wrangling data-preprocessing

Updated Jan 28, 2019
HTML

MJanbandhu / PortugeseBankMarketingProject

Star

The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe to a term deposit (variable y).

data-science machine-learning-algorithms data-preprocessing bank-marketing-analysis data-science-projects data-encoding mohit-janbandhu

Updated Mar 2, 2024
HTML

UzoigweC / EasyVisa

Star

Model for easy facilitation of visa processing and approvals

eda xgboost adaboost data-preprocessing gradient-boosting gridsearchcv business-insights stacking-classifier bagging-and-random-forest

Updated Mar 10, 2024
HTML

akhyear / house_price_prediction

Star

This project explores a Kaggle dataset to build predictive models for house prices. Includes data cleaning, EDA, feature engineering and model training using ensemble methods.

data-science machine-learning evaluation eda model-selection ensemble-learning data-preprocessing feature-engineering gradient-boosting

Updated Feb 9, 2026
HTML

TamerDotWork / vesper

Star

Agentic data intelligence tool using LangChain & Pandas for automated dataset cleaning, governance, and quality analysis.

python machine-learning automation etl ml pandas data-analysis data-preprocessing data-preparation data-cleaning data-governance etl-automation ai-assistant langchain data-cleaning-and-preprocessing agentic-tool-platform dataset-quality agentic-tool ai-assistant-offline

Updated Jan 18, 2026
HTML

digital-mila / WineQuality

Star

Final project for DSCI 100: Developed a KNN classification model in R to predict wine quality using physicochemical properties. Conducted data preprocessing, feature selection, and cross-validation to evaluate model performance.

data-science machine-learning r cross-validation feature-selection classification data-analysis academic-project data-preprocessing wine-quality knn-model physicochemical-analysis

Updated Jan 21, 2025
HTML

DerJator / IntroML

Star

Exercises for the lecture "Introduction to ML". Focus is on data preprocessing and simple machine learning algorithms.

machine-learning data-preprocessing

Updated Jul 30, 2024
HTML

EricaYanoshak / AI-Purchase-Behavior-Project

Star

This project focuses on predicting customer purchase behavior using machine learning models, with an emphasis on feature importance.

machine-learning logistic-regression data-preprocessing predictive-modeling decision-tree-classifier binary-classification smote random-forest-classifier customer-segmentation marketing-analytics stacking-ensemble model-optimization model-implimentation

Updated Dec 25, 2024
HTML

HariprasadManimozhi / data-preparation

Star

Data preparation on raw data using Python

python data-preprocessing

Updated May 13, 2020
HTML

Wolverick1996 / MSc-Thesis

Star

equality data-analysis data-preprocessing equity fairness gender-gap

Updated Oct 5, 2021
HTML

Mohansharmila / Office-furniture-analysis-using-Rstudio

Star

In this two cluster approaches are used: hierarchical clustering and K-means clustering. It is unsupervised learning technique for grouping related data points which shows same behaviour in the dataset regardless of the outcome.

html algorithms rstudio data-visualization data-analysis data-preprocessing clustering-algorithm k-means-algorithm

Updated Oct 29, 2024
HTML

kamecon / TFM_Kschool

Star

Trabajo de Fin de Máster KSchool

r reporting python3 data-wrangling data-preprocessing factor-analysis logit

Updated Sep 4, 2017
HTML

vl831227 / EasyVisa

Star

Ensemble Techniques: Analyze the data of Visa applicants, build a predictive model to facilitate the process of visa approvals, and based on important factors that significantly influence the Visa status recommend a suitable profile for the applicants for whom the visa should be certified or denied.

eda data-preprocessing business-insights ensemble-techniques stacking-classifier customer-profiling boosting-classifier-adaboost boosting-classifier-gradient-boosting boosting-classifier-xgboost hyperparameter-tuning-using-gridsearchcv

Updated Jun 20, 2025
HTML

Omaewayoshiekinoroyo / LeafNet_Final

Star

Ini merupakan repositori proyek akhir untuk aplikasi LeafNet yang di buat oleh Tim Ampera dari kelas Asimo Kelompok 3 pada program Artificial Intelligence Mastery Program yang di selenggarakan oleh Orbit Future Academy

javascript css python training html php data-science machine-learning front-end computer-vision back-end python3 artificial-intelligence transfer-learning data-preprocessing performance-evaluation model-deployment

Updated Jun 5, 2023
HTML

Srking501 / futurelearn_mooc

Star

A summative coursework for CSC8631 Data Management and Exploratory Data Analysis

data-science data-mining deployment exploratory-data-analysis eda data-visualization data-preprocessing crisp-dm

Updated Nov 18, 2023
HTML

Improve this page

Add a description, image, and links to the data-preprocessing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-preprocessing topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-preprocessing

Here are 45 public repositories matching this topic...

data-prep-kit / data-prep-kit

danielhanchen / sciblox

sharmaroshan / Numpy-and-Pandas

karamolegkos / EverAnalyzer

MezbanS / Real-Estate.

aenni0409 / DataScienceJob

MJanbandhu / PortugeseBankMarketingProject

UzoigweC / EasyVisa

akhyear / house_price_prediction

TamerDotWork / vesper

digital-mila / WineQuality

DerJator / IntroML

EricaYanoshak / AI-Purchase-Behavior-Project

HariprasadManimozhi / data-preparation

Wolverick1996 / MSc-Thesis

Mohansharmila / Office-furniture-analysis-using-Rstudio

kamecon / TFM_Kschool

vl831227 / EasyVisa

Omaewayoshiekinoroyo / LeafNet_Final

Srking501 / futurelearn_mooc

Improve this page

Add this topic to your repo