Open source project for data preparation for GenAI applications
-
Updated
Mar 13, 2026 - HTML
Open source project for data preparation for GenAI applications
sciblox - Easier Data Science and Machine Learning
Numpy and Pandas are one of the most important building blocks of knowledge to get started in the field of Data Science, Analytics, Machine Learning, Business Intelligence, and Business Analytics. This Tutorial Focuses to help the Beginners to learn the core Concepts of Numpy and Pandas and get started with Machine Learning and Data Science.
EverAnalyzer is my thesis in the Department of Digital Systems of the University of Piraeus. EverAnalyzer is a platform for collecting, preprocessing, processing and analyzing Big Data from the Twitter platform.
This project creates a statistical model to predict demand for loans in each region of the USA based on monthly family income and rental costs. The results are displayed on a dashboard updated periodically with data retrieval.
My side project about Data Scientist
The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe to a term deposit (variable y).
Model for easy facilitation of visa processing and approvals
This project explores a Kaggle dataset to build predictive models for house prices. Includes data cleaning, EDA, feature engineering and model training using ensemble methods.
Agentic data intelligence tool using LangChain & Pandas for automated dataset cleaning, governance, and quality analysis.
Final project for DSCI 100: Developed a KNN classification model in R to predict wine quality using physicochemical properties. Conducted data preprocessing, feature selection, and cross-validation to evaluate model performance.
Exercises for the lecture "Introduction to ML". Focus is on data preprocessing and simple machine learning algorithms.
This project focuses on predicting customer purchase behavior using machine learning models, with an emphasis on feature importance.
In this two cluster approaches are used: hierarchical clustering and K-means clustering. It is unsupervised learning technique for grouping related data points which shows same behaviour in the dataset regardless of the outcome.
Trabajo de Fin de Máster KSchool
Ensemble Techniques: Analyze the data of Visa applicants, build a predictive model to facilitate the process of visa approvals, and based on important factors that significantly influence the Visa status recommend a suitable profile for the applicants for whom the visa should be certified or denied.
Ini merupakan repositori proyek akhir untuk aplikasi LeafNet yang di buat oleh Tim Ampera dari kelas Asimo Kelompok 3 pada program Artificial Intelligence Mastery Program yang di selenggarakan oleh Orbit Future Academy
A summative coursework for CSC8631 Data Management and Exploratory Data Analysis
Add a description, image, and links to the data-preprocessing topic page so that developers can more easily learn about it.
To associate your repository with the data-preprocessing topic, visit your repo's landing page and select "manage topics."