Cleanly is an interactive Streamlit-based tool that allows you to upload, clean, and analyze CSV files easily. It provides functionalities for data cleaning, outlier removal, data visualization, and exploratory data analysis (EDA).
👉 Try it now: Launch Cleanly App - No installation required!
- 📂 CSV File Upload: Upload CSV files for processing.
- 🧹 Data Cleaning:
- 🗑️ Remove duplicate rows.
- 🩹 Handle missing values (fill with mean or mode based on data type).
- 🚫 Outlier Removal:
- 📊 Based on Z-Score.
- 📈 Based on Interquartile Range (IQR).
- 📏 Data Normalization: Scale numeric values between 0 and 1.
- 🔢 Categorical Column Encoding: Convert categorical columns into numeric values.
- ✏️ Rename and Delete Columns: Modify or delete specific columns.
- 🔍 Row Filtering and Sorting: Filter rows based on specific values and sort data.
- 📊 Data Visualization:
- 📉 Histograms.
- 📊 Bar charts.
- 🟢 Scatter plots.
- 🔗 Correlation matrix with automated insights.
- 📚 Data Grouping: Group data by columns and apply aggregation functions like mean, sum, count, etc.
- 🔎 Enhanced Data Exploration:
- 📋 Data type analysis with first value examples
- 📊 Distribution of column data types visualization
- 🔍 Automatic outlier detection with statistical measurements
- 📈 Top outlier columns visualization
- 💾 Download Processed Data: Download the cleaned and processed CSV file.
Before running the project, ensure you have the following dependencies installed:
- 🐍 Python 3.7 or higher
- 📦 Streamlit
- 📦 Pandas
- 📦 NumPy
- 📦 Seaborn
- 📦 Matplotlib
Click the "Browse files" button to upload a CSV file from your computer.
View the original data, the number of rows and columns, duplicates, and missing values.
Use the dropdown menu to select actions such as cleaning data, removing outliers, normalizing data, etc.
Generate visualizations such as histograms, bar charts, scatter plots, and the correlation matrix to analyze the data.
Once you have finished processing the data, click the download button to save the cleaned CSV file.
- Descriptive Statistics: Generate summary statistics (count, mean, std, min, max, quartiles) for numeric columns.
- Correlation Matrix: Displays the correlation between numeric columns as a heatmap with automated insights.
- Remove Duplicates: Identifies and removes duplicate rows in the dataset.
- Handle Missing Values:
- For numeric columns, missing values are filled with the mean.
- For categorical columns, missing values are filled with the mode.
- Z-Score: Removes rows where numeric values have a Z-Score greater than a threshold (default: 3).
- IQR (Interquartile Range): Removes rows where numeric values fall outside the IQR range.
- Normalize Data: Scale numeric columns to a range between 0 and 1 for better comparability.
- Encode Categorical Columns: Convert text/categorical columns to numeric values for analysis.
- Rename Columns: Change column names for better clarity or consistency.
- Delete Specific Columns: Remove unnecessary columns from the dataset.
- Filter Rows: Filter data based on specific column values with options for different comparison types.
- Sort Data: Order the dataset by any column in ascending or descending order.
- Histograms: Displays the distribution of numeric values.
- Bar Charts: Shows the frequency of categorical values.
- Scatter Plots: Visualizes the relationship between two numeric columns.
- Correlation Matrix: Displays the correlation between numeric columns as a heatmap.
- Group data by a selected column and apply aggregation functions such as
mean,sum,count,max, ormin.
- Download the processed CSV file with all the applied changes.
- 📂 Upload a raw CSV file.
- 🧹 Select "Clean Data" to remove duplicates and handle missing values.
- 🚫 Use "Remove Outliers (Z-Score)" to eliminate outliers.
- 📊 Generate a histogram to analyze the distribution of a numeric column.
- 💾 Download the cleaned file by clicking "Download Cleaned CSV."
Below is a preview of Cleanly's interface. The application provides an intuitive dashboard where you can upload your CSV files, clean your data, and perform exploratory data analysis with just a few clicks.
If you want to contribute to this project, please follow these steps:
- 🍴 Fork the repository.
- 🌿 Create a new branch for your feature (
git checkout -b feature/new-feature). - 💻 Make your changes and commit them (
git commit -m "Added new feature"). - 📤 Push your branch (
git push origin feature/new-feature). - 🔁 Open a Pull Request.
This project is licensed under the MIT License. See the LICENSE file for more details.

