The project is a data-driven web application designed to assist users in exploring rental property options based on their preferences. It integrates data filtering, scoring, machine learning predictions, and recommendation systems to create a comprehensive solution for rental property searches. Users can filter apartments by attributes such as the number of bedrooms, bathrooms, and price range. They can also assign weights to these criteria to prioritize results.
This application is valuable for potential renters, real estate agents, and property managers, offering a streamlined platform to evaluate properties based on personalized preferences.
The application uses a dataset of rental properties (10k_data.pickle) containing attributes such as:
- Property details: Title, description, and address.
- Attributes: Number of bedrooms, bathrooms, square footage, price, and amenities.
- Location data: City, state, latitude, and longitude.
- Additional information: Pet policies, photo availability, and source.
The dataset was preprocessed to clean and validate entries:
- Removed duplicate and invalid data.
- Standardized columns (e.g., handling missing values and creating additional columns like
half_bathrooms). - Ensured proper formatting of location fields.
The application is driven by multiple algorithms:
- Data Filtering:
- Filters apartments based on user-selected criteria, such as state, price range, and ratings for bathrooms and bedrooms.
- Scoring System:
- Scores apartments based on weighted criteria using a custom scoring algorithm (
ScoreDistribution). - Sorts properties based on their relevance to user preferences.
- Scores apartments based on weighted criteria using a custom scoring algorithm (
- Machine Learning Prediction:
- A Random Forest Regressor predicts rental prices based on attributes such as square footage, number of bedrooms and bathrooms, and city.
- Recommendations:
- Utilizes PCA and pairwise distance calculations to find similar apartments based on top-ranked properties.
- Search Functionality:
- Uses text-based search (
Simple_Search) to match user queries with apartment details, leveraging cosine similarity for relevance scoring.
- Uses text-based search (
- Streamlit:
- Builds the web interface for user interaction.
- Displays apartment details, visualizations, and predictions.
- Backblaze:
- Stores and retrieves large data files and machine learning models.
- Pandas:
- Handles data manipulation, cleaning, and transformation.
- Scikit-learn:
- Implements machine learning algorithms, including Random Forest and PCA.
- Numpy:
- Supports mathematical and numerical operations, especially in scoring and pairwise calculations.
- Pickle:
- Serializes and deserializes the cleaned dataset and trained machine learning models for faster runtime performance.
- CountVectorizer:
- Transforms text data into numerical vectors for text-based search.
- Cosine Similarity:
- Evaluates the similarity between user queries and property descriptions.
- Data Integrity:
- The dataset must be free of bias or inaccuracies to ensure fair and accurate property evaluations.
- Cleaning and validation processes mitigate potential data quality issues.
- Privacy:
- The dataset does not contain personally identifiable information (PII).
- Accessibility:
- The interface is designed to be intuitive and accessible to a wide range of users.
- Prediction Limitations:
- Predictions and recommendations are based on historical data and may not reflect future market trends. Clear disclaimers are provided.
- Bias in Algorithms:
- Machine learning models may inherit biases present in the dataset. Continuous evaluation and retraining can mitigate this risk.
- Renters:
- Users looking for apartments that meet their budget and preferences.
- Real Estate Agents:
- Professionals seeking to streamline property recommendations for clients.
- Property Managers:
- Use the platform to identify market trends and compare property prices.
- Researchers:
- Analyze rental trends and user preferences.
- Filtered Search:
- Real-time filtering based on user preferences.
- Custom Scoring:
- Allows users to rank properties based on their unique priorities.
- Price Prediction:
- Machine learning model estimates rental prices for listings.
- Recommendations:
- Provides suggestions for similar apartments based on user-selected top properties.
- Interactive Visualizations:
- Includes scatter plots for geographical distribution and box plots for price vs. amenities.