This repository contains tools and techniques for text summarization. It leverages advanced natural language processing (NLP) methods to generate concise and meaningful summaries of textual content. The project is designed for researchers, developers, and enthusiasts looking to experiment with or implement text summarization models.
The dataset used for training and evaluation is located in the data directory.
To set up the project environment, follow these steps:
-
Clone the repository:
git clone https://github.com/farukaplan/Text-Summarization cd Text-Summarization -
Create a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install the required packages:
pip install -r requirements.txt
The main implementation is provided in the Jupyter Notebook text_summarization.ipynb. To run the notebook:
-
Ensure Jupyter is installed:
pip install jupyter
-
Launch Jupyter Notebook:
jupyter notebook
-
Open and execute the
text_summarization.ipynbnotebook to train and evaluate the model. -
Beware, since we developed this project on Google Colab, file path to dataset may vary, you may need to adjust it accordingly
The model's performance is evaluated using ROUGE metric. Detailed results, including training and validation accuracy and loss curves, are documented in the text_summarization_report.pdf file.
Contributions to enhance the project are welcome. To contribute:
- Fork the repository.
- Create a new branch.
- Make your changes.
- Submit a pull request.
This project is developed by Faruk Kaplan and Mert Altekin
If you prefer to watch video to understand the code, you can visit following YouTube video: https://youtu.be/o95-X_zDRkU