Dodd-Frank Act Summarization Project

This project implements multiple advanced text summarization methods to generate comprehensive summaries of the Dodd-Frank Wall Street Reform and Consumer Protection Act. The project is designed for senior analysts and experts who require high-quality, technical summaries with minimal loss of context and detail.

🎯 Project Goals

Generate high-quality summaries with minimal loss of context and detail
Ensure conciseness while preserving maximum detail
Maintain readability and natural flow
Utilize technical language appropriate for senior analysts

📋 Features

Summarization Methods Implemented

Hierarchical Summarization - Structures summaries by titles and sections, maintaining the original document organization
Hybrid Extractive-Abstractive Summarization - Combines relevance ranking with abstractive summarization for efficiency
Agglomerative Clustering Summarization - Uses clustering techniques to group related content before summarization

Evaluation Metrics

Readability Assessment: Flesch Reading Ease, Flesch-Kincaid Grade Level, Gunning Fog Index
Technical Complexity: Weighted scoring based on financial and legal terminology
Coverage Analysis: BERTScore evaluation for precision, recall, and F1 scores
Conciseness Measurement: Word count analysis

🚀 Quick Start

Prerequisites

Python 3.8 or higher
Azure OpenAI API access
Git

Installation

Clone the repository

git clone <your-repo-url>
cd summarizing_dodd_frank_project

Create a virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```

Set up environment variables

cp .env.template .env
# Edit .env with your Azure OpenAI credentials

Download the Dodd-Frank Act PDF
- Place the PDF file in the data/ directory as DODD_FRANK.pdf
- You can download it from: Congress.gov

Running the Project

Open the Jupyter notebook

jupyter notebook "summarize_dodd (1).ipynb"

Execute cells in order to:
- Clean and preprocess the PDF text
- Run different summarization methods
- Evaluate summary quality
- Generate comparison visualizations

📁 Project Structure

summarizing_dodd_frank_project/
├── data/
│   └── DODD_FRANK.pdf          # Source PDF document
├── generated_summary_examples/  # Example outputs from different methods
├── .env.template               # Environment variables template
├── .gitignore                  # Git ignore rules
├── requirements.txt           # Python dependencies
├── README.md                  # This file
└── summarize_dodd (1).ipynb  # Main analysis notebook

🔧 Configuration

Environment Variables

Create a .env file with the following variables:

AZURE_OPENAI_API_KEY=your_api_key_here
AZURE_OPENAI_ENDPOINT=https://your-resource-name.openai.azure.com/
AZURE_OPENAI_API_VERSION=2024-02-15-preview
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME=chat

Customization Options

Summary Length: Adjust top_n parameter in hybrid method
Clustering: Modify num_clusters in agglomerative clustering
Technical Terms: Update the technical_terms dictionary for complexity scoring

📊 Results and Evaluation

The project includes comprehensive evaluation metrics comparing different summarization approaches:

Hierarchical Method: Best structure preservation, highest detail retention
Hybrid Method: Most efficient, good balance of speed and quality
Agglomerative Clustering: Good thematic organization, moderate efficiency

🔒 Security Notes

Never commit API keys to version control
Use environment variables or .env files for sensitive configuration
The .gitignore file excludes sensitive files and temporary outputs
Dodd-Frank Act source: Congress.gov
LangChain framework for LLM integration
Azure OpenAI for language model access
Various Python libraries for NLP and machine learning

📞 Support

For questions or issues, please open an issue in the GitHub repository.

Note: This project is designed for educational and research purposes. Ensure compliance with all applicable terms of service when using external APIs and data sources.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dodd-Frank Act Summarization Project

🎯 Project Goals

📋 Features

Summarization Methods Implemented

Evaluation Metrics

🚀 Quick Start

Prerequisites

Installation

Running the Project

📁 Project Structure

🔧 Configuration

Environment Variables

Customization Options

📊 Results and Evaluation

🔒 Security Notes

📞 Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
summarize_dodd (1).ipynb		summarize_dodd (1).ipynb

Folders and files

Latest commit

History

Repository files navigation

Dodd-Frank Act Summarization Project

🎯 Project Goals

📋 Features

Summarization Methods Implemented

Evaluation Metrics

🚀 Quick Start

Prerequisites

Installation

Running the Project

📁 Project Structure

🔧 Configuration

Environment Variables

Customization Options

📊 Results and Evaluation

🔒 Security Notes

📞 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages