Skip to content

Commit 829a07b

Browse files
committed
Update LLM.txt
1 parent 2825828 commit 829a07b

File tree

1 file changed

+118
-128
lines changed

1 file changed

+118
-128
lines changed

llm.txt

Lines changed: 118 additions & 128 deletions
Original file line numberDiff line numberDiff line change
@@ -1,143 +1,133 @@
11
# LLM.txt - Repository Status and Documentation
22

33
## Repository Overview
4-
**Repository:** mastering_llms_workshop_dhs2025
5-
**Purpose:** Full Day Workshop on Mastering LLMs
4+
**Repository:** mastering_llms_workshop_dhs2025
5+
**Purpose:** Full Day Workshop on Mastering LLMs: Training, Fine-Tuning, and Best Practices
66
**License:** GNU General Public License v3.0
77

88
## Current Repository Structure
99
```
1010
mastering_llms_workshop_dhs2025/
11-
├── README.md # Basic repository description
11+
├── README.md # Main repository overview and setup
1212
├── LICENSE # GPL v3.0 license
13-
├── .gitignore # Comprehensive Python gitignore
1413
├── copilot-instructions.md # Development guidelines and best practices
15-
└── llm.txt # This documentation file
14+
├── llm.txt # This documentation file
15+
└── docs/
16+
├── assets/ # Images and supporting files
17+
├── module_01_lm_fundamentals/
18+
│ ├── 01_text_representation.ipynb
19+
│ ├── 02_contextual_embeddings.ipynb
20+
│ └── README.md
21+
├── module_02_llm_building_blocks/
22+
│ ├── 01_transformers.ipynb
23+
│ ├── 02_transformers_pipelines.ipynb
24+
│ ├── 03_training_language_models.ipynb
25+
│ ├── 04_llm_training_and_scaling.ipynb
26+
│ └── README.md
27+
├── module_03_instruction_tuning_and_alignment/
28+
│ ├── 01_instruction_tuning_llama_txt2py.ipynb
29+
│ ├── 02_RLHF_phi2.ipynb
30+
│ ├── 03_zephyr_alignment_dpo.ipynb
31+
│ └── README.md
32+
├── module_04_llm_apps/
33+
│ ├── 01_retrieval_augmented_llm_app.ipynb
34+
│ ├── 02_dspy_demo.ipynb
35+
│ ├── 03_mcp_getting_started.ipynb
36+
│ ├── app.py
37+
│ ├── constants.py
38+
│ ├── mcp_chatbot.py
39+
│ ├── openai_mcp_handler.py
40+
│ ├── README.md
41+
│ └── ...
42+
└── ...
1643
```
1744

1845
## Development Setup and Requirements
19-
20-
### Python Requirements
21-
- **Minimum Version:** Python 3.11+
22-
- **Dependency Management:** Poetry (required)
23-
- **Development Environment:** Jupyter Lab (preferred for workshop content)
24-
25-
### Key Technologies and Frameworks
26-
- Poetry for dependency management
27-
- Jupyter Lab for interactive development
28-
- Python 3.11+ for modern language features
29-
30-
### Installation Instructions
31-
Currently, the repository is in initial setup phase. Once workshop modules are added:
32-
33-
1. Ensure Python 3.11+ is installed
34-
2. Install Poetry if not already available:
35-
```bash
36-
curl -sSL https://install.python-poetry.org | python3 -
37-
```
38-
3. Clone the repository:
39-
```bash
40-
git clone https://github.com/raghavbali/mastering_llms_workshop_dhs2025.git
41-
cd mastering_llms_workshop_dhs2025
42-
```
43-
4. Install dependencies (when pyproject.toml is added):
44-
```bash
45-
poetry install
46-
```
47-
5. Activate the virtual environment:
48-
```bash
49-
poetry shell
50-
```
51-
6. Launch Jupyter Lab:
52-
```bash
53-
jupyter lab
54-
```
55-
56-
## Workshop Structure (Planned)
57-
The workshop will follow a modular structure with the following planned components:
58-
59-
### Module Organization
60-
- **01_foundations/**: Introduction to LLMs and basic concepts
61-
- **02_prompt_engineering/**: Prompt design and optimization techniques
62-
- **03_fine_tuning/**: Model customization and training
63-
- **04_deployment/**: Production deployment considerations
64-
- **utils/**: Shared utilities and helper functions
65-
66-
### File Naming Conventions
67-
- Jupyter notebooks: `##_descriptive_name.ipynb`
68-
- Python modules: `snake_case_naming.py`
69-
- Documentation: `descriptive-name.md`
70-
71-
## Current Status
72-
73-
### Completed Items
74-
- [x] Repository initialization
75-
- [x] License setup (GPL v3.0)
76-
- [x] Comprehensive .gitignore for Python projects
77-
- [x] Basic README.md structure
78-
- [x] Copilot instructions for development guidelines
79-
- [x] Initial llm.txt documentation
80-
81-
### Pending Items
82-
- [ ] Poetry project configuration (pyproject.toml)
83-
- [ ] Workshop module structure creation
84-
- [ ] Jupyter notebook templates
85-
- [ ] Core utility functions
86-
- [ ] Requirements and dependencies definition
87-
- [ ] Detailed README.md with workshop overview
88-
- [ ] Sample datasets and resources
89-
- [ ] Testing framework setup
90-
91-
## Documentation Guidelines
92-
93-
### Maintenance Requirements
94-
- Update this llm.txt file with each significant change
95-
- Keep README.md synchronized with workshop structure
96-
- Document all new dependencies and their purposes
97-
- Include installation and setup instructions
98-
- Track workshop progression and learning objectives
99-
100-
### Content Standards
101-
- Use clear, educational language appropriate for workshop participants
102-
- Include practical examples and hands-on exercises
103-
- Provide both theoretical background and implementation details
104-
- Support multiple skill levels and learning paths
105-
106-
## Resource Requirements
107-
108-
### Computational Needs
109-
- Standard laptop/desktop for basic workshop modules
110-
- GPU access recommended for fine-tuning exercises
111-
- Cloud alternatives for resource-intensive tasks
112-
- Internet connection for accessing pre-trained models
113-
114-
### Software Dependencies
115-
- Python 3.11+ runtime
116-
- Poetry package manager
117-
- Jupyter Lab environment
118-
- Git for version control
119-
- Additional ML/AI libraries (to be specified)
120-
121-
## Contributing Guidelines
122-
123-
### Development Workflow
124-
1. Follow the guidelines in copilot-instructions.md
125-
2. Use Poetry for all dependency management
126-
3. Create meaningful, descriptive filenames
127-
4. Update documentation with code changes
128-
5. Test workshop materials before submitting
129-
6. Keep commits atomic and well-documented
130-
131-
### Quality Standards
132-
- Ensure all code works with Python 3.11+
133-
- Use type hints and proper documentation
134-
- Include practical examples in all workshop modules
135-
- Validate end-to-end workshop experience
136-
- Maintain consistency across all materials
137-
138-
## Contact and Support
139-
For questions about this workshop or repository structure, please refer to the main repository documentation or open an issue on GitHub.
46+
- **Python Version:** 3.11 or above (required for all code and notebooks)
47+
- **Preferred Environment:** Jupyter Lab (for interactive development, tutorials, and prototyping)
48+
- **Dependency Management:** Poetry (planned), currently install dependencies as needed per notebook/module
49+
50+
## Key Technologies and Frameworks
51+
- Python 3.11+
52+
- Jupyter Lab
53+
- PyTorch, HuggingFace Transformers, Datasets, Tokenizers
54+
- LangChain, DSpy, ChromaDB, Model Context Protocol (MCP)
55+
- Additional libraries as specified in individual notebooks
56+
57+
## Module-wise Overview
58+
59+
### Module 01: Language Model Fundamentals
60+
- **Notebooks:**
61+
- 01_text_representation.ipynb: Covers tokenization, vectorization, and word embeddings (Word2Vec, FastText)
62+
- 02_contextual_embeddings.ipynb: Contextual embeddings using transformer models (BERT, MiniLM)
63+
- **Learning Objectives:**
64+
- Understand text representation and embedding techniques
65+
- Compare static and contextual embeddings
66+
- Visualize and apply embeddings in NLP tasks
67+
68+
### Module 02: LLM Building Blocks
69+
- **Notebooks:**
70+
- 01_transformers.ipynb: Transformer architecture and self-attention
71+
- 02_transformers_pipelines.ipynb: HuggingFace pipelines for NLP tasks
72+
- 03_training_language_models.ipynb: Fine-tuning and training LLMs
73+
- 04_llm_training_and_scaling.ipynb: Scaling, optimization, and distributed training
74+
- **Model Card:** codeparrot-ds/README.md (fine-tuned GPT-2)
75+
- **Learning Objectives:**
76+
- Master transformer internals and practical pipelines
77+
- Train and fine-tune LLMs
78+
- Understand scaling and optimization strategies
79+
80+
### Module 03: Instruction Tuning and Alignment
81+
- **Notebooks:**
82+
- 01_instruction_tuning_llama_txt2py.ipynb: Supervised instruction tuning
83+
- 02_RLHF_phi2.ipynb: RLHF with Phi-2
84+
- 03_zephyr_alignment_dpo.ipynb: DPO and advanced alignment
85+
- **Learning Objectives:**
86+
- Instruction tuning for LLMs
87+
- RLHF and preference optimization
88+
- Direct Preference Optimization (DPO)
89+
90+
### Module 04: LLM Applications
91+
- **Notebooks & Scripts:**
92+
- 01_retrieval_augmented_llm_app.ipynb: Retrieval-Augmented Generation (RAG)
93+
- 02_dspy_demo.ipynb: DSpy for prompt engineering
94+
- 03_mcp_getting_started.ipynb: Model Context Protocol (MCP)
95+
- app.py, mcp_chatbot.py, openai_mcp_handler.py, etc.: Application code
96+
- **Learning Objectives:**
97+
- Build real-world LLM applications (RAG, LangChain, DSpy, MCP)
98+
- Tool/function calling and workflow automation
99+
100+
## Documentation and Maintenance Standards
101+
- All modules and notebooks are documented with clear markdown cells and outputs
102+
- Each module contains a README.md with:
103+
- Title, description, and table of contents (with links to notebooks)
104+
- Learning objectives and summary
105+
- Attribution and timestamp
106+
- Main README.md provides workshop overview, setup, and navigation
107+
- **Keep llm.txt and all READMEs updated with each significant change**
108+
- Follow copilot-instructions.md for best practices, naming, and structure
109+
110+
## Code Quality and Testing
111+
- All code uses Python 3.11+ features and type hints optionally present
112+
- Notebooks are tested for end-to-end execution
113+
114+
## Resource and Accessibility Guidelines
115+
- Designed for low-resource and cloud environments
116+
- GPU recommended for training/fine-tuning, but alternatives provided
117+
- Content accessible for both beginners and advanced users
118+
- Multiple learning paths and practical exercises included
119+
120+
## Contribution and Version Control
121+
- Follow atomic commits and conventional commit messages
122+
- Update documentation and llm.txt with code changes
123+
- Test all workshop materials before merging
124+
- Keep style and approach consistent across modules
125+
126+
## Learning Resources and References
127+
- See module READMEs and notebooks for links to papers, docs, and external resources
128+
- Workshop website: https://raghavbali.github.io/mastering_llms_workshop/
140129

141130
---
142-
**Last Updated:** Initial creation
143-
**Next Review:** Upon addition of first workshop module
131+
**Last Updated:** August 10, 2025
132+
**Generated by:** GitHub Copilot (LLM)
133+
---

0 commit comments

Comments
 (0)