|
1 | 1 | # LLM.txt - Repository Status and Documentation |
2 | 2 |
|
3 | 3 | ## Repository Overview |
4 | | -**Repository:** mastering_llms_workshop_dhs2025 |
5 | | -**Purpose:** Full Day Workshop on Mastering LLMs |
| 4 | +**Repository:** mastering_llms_workshop_dhs2025 |
| 5 | +**Purpose:** Full Day Workshop on Mastering LLMs: Training, Fine-Tuning, and Best Practices |
6 | 6 | **License:** GNU General Public License v3.0 |
7 | 7 |
|
8 | 8 | ## Current Repository Structure |
9 | 9 | ``` |
10 | 10 | mastering_llms_workshop_dhs2025/ |
11 | | -├── README.md # Basic repository description |
| 11 | +├── README.md # Main repository overview and setup |
12 | 12 | ├── LICENSE # GPL v3.0 license |
13 | | -├── .gitignore # Comprehensive Python gitignore |
14 | 13 | ├── copilot-instructions.md # Development guidelines and best practices |
15 | | -└── llm.txt # This documentation file |
| 14 | +├── llm.txt # This documentation file |
| 15 | +└── docs/ |
| 16 | + ├── assets/ # Images and supporting files |
| 17 | + ├── module_01_lm_fundamentals/ |
| 18 | + │ ├── 01_text_representation.ipynb |
| 19 | + │ ├── 02_contextual_embeddings.ipynb |
| 20 | + │ └── README.md |
| 21 | + ├── module_02_llm_building_blocks/ |
| 22 | + │ ├── 01_transformers.ipynb |
| 23 | + │ ├── 02_transformers_pipelines.ipynb |
| 24 | + │ ├── 03_training_language_models.ipynb |
| 25 | + │ ├── 04_llm_training_and_scaling.ipynb |
| 26 | + │ └── README.md |
| 27 | + ├── module_03_instruction_tuning_and_alignment/ |
| 28 | + │ ├── 01_instruction_tuning_llama_txt2py.ipynb |
| 29 | + │ ├── 02_RLHF_phi2.ipynb |
| 30 | + │ ├── 03_zephyr_alignment_dpo.ipynb |
| 31 | + │ └── README.md |
| 32 | + ├── module_04_llm_apps/ |
| 33 | + │ ├── 01_retrieval_augmented_llm_app.ipynb |
| 34 | + │ ├── 02_dspy_demo.ipynb |
| 35 | + │ ├── 03_mcp_getting_started.ipynb |
| 36 | + │ ├── app.py |
| 37 | + │ ├── constants.py |
| 38 | + │ ├── mcp_chatbot.py |
| 39 | + │ ├── openai_mcp_handler.py |
| 40 | + │ ├── README.md |
| 41 | + │ └── ... |
| 42 | + └── ... |
16 | 43 | ``` |
17 | 44 |
|
18 | 45 | ## Development Setup and Requirements |
19 | | - |
20 | | -### Python Requirements |
21 | | -- **Minimum Version:** Python 3.11+ |
22 | | -- **Dependency Management:** Poetry (required) |
23 | | -- **Development Environment:** Jupyter Lab (preferred for workshop content) |
24 | | - |
25 | | -### Key Technologies and Frameworks |
26 | | -- Poetry for dependency management |
27 | | -- Jupyter Lab for interactive development |
28 | | -- Python 3.11+ for modern language features |
29 | | - |
30 | | -### Installation Instructions |
31 | | -Currently, the repository is in initial setup phase. Once workshop modules are added: |
32 | | - |
33 | | -1. Ensure Python 3.11+ is installed |
34 | | -2. Install Poetry if not already available: |
35 | | - ```bash |
36 | | - curl -sSL https://install.python-poetry.org | python3 - |
37 | | - ``` |
38 | | -3. Clone the repository: |
39 | | - ```bash |
40 | | - git clone https://github.com/raghavbali/mastering_llms_workshop_dhs2025.git |
41 | | - cd mastering_llms_workshop_dhs2025 |
42 | | - ``` |
43 | | -4. Install dependencies (when pyproject.toml is added): |
44 | | - ```bash |
45 | | - poetry install |
46 | | - ``` |
47 | | -5. Activate the virtual environment: |
48 | | - ```bash |
49 | | - poetry shell |
50 | | - ``` |
51 | | -6. Launch Jupyter Lab: |
52 | | - ```bash |
53 | | - jupyter lab |
54 | | - ``` |
55 | | - |
56 | | -## Workshop Structure (Planned) |
57 | | -The workshop will follow a modular structure with the following planned components: |
58 | | - |
59 | | -### Module Organization |
60 | | -- **01_foundations/**: Introduction to LLMs and basic concepts |
61 | | -- **02_prompt_engineering/**: Prompt design and optimization techniques |
62 | | -- **03_fine_tuning/**: Model customization and training |
63 | | -- **04_deployment/**: Production deployment considerations |
64 | | -- **utils/**: Shared utilities and helper functions |
65 | | - |
66 | | -### File Naming Conventions |
67 | | -- Jupyter notebooks: `##_descriptive_name.ipynb` |
68 | | -- Python modules: `snake_case_naming.py` |
69 | | -- Documentation: `descriptive-name.md` |
70 | | - |
71 | | -## Current Status |
72 | | - |
73 | | -### Completed Items |
74 | | -- [x] Repository initialization |
75 | | -- [x] License setup (GPL v3.0) |
76 | | -- [x] Comprehensive .gitignore for Python projects |
77 | | -- [x] Basic README.md structure |
78 | | -- [x] Copilot instructions for development guidelines |
79 | | -- [x] Initial llm.txt documentation |
80 | | - |
81 | | -### Pending Items |
82 | | -- [ ] Poetry project configuration (pyproject.toml) |
83 | | -- [ ] Workshop module structure creation |
84 | | -- [ ] Jupyter notebook templates |
85 | | -- [ ] Core utility functions |
86 | | -- [ ] Requirements and dependencies definition |
87 | | -- [ ] Detailed README.md with workshop overview |
88 | | -- [ ] Sample datasets and resources |
89 | | -- [ ] Testing framework setup |
90 | | - |
91 | | -## Documentation Guidelines |
92 | | - |
93 | | -### Maintenance Requirements |
94 | | -- Update this llm.txt file with each significant change |
95 | | -- Keep README.md synchronized with workshop structure |
96 | | -- Document all new dependencies and their purposes |
97 | | -- Include installation and setup instructions |
98 | | -- Track workshop progression and learning objectives |
99 | | - |
100 | | -### Content Standards |
101 | | -- Use clear, educational language appropriate for workshop participants |
102 | | -- Include practical examples and hands-on exercises |
103 | | -- Provide both theoretical background and implementation details |
104 | | -- Support multiple skill levels and learning paths |
105 | | - |
106 | | -## Resource Requirements |
107 | | - |
108 | | -### Computational Needs |
109 | | -- Standard laptop/desktop for basic workshop modules |
110 | | -- GPU access recommended for fine-tuning exercises |
111 | | -- Cloud alternatives for resource-intensive tasks |
112 | | -- Internet connection for accessing pre-trained models |
113 | | - |
114 | | -### Software Dependencies |
115 | | -- Python 3.11+ runtime |
116 | | -- Poetry package manager |
117 | | -- Jupyter Lab environment |
118 | | -- Git for version control |
119 | | -- Additional ML/AI libraries (to be specified) |
120 | | - |
121 | | -## Contributing Guidelines |
122 | | - |
123 | | -### Development Workflow |
124 | | -1. Follow the guidelines in copilot-instructions.md |
125 | | -2. Use Poetry for all dependency management |
126 | | -3. Create meaningful, descriptive filenames |
127 | | -4. Update documentation with code changes |
128 | | -5. Test workshop materials before submitting |
129 | | -6. Keep commits atomic and well-documented |
130 | | - |
131 | | -### Quality Standards |
132 | | -- Ensure all code works with Python 3.11+ |
133 | | -- Use type hints and proper documentation |
134 | | -- Include practical examples in all workshop modules |
135 | | -- Validate end-to-end workshop experience |
136 | | -- Maintain consistency across all materials |
137 | | - |
138 | | -## Contact and Support |
139 | | -For questions about this workshop or repository structure, please refer to the main repository documentation or open an issue on GitHub. |
| 46 | +- **Python Version:** 3.11 or above (required for all code and notebooks) |
| 47 | +- **Preferred Environment:** Jupyter Lab (for interactive development, tutorials, and prototyping) |
| 48 | +- **Dependency Management:** Poetry (planned), currently install dependencies as needed per notebook/module |
| 49 | + |
| 50 | +## Key Technologies and Frameworks |
| 51 | +- Python 3.11+ |
| 52 | +- Jupyter Lab |
| 53 | +- PyTorch, HuggingFace Transformers, Datasets, Tokenizers |
| 54 | +- LangChain, DSpy, ChromaDB, Model Context Protocol (MCP) |
| 55 | +- Additional libraries as specified in individual notebooks |
| 56 | + |
| 57 | +## Module-wise Overview |
| 58 | + |
| 59 | +### Module 01: Language Model Fundamentals |
| 60 | +- **Notebooks:** |
| 61 | + - 01_text_representation.ipynb: Covers tokenization, vectorization, and word embeddings (Word2Vec, FastText) |
| 62 | + - 02_contextual_embeddings.ipynb: Contextual embeddings using transformer models (BERT, MiniLM) |
| 63 | +- **Learning Objectives:** |
| 64 | + - Understand text representation and embedding techniques |
| 65 | + - Compare static and contextual embeddings |
| 66 | + - Visualize and apply embeddings in NLP tasks |
| 67 | + |
| 68 | +### Module 02: LLM Building Blocks |
| 69 | +- **Notebooks:** |
| 70 | + - 01_transformers.ipynb: Transformer architecture and self-attention |
| 71 | + - 02_transformers_pipelines.ipynb: HuggingFace pipelines for NLP tasks |
| 72 | + - 03_training_language_models.ipynb: Fine-tuning and training LLMs |
| 73 | + - 04_llm_training_and_scaling.ipynb: Scaling, optimization, and distributed training |
| 74 | +- **Model Card:** codeparrot-ds/README.md (fine-tuned GPT-2) |
| 75 | +- **Learning Objectives:** |
| 76 | + - Master transformer internals and practical pipelines |
| 77 | + - Train and fine-tune LLMs |
| 78 | + - Understand scaling and optimization strategies |
| 79 | + |
| 80 | +### Module 03: Instruction Tuning and Alignment |
| 81 | +- **Notebooks:** |
| 82 | + - 01_instruction_tuning_llama_txt2py.ipynb: Supervised instruction tuning |
| 83 | + - 02_RLHF_phi2.ipynb: RLHF with Phi-2 |
| 84 | + - 03_zephyr_alignment_dpo.ipynb: DPO and advanced alignment |
| 85 | +- **Learning Objectives:** |
| 86 | + - Instruction tuning for LLMs |
| 87 | + - RLHF and preference optimization |
| 88 | + - Direct Preference Optimization (DPO) |
| 89 | + |
| 90 | +### Module 04: LLM Applications |
| 91 | +- **Notebooks & Scripts:** |
| 92 | + - 01_retrieval_augmented_llm_app.ipynb: Retrieval-Augmented Generation (RAG) |
| 93 | + - 02_dspy_demo.ipynb: DSpy for prompt engineering |
| 94 | + - 03_mcp_getting_started.ipynb: Model Context Protocol (MCP) |
| 95 | + - app.py, mcp_chatbot.py, openai_mcp_handler.py, etc.: Application code |
| 96 | +- **Learning Objectives:** |
| 97 | + - Build real-world LLM applications (RAG, LangChain, DSpy, MCP) |
| 98 | + - Tool/function calling and workflow automation |
| 99 | + |
| 100 | +## Documentation and Maintenance Standards |
| 101 | +- All modules and notebooks are documented with clear markdown cells and outputs |
| 102 | +- Each module contains a README.md with: |
| 103 | + - Title, description, and table of contents (with links to notebooks) |
| 104 | + - Learning objectives and summary |
| 105 | + - Attribution and timestamp |
| 106 | +- Main README.md provides workshop overview, setup, and navigation |
| 107 | +- **Keep llm.txt and all READMEs updated with each significant change** |
| 108 | +- Follow copilot-instructions.md for best practices, naming, and structure |
| 109 | + |
| 110 | +## Code Quality and Testing |
| 111 | +- All code uses Python 3.11+ features and type hints optionally present |
| 112 | +- Notebooks are tested for end-to-end execution |
| 113 | + |
| 114 | +## Resource and Accessibility Guidelines |
| 115 | +- Designed for low-resource and cloud environments |
| 116 | +- GPU recommended for training/fine-tuning, but alternatives provided |
| 117 | +- Content accessible for both beginners and advanced users |
| 118 | +- Multiple learning paths and practical exercises included |
| 119 | + |
| 120 | +## Contribution and Version Control |
| 121 | +- Follow atomic commits and conventional commit messages |
| 122 | +- Update documentation and llm.txt with code changes |
| 123 | +- Test all workshop materials before merging |
| 124 | +- Keep style and approach consistent across modules |
| 125 | + |
| 126 | +## Learning Resources and References |
| 127 | +- See module READMEs and notebooks for links to papers, docs, and external resources |
| 128 | +- Workshop website: https://raghavbali.github.io/mastering_llms_workshop/ |
140 | 129 |
|
141 | 130 | --- |
142 | | -**Last Updated:** Initial creation |
143 | | -**Next Review:** Upon addition of first workshop module |
| 131 | +**Last Updated:** August 10, 2025 |
| 132 | +**Generated by:** GitHub Copilot (LLM) |
| 133 | +--- |
0 commit comments