Skip to content

yandexdataschool/nlp_course

Repository files navigation

YSDA Natural Language Processing course

  • This is the 2025 iteration of the course, materials are added as we prepare them.
  • Lecture and seminar materials for each week are in ./week* folders, see README.md for materials and instructions
  • Any technical issues, ideas, bugs in course materials, contribution ideas - add an issue
  • Installing libraries and troubleshooting: this thread.

Syllabus

  • week01 Word Embeddings

    • Lecture: Word embeddings. Distributional semantics. Count-based (pre-neural) methods. Word2Vec: learn vectors. GloVe: count, then learn. Evaluation: intrinsic vs extrinsic. Analysis and Interpretability. Interactive lecture materials and more.
    • Seminar: Playing with word and sentence embeddings
    • Homework: Embedding-based machine translation system
  • week02 Language Modeling

    • Lecture: Language Modeling: what does it mean? Left-to-right framework. N-gram language models. Neural Language Models: General View, Recurrent Models, Convolutional Models. Evaluation. Practical Tips: Weight Tying. Analysis and Interpretability. Interactive lecture materials and more.
    • Seminar: Build a N-gram language model from scratch
    • Homework: Neural LMs & smoothing in count-based models.
  • week03 Seq2seq and Attention

    • Lecture: Seq2seq Basics: Encoder-Decoder framework, Training, Simple Models, Inference (e.g., beam search). Attention: general, score functions, models. Transformer: self-attention, masked self-attention, multi-head attention; model architecture. Subword Segmentation (BPE). Analysis and Interpretability: functions of attention heads; probing for linguistic structure. Interactive lecture materials and more.
    • Seminar: Basic sequence to sequence model
    • Homework: Machine translation with attention
  • week04 Transfer Learning

    • Lecture: What is Transfer Learning? Great idea 1: From Words to Words-in-Context (CoVe, ELMo). Great idea 2: From Replacing Embeddings to Replacing Models (GPT, BERT). (A Bit of) Adaptors. Analysis and Interpretability. Interactive lecture materials and more.
    • Homework: fine-tuning a pre-trained BERT model
  • week05 Large Language Models

    • Lecture: Scaling laws. Emergent abilities. Open-source LLMs.
    • Practice: hands-on with open-source LLMs
  • week06 Prompting & In-Context Learning

    • Lecture: Prompting techniques. Chain-of-Thought reasoning. In-context learning: how and why it works. Analysis and Interpretability.
    • Homework: manual prompt engineering and chain-of-thought reasoning
  • week07 Fine-tuning (PEFT & RLHF)

    • Lecture: Parameter-efficient fine-tuning (LoRA, adapters). Reinforcement Learning from Human Feedback (RLHF).
    • Seminar + Homework
  • week08 Efficiency

    • Lecture: Quantization. Distillation. Pruning. Speculative decoding.
    • Homework
  • week09 Retrieval-Augmented Generation (RAG)

    • Lecture: Dense retrieval. RAG architectures.
    • Practice
  • week10 AI Agents

    • Lecture: Agent architectures. Tool use. Memory.
    • Seminar + Homework
  • week11 Interpretability

    • Lecture: Probing. Mechanistic interpretability.
    • Seminar + Homework
  • week12 Multimodal LLMs

  • week13 Building LLM Systems

  • week14 AI Agents in Production

Contributors & course staff

Course materials and teaching performed by

About

YSDA course in Natural Language Processing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages