Add chapters to video transcripts#200
Add chapters to video transcripts#200nicolevanderhoeven wants to merge 29 commits intoopen-telemetry:mainfrom
Conversation
- Add create_chapters() function to generate YouTube-style timestamps using OpenAI - Integrate chapters section into markdown output between summary and transcript - Add comprehensive rate limiting to avoid YouTube API quota issues - Implement get_video_transcript_with_retry() with exponential backoff - Add robust error handling for quota exceeded and API failures - Improve transcript validation and filtering ([Music], [Applause], etc) - Fix Japanese language code from 'jp' to 'ja' - Increase batch size from 10 to 50 for better efficiency - Add progress indicators and better logging throughout
- Document new chapters/timestamps generation feature - Add comprehensive usage examples with command-line options - Document output format and file structure - Add Features & Reliability section covering rate limiting and error handling - Clarify OpenAI API key requirements and AI-enhanced features - Document multi-language transcript support
Update pydantic_core from 2.39.0 to 2.33.2 to match the version required by pydantic 2.11.9, resolving pip installation error.
Major changes: - Upgrade youtube-transcript-api from 0.6.3 to 1.2.3 (fixes empty response issue) - Update code to use new 1.x API (YouTubeTranscriptApi().fetch()) - Remove deprecated fallback to old static methods - Enhance error handling for 429 rate limits with 60-120s delays - Detect XML parse errors as potential rate limiting - Increase max retries from 3 to 5 attempts Root cause: YouTube changed their API and version 0.6.3 was returning empty responses (not rate limiting). The library can now successfully fetch transcripts and generate chapters.
Reduced unnecessary delays now that transcript API is fixed: - Remove initial 5-15s startup delay - Reduce pagination delays from 3-8s/5-15s to 1-3s - Reduce inter-video delays from 10-30s to 2-5s - Reduce API call separation from 10-20s to 2-4s Keep essential protections: - YouTube Data API quota error handling (60s retry) - 429 rate limit detection and handling (60-120s retry) - XML parse error detection - Small delays to avoid API hammering Result: ~3-5x faster processing while maintaining API safety.
- Detect IP block errors specifically (vs rate limiting) - Stop retries immediately when IP is blocked (no point retrying) - Add comprehensive troubleshooting section to README - Provide clear workaround options for users - Import TooManyRequests exception for better error handling IP blocks are different from rate limits and require different solutions like waiting 24-48 hours, switching networks, or using cookie auth.
|
@avillela , @danielgblanco , and @reese-lee , would you take a look at this PR when you get a chance? Thank you! |
reese-lee
left a comment
There was a problem hiding this comment.
Thank you for doing this. It's interesting to see the differences between some of the old AI-generated transcripts and the new one.
I think it works in general for most videos, but I did notice that with the Humans of OTel interviews, it summarizes everyone's thoughts instead of leaving them individual, whereas the whole point of a video like that IS to showcase the individuals.
|
Thanks @reese-lee , yeah, I think human-generated summaries/chapters are still the best... but in my experience they often don't get done. I think the AI-generated ones are a good starting point at least! Some videos will probably still be exceptions, and it's hard to include that in a general prompt for all videos. |
|
Hi @nicolevanderhoeven, we reviewed some of the summarized transcripts, and have a couple questions:
|
- Add clean_ai_preamble() function to remove conversational preamble lines - Remove phrases like 'Sure! Here are the key moments...' from chapter output - Preserve only actual timestamp lines in generated transcripts
- Insert chapter timestamps inline in cleaned transcripts at matching text positions - Add video duration constraint to prevent AI from generating timestamps beyond video length - Improve transcript cleanup to preserve exact wording and speaker names - Add post-processing validation to verify chapter timestamps match content - Limit timestamp corrections to ±60 second window around original time - Use window-based matching with chapter titles for accurate timestamp placement - Lower AI temperature for more accurate and deterministic results
- Move chapter limit to CRITICAL INSTRUCTIONS section - Use stronger mandatory language (MUST, NO MORE than 10) - Add reminder at end of prompt to reinforce limit - Prevents AI from generating 19-24 chapters per video
- Replace segment-based windowing with simpler 10-second intervals - Remove complex window_key logic and seen_times tracking - Create windows directly at 10-second intervals within search range - Makes timestamp finding more predictable and easier to debug
- Test timeline building with various intervals and edge cases - Test filtering of [Music] and [Applause] markers - Test timestamp parsing and formatting utilities - Test roundtrip conversions and precision handling
- Add venv/, .venv/, env/, ENV/ directories - Add .env file (for API keys/secrets) - Prevents accidentally committing large virtual environments
|
@reese-lee Thanks for reviewing! I've just done another pass to address your comments.
Sure is! I've just added it here.
Oops, I just put in some logic to add the timestamps of the chapters within the transcript itself, but now I'm rereading this and thinking that you meant the descriptions, not just the timestamps. Before I change this, I just wanted to double-check what you're asking for here. Currently (here's an example of a generated transcript, there is a chapter at the beginning ( Would you prefer for that line to look like: ### Guest introduction: Diana
**Reese:** Diana, welcome. And thank you guys.instead? Do you want the timestamp there at that point too, or just the heading for the chapter description? |
|
@nicolevanderhoeven I think the example you had there would be great. Having the chapters in the transcripts as different sections would be great :) So, instead of Something like this? |
| videos = [] | ||
| next_page_token = None | ||
| page_count = 0 | ||
| max_pages = 1 # Limit to 1 page (50 videos max) to avoid pagination issues |
There was a problem hiding this comment.
If this only 1, and we want to limit this scrape to 50 videos for the whole channel, then do we need the while loop? If getting all videos for a channel is challenging, I'd opt for removing this method (and associated docs) and thinking of how we can add it safely.
| if i < len(videos) - 1: # Don't sleep after the last video | ||
| delay = random.uniform(2, 5) # Random delay between 2-5 seconds | ||
| print(f"Waiting {delay:.1f} seconds before next video...") | ||
| time.sleep(delay) |
There was a problem hiding this comment.
Why do we need to wait? Is it related to limits?
There was a problem hiding this comment.
Yep, this is a proactive delay to avoid hitting the YouTube rate limits. It might not be such a big deal if people are just running the script to fetch a few new videos, but I ran into rate limits a lot when regenerating the transcripts for 44 videos. I recommend keeping the delay even if it does slow down generation.
|
There is quite a lot of code related to text transformation contributed on this PR. Considering we already have I don't have access to ChatGPT to test this, but I've created a Gemini Gem with the same system prompt we're currently using here, and added the following to the prompt: With that, I got the following result with the latest OTel Night in Berlin: I'm not against having any necessary code here, but considering we already use ChatGPT for the transcript cleanup, I think we can use it to create these chapters and link back to the seconds? |
- Add timeline skeleton to chapter generation prompt showing actual timestamps every 30s - Remove skip logic for 00:00:00 chapter heading in transcript insertion - Reuse existing time-to-text mapping logic for consistency - Fixes issue where chapter timestamps could be off by several minutes
- Pass video summary to chapter generation to guide topic selection - Add explicit guidance to skip small talk not mentioned in summary - Fix timeline sampling to cover entire video duration dynamically * Was limited to 40 samples (20 minutes) regardless of video length * Now uses ~55 samples distributed across full video duration * Sample interval adjusts based on video length (min 15s) - Update prompt to emphasize reviewing ENTIRE video before selecting chapters - Add better logging showing sample count, interval, and video duration - Fixes issues with: * Chapters for unimportant small talk (e.g., weather discussion) * Missing chapters in second half of longer videos * AI only seeing beginning of video
- Remove unused TooManyRequests import and defensive fallback - Add TestYouTubeAPIErrorHandling class to validate exception API contract - Tests ensure YouTube API exceptions are importable and catchable - Will catch breaking changes if youtube-transcript-api is upgraded
…ards - Add max_pages parameter (default 100) to prevent infinite loops - Add max_retries (3) for quota exceeded errors - Track page_count and retry_count for better control flow - Make loop condition explicit: while page_count < max_pages - Reset retry_count on successful requests - Raise RuntimeError when max retries exceeded This addresses reviewer feedback about the risks of using while True loops.
- Extract video ID fetching logic into _fetch_playlist_video_ids - Extract video details fetching logic into _fetch_video_details_batch - Simplify main get_playlist_videos function to coordinate the two helpers - Each function now has a single, clear responsibility - Improve code readability and maintainability
- Removed 200ms delay in _fetch_video_details_batch - Removed 1-3 second delays in _fetch_playlist_video_ids and get_channel_videos - Removed 2-4 second delay between playlist and channel fetches in main - Fixed retry bug in _fetch_video_details_batch where continue would skip to next batch instead of retrying the failed one - Added proper retry loop with max_retries tracking per batch - 403 quota handlers now properly catch and retry after 60 second wait
- Add explanatory comment in transcripts.py explaining that the 2-5s delay prevents YouTube rate limiting (429 errors) - Update README to accurately reflect actual delays used (2-5s, not 10-30s) - Clarify distinction between proactive delays and reactive retry waits
- Condense multiline IP blocking error messages to 2 lines - Shorten rate limiting error messages - Simplify XML parse error messages - All changes reference README for detailed troubleshooting
- Instruct AI to output only the chapter list without preamble text - Prevent conversational phrases like 'Here are the chapters' or 'Sure, I'll help' - Addresses reviewer feedback to handle this in the prompt rather than post-processing
- Extract _build_time_to_text_mapping() for building time-to-text mapping - Extract _get_window_texts() for getting text from time windows - Extract _extract_key_words() for keyword extraction with filtering - Extract _calculate_line_score() for scoring line matches - Extract _find_best_insertion_line() for finding best insertion points - Extract _build_transcript_with_chapters() for final transcript assembly - Main function now acts as orchestrator (36 lines, down from 113) - Fix syntax error on line 318 (invalid character in return statement) Benefits: improved testability, readability, and maintainability
|
Agree with @danielgblanco. Don't think we need this atm. |
|
@nicolevanderhoeven before we continue reviewing this PR, I would like to understand the core problem to solve. From experimentation it looks like the transcript cleanup process via ChatGPT can potentially solve this problem with a modification to the system prompt. Do you think the code contributed can benefit us in the long term, as opposed to solely relying on LLMs to handle the transcript cleanup? |
|
@avillela and @danielgblanco: Are your objections to the idea of having chapters in video transcripts in general, or the text transformation code vs just adding to the AI prompt? If it's the first: The intention of this PR is to automatically generate chapters in a format such that they can be copied and pasted into the YouTube description of a video. When this is done, YouTube adds sections to the video that:
Here's a video explanation on whether chapters are useful. I apologize if this is information you already know, but I thought it would be good to get it out here and make sure we're on the same page. :) YouTube also expects the chapters in a specific format, otherwise it doesn't get parsed correctly. So the "Table of Contents" example posted by Daniel might be useful when people are reading through the transcript, but not for the YouTube video itself. If your objections are instead about the second (the text transformation code vs. a pure AI prompt approach): I definitely considered just doing it entirely via AI. What I found was that AI was inconsistently bad. At this point I've generated and regenerated all 44 of the transcripts for the OTel channel multiple times, and what I found was that when it's purely an AI prompt, with no checks or text transformations, both the format and accuracy suffered. It would often pick chapter descriptions that were not at all useful despite my "only choose important moments" prompt or the chapter description did not match the timestamp. So I'd see things like:
Those sort of errors really erode the usefulness of chapters. So gradually I added more and more checks and tried to catch some things in code when I could. My thinking was that formatting and tests are things that can be done by code, whereas the determining key moments and key chapters to highlight is something that AI still does well and would be difficult to do with code alone. I only tested with OpenAI, so I'm not sure if Gemini does a better job. Personally I'd be more comfortable keeping the checks and text transformation code in, just for consistency. I do this for my own videos and find it extremely useful not to have to manually check the chapters, so I really value the script getting it right the first time and am okay with having a bit more code to do so. Let me know how you'd like to proceed. |
|
Thank you so much for all the context @nicolevanderhoeven. We (@open-telemetry/sig-end-user-maintainers) have discussed this. Our main concern was not about adding chapters, and we think that's very valuable and a great idea. However, I must personally say I had glossed over the fact that this can also help add descriptions back to YouTube so it needs to be in a specific format. So, this is then double useful! Our concern is about having a substantial amount of code added here (larger than the current script) for the purpose of adding chapters to summaries that have already been generated via ChatGPT. The main reason we're hesitant is because we need to ensure the code that's in this repo solves issues that cannot be solved in other off-the-shelf ways. We'd be much in favour of extending the system prompt we give ChatGPT in the cleanup, when calling It's clear that the non-determinism of LLMs can be an issue, but I'd argue that it's an issue in both the summaries and the chapter identification. If we can make it more deterministic by tuning parameters I think we can benefit from both summaries and chapter identification. We don't foresee us having to re-do all transcripts every time we change this code in the future, so we also think there's an element of reviewing new video transcripts (and chapters) as they're generated in new PRs. |
|
Thanks for the clarification, @danielgblanco ! I understand your concern more clearly now. Let me take another pass at it (tweaking the temperature is something I didn't try) with that in mind! :) |
This PR adds chapters generation to the video transcript Python script originally added here, adding a new function to use OpenAI to create chapters (timestamped sections of the video). When added to the YouTube description of a video, these chapters allow viewers to skip to relevant portions of the video and also give everyone a better idea of what the video entails.
In addition, I made the following improvements to video transcription:
env.exampleto show what the.envfile should look like.