RTMP vs HLS vs LL‑HLS • Why “just WebSockets” doesn’t scale • Latency math • Bitrate/ABR basics • Core terminology
Quick mental model
- Ingest (publisher → server): often RTMP (or SRT)
- Delivery (server/CDN → viewers): often HLS / DASH / LL‑HLS
- Primary tradeoff: Latency ↔ scale ↔ stability
- High-level systems: RTMP vs HLS, and HLS vs LL‑HLS
- Why we can’t do a “simple” publisher → server → clients setup
- Latency calculations + bandwidth math
- Glossary of key terms
- Appendix: commands
Suggested local lab tools: OBS (publisher), nginx‑rtmp (ingest), VLC (player), FFmpeg/FFprobe (inspect frames, timestamps, keyframes).
- What it is: A long-lived connection from a broadcaster (e.g., OBS) to an ingest server.
- Where it fits: Publisher → ingest side.
- Why it’s used: Stable persistent streaming from a single sender into the platform.
- What it is: HTTP-based segmented streaming: manifests (
.m3u8) + media segments (.tsor.m4s). - Where it fits: Origin/CDN → viewer side.
- Why it’s used: Scales via CDNs because segments are cacheable HTTP objects; supports ABR (adaptive bitrate).
- What it is: An HLS variant that reduces latency by delivering partial segments (chunks) and reducing playlist update delay.
- Key mechanisms: partial segments (
#EXT-X-PART), preload hints, and blocking playlist reload (server control). - Result: Can reduce end-to-end latency from ~15–30s (traditional HLS) to ~2–5s (LL‑HLS), while still using HTTP/CDNs.
| System | Best for | Pros | Cons |
|---|---|---|---|
| RTMP (ingest) | Broadcaster → platform ingest | Persistent connection, simple broadcaster tooling (OBS), good ingest stability | Not ideal for large-scale viewer delivery directly; not CDN-friendly for massive fan-out |
| HLS (delivery) | Massive scale to viewers | CDN caching, ABR support, works well over HTTP, resilient to many network issues | Higher latency (segment + buffer), more components (packaging, manifest updates) |
| LL‑HLS (delivery) | Lower-latency at large scale | 2–5s latency possible, still HTTP/CDN compatible | More requests/overhead, more operational complexity than standard HLS |
Rule of thumb: Use RTMP/SRT for ingest, and HLS/LL‑HLS for delivery to many viewers. Delivery choice depends on latency requirements.
“Broadcaster sends bytes to a server; server keeps client connections open (WebSockets) and forwards the same bytes to everyone.”
- Timed media, not a file: Video/audio are sequences of frames/samples with timestamps. Late data is effectively “wrong”, not just “slow”.
- Codec dependencies: Most frames (P/B) depend on other frames. New viewers often can’t start decoding until the next keyframe (I-frame).
- Fan-out cost: One publisher at 6 Mbps with 1,000,000 viewers implies ~6 Tbps outbound if your origin pushes to everyone. That’s why we use CDNs and cacheable HTTP objects (HLS segments).
- Backpressure: With push, slow viewers create per-connection queues. Memory grows, tail latency grows, and servers destabilize.
- Adaptive bitrate (ABR): A single bitrate stream fails users on poor networks. ABR requires multiple renditions (1080p/720p/480p/360p).
- Operational reality: Pull-based segmented delivery is easier to scale, cache, retry, and observe.
- Encoder: Converts raw frames to compressed bitstream (H.264/H.265/AV1). Can be software (CPU) or hardware (media engine).
- Transcoder: Produces multiple resolutions/bitrates for ABR (the “ladder”).
- Packager: Wraps compressed audio/video into streaming containers/segments (TS/fMP4/CMAF) and publishes manifests (HLS playlists).
- Decoder: On viewer device/browser. Often hardware-accelerated for performance and battery.
Key insight: “Just forward bytes” works for small audiences. At internet scale, you need stateless pull, cacheable objects, and ABR.
- OBS streams to RTMP ingest:
rtmp://localhost:1935/live/test - VLC connects later and may wait for next keyframe to start decoding.
- Changing OBS keyframe interval (e.g., 2s → 10s) increases worst-case join delay.
Total latency ≈ capture + encode + network + buffering + decode + render
- Often: ~1–5 seconds end-to-end (varies widely by encoder settings and player buffering).
- Why: Persistent stream transport, but still needs buffering for stability.
- Typical segment duration: 4–6 seconds
- Typical startup buffer: 2–3 segments
- Ballpark latency: ~15–30 seconds
Example math: If segment = 6s and player buffers 3 segments, latency can be ≈ 6s (segment completion) + 12s (buffer) + overhead ≈ 18–25s.
- Partial segment size: ~200–500ms
- Buffer target: a few partials
- Ballpark latency: ~2–5 seconds
Example math: If part = 0.5s and buffer = 3 parts, latency can be ≈ 0.5s (part ready) + 1.5s (buffer) + overhead ≈ 2–3s.
- Pixel count scaling: More pixels per frame generally need more bits to preserve similar perceptual quality.
- Bits-per-pixel intuition: Many practical ladders keep a roughly similar “bits per pixel per second” range across resolutions.
- ABR safety margin: Players pick a rendition below measured bandwidth to avoid rebuffering.
Uncompressed bitrate approximation:
bitrate ≈ width × height × bits_per_pixel × fps
Example: 1920×1080, 24 bpp (RGB), 30 fps ≈ 1.49 Gbps (far too large) → compression brings it down to single‑digit Mbps.
- Why video is a timed stream (frames, timestamps) and not “just bytes”.
- Why keyframe interval affects join latency.
- Why CDN-friendly segmented delivery scales better than pushing to millions of sockets.
- What makes LL‑HLS lower latency (partials, blocking playlist reload, preload hints).
| Term | Definition | Why it matters |
|---|---|---|
| Frame | A single video image in time (part of a timed sequence). | Playback requires frames to arrive decodable and on time. |
| I-frame (Keyframe) | Intra-coded frame; self-contained image (like a JPEG in concept). | Entry point for new viewers; enables seeking; larger in size. |
| P-frame | Predicted from past reference frame(s) using motion vectors + residuals. | Smaller than I-frames but depends on previous frames. |
| B-frame | Bi-directional predicted using both past and future reference frames. | Better compression but introduces reordering; impacts latency/buffering. |
| FPS (Frames Per Second) | Number of frames displayed per second (e.g., 30 fps → 33ms/frame). | Defines timing cadence and impacts bitrate and motion smoothness. |
| Bitrate | Bits per second used for video/audio (e.g., 6 Mbps). | Higher bitrate can improve quality but requires more bandwidth. |
| PTS (Presentation Timestamp) | When a frame should be displayed. | Needed to play timed media correctly and keep A/V in sync. |
| DTS (Decoding Timestamp) | When a frame must be decoded (may differ from PTS when B-frames exist). | Explains why decode order ≠ display order; affects buffering/latency. |
| GOP (Group of Pictures) | Pattern/interval between keyframes (I-frame distance). | Controls join time and recovery; shorter GOP → faster join but more bits. |
| Buffering | Holding media ahead of playback to absorb network jitter/throughput drops. | More buffer → more stability but higher latency. |
| ABR (Adaptive Bitrate) | Client switches between multiple renditions based on bandwidth/conditions. | Prevents rebuffering and improves QoE across diverse networks. |
| Encoder / Decoder | Encoder compresses raw frames; decoder reconstructs frames for display. | Core compute for video; can be hardware-accelerated. |
| Transcoder | Creates multiple versions (resolutions/bitrates) from one input stream. | Enables ABR ladders (1080p/720p/480p/360p). |
| Packager | Wraps encoded streams into containers/segments + produces manifests (HLS playlists). | Makes delivery CDN-friendly and browser/player-friendly. |
Practical debug hint: FFmpeg -vf showinfo outputs per-frame info (type, iskey, pts_time, duration_time). Keyframes show iskey:1.
docker compose up -dServer: rtmp://localhost:1935/live
Stream key: test
Full URL: rtmp://localhost:1935/live/test
ffmpeg -i rtmp://localhost:1935/live/test -vf showinfo -f null -ffmpeg -y -i rtmp://localhost:1935/live/test -t 10 -c copy sample.flvcreated with ❤️ by Chatgpt & Nikhil