Designing a Video Streaming System: Netflix/Prime Architecture Deep Dive

August 15, 2024 • 45 min read

System Design

Building a video streaming platform that serves millions of concurrent viewers, processes petabytes of video content, and delivers seamless playback requires sophisticated engineering across multiple domains. This comprehensive guide explores the architecture, algorithms, and design patterns needed to build a scalable streaming system like Netflix or Amazon Prime Video.

Requirements Analysis

Functional Requirements

Content Management:

Video upload and ingestion
Video processing and transcoding
Content metadata management
Thumbnail and preview generation

Streaming:

Video playback with adaptive bitrate
Support for multiple devices and platforms
Resume playback from last position
Download for offline viewing

User Features:

User authentication and profiles
Watch history and recommendations
Search and discovery
Playlists and favorites
Ratings and reviews

Content Delivery:

Global content distribution
Low latency streaming
High quality video delivery
Bandwidth optimization

Non-Functional Requirements

Scalability: Support 200 million users, 100K concurrent streams Availability: 99.99% uptime Latency: Video start time < 2 seconds Quality: Support up to 4K resolution, multiple audio tracks Bandwidth: Optimize for varying network conditions

Capacity Estimation

Traffic Estimates

Daily Active Users (DAU): 200 million Peak Concurrent Viewers: 10% of DAU = 20 million Average Session Duration: 60 minutes Average Bitrate: 5 Mbps (HD quality) Peak Bandwidth: 20M × 5 Mbps = 100 Tbps

Storage Estimates

Content Library: 50,000 titles Average Title Size: 2 hours × 5 Mbps = 4.5 GB Total Library Size: 50K × 4.5 GB = 225 TB (single quality) Multiple Qualities: 4 qualities (480p, 720p, 1080p, 4K) = 900 TB Annual New Content: 10,000 titles/year = 180 TB/year

Processing Estimates

Upload Rate: 1000 videos/day Average Video Length: 2 hours Transcoding Time: 2 hours × real-time = 2 hours per video Concurrent Transcoding: Need 1000 transcoding servers (assuming 1:1 ratio)

System APIs

Core APIs

uploadVideo(userId, videoFile, metadata)
- Upload video for processing
- Returns: videoId, uploadUrl

getVideo(videoId)
- Get video metadata and streaming URLs
- Returns: video metadata, streaming URLs for different qualities

streamVideo(videoId, quality, startTime)
- Get streaming URL for video segment
- Returns: segment URL, next segment URL

searchVideos(query, filters)
- Search video library
- Returns: List of matching videos

getRecommendations(userId)
- Get personalized video recommendations
- Returns: List of recommended videos

updateWatchHistory(userId, videoId, position)
- Update user's watch position
- Returns: success status

High-Level Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Client Applications                       │
│              (Web, Mobile, TV, Gaming Console)              │
└───────────────┬───────────────────────────────┬─────────────┘
                │                               │
        ┌───────▼────────┐             ┌────────▼────────┐
        │  CDN Edge      │             │  API Gateway   │
        │  Servers       │             │  (REST API)    │
        └───────┬────────┘             └────────┬────────┘
                │                               │
        ┌───────▼───────────────────────────────▼───────┐
        │         CDN Origin / API Servers              │
        └───────┬───────────────────────────────┬───────┘
                │                               │
    ┌───────────▼───────────┐         ┌─────────▼─────────┐
    │  Video Processing    │         │  Metadata Service  │
    │  Pipeline            │         │  (Cassandra/MySQL) │
    └───────────┬───────────┘         └─────────┬─────────┘
                │                               │
    ┌───────────▼───────────────────────────────▼─────────┐
    │         Object Storage (Video Files)                 │
    │         (S3, GCS, Azure Blob)                        │
    └──────────────────────────────────────────────────────┘

Detailed Component Design

Video Processing Pipeline

Stage 1: Upload and Ingestion

Approach 1: Direct Upload to Storage

Creator          API Server        Object Storage
-------          ----------        -------------
  |--Upload-------->|                    |
  |                 |--Store------------->|
  |                 |<--URL---------------|
  |<--Success-------|                    |

Pros:

Simple architecture
Direct to storage, no intermediate step

Cons:

Long upload time blocks API
No validation before storage

Approach 2: Chunked Upload with Validation

Creator          API Server        Temporary Storage    Object Storage
-------          ----------        ----------------    -------------
  |--Init Upload-->|                    |                    |
  |<--Upload URL---|                    |                    |
  |                 |                    |                    |
  |--Upload Chunk-->|                    |                    |
  |                 |--Store Chunk------>|                    |
  |                 |                    |                    |
  |--Complete------>|                    |                    |
  |                 |--Validate---------|                    |
  |                 |--Move to Final------------------------>|
  |<--Success-------|                    |                    |

Pros:

Resume interrupted uploads
Validate before final storage
Better error handling

Cons:

More complex
Requires temporary storage

Decision: Use chunked upload for large files, direct upload for small files.

Stage 2: Transcoding

Challenge: Convert single video into multiple formats and qualities.

Transcoding Requirements:

Multiple resolutions: 480p, 720p, 1080p, 4K
Multiple bitrates per resolution
Multiple codecs: H.264, VP9, AV1
Multiple audio tracks: Different languages, audio descriptions
Subtitle generation: Extract and convert subtitles

Transcoding Pipeline:

Raw Video → Validation → Transcode (Parallel) → Quality Check → Storage
                │
                ├─ 480p (H.264, VP9)
                ├─ 720p (H.264, VP9)
                ├─ 1080p (H.264, VP9)
                ├─ 4K (H.264, VP9, AV1)
                └─ Audio Tracks (Multiple languages)

Transcoding Architecture:

Option 1: Centralized Transcoding Farm

Upload → Queue → Transcoding Farm → Storage

Pros:

Centralized management
Easy to monitor and optimize

Cons:

Single point of failure
Hard to scale

Option 2: Distributed Transcoding

Upload → Queue → Distributed Workers → Storage

Pros:

Horizontal scaling
Fault tolerant

Cons:

More complex coordination
Network overhead

Option 3: Cloud-Based Transcoding

Upload → Cloud Transcoding Service → Storage

Pros:

Managed service
Auto-scaling
Pay per use

Cons:

Vendor lock-in
Cost at scale

Decision: Use distributed transcoding workers with auto-scaling.

Transcoding Optimization:

Parallel Processing: Transcode multiple qualities simultaneously
GPU Acceleration: Use GPUs for faster transcoding
Adaptive Quality: Adjust quality based on content complexity
Caching: Cache transcoded segments for similar content

Stage 3: Storage Organization

Storage Structure:

videos/
  {videoId}/
    {quality}/
      {codec}/
        segment_0001.m4s
        segment_0002.m4s
        ...
    manifest.mpd (DASH) or playlist.m3u8 (HLS)
    thumbnails/
      thumbnail_1.jpg
      thumbnail_2.jpg
      ...

Storage Strategy:

Hot Content: Frequently accessed, store in fast storage (SSD) Warm Content: Moderately accessed, standard storage Cold Content: Rarely accessed, archive storage

Lifecycle Management: Automatically move content between tiers based on access patterns.

Content Delivery Network (CDN)

CDN Architecture

Purpose: Serve video content from edge locations close to users.

CDN Layers:

User → Edge Server (L1) → Regional Cache (L2) → Origin Server

Edge Server: Closest to user, serves cached content Regional Cache: Serves content for region, fetches from origin if not cached Origin Server: Source of truth, stores all content

Caching Strategy

Cache-Aside Pattern:

1. Check edge cache
2. If miss, check regional cache
3. If miss, fetch from origin
4. Store in cache for future requests

Cache Invalidation:

TTL-Based: Content expires after time period Purge API: Manually purge specific content Version-Based: Include version in URL, new version invalidates old

Cache Key Design:

{videoId}/{quality}/{codec}/segment_{number}

CDN Selection

Option 1: Build Own CDN

Pros:

Full control
No vendor lock-in
Custom optimizations

Cons:

High capital investment
Operational complexity
Global infrastructure needed

Option 2: Use Commercial CDN

Options: CloudFront, Cloudflare, Fastly, Akamai

Pros:

Managed service
Global infrastructure
DDoS protection included

Cons:

Cost at scale
Less control
Vendor dependency

Option 3: Hybrid

Pros:

Use CDN for edge, own for origin
Balance cost and control

Cons:

More complex

Decision: Use commercial CDN (CloudFront/Cloudflare) for edge, own infrastructure for origin.

Adaptive Bitrate Streaming

Problem

Users have varying network conditions. Fixed bitrate causes:

High bitrate: Buffering on slow connections
Low bitrate: Poor quality on fast connections

Solution: Adaptive Bitrate Streaming

How It Works: Client monitors network conditions and requests appropriate quality segments.

Protocols:

HLS (HTTP Live Streaming):

Apple’s protocol
Segments in .m3u8 playlist
Widely supported

DASH (Dynamic Adaptive Streaming over HTTP):

MPEG standard
More flexible than HLS
Better for complex scenarios

Smooth Streaming:

Microsoft’s protocol
Less common now

Decision: Support both HLS and DASH for maximum compatibility.

Adaptive Algorithm

Client-Side Algorithm:

1. Start with medium quality
2. Monitor download speed
3. Calculate buffer level
4. If buffer low → switch to lower quality
5. If buffer high and speed good → switch to higher quality
6. Avoid frequent switching (hysteresis)

Metrics to Monitor:

Download Speed: Bytes per second
Buffer Level: Seconds of video buffered
Segment Download Time: Time to download segment
Rebuffering Events: Number of stalls

Quality Switching Logic:

if (bufferLevel < 5 seconds):
    quality = lower(quality)
else if (bufferLevel > 30 seconds && downloadSpeed > threshold):
    quality = higher(quality)
else:
    quality = current(quality)

Server-Side Hints:

Option 1: Client-Only Decision

Client decides based on its metrics
Pros: Simple, works offline
Cons: May not have full picture

Option 2: Server-Assisted

Server provides recommendations
Pros: Better decisions
Cons: More complex, requires server communication

Decision: Use client-side algorithm with server hints for optimal experience.

Video Segmentation

Segment Size Trade-offs

Small Segments (2-4 seconds):

Pros: Faster adaptation, lower latency
Cons: More requests, more overhead

Large Segments (10 seconds):

Pros: Fewer requests, less overhead
Cons: Slower adaptation, higher latency

Decision: Use 4-6 second segments for balance.

Segment Structure

HLS Segment:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:6
#EXTINF:4.0,
segment_0001.ts
#EXTINF:4.0,
segment_0002.ts
...

DASH Manifest:

<MPD>
  <Period>
    <AdaptationSet>
      <Representation id="480p" bandwidth="1000000">
        <SegmentTemplate>
          <SegmentURL media="segment_$Number$.m4s"/>
        </SegmentTemplate>
      </Representation>
    </AdaptationSet>
  </Period>
</MPD>

Database Design

Schema Design

Videos Table:

video_id (PK)
title
description
duration
release_date
genre
rating
thumbnail_url
created_at

Video_Qualities Table:

video_id (FK)
quality (480p, 720p, 1080p, 4K)
codec (H264, VP9, AV1)
bitrate
manifest_url (HLS/DASH)
storage_path

Users Table:

user_id (PK)
email
username
subscription_tier
created_at

Watch_History Table:

user_id (FK)
video_id (FK)
watch_position (seconds)
last_watched_at
completed (boolean)

Recommendations Table:

user_id (FK)
video_id (FK)
score
generated_at

Database Choice

Video Metadata: PostgreSQL (complex queries, relationships) Watch History: Cassandra (high write volume, time-series data) Recommendations: Redis (fast access, temporary data)

Search and Discovery

Search Architecture

Full-Text Search: Elasticsearch for video search Index Fields: Title, description, cast, genre, tags

Search Features:

Fuzzy matching
Autocomplete
Faceted search (filter by genre, year, etc.)
Relevance ranking

Recommendation System

Approach 1: Collaborative Filtering

How It Works: Recommend based on similar users’ preferences.

Pros:

Works without content analysis
Discovers unexpected content

Cons:

Cold start problem (new users/items)
Popularity bias

Approach 2: Content-Based Filtering

How It Works: Recommend based on content similarity.

Pros:

No cold start for content
Explainable recommendations

Cons:

Limited diversity
Requires content analysis

Approach 3: Hybrid

How It Works: Combine collaborative and content-based.

Pros:

Best of both approaches
Handles edge cases

Cons:

More complex

Decision: Use hybrid approach for best results.

Recommendation Pipeline:

User Profile → Feature Extraction → Similarity Calculation → Ranking → Top N Recommendations

Features:

Watch history
Ratings
Genre preferences
Time of day preferences
Device type

Scalability Patterns

Horizontal Scaling

Stateless API Servers: Scale horizontally Load Balancing: Distribute requests across servers Database Sharding: Partition data by video_id or user_id

Caching Strategy

Multi-Level Caching:

CDN Cache: Edge locations
Application Cache: Redis for metadata
Database Cache: Query result caching

Cache Invalidation:

TTL-based expiration
Event-based invalidation
Version-based keys

Content Pre-positioning

Strategy: Pre-cache popular content at edge locations.

Popularity Prediction:

Historical views
Trending algorithms
Regional preferences
Time-based patterns

Pre-positioning Algorithm:

if (content.popularity > threshold && edge.cache_space_available):
    pre_cache(content, edge)

Real-World Implementations

Netflix Architecture

CDN: Own CDN (Open Connect) + ISPs Transcoding: Distributed transcoding farm Storage: S3-compatible object storage Protocol: HLS and DASH Adaptive Algorithm: Proprietary client-side algorithm Recommendations: Machine learning-based hybrid system

Amazon Prime Video

CDN: CloudFront Transcoding: AWS Elemental MediaConvert Storage: S3 Protocol: HLS Adaptive Algorithm: Client-side with CloudFront metrics

YouTube

CDN: Google’s global CDN Transcoding: Distributed transcoding Storage: Google Cloud Storage Protocol: DASH primarily Adaptive Algorithm: Advanced client-side algorithm with server hints

Trade-offs Summary

CDN: Own vs Commercial:

Own: Full control, high investment
Commercial: Managed, cost at scale

Transcoding: Centralized vs Distributed:

Centralized: Simpler, harder to scale
Distributed: Scalable, more complex

Segment Size: Small vs Large:

Small: Faster adaptation, more overhead
Large: Less overhead, slower adaptation

Adaptive Algorithm: Client vs Server:

Client: Simple, works offline
Server: Better decisions, more complex

Conclusion

Designing a video streaming system requires expertise across multiple domains: video processing, CDN architecture, adaptive streaming, and recommendation systems. The key is optimizing for user experience while managing costs and complexity.

Key decisions:

Chunked upload for large files
Distributed transcoding with auto-scaling
Commercial CDN for edge delivery
HLS and DASH protocols
Client-side adaptive algorithm with server hints
Hybrid recommendation system
Multi-level caching strategy

By understanding these components and making informed trade-offs, we can build streaming systems that scale to millions of users while delivering high-quality video experiences.