Designing a Video Streaming System: Netflix/Prime Architecture Deep Dive
Building a video streaming platform that serves millions of concurrent viewers, processes petabytes of video content, and delivers seamless playback requires sophisticated engineering across multiple domains. This comprehensive guide explores the architecture, algorithms, and design patterns needed to build a scalable streaming system like Netflix or Amazon Prime Video.
Requirements Analysis
Functional Requirements
Content Management:
- Video upload and ingestion
- Video processing and transcoding
- Content metadata management
- Thumbnail and preview generation
Streaming:
- Video playback with adaptive bitrate
- Support for multiple devices and platforms
- Resume playback from last position
- Download for offline viewing
User Features:
- User authentication and profiles
- Watch history and recommendations
- Search and discovery
- Playlists and favorites
- Ratings and reviews
Content Delivery:
- Global content distribution
- Low latency streaming
- High quality video delivery
- Bandwidth optimization
Non-Functional Requirements
Scalability: Support 200 million users, 100K concurrent streams Availability: 99.99% uptime Latency: Video start time < 2 seconds Quality: Support up to 4K resolution, multiple audio tracks Bandwidth: Optimize for varying network conditions
Capacity Estimation
Traffic Estimates
Daily Active Users (DAU): 200 million Peak Concurrent Viewers: 10% of DAU = 20 million Average Session Duration: 60 minutes Average Bitrate: 5 Mbps (HD quality) Peak Bandwidth: 20M × 5 Mbps = 100 Tbps
Storage Estimates
Content Library: 50,000 titles Average Title Size: 2 hours × 5 Mbps = 4.5 GB Total Library Size: 50K × 4.5 GB = 225 TB (single quality) Multiple Qualities: 4 qualities (480p, 720p, 1080p, 4K) = 900 TB Annual New Content: 10,000 titles/year = 180 TB/year
Processing Estimates
Upload Rate: 1000 videos/day Average Video Length: 2 hours Transcoding Time: 2 hours × real-time = 2 hours per video Concurrent Transcoding: Need 1000 transcoding servers (assuming 1:1 ratio)
System APIs
Core APIs
uploadVideo(userId, videoFile, metadata)
- Upload video for processing
- Returns: videoId, uploadUrl
getVideo(videoId)
- Get video metadata and streaming URLs
- Returns: video metadata, streaming URLs for different qualities
streamVideo(videoId, quality, startTime)
- Get streaming URL for video segment
- Returns: segment URL, next segment URL
searchVideos(query, filters)
- Search video library
- Returns: List of matching videos
getRecommendations(userId)
- Get personalized video recommendations
- Returns: List of recommended videos
updateWatchHistory(userId, videoId, position)
- Update user's watch position
- Returns: success status
High-Level Architecture
┌─────────────────────────────────────────────────────────────┐
│ Client Applications │
│ (Web, Mobile, TV, Gaming Console) │
└───────────────┬───────────────────────────────┬─────────────┘
│ │
┌───────▼────────┐ ┌────────▼────────┐
│ CDN Edge │ │ API Gateway │
│ Servers │ │ (REST API) │
└───────┬────────┘ └────────┬────────┘
│ │
┌───────▼───────────────────────────────▼───────┐
│ CDN Origin / API Servers │
└───────┬───────────────────────────────┬───────┘
│ │
┌───────────▼───────────┐ ┌─────────▼─────────┐
│ Video Processing │ │ Metadata Service │
│ Pipeline │ │ (Cassandra/MySQL) │
└───────────┬───────────┘ └─────────┬─────────┘
│ │
┌───────────▼───────────────────────────────▼─────────┐
│ Object Storage (Video Files) │
│ (S3, GCS, Azure Blob) │
└──────────────────────────────────────────────────────┘
Detailed Component Design
Video Processing Pipeline
Stage 1: Upload and Ingestion
Approach 1: Direct Upload to Storage
Creator API Server Object Storage
------- ---------- -------------
|--Upload-------->| |
| |--Store------------->|
| |<--URL---------------|
|<--Success-------| |
Pros:
- Simple architecture
- Direct to storage, no intermediate step
Cons:
- Long upload time blocks API
- No validation before storage
Approach 2: Chunked Upload with Validation
Creator API Server Temporary Storage Object Storage
------- ---------- ---------------- -------------
|--Init Upload-->| | |
|<--Upload URL---| | |
| | | |
|--Upload Chunk-->| | |
| |--Store Chunk------>| |
| | | |
|--Complete------>| | |
| |--Validate---------| |
| |--Move to Final------------------------>|
|<--Success-------| | |
Pros:
- Resume interrupted uploads
- Validate before final storage
- Better error handling
Cons:
- More complex
- Requires temporary storage
Decision: Use chunked upload for large files, direct upload for small files.
Stage 2: Transcoding
Challenge: Convert single video into multiple formats and qualities.
Transcoding Requirements:
- Multiple resolutions: 480p, 720p, 1080p, 4K
- Multiple bitrates per resolution
- Multiple codecs: H.264, VP9, AV1
- Multiple audio tracks: Different languages, audio descriptions
- Subtitle generation: Extract and convert subtitles
Transcoding Pipeline:
Raw Video → Validation → Transcode (Parallel) → Quality Check → Storage
│
├─ 480p (H.264, VP9)
├─ 720p (H.264, VP9)
├─ 1080p (H.264, VP9)
├─ 4K (H.264, VP9, AV1)
└─ Audio Tracks (Multiple languages)
Transcoding Architecture:
Option 1: Centralized Transcoding Farm
Upload → Queue → Transcoding Farm → Storage
Pros:
- Centralized management
- Easy to monitor and optimize
Cons:
- Single point of failure
- Hard to scale
Option 2: Distributed Transcoding
Upload → Queue → Distributed Workers → Storage
Pros:
- Horizontal scaling
- Fault tolerant
Cons:
- More complex coordination
- Network overhead
Option 3: Cloud-Based Transcoding
Upload → Cloud Transcoding Service → Storage
Pros:
- Managed service
- Auto-scaling
- Pay per use
Cons:
- Vendor lock-in
- Cost at scale
Decision: Use distributed transcoding workers with auto-scaling.
Transcoding Optimization:
- Parallel Processing: Transcode multiple qualities simultaneously
- GPU Acceleration: Use GPUs for faster transcoding
- Adaptive Quality: Adjust quality based on content complexity
- Caching: Cache transcoded segments for similar content
Stage 3: Storage Organization
Storage Structure:
videos/
{videoId}/
{quality}/
{codec}/
segment_0001.m4s
segment_0002.m4s
...
manifest.mpd (DASH) or playlist.m3u8 (HLS)
thumbnails/
thumbnail_1.jpg
thumbnail_2.jpg
...
Storage Strategy:
Hot Content: Frequently accessed, store in fast storage (SSD) Warm Content: Moderately accessed, standard storage Cold Content: Rarely accessed, archive storage
Lifecycle Management: Automatically move content between tiers based on access patterns.
Content Delivery Network (CDN)
CDN Architecture
Purpose: Serve video content from edge locations close to users.
CDN Layers:
User → Edge Server (L1) → Regional Cache (L2) → Origin Server
Edge Server: Closest to user, serves cached content Regional Cache: Serves content for region, fetches from origin if not cached Origin Server: Source of truth, stores all content
Caching Strategy
Cache-Aside Pattern:
1. Check edge cache
2. If miss, check regional cache
3. If miss, fetch from origin
4. Store in cache for future requests
Cache Invalidation:
TTL-Based: Content expires after time period Purge API: Manually purge specific content Version-Based: Include version in URL, new version invalidates old
Cache Key Design:
{videoId}/{quality}/{codec}/segment_{number}
CDN Selection
Option 1: Build Own CDN
Pros:
- Full control
- No vendor lock-in
- Custom optimizations
Cons:
- High capital investment
- Operational complexity
- Global infrastructure needed
Option 2: Use Commercial CDN
Options: CloudFront, Cloudflare, Fastly, Akamai
Pros:
- Managed service
- Global infrastructure
- DDoS protection included
Cons:
- Cost at scale
- Less control
- Vendor dependency
Option 3: Hybrid
Pros:
- Use CDN for edge, own for origin
- Balance cost and control
Cons:
- More complex
Decision: Use commercial CDN (CloudFront/Cloudflare) for edge, own infrastructure for origin.
Adaptive Bitrate Streaming
Problem
Users have varying network conditions. Fixed bitrate causes:
- High bitrate: Buffering on slow connections
- Low bitrate: Poor quality on fast connections
Solution: Adaptive Bitrate Streaming
How It Works: Client monitors network conditions and requests appropriate quality segments.
Protocols:
HLS (HTTP Live Streaming):
- Apple’s protocol
- Segments in .m3u8 playlist
- Widely supported
DASH (Dynamic Adaptive Streaming over HTTP):
- MPEG standard
- More flexible than HLS
- Better for complex scenarios
Smooth Streaming:
- Microsoft’s protocol
- Less common now
Decision: Support both HLS and DASH for maximum compatibility.
Adaptive Algorithm
Client-Side Algorithm:
1. Start with medium quality
2. Monitor download speed
3. Calculate buffer level
4. If buffer low → switch to lower quality
5. If buffer high and speed good → switch to higher quality
6. Avoid frequent switching (hysteresis)
Metrics to Monitor:
- Download Speed: Bytes per second
- Buffer Level: Seconds of video buffered
- Segment Download Time: Time to download segment
- Rebuffering Events: Number of stalls
Quality Switching Logic:
if (bufferLevel < 5 seconds):
quality = lower(quality)
else if (bufferLevel > 30 seconds && downloadSpeed > threshold):
quality = higher(quality)
else:
quality = current(quality)
Server-Side Hints:
Option 1: Client-Only Decision
- Client decides based on its metrics
- Pros: Simple, works offline
- Cons: May not have full picture
Option 2: Server-Assisted
- Server provides recommendations
- Pros: Better decisions
- Cons: More complex, requires server communication
Decision: Use client-side algorithm with server hints for optimal experience.
Video Segmentation
Segment Size Trade-offs
Small Segments (2-4 seconds):
- Pros: Faster adaptation, lower latency
- Cons: More requests, more overhead
Large Segments (10 seconds):
- Pros: Fewer requests, less overhead
- Cons: Slower adaptation, higher latency
Decision: Use 4-6 second segments for balance.
Segment Structure
HLS Segment:
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:6
#EXTINF:4.0,
segment_0001.ts
#EXTINF:4.0,
segment_0002.ts
...
DASH Manifest:
<MPD>
<Period>
<AdaptationSet>
<Representation id="480p" bandwidth="1000000">
<SegmentTemplate>
<SegmentURL media="segment_$Number$.m4s"/>
</SegmentTemplate>
</Representation>
</AdaptationSet>
</Period>
</MPD>
Database Design
Schema Design
Videos Table:
video_id (PK)
title
description
duration
release_date
genre
rating
thumbnail_url
created_at
Video_Qualities Table:
video_id (FK)
quality (480p, 720p, 1080p, 4K)
codec (H264, VP9, AV1)
bitrate
manifest_url (HLS/DASH)
storage_path
Users Table:
user_id (PK)
email
username
subscription_tier
created_at
Watch_History Table:
user_id (FK)
video_id (FK)
watch_position (seconds)
last_watched_at
completed (boolean)
Recommendations Table:
user_id (FK)
video_id (FK)
score
generated_at
Database Choice
Video Metadata: PostgreSQL (complex queries, relationships) Watch History: Cassandra (high write volume, time-series data) Recommendations: Redis (fast access, temporary data)
Search and Discovery
Search Architecture
Full-Text Search: Elasticsearch for video search Index Fields: Title, description, cast, genre, tags
Search Features:
- Fuzzy matching
- Autocomplete
- Faceted search (filter by genre, year, etc.)
- Relevance ranking
Recommendation System
Approach 1: Collaborative Filtering
How It Works: Recommend based on similar users’ preferences.
Pros:
- Works without content analysis
- Discovers unexpected content
Cons:
- Cold start problem (new users/items)
- Popularity bias
Approach 2: Content-Based Filtering
How It Works: Recommend based on content similarity.
Pros:
- No cold start for content
- Explainable recommendations
Cons:
- Limited diversity
- Requires content analysis
Approach 3: Hybrid
How It Works: Combine collaborative and content-based.
Pros:
- Best of both approaches
- Handles edge cases
Cons:
- More complex
Decision: Use hybrid approach for best results.
Recommendation Pipeline:
User Profile → Feature Extraction → Similarity Calculation → Ranking → Top N Recommendations
Features:
- Watch history
- Ratings
- Genre preferences
- Time of day preferences
- Device type
Scalability Patterns
Horizontal Scaling
Stateless API Servers: Scale horizontally Load Balancing: Distribute requests across servers Database Sharding: Partition data by video_id or user_id
Caching Strategy
Multi-Level Caching:
- CDN Cache: Edge locations
- Application Cache: Redis for metadata
- Database Cache: Query result caching
Cache Invalidation:
- TTL-based expiration
- Event-based invalidation
- Version-based keys
Content Pre-positioning
Strategy: Pre-cache popular content at edge locations.
Popularity Prediction:
- Historical views
- Trending algorithms
- Regional preferences
- Time-based patterns
Pre-positioning Algorithm:
if (content.popularity > threshold && edge.cache_space_available):
pre_cache(content, edge)
Real-World Implementations
Netflix Architecture
CDN: Own CDN (Open Connect) + ISPs Transcoding: Distributed transcoding farm Storage: S3-compatible object storage Protocol: HLS and DASH Adaptive Algorithm: Proprietary client-side algorithm Recommendations: Machine learning-based hybrid system
Amazon Prime Video
CDN: CloudFront Transcoding: AWS Elemental MediaConvert Storage: S3 Protocol: HLS Adaptive Algorithm: Client-side with CloudFront metrics
YouTube
CDN: Google’s global CDN Transcoding: Distributed transcoding Storage: Google Cloud Storage Protocol: DASH primarily Adaptive Algorithm: Advanced client-side algorithm with server hints
Trade-offs Summary
CDN: Own vs Commercial:
- Own: Full control, high investment
- Commercial: Managed, cost at scale
Transcoding: Centralized vs Distributed:
- Centralized: Simpler, harder to scale
- Distributed: Scalable, more complex
Segment Size: Small vs Large:
- Small: Faster adaptation, more overhead
- Large: Less overhead, slower adaptation
Adaptive Algorithm: Client vs Server:
- Client: Simple, works offline
- Server: Better decisions, more complex
Conclusion
Designing a video streaming system requires expertise across multiple domains: video processing, CDN architecture, adaptive streaming, and recommendation systems. The key is optimizing for user experience while managing costs and complexity.
Key decisions:
- Chunked upload for large files
- Distributed transcoding with auto-scaling
- Commercial CDN for edge delivery
- HLS and DASH protocols
- Client-side adaptive algorithm with server hints
- Hybrid recommendation system
- Multi-level caching strategy
By understanding these components and making informed trade-offs, we can build streaming systems that scale to millions of users while delivering high-quality video experiences.