Designing a Video Streaming System: Netflix/Prime Architecture Deep Dive

• 45 min read
System Design

Building a video streaming platform that serves millions of concurrent viewers, processes petabytes of video content, and delivers seamless playback requires sophisticated engineering across multiple domains. This comprehensive guide explores the architecture, algorithms, and design patterns needed to build a scalable streaming system like Netflix or Amazon Prime Video.

Requirements Analysis

Functional Requirements

Content Management:

  • Video upload and ingestion
  • Video processing and transcoding
  • Content metadata management
  • Thumbnail and preview generation

Streaming:

  • Video playback with adaptive bitrate
  • Support for multiple devices and platforms
  • Resume playback from last position
  • Download for offline viewing

User Features:

  • User authentication and profiles
  • Watch history and recommendations
  • Search and discovery
  • Playlists and favorites
  • Ratings and reviews

Content Delivery:

  • Global content distribution
  • Low latency streaming
  • High quality video delivery
  • Bandwidth optimization

Non-Functional Requirements

Scalability: Support 200 million users, 100K concurrent streams Availability: 99.99% uptime Latency: Video start time < 2 seconds Quality: Support up to 4K resolution, multiple audio tracks Bandwidth: Optimize for varying network conditions

Capacity Estimation

Traffic Estimates

Daily Active Users (DAU): 200 million Peak Concurrent Viewers: 10% of DAU = 20 million Average Session Duration: 60 minutes Average Bitrate: 5 Mbps (HD quality) Peak Bandwidth: 20M × 5 Mbps = 100 Tbps

Storage Estimates

Content Library: 50,000 titles Average Title Size: 2 hours × 5 Mbps = 4.5 GB Total Library Size: 50K × 4.5 GB = 225 TB (single quality) Multiple Qualities: 4 qualities (480p, 720p, 1080p, 4K) = 900 TB Annual New Content: 10,000 titles/year = 180 TB/year

Processing Estimates

Upload Rate: 1000 videos/day Average Video Length: 2 hours Transcoding Time: 2 hours × real-time = 2 hours per video Concurrent Transcoding: Need 1000 transcoding servers (assuming 1:1 ratio)

System APIs

Core APIs

uploadVideo(userId, videoFile, metadata)
- Upload video for processing
- Returns: videoId, uploadUrl

getVideo(videoId)
- Get video metadata and streaming URLs
- Returns: video metadata, streaming URLs for different qualities

streamVideo(videoId, quality, startTime)
- Get streaming URL for video segment
- Returns: segment URL, next segment URL

searchVideos(query, filters)
- Search video library
- Returns: List of matching videos

getRecommendations(userId)
- Get personalized video recommendations
- Returns: List of recommended videos

updateWatchHistory(userId, videoId, position)
- Update user's watch position
- Returns: success status

High-Level Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Client Applications                       │
│              (Web, Mobile, TV, Gaming Console)              │
└───────────────┬───────────────────────────────┬─────────────┘
                │                               │
        ┌───────▼────────┐             ┌────────▼────────┐
        │  CDN Edge      │             │  API Gateway   │
        │  Servers       │             │  (REST API)    │
        └───────┬────────┘             └────────┬────────┘
                │                               │
        ┌───────▼───────────────────────────────▼───────┐
        │         CDN Origin / API Servers              │
        └───────┬───────────────────────────────┬───────┘
                │                               │
    ┌───────────▼───────────┐         ┌─────────▼─────────┐
    │  Video Processing    │         │  Metadata Service  │
    │  Pipeline            │         │  (Cassandra/MySQL) │
    └───────────┬───────────┘         └─────────┬─────────┘
                │                               │
    ┌───────────▼───────────────────────────────▼─────────┐
    │         Object Storage (Video Files)                 │
    │         (S3, GCS, Azure Blob)                        │
    └──────────────────────────────────────────────────────┘

Detailed Component Design

Video Processing Pipeline

Stage 1: Upload and Ingestion

Approach 1: Direct Upload to Storage

Creator          API Server        Object Storage
-------          ----------        -------------
  |--Upload-------->|                    |
  |                 |--Store------------->|
  |                 |<--URL---------------|
  |<--Success-------|                    |

Pros:

  • Simple architecture
  • Direct to storage, no intermediate step

Cons:

  • Long upload time blocks API
  • No validation before storage

Approach 2: Chunked Upload with Validation

Creator          API Server        Temporary Storage    Object Storage
-------          ----------        ----------------    -------------
  |--Init Upload-->|                    |                    |
  |<--Upload URL---|                    |                    |
  |                 |                    |                    |
  |--Upload Chunk-->|                    |                    |
  |                 |--Store Chunk------>|                    |
  |                 |                    |                    |
  |--Complete------>|                    |                    |
  |                 |--Validate---------|                    |
  |                 |--Move to Final------------------------>|
  |<--Success-------|                    |                    |

Pros:

  • Resume interrupted uploads
  • Validate before final storage
  • Better error handling

Cons:

  • More complex
  • Requires temporary storage

Decision: Use chunked upload for large files, direct upload for small files.

Stage 2: Transcoding

Challenge: Convert single video into multiple formats and qualities.

Transcoding Requirements:

  • Multiple resolutions: 480p, 720p, 1080p, 4K
  • Multiple bitrates per resolution
  • Multiple codecs: H.264, VP9, AV1
  • Multiple audio tracks: Different languages, audio descriptions
  • Subtitle generation: Extract and convert subtitles

Transcoding Pipeline:

Raw Video → Validation → Transcode (Parallel) → Quality Check → Storage

                ├─ 480p (H.264, VP9)
                ├─ 720p (H.264, VP9)
                ├─ 1080p (H.264, VP9)
                ├─ 4K (H.264, VP9, AV1)
                └─ Audio Tracks (Multiple languages)

Transcoding Architecture:

Option 1: Centralized Transcoding Farm

Upload → Queue → Transcoding Farm → Storage

Pros:

  • Centralized management
  • Easy to monitor and optimize

Cons:

  • Single point of failure
  • Hard to scale

Option 2: Distributed Transcoding

Upload → Queue → Distributed Workers → Storage

Pros:

  • Horizontal scaling
  • Fault tolerant

Cons:

  • More complex coordination
  • Network overhead

Option 3: Cloud-Based Transcoding

Upload → Cloud Transcoding Service → Storage

Pros:

  • Managed service
  • Auto-scaling
  • Pay per use

Cons:

  • Vendor lock-in
  • Cost at scale

Decision: Use distributed transcoding workers with auto-scaling.

Transcoding Optimization:

  • Parallel Processing: Transcode multiple qualities simultaneously
  • GPU Acceleration: Use GPUs for faster transcoding
  • Adaptive Quality: Adjust quality based on content complexity
  • Caching: Cache transcoded segments for similar content

Stage 3: Storage Organization

Storage Structure:

videos/
  {videoId}/
    {quality}/
      {codec}/
        segment_0001.m4s
        segment_0002.m4s
        ...
    manifest.mpd (DASH) or playlist.m3u8 (HLS)
    thumbnails/
      thumbnail_1.jpg
      thumbnail_2.jpg
      ...

Storage Strategy:

Hot Content: Frequently accessed, store in fast storage (SSD) Warm Content: Moderately accessed, standard storage Cold Content: Rarely accessed, archive storage

Lifecycle Management: Automatically move content between tiers based on access patterns.

Content Delivery Network (CDN)

CDN Architecture

Purpose: Serve video content from edge locations close to users.

CDN Layers:

User → Edge Server (L1) → Regional Cache (L2) → Origin Server

Edge Server: Closest to user, serves cached content Regional Cache: Serves content for region, fetches from origin if not cached Origin Server: Source of truth, stores all content

Caching Strategy

Cache-Aside Pattern:

1. Check edge cache
2. If miss, check regional cache
3. If miss, fetch from origin
4. Store in cache for future requests

Cache Invalidation:

TTL-Based: Content expires after time period Purge API: Manually purge specific content Version-Based: Include version in URL, new version invalidates old

Cache Key Design:

{videoId}/{quality}/{codec}/segment_{number}

CDN Selection

Option 1: Build Own CDN

Pros:

  • Full control
  • No vendor lock-in
  • Custom optimizations

Cons:

  • High capital investment
  • Operational complexity
  • Global infrastructure needed

Option 2: Use Commercial CDN

Options: CloudFront, Cloudflare, Fastly, Akamai

Pros:

  • Managed service
  • Global infrastructure
  • DDoS protection included

Cons:

  • Cost at scale
  • Less control
  • Vendor dependency

Option 3: Hybrid

Pros:

  • Use CDN for edge, own for origin
  • Balance cost and control

Cons:

  • More complex

Decision: Use commercial CDN (CloudFront/Cloudflare) for edge, own infrastructure for origin.

Adaptive Bitrate Streaming

Problem

Users have varying network conditions. Fixed bitrate causes:

  • High bitrate: Buffering on slow connections
  • Low bitrate: Poor quality on fast connections

Solution: Adaptive Bitrate Streaming

How It Works: Client monitors network conditions and requests appropriate quality segments.

Protocols:

HLS (HTTP Live Streaming):

  • Apple’s protocol
  • Segments in .m3u8 playlist
  • Widely supported

DASH (Dynamic Adaptive Streaming over HTTP):

  • MPEG standard
  • More flexible than HLS
  • Better for complex scenarios

Smooth Streaming:

  • Microsoft’s protocol
  • Less common now

Decision: Support both HLS and DASH for maximum compatibility.

Adaptive Algorithm

Client-Side Algorithm:

1. Start with medium quality
2. Monitor download speed
3. Calculate buffer level
4. If buffer low → switch to lower quality
5. If buffer high and speed good → switch to higher quality
6. Avoid frequent switching (hysteresis)

Metrics to Monitor:

  • Download Speed: Bytes per second
  • Buffer Level: Seconds of video buffered
  • Segment Download Time: Time to download segment
  • Rebuffering Events: Number of stalls

Quality Switching Logic:

if (bufferLevel < 5 seconds):
    quality = lower(quality)
else if (bufferLevel > 30 seconds && downloadSpeed > threshold):
    quality = higher(quality)
else:
    quality = current(quality)

Server-Side Hints:

Option 1: Client-Only Decision

  • Client decides based on its metrics
  • Pros: Simple, works offline
  • Cons: May not have full picture

Option 2: Server-Assisted

  • Server provides recommendations
  • Pros: Better decisions
  • Cons: More complex, requires server communication

Decision: Use client-side algorithm with server hints for optimal experience.

Video Segmentation

Segment Size Trade-offs

Small Segments (2-4 seconds):

  • Pros: Faster adaptation, lower latency
  • Cons: More requests, more overhead

Large Segments (10 seconds):

  • Pros: Fewer requests, less overhead
  • Cons: Slower adaptation, higher latency

Decision: Use 4-6 second segments for balance.

Segment Structure

HLS Segment:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:6
#EXTINF:4.0,
segment_0001.ts
#EXTINF:4.0,
segment_0002.ts
...

DASH Manifest:

<MPD>
  <Period>
    <AdaptationSet>
      <Representation id="480p" bandwidth="1000000">
        <SegmentTemplate>
          <SegmentURL media="segment_$Number$.m4s"/>
        </SegmentTemplate>
      </Representation>
    </AdaptationSet>
  </Period>
</MPD>

Database Design

Schema Design

Videos Table:

video_id (PK)
title
description
duration
release_date
genre
rating
thumbnail_url
created_at

Video_Qualities Table:

video_id (FK)
quality (480p, 720p, 1080p, 4K)
codec (H264, VP9, AV1)
bitrate
manifest_url (HLS/DASH)
storage_path

Users Table:

user_id (PK)
email
username
subscription_tier
created_at

Watch_History Table:

user_id (FK)
video_id (FK)
watch_position (seconds)
last_watched_at
completed (boolean)

Recommendations Table:

user_id (FK)
video_id (FK)
score
generated_at

Database Choice

Video Metadata: PostgreSQL (complex queries, relationships) Watch History: Cassandra (high write volume, time-series data) Recommendations: Redis (fast access, temporary data)

Search and Discovery

Search Architecture

Full-Text Search: Elasticsearch for video search Index Fields: Title, description, cast, genre, tags

Search Features:

  • Fuzzy matching
  • Autocomplete
  • Faceted search (filter by genre, year, etc.)
  • Relevance ranking

Recommendation System

Approach 1: Collaborative Filtering

How It Works: Recommend based on similar users’ preferences.

Pros:

  • Works without content analysis
  • Discovers unexpected content

Cons:

  • Cold start problem (new users/items)
  • Popularity bias

Approach 2: Content-Based Filtering

How It Works: Recommend based on content similarity.

Pros:

  • No cold start for content
  • Explainable recommendations

Cons:

  • Limited diversity
  • Requires content analysis

Approach 3: Hybrid

How It Works: Combine collaborative and content-based.

Pros:

  • Best of both approaches
  • Handles edge cases

Cons:

  • More complex

Decision: Use hybrid approach for best results.

Recommendation Pipeline:

User Profile → Feature Extraction → Similarity Calculation → Ranking → Top N Recommendations

Features:

  • Watch history
  • Ratings
  • Genre preferences
  • Time of day preferences
  • Device type

Scalability Patterns

Horizontal Scaling

Stateless API Servers: Scale horizontally Load Balancing: Distribute requests across servers Database Sharding: Partition data by video_id or user_id

Caching Strategy

Multi-Level Caching:

  1. CDN Cache: Edge locations
  2. Application Cache: Redis for metadata
  3. Database Cache: Query result caching

Cache Invalidation:

  • TTL-based expiration
  • Event-based invalidation
  • Version-based keys

Content Pre-positioning

Strategy: Pre-cache popular content at edge locations.

Popularity Prediction:

  • Historical views
  • Trending algorithms
  • Regional preferences
  • Time-based patterns

Pre-positioning Algorithm:

if (content.popularity > threshold && edge.cache_space_available):
    pre_cache(content, edge)

Real-World Implementations

Netflix Architecture

CDN: Own CDN (Open Connect) + ISPs Transcoding: Distributed transcoding farm Storage: S3-compatible object storage Protocol: HLS and DASH Adaptive Algorithm: Proprietary client-side algorithm Recommendations: Machine learning-based hybrid system

Amazon Prime Video

CDN: CloudFront Transcoding: AWS Elemental MediaConvert Storage: S3 Protocol: HLS Adaptive Algorithm: Client-side with CloudFront metrics

YouTube

CDN: Google’s global CDN Transcoding: Distributed transcoding Storage: Google Cloud Storage Protocol: DASH primarily Adaptive Algorithm: Advanced client-side algorithm with server hints

Trade-offs Summary

CDN: Own vs Commercial:

  • Own: Full control, high investment
  • Commercial: Managed, cost at scale

Transcoding: Centralized vs Distributed:

  • Centralized: Simpler, harder to scale
  • Distributed: Scalable, more complex

Segment Size: Small vs Large:

  • Small: Faster adaptation, more overhead
  • Large: Less overhead, slower adaptation

Adaptive Algorithm: Client vs Server:

  • Client: Simple, works offline
  • Server: Better decisions, more complex

Conclusion

Designing a video streaming system requires expertise across multiple domains: video processing, CDN architecture, adaptive streaming, and recommendation systems. The key is optimizing for user experience while managing costs and complexity.

Key decisions:

  1. Chunked upload for large files
  2. Distributed transcoding with auto-scaling
  3. Commercial CDN for edge delivery
  4. HLS and DASH protocols
  5. Client-side adaptive algorithm with server hints
  6. Hybrid recommendation system
  7. Multi-level caching strategy

By understanding these components and making informed trade-offs, we can build streaming systems that scale to millions of users while delivering high-quality video experiences.