👽
Software Engineer Interview Handbook
  • README
  • Behavioral
    • Useful Links
    • Dongze Li
  • Algorithm
    • Segment Tree
    • Array
      • Product Of Array Except Self
      • Merge Strings Alternately
      • Increasing Triplet Subsequence
      • String Compression
      • Greatest Common Divisor Strings
      • Max Product Of Three
      • Find Duplicate Num
      • Valid Palindrome Ii
      • Next Permutation
      • Rearrange Array By Sign
      • Removing Min Max Elements
      • Find Original Array From Doubled
      • Reverse Words Ii
    • Backtracking
      • Letter Combination Phone Number
      • Combination Sum Iii
      • N Queens
      • Permutations
      • Combination Sum
    • Binary Search
      • Koko Eating Bananas
      • Find Peak Element
      • Successful Pairs Of Spells Potions
    • Binary Search Tree
      • Delete Node In BST
      • Validate Bst
      • Range Sum Bst
    • Binary Tree
      • Maximum Depth
      • Leaf Similar Trees
      • Maximum Level Sum
      • Binary Tree Right Side
      • Lowest Common Ancestor
      • Longest Zigzag Path
      • Count Good Nodes
      • Path Sum III
      • Maximum Path Sum
      • Move Zero
      • Diameter Binary Tree
      • Sum Root Leaf Number
      • Traversal
      • Binary Tree Vertical Order
      • Height Tree Removal Queries
      • Count Nodes Avg Subtree
      • Distribute Coins
      • Binary Tree Max Path Sum
    • Bit
      • Min Flips
      • Single Number
      • Pow
      • Find Unique Binary Str
    • BFS
      • Rotten Oranges
      • Nearest Exist From Entrance
      • Minimum Knight Moves
      • Network Delay Time
      • Minimum Height Tree
      • Knight Probability In Board
    • Design
      • LRU Cache
      • Get Random
      • LFU Cache
      • Moving Average
      • Rle Iterator
      • Design Hashmap
    • DFS
      • Reorder Routes Lead City
      • Evaluate Division
      • Keys And Rooms
      • Number Of Provinces
      • Disconnected Path With One Flip
      • Course Schedule Ii
      • Robot Room Cleaner
      • Word Break Ii
      • Number Coins In Tree Nodes
      • Maximum Increasing Cells
      • Number Coins In Tree Nodes
      • Detonate Maximum Bombs
      • Find All Possible Recipes
      • Min Fuel Report Capital
      • Similar String Groups
    • DP
      • Domino And Tromino Tiling
      • House Robber
      • Longest Common Subsequence
      • Trade Stock With Transaction Fee
      • Buy And Sell Stock
      • Longest Non Decreasing Subarray
      • Number Of Good Binary Strings
      • Delete And Earn
      • Minimum Costs Using Train Line
      • Decode Ways
      • Trapping Rain Water
      • Count Fertile Pyramids
      • Minimum Time Finish Race
      • Knapsack
      • Count Unique Char Substrs
      • Count All Valid Pickup
    • Greedy
      • Dota2 Senate
      • Smallest Range Ii
      • Can Place Flowers
      • Meeting Rooms II
      • Guess the word
      • Minimum Replacement
      • Longest Palindrome Two Letter Words
      • Parentheses String Valid
      • Largest Palindromic Num
      • Find Missing Observations
      • Most Profit Assigning Work
    • Hashmap
      • Equal Row Column Pairs
      • Two Strings Close
      • Group Anagrams
      • Detect Squares
    • Heap
      • Maximum Subsequence Score
      • Smallest Number Infinite Set
      • Total Cost Hire Workers
      • Kth Largest Element
      • Meeting Rooms III
      • K Closest Points Origin
      • Merge K Sorted List
      • Top K Frequent Elements
      • Meeting Room III
      • Num Flowers Bloom
      • Find Median From Stream
    • Intervals
      • Non Overlapping Intervals
      • Min Arrows Burst Ballons
    • Linkedlist
      • Reverse Linked List
      • Delete Middle Node
      • Odd Even Linkedlist
      • Palindrome Linkedlist
    • Monotonic Stack
      • Daily Temperatures
      • Online Stock Span
    • Random
      • Random Pick With Weight
      • Random Pick Index
      • Shuffle An Array
    • Recursion
      • Difference Between Two Objs
    • Segment Fenwick
      • Longest Increasing Subsequence II
    • Stack
      • Removing Stars From String
      • Asteroid Collision
      • Evaluate Reverse Polish Notation
      • Building With Ocean View
      • Min Remove Parentheses
      • Basic Calculator Ii
      • Simplify Path
      • Min Add Parentheses
    • Prefix Sum
      • Find The Highest Altitude
      • Find Pivot Index
      • Subarray Sum K
      • Range Addition
    • Sliding Window
      • Max Vowels Substring
      • Max Consecutive Ones III
      • Longest Subarray Deleting Element
      • Minimum Window Substring
      • K Radius Subarray Averages
    • String
      • Valid Word Abbreviations
    • Two Pointers
      • Container With Most Water
      • Max Number K Sum Pairs
      • Is Subsequence
      • Num Substrings Contains Three Char
    • Trie
      • Prefix Tree
      • Search Suggestions System
      • Design File System
    • Union Find
      • Accounts Merge
    • Multithreading
      • Basics
      • Web Crawler
  • System Design
    • Operating System
    • Mocks
      • Design ChatGPT
      • Design Web Crawler
      • Distributed Search
      • News Feed Search
      • Top K / Ad Click Aggregation
      • Design Job Scheduler
      • Distributed Message Queue
      • Google Maps
      • Nearby Friends
      • Proximity Service
      • Metrics monitoring and alert system
      • Design Email
      • Design Gaming Leaderboard
      • Facebook New Feed Live Comments
      • Dog Sitting App
      • Design Chat App (WhatsApp)
      • Design Youtube/Netflix
      • Design Google Doc
      • Design Webhook
      • Validate Instacart Shopper Checkout
      • Design Inventory
      • Design donation app
      • Design Twitter
    • Deep-Dive
      • Back of Envelope
      • Message Queue
      • Redis Sorted Set
      • FAQ
      • Geohash
      • Quadtree
      • Redis Pub/Sub
      • Cassandra DB
      • Collaborative Concurrency Control
      • Websocket / Long Polling / SSE
    • DDIA
      • Chapter 2: Data Models and Query Languages
      • Chapter 5: Replication
      • Chapter 9: Consistency and Consensus
  • OOD
    • Overview
    • Design Parking
  • Company Tags
    • Meta
    • Citadel
      • C++ Fundamentals
      • 面经1
      • Fibonacci
      • Pi
      • Probability
    • DoorDash
      • Similar String Groups
      • Door And Gates
      • Max Job Profit
      • Design File System
      • Count All Valid Pickup
      • Most Profit Assigning Work
      • Swap
      • Binary Tree Max Path Sum
      • Nearest Cities
      • Exployee Free Time
      • Tree Add Removal
    • Lyft
      • Autocomplete
      • Job Scheduler
      • Read4
      • Kvstore
    • Amazon
      • Min Binary Str Val
    • AppLovin
      • TODO
      • Java Basic Questions
    • Google
      • Huffman Tree
      • Unique Elements
    • Instacart
      • Meeting Rooms II
      • Pw
      • Pw2
      • Pw3
      • Expression1
      • Expression2
      • Expression3
      • PW All
      • Expression All
      • Wildcard
      • Free forum tech discussion
    • OpenAI
      • Spreadsheet
      • Iterator
      • Kv Store
    • Rabbit
      • Scheduler
      • SchedulerC++
    • [Microsoft]
      • Min Moves Spread Stones
      • Inorder Successor
      • Largest Palindromic Num
      • Count Unique Char Substrs
      • Reverse Words Ii
      • Find Missing Observations
      • Min Fuel Report Capital
      • Design Hashmap
      • Find Original Array From Doubled
      • Num Flowers Bloom
      • Distribute Coins
      • Find Median From Stream
Powered by GitBook
On this page
  • Topics:
  • Functional Requirement
  • Non-functional Requirement:
  • High Level Design
  • API
  • Data Schema
  • Scale
  • How to deduplicate video?
  • Adaptive Streaming
  • Recommendation
  1. System Design
  2. Mocks

Design Youtube/Netflix

Topics:

  1. How to design schema for video segment? for partitioning.

  2. CDN and Redis cache usage?

  3. How to achieve low latency?

  4. How to de-duplicate videos?

  5. How to implement search?

  6. Mention Vitess database abstraction layer.

  7. Adaptive streaming?

  8. How to do video recommendation?

Functional Requirement

  1. Users can view video.

  2. Users can see a list of recommended video on homepage.

  3. Users can search video based on keyword.

  4. Users can upload videos.

Optional:

  1. Uploaded video can reviewed and censored.

  2. Like and dislike videos

  3. Add comments to videos.

Non-functional Requirement:

  1. Low latency

  2. Highly available

  3. Scalable

  4. Fault tolerance.

  5. Availability > consistency

  6. Read > Write

High Level Design

  1. Client send a GET request to API Gateway for fetching videos.

  2. request get routed to Video service

  3. video service determines the user id for the client and fetch recommended videos from DB.

  4. video service send back the response with a list of videos.

  5. client be able to see videos in the client page.

API

/v1/recommendation GET

/v1/video GET (stream video)
Request 
json {
  screen_resolution,
  user_bitrate: determine quality of video chunks,
}
Response: 
json {
  "clips": [
    "clip0": {
    
    },
    "clip1": {
    
    }
    ...
  ],
}

/v1/video POST (Upload Video)
Request
json {
  user_id
  video_file: video file for upload
  category_id: "Entertainment", "Engineering" or "Science"
  title,
  description,
  tags
  default_language
  privacy_settings: PUBLIC, PRIVATE etc.
}

Response
201 OK

/v1/video PUT

/v1/video DELETE

/v1/video/feedback POST (thumbup or thumbdown)

Data Schema

User Table
id
email
username
pw
dob

Video Table
id
title
description
upload_date
channel_id
likes_count
dislikes_count
views_count
video_URI
privacy_level
default_lang
first_clip_id
clips_mapping: {0: clipid0, 1: clipid1...}

Channel Table
id
channel_name
user_id
subscribers
description
category_id

Recommendation Table

Clip Table
clip_id (primary key)
clip_offset (sort key)
next_clip_id
video_id
author_id
clip_path: this can be a S3 path where we have different encoding and resolution url there.

Clip Encodings (Store it in CDN storage)
clip_id
encoded_content (binary format)

Comments Table
id
video_id
user_id
posted_date
comment_text
likes_count
dislikes_count

Category Table

Scale

2B DAU, how many new videos a day

QPS: 20B requests = 20*10^9 / 10^5 = 20*10^4 = 200k QPS

Storage:

Avg video length: 5 mins

Size before compression: 600MB

Size after compression: 30MB

500 hrs video is uploaded every minute

6MB to store a minute of video

total = 6*500*60 = 180000 MB per minute = 180GB per minute = 180*24*60 = 259 TB per day = 94TB per year.

Bandwidth

500 hr/min * 60 min * 120mb/min * 8 bits / 60s = 480Gbps

  1. Use dynamo DB for large read requests and partition data easily with schema key design

DynamoDB Limitations: 1000WCU/s, 3000RCU/s.

  1. Use Redis cache to store frequent read data, viral video posted by big influencers.

How to deduplicate video?

Assume 50 out of 500 hours of videos uploaded to Youtube are duplicates. Considering the one minute of video requires 6MB of storage space, the duplicated content will take up following storage space:

(50*60) mins * 6MB/min = 18GB

If we avoid video duplication, we can save up to 9.5 perabytes of storage space.

There is also copyright issue, No content creator would want their content plagiarized.

Options:

  1. Locality-sensitve hashing.

  2. Block matching algorithms, phase correlation

  3. AI

Ateliere's proprietary FrameDNA™ AI/ML technology revolutionizes video management by fingerprinting each frame upon ingest. This allows for an accurate comparison of video files. This advanced technology not only helps in identifying duplicate content but also assists in detecting any alterations or tampering within the video files. Additionally, the system's efficient storage management capabilities ensure that only the most relevant and original content is preserved, optimizing storage resources and reducing unnecessary duplication.

Adaptive Streaming

While the content is being served, the bandwidth of the user is also being monitored. Since the video is divided into chunks of different qualities, each video clip can be provided based on changing network conditions.

The adaptive bitrate algorithm can bsed on four parameters:

  1. End-to-end available bandwidth (from a CDN/servers to a specific client)

  2. The device capabilities of the user.

  3. Encoding techniques used.

  4. The buffer space at the client.

Recommendation

Youtube recommends video to user based on their profile, taking into account factors such as their interests, view and search history, subscribed channels, related topics to already viewed content and activities on content such as comments and likes.

Youtube filters videos in two phases:

  1. Candidate generation: millions of Youtube videos are filtered down to hundreds based on the user's history and current context.

  2. Ranking: The ranking phase rates videos based on their feature and according to the user's interests and history. Hundreds of videos are filtered and ranked down to a few dozen videos during the phase.

  1. Collaborative Filtering

A technique used in recommendation systems, works by predicting a user's interests based on preferences of many users.

User-based collaborative filtering: The approach recommends items by finding similar users. For example, if user X likes items A, B and C and user Y likes item A, B and D. The system infer that X might also like item D because Y likes it.

2005-2011: Optimizing for clicks & views

2012: Optimizing for watch time

2015-2016: Optimizing for satisfaction: Shares, likes and Dislikes, not interested button.

PreviousDesign Chat App (WhatsApp)NextDesign Google Doc

Last updated 1 year ago

https://blog.hootsuite.com/how-the-youtube-algorithm-works/
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45530.pdf
Drawing
Drawing