👽
Software Engineer Interview Handbook
  • README
  • Behavioral
    • Useful Links
    • Dongze Li
  • Algorithm
    • Segment Tree
    • Array
      • Product Of Array Except Self
      • Merge Strings Alternately
      • Increasing Triplet Subsequence
      • String Compression
      • Greatest Common Divisor Strings
      • Max Product Of Three
      • Find Duplicate Num
      • Valid Palindrome Ii
      • Next Permutation
      • Rearrange Array By Sign
      • Removing Min Max Elements
      • Find Original Array From Doubled
      • Reverse Words Ii
    • Backtracking
      • Letter Combination Phone Number
      • Combination Sum Iii
      • N Queens
      • Permutations
      • Combination Sum
    • Binary Search
      • Koko Eating Bananas
      • Find Peak Element
      • Successful Pairs Of Spells Potions
    • Binary Search Tree
      • Delete Node In BST
      • Validate Bst
      • Range Sum Bst
    • Binary Tree
      • Maximum Depth
      • Leaf Similar Trees
      • Maximum Level Sum
      • Binary Tree Right Side
      • Lowest Common Ancestor
      • Longest Zigzag Path
      • Count Good Nodes
      • Path Sum III
      • Maximum Path Sum
      • Move Zero
      • Diameter Binary Tree
      • Sum Root Leaf Number
      • Traversal
      • Binary Tree Vertical Order
      • Height Tree Removal Queries
      • Count Nodes Avg Subtree
      • Distribute Coins
      • Binary Tree Max Path Sum
    • Bit
      • Min Flips
      • Single Number
      • Pow
      • Find Unique Binary Str
    • BFS
      • Rotten Oranges
      • Nearest Exist From Entrance
      • Minimum Knight Moves
      • Network Delay Time
      • Minimum Height Tree
      • Knight Probability In Board
    • Design
      • LRU Cache
      • Get Random
      • LFU Cache
      • Moving Average
      • Rle Iterator
      • Design Hashmap
    • DFS
      • Reorder Routes Lead City
      • Evaluate Division
      • Keys And Rooms
      • Number Of Provinces
      • Disconnected Path With One Flip
      • Course Schedule Ii
      • Robot Room Cleaner
      • Word Break Ii
      • Number Coins In Tree Nodes
      • Maximum Increasing Cells
      • Number Coins In Tree Nodes
      • Detonate Maximum Bombs
      • Find All Possible Recipes
      • Min Fuel Report Capital
      • Similar String Groups
    • DP
      • Domino And Tromino Tiling
      • House Robber
      • Longest Common Subsequence
      • Trade Stock With Transaction Fee
      • Buy And Sell Stock
      • Longest Non Decreasing Subarray
      • Number Of Good Binary Strings
      • Delete And Earn
      • Minimum Costs Using Train Line
      • Decode Ways
      • Trapping Rain Water
      • Count Fertile Pyramids
      • Minimum Time Finish Race
      • Knapsack
      • Count Unique Char Substrs
      • Count All Valid Pickup
    • Greedy
      • Dota2 Senate
      • Smallest Range Ii
      • Can Place Flowers
      • Meeting Rooms II
      • Guess the word
      • Minimum Replacement
      • Longest Palindrome Two Letter Words
      • Parentheses String Valid
      • Largest Palindromic Num
      • Find Missing Observations
      • Most Profit Assigning Work
    • Hashmap
      • Equal Row Column Pairs
      • Two Strings Close
      • Group Anagrams
      • Detect Squares
    • Heap
      • Maximum Subsequence Score
      • Smallest Number Infinite Set
      • Total Cost Hire Workers
      • Kth Largest Element
      • Meeting Rooms III
      • K Closest Points Origin
      • Merge K Sorted List
      • Top K Frequent Elements
      • Meeting Room III
      • Num Flowers Bloom
      • Find Median From Stream
    • Intervals
      • Non Overlapping Intervals
      • Min Arrows Burst Ballons
    • Linkedlist
      • Reverse Linked List
      • Delete Middle Node
      • Odd Even Linkedlist
      • Palindrome Linkedlist
    • Monotonic Stack
      • Daily Temperatures
      • Online Stock Span
    • Random
      • Random Pick With Weight
      • Random Pick Index
      • Shuffle An Array
    • Recursion
      • Difference Between Two Objs
    • Segment Fenwick
      • Longest Increasing Subsequence II
    • Stack
      • Removing Stars From String
      • Asteroid Collision
      • Evaluate Reverse Polish Notation
      • Building With Ocean View
      • Min Remove Parentheses
      • Basic Calculator Ii
      • Simplify Path
      • Min Add Parentheses
    • Prefix Sum
      • Find The Highest Altitude
      • Find Pivot Index
      • Subarray Sum K
      • Range Addition
    • Sliding Window
      • Max Vowels Substring
      • Max Consecutive Ones III
      • Longest Subarray Deleting Element
      • Minimum Window Substring
      • K Radius Subarray Averages
    • String
      • Valid Word Abbreviations
    • Two Pointers
      • Container With Most Water
      • Max Number K Sum Pairs
      • Is Subsequence
      • Num Substrings Contains Three Char
    • Trie
      • Prefix Tree
      • Search Suggestions System
      • Design File System
    • Union Find
      • Accounts Merge
    • Multithreading
      • Basics
      • Web Crawler
  • System Design
    • Operating System
    • Mocks
      • Design ChatGPT
      • Design Web Crawler
      • Distributed Search
      • News Feed Search
      • Top K / Ad Click Aggregation
      • Design Job Scheduler
      • Distributed Message Queue
      • Google Maps
      • Nearby Friends
      • Proximity Service
      • Metrics monitoring and alert system
      • Design Email
      • Design Gaming Leaderboard
      • Facebook New Feed Live Comments
      • Dog Sitting App
      • Design Chat App (WhatsApp)
      • Design Youtube/Netflix
      • Design Google Doc
      • Design Webhook
      • Validate Instacart Shopper Checkout
      • Design Inventory
      • Design donation app
      • Design Twitter
    • Deep-Dive
      • Back of Envelope
      • Message Queue
      • Redis Sorted Set
      • FAQ
      • Geohash
      • Quadtree
      • Redis Pub/Sub
      • Cassandra DB
      • Collaborative Concurrency Control
      • Websocket / Long Polling / SSE
    • DDIA
      • Chapter 2: Data Models and Query Languages
      • Chapter 5: Replication
      • Chapter 9: Consistency and Consensus
  • OOD
    • Overview
    • Design Parking
  • Company Tags
    • Meta
    • Citadel
      • C++ Fundamentals
      • 面经1
      • Fibonacci
      • Pi
      • Probability
    • DoorDash
      • Similar String Groups
      • Door And Gates
      • Max Job Profit
      • Design File System
      • Count All Valid Pickup
      • Most Profit Assigning Work
      • Swap
      • Binary Tree Max Path Sum
      • Nearest Cities
      • Exployee Free Time
      • Tree Add Removal
    • Lyft
      • Autocomplete
      • Job Scheduler
      • Read4
      • Kvstore
    • Amazon
      • Min Binary Str Val
    • AppLovin
      • TODO
      • Java Basic Questions
    • Google
      • Huffman Tree
      • Unique Elements
    • Instacart
      • Meeting Rooms II
      • Pw
      • Pw2
      • Pw3
      • Expression1
      • Expression2
      • Expression3
      • PW All
      • Expression All
      • Wildcard
      • Free forum tech discussion
    • OpenAI
      • Spreadsheet
      • Iterator
      • Kv Store
    • Rabbit
      • Scheduler
      • SchedulerC++
    • [Microsoft]
      • Min Moves Spread Stones
      • Inorder Successor
      • Largest Palindromic Num
      • Count Unique Char Substrs
      • Reverse Words Ii
      • Find Missing Observations
      • Min Fuel Report Capital
      • Design Hashmap
      • Find Original Array From Doubled
      • Num Flowers Bloom
      • Distribute Coins
      • Find Median From Stream
Powered by GitBook
On this page
  • Topics
  • Functional Requirements
  • Non-functional Requirement
  • Scale
  • QPS
  • Data
  • API
  • Data Schema
  • High Level Diagram
  • Deep Dive
  • How to deal with thundering herd problem?
  • Redis
  • How to do censorship for tweets?
  1. System Design
  2. Mocks

Design Twitter

Topics

  1. SQL vs NoSQL

  2. How to do index on db schema?

  3. How to do pagination on news feed?

  4. 随着user, requests, data size increase, how to scale the system?

  5. celebrity: different celebrity data pattern diffs a lot, do we need different message queue to fan out?

  6. How to deal with thundering herd problem?

  7. How to validate and limit malicious user behavior (send 100 tweets in one minute?)

  8. How to block sensitive words in tweet?

  9. How to design hash tag?

Functional Requirements

  1. Post tweets: registered user can post one or more tweets.

  2. View user or home timeline.

  3. Delete tweets: can delete one or more tweets on twitter.

  4. Follow or unfollow.

  5. Like/dislike

  6. Reply to tweet.

  7. Retweet.

  8. Search tweet.

  9. Hashtags.

  10. Do we support media types like video or images?

Non-functional Requirement

  1. Highly available

  2. Low latency

  3. read, write ratio: 10000:1

  4. Eventual consistency, availability > consistency.

Scale

QPS

100M active users -> 500M tweets per day:

Each tweet averages a fanout of 10 deliveries -> 5B total tweets delivered on fanout each day.

10B read requests per day -> 10^4 QPS.

10B search per month.

Data

Tweet id: 8 bytes
user id: 32 bytes
text: 140 bytes
total: 1KB

1KB * 500M tweets per day * 30 days
0.5PB in three years.

API

Post tweets
POST /v1/tweet
Request:
{
  user_id
  content
  access_type: private, public etc
  tweet_type? video,image?
  media_url?
}

View home timeline
GET /v1/home
Request: 
{
  user_id,
  page_number, // pagination
  page_size,
  user_location,
}

POST /v1/follow (unfollow)
Request {
  user_id,
  followed_user_id,
}

GET /v1/search

Data Schema

Dynamo DB

User Table
{
  user_id, (partition_key)
  profile_url,
  bio,
  userhandle,
  encrypted_pw,
  email,
}

Tweets Table
{
  tweet_id, (partition_key)
  user_id,
  content,
  created_at, (sort_key)
  is_deleted,
}

Follow Table
{
  following_id (partition_key)
  follower_id
  followee_id
  updated_at (sort_key)
}

HashTag table {
  hashtag_id
  name
  reference_counter
}

HashTagRef Table {

}

High Level Diagram

  1. Client send request, either post a tweet or view home timeline.

  2. Web server read and write from DB.

  3. Return the response back to client.

Single point of failures:

  1. Web servers

  2. DB

Posting a tweet:

  1. user send post tweet request load balancer, it get routes onto one of the web servers.

  2. web server write this record onto DB.

Viewing a tweet:

  1. user send read request for home timeline.

  2. web server read tweets from DB.

  3. Same tweets might be read multiple times by different followers.

userid: listOfFollowers
timeline cache, userId: listOfTweets

More read burden on DB. So we can fanout during the write phase for posting tweet:

  1. Each user store a list of followers in the cache.

  2. We push the tweets into user timeline inbox in the cache.

Posting a tweet:

  1. user send post tweet request load balancer, it get routes onto one of the web servers.

  2. web server write this record onto DB.

  3. Fanout to store tweets in each follower's tweet list cache.

Viewing a tweet:

  1. user send read request for home timeline.

  2. read from redis first to see if there are any tweets.

  3. it not enough, we read tweets from DB.

  4. Same tweets might be read multiple times by different followers.

Pros:

  1. We reduce load on read by doing fanout.

Cons:

  1. If the tweet poster is a celebrity, the fanout overhead is huge.

  2. We need Redis to scale.

Celebrity case:

Instead of fanout, we store some cache for celebrity in Redis as well:

Celebrity tweets, userId: listOfTweets
userId: listOfCelebritiesIdTheUserIsFollowing
userId: isCelebrity

Deep Dive

How to deal with thundering herd problem?

Cascading failure: Some server crashed, and other servers have to take on additional load, but they might not be able to handle it either so all of them crashed one by one.

Rate limiting: Token Bucket/ Leaky Bucket/Sliding Window

For each request we can categorize the user by maintaining an in-memory map or in redis, with user as key and a token. On each request from the same user the token is decreased by one and when it reaches zero we throw temporary error to user.

It might be space intensive if there are too many users...

Context: Some celebrity tweets trends, difficult to determine the load beforehand. Auto scale might not work either.

Redis

userid: listOfFollowers
timeline cache, userId: listOfTweets
-------------------------------------
Celebrity tweets, userId: listOfTweets
userId: listOfCelebritiesIdTheUserIsFollowing
userId: isCelebrity
  • Keep only several hundred tweets for each home timeline in the Memory Cache

  • Keep only active users' home timeline info in the Memory Cache

    • If a user was not previously active in the past 30 days, we could rebuild the timeline from the SQL Database

      • Query the User Graph Service to determine who the user is following

      • Get the tweets from the SQL Database and add them to the Memory Cache

How to update cache?

  1. cache aside

application is responsible for reading and writing from storage.

  1. write-through

application uses the cache as the main data store, reading and writing data to it without interacting with db.

Pros: read are fast.

Cons:

  • write-through is a slow overall operation due to write operation.

  • Most data written might never be read, which can be minimized by TTL config.

  1. write-behind

  2. refresh ahead

How to do censorship for tweets?

  1. We can train a model or use some existing models to do that?

  2. GPT 3.5 API call?

PreviousDesign donation appNextDeep-Dive

Last updated 1 year ago

Drawing
Drawing
Drawing