đź‘˝
Software Engineer Interview Handbook
  • README
  • Behavioral
    • Useful Links
    • Dongze Li
  • Algorithm
    • Segment Tree
    • Array
      • Product Of Array Except Self
      • Merge Strings Alternately
      • Increasing Triplet Subsequence
      • String Compression
      • Greatest Common Divisor Strings
      • Max Product Of Three
      • Find Duplicate Num
      • Valid Palindrome Ii
      • Next Permutation
      • Rearrange Array By Sign
      • Removing Min Max Elements
      • Find Original Array From Doubled
      • Reverse Words Ii
    • Backtracking
      • Letter Combination Phone Number
      • Combination Sum Iii
      • N Queens
      • Permutations
      • Combination Sum
    • Binary Search
      • Koko Eating Bananas
      • Find Peak Element
      • Successful Pairs Of Spells Potions
    • Binary Search Tree
      • Delete Node In BST
      • Validate Bst
      • Range Sum Bst
    • Binary Tree
      • Maximum Depth
      • Leaf Similar Trees
      • Maximum Level Sum
      • Binary Tree Right Side
      • Lowest Common Ancestor
      • Longest Zigzag Path
      • Count Good Nodes
      • Path Sum III
      • Maximum Path Sum
      • Move Zero
      • Diameter Binary Tree
      • Sum Root Leaf Number
      • Traversal
      • Binary Tree Vertical Order
      • Height Tree Removal Queries
      • Count Nodes Avg Subtree
      • Distribute Coins
      • Binary Tree Max Path Sum
    • Bit
      • Min Flips
      • Single Number
      • Pow
      • Find Unique Binary Str
    • BFS
      • Rotten Oranges
      • Nearest Exist From Entrance
      • Minimum Knight Moves
      • Network Delay Time
      • Minimum Height Tree
      • Knight Probability In Board
    • Design
      • LRU Cache
      • Get Random
      • LFU Cache
      • Moving Average
      • Rle Iterator
      • Design Hashmap
    • DFS
      • Reorder Routes Lead City
      • Evaluate Division
      • Keys And Rooms
      • Number Of Provinces
      • Disconnected Path With One Flip
      • Course Schedule Ii
      • Robot Room Cleaner
      • Word Break Ii
      • Number Coins In Tree Nodes
      • Maximum Increasing Cells
      • Number Coins In Tree Nodes
      • Detonate Maximum Bombs
      • Find All Possible Recipes
      • Min Fuel Report Capital
      • Similar String Groups
    • DP
      • Domino And Tromino Tiling
      • House Robber
      • Longest Common Subsequence
      • Trade Stock With Transaction Fee
      • Buy And Sell Stock
      • Longest Non Decreasing Subarray
      • Number Of Good Binary Strings
      • Delete And Earn
      • Minimum Costs Using Train Line
      • Decode Ways
      • Trapping Rain Water
      • Count Fertile Pyramids
      • Minimum Time Finish Race
      • Knapsack
      • Count Unique Char Substrs
      • Count All Valid Pickup
    • Greedy
      • Dota2 Senate
      • Smallest Range Ii
      • Can Place Flowers
      • Meeting Rooms II
      • Guess the word
      • Minimum Replacement
      • Longest Palindrome Two Letter Words
      • Parentheses String Valid
      • Largest Palindromic Num
      • Find Missing Observations
      • Most Profit Assigning Work
    • Hashmap
      • Equal Row Column Pairs
      • Two Strings Close
      • Group Anagrams
      • Detect Squares
    • Heap
      • Maximum Subsequence Score
      • Smallest Number Infinite Set
      • Total Cost Hire Workers
      • Kth Largest Element
      • Meeting Rooms III
      • K Closest Points Origin
      • Merge K Sorted List
      • Top K Frequent Elements
      • Meeting Room III
      • Num Flowers Bloom
      • Find Median From Stream
    • Intervals
      • Non Overlapping Intervals
      • Min Arrows Burst Ballons
    • Linkedlist
      • Reverse Linked List
      • Delete Middle Node
      • Odd Even Linkedlist
      • Palindrome Linkedlist
    • Monotonic Stack
      • Daily Temperatures
      • Online Stock Span
    • Random
      • Random Pick With Weight
      • Random Pick Index
      • Shuffle An Array
    • Recursion
      • Difference Between Two Objs
    • Segment Fenwick
      • Longest Increasing Subsequence II
    • Stack
      • Removing Stars From String
      • Asteroid Collision
      • Evaluate Reverse Polish Notation
      • Building With Ocean View
      • Min Remove Parentheses
      • Basic Calculator Ii
      • Simplify Path
      • Min Add Parentheses
    • Prefix Sum
      • Find The Highest Altitude
      • Find Pivot Index
      • Subarray Sum K
      • Range Addition
    • Sliding Window
      • Max Vowels Substring
      • Max Consecutive Ones III
      • Longest Subarray Deleting Element
      • Minimum Window Substring
      • K Radius Subarray Averages
    • String
      • Valid Word Abbreviations
    • Two Pointers
      • Container With Most Water
      • Max Number K Sum Pairs
      • Is Subsequence
      • Num Substrings Contains Three Char
    • Trie
      • Prefix Tree
      • Search Suggestions System
      • Design File System
    • Union Find
      • Accounts Merge
    • Multithreading
      • Basics
      • Web Crawler
  • System Design
    • Operating System
    • Mocks
      • Design ChatGPT
      • Design Web Crawler
      • Distributed Search
      • News Feed Search
      • Top K / Ad Click Aggregation
      • Design Job Scheduler
      • Distributed Message Queue
      • Google Maps
      • Nearby Friends
      • Proximity Service
      • Metrics monitoring and alert system
      • Design Email
      • Design Gaming Leaderboard
      • Facebook New Feed Live Comments
      • Dog Sitting App
      • Design Chat App (WhatsApp)
      • Design Youtube/Netflix
      • Design Google Doc
      • Design Webhook
      • Validate Instacart Shopper Checkout
      • Design Inventory
      • Design donation app
      • Design Twitter
    • Deep-Dive
      • Back of Envelope
      • Message Queue
      • Redis Sorted Set
      • FAQ
      • Geohash
      • Quadtree
      • Redis Pub/Sub
      • Cassandra DB
      • Collaborative Concurrency Control
      • Websocket / Long Polling / SSE
    • DDIA
      • Chapter 2: Data Models and Query Languages
      • Chapter 5: Replication
      • Chapter 9: Consistency and Consensus
  • OOD
    • Overview
    • Design Parking
  • Company Tags
    • Meta
    • Citadel
      • C++ Fundamentals
      • 面经1
      • Fibonacci
      • Pi
      • Probability
    • DoorDash
      • Similar String Groups
      • Door And Gates
      • Max Job Profit
      • Design File System
      • Count All Valid Pickup
      • Most Profit Assigning Work
      • Swap
      • Binary Tree Max Path Sum
      • Nearest Cities
      • Exployee Free Time
      • Tree Add Removal
    • Lyft
      • Autocomplete
      • Job Scheduler
      • Read4
      • Kvstore
    • Amazon
      • Min Binary Str Val
    • AppLovin
      • TODO
      • Java Basic Questions
    • Google
      • Huffman Tree
      • Unique Elements
    • Instacart
      • Meeting Rooms II
      • Pw
      • Pw2
      • Pw3
      • Expression1
      • Expression2
      • Expression3
      • PW All
      • Expression All
      • Wildcard
      • Free forum tech discussion
    • OpenAI
      • Spreadsheet
      • Iterator
      • Kv Store
    • Rabbit
      • Scheduler
      • SchedulerC++
    • [Microsoft]
      • Min Moves Spread Stones
      • Inorder Successor
      • Largest Palindromic Num
      • Count Unique Char Substrs
      • Reverse Words Ii
      • Find Missing Observations
      • Min Fuel Report Capital
      • Design Hashmap
      • Find Original Array From Doubled
      • Num Flowers Bloom
      • Distribute Coins
      • Find Median From Stream
Powered by GitBook
On this page
  • Why to use message queue?
  • Why to use Kafka vs SQS or other message brokers?
  • When to use Kafka?
  • When to use traditional message broker? SQS & ActiveMQ?
  • Kafka
  • Topics
  • Partitions and Offset
  • Brokers
  • Lag
  • Consumer Group
  • Vertical Scaling
  • Kafka Exactly Once Support
  • Push vs Pull?
  • Push
  • Pull
  1. System Design
  2. Deep-Dive

Message Queue

Why to use message queue?

  • Decoupling. Message queues eliminate the tight coupling between components so they can be updated independently.

  • Improved scalability. We can scale producers and consumers independently based on traffic load. For example, during peak hours, more consumers can be added to handle the increased traffic.

  • Increased availability. If one part of the system goes offline, the other components can continue to interact with the queue.

  • Better performance. Message queues make asynchronous communication easy. Producers can add messages to a queue without waiting for the response and consumers consume messages whenever they are available.

Why to use Kafka vs SQS or other message brokers?

JMS(Java Message Service) vs AMQP (Advanced Message Queuing Protocol)

Traditional message brokers: ActiveMQ, SQS, RabbitMQ

Log-based message brokers: Kafka, Amazon Kinesis.

Log-based message brokers uses write-ahead mechanism to persist logs in disk.

Pros:

  • It supports fan-out messaging (Multiple consumers subscribe to the same topic) naturally, because several consumers can independently read the log and it won't delete the log.

  • Ordering is guaranteed for nodes consuming on a single partition.

Cons:

  • The number of nodes sharing the work of consuming a topic can be at most the number of partitions in that topic. Because a node will have to consume all the messages within a partition, you can't split messages from a single partitions to multiple nodes.

Except, you can manually make one node only read odd numbers and another read only even numbered offset messages. Or use thread pool but that is complicated and not recommended.

  • If a single message is slow to process, it holds up the processing of subsequent messages in that partition.

When to use Kafka?

  • Message is fast to process.

  • Need some ordering guarantee.

When to use traditional message broker? SQS & ActiveMQ?

  • Message is expensive to process and want to parallelize processing on a message-by message basis.

  • Order is not important.

Kafka

Topics

Topics is a particular stream of data. It consists of one or more partitions, ordered, immutable sequences of messages.

Partitions and Offset

Topics are split in partitions. Each message within a partition get an incremental id, called offset.

Brokers

A Kafks cluster consists of one or more brokers. Partitions are spread across brokers, After connecting to any broker, you have connectivity to the entire cluster, you can basically request all brokers with partitions info.

Lag

A consumer is lagging when it's unable to keep up with producers messages. Lag is expressed as the number of offset that are behind the head of partition.

recover time = messages / (consume message per second - produce message per second)

Consumer Group

Consumers can be grouped together for a given topic for maximizing the read throughput. Each consumer in a group read from mutually exclusive partitions. The horizontal scaling on a consumer group is bounded by number of partitions.

We use multiple consumer group when we need to perform different operations on same topics. For example, some consumers might do some real-time analysis with data.

Vertical Scaling

Single-threaded ensures processing order guarantees.

AWS kinesis allow you to change batch size that can be sent to your lambda function.

Multi-threaded model:

  1. Offset might be committed before a record is processed by consumers.

  2. Message processing order can't be guaranteed since messages from the same partition could be processed in parallel.

Kafka Exactly Once Support

Use Kafka streams API:

“processing.guarantee” to “exactly_once” (default value is “at_least_once”)

when processing.guarantee is configured to exactly_once, Kafka Streams sets the internal embedded producer client with a transaction id to enable the idempotence and transactional messaging features, and also sets its consumer client with the read-committed mode to only fetch messages from committed transactions from the upstream producers.

Push vs Pull?

Push

Pros:

  1. Instant communication. Message get pushed as soon as they arrives.

  2. Client can be thinner since they don't need polling mechanism.

  3. Save network bandwidth.

Cons:

  1. Message maybe missed if network not reliable.

  2. Need firewall on client side.

  3. Consumer might be overwhelmed if it can't keep up with producer's speed.

Pull

Pros:

  1. No firewall needed for security

  2. Consumer control the consumption speed.

Cons:

  1. May have delays for actually processing the message.

  2. Consumer implementation is thicker for poll mechanism.

  3. More network bandwidth.

PreviousBack of EnvelopeNextRedis Sorted Set

Last updated 1 year ago

Exactly-once Semantics is Possible: Here's How Apache Kafka Does itConfluent
Logo
Enabling Exactly-Once in Kafka Streams | ConfluentConfluent
Logo