Design Chat App (WhatsApp)
Topics:
Pros and cons of HTTP vs Websocket
How to scale Redis Pub/Sub?
How to guarantee message delivery?
What DB to use and why?
Functional Requirement
User can send message, receive message.
User can join group chat.
Message delivery acknowledgement: sent, delivered and read.
Push notification
User status: whether use are online or offline.
Optional:
Do we support sending images/files?
Do we support recall a message?
Do we support group chat?
How to add friends?
Non-functional requirement
Low latency
Highly available
Consistency: messages should be delivered in the order they were sent. Users must see the same chat history on all devices.
High Level Design
E2E
User A and user B create communication between clients.
User A send a message to chat server.
Chat server acknowledge back to user A
Chat server sends the message to user B and stores message in the DB.
User B sends an acknowledgement to chat server.
Chat server notifies user A message has been successfully delivered.
When user B reads the message, application notifies user A that B has read the message.
API
HTTP
Remove a conversation
View conversation history
Get conversation Detail
Get Friends List
WebSocket
Create a conversation
Send message
Acknowledgement Handler on client
Data Schema
Message Table
User Table
Conversation Table
Scale
2B users, 100B messages per day.
QPS: 100*10^9 / 10^5 = 100*10^4 = 1M QPS.
Storage:
100 bytes for a message
100B*100B = 10^13B = 10^10KB = 10^7MB = 10TB per day
We keep messages for 30 days = 300 TB per month.
Bandwidth:
10TB / 10^5 = 10*1000*1000MB / 10^5 = 1000MB / second
Number of servers:
WhatsApp handles 10M connections on a single server
2B / 10M = 200 servers
How to scale Redis Pub/Sub?
Modern Redis server capability:
100GB memory, gigabit network handle about 100,000 subscribers push.
max 10k connections.
1M QPS / 10^5 = 10 Redis.
2B Users -> 20B channels * 20 bytes = 400*10^9 bytes / GB = 400 GB
We need 4 Redis servers with each Redis server has 100GB.
How to maintain message ordering?
the message might be sent
To adjust incorrect device clocks, one approach is to log three timestamps:
The time at which the event occurred, according to the device lock.
The time at which the event was sent to the server, according to device clock.
The time at which the event was received by the server, according to server clock.
offset = 3-2
real time = 1+offset
Decentralized approach regarding pubsub message delivery (at-most-once), Presence(join room, leave room) and caching..
Last updated