Dog Sitting App
Questions:
If this app is location based, how much is the radius we are considering?
For booking, what preferences/filters we are considering?
Do we care about payments?
Do we care about canceling the booking mechanism?
Do we deal with logistic throughout entire dogsitting session?
Do we consider differnt dog sitting types like boarding, drop-in or dog-walking?
Functional Requirement
Dog owner will be able to see dog sitters nearby, ideally within a radius?
Dog owner will be able to finish bookings.
Dog sitter will be able to see bookings from dog owners.
Non-functional Requirement
Low latency
Highly available
High concurrency, a lot of dog owners might try to book the same dog sitter.
High Level Design
Reservation Service
Receives reservation request and reserves dog sitters. It also tracks for listing inventory as listings are reserved or reservations are cancelled.
Dogsitter and dogowner can both cancel a reservation, view upcoming reservations etc.
Listing Service
Search a list of listings based on time, location.
Listing Management Service
Create, update, and delete listings for dogowner.
Scale
How many dogsitters and how many dogowners?
1.5M dog owners. 1M dog sitters.
How many bookings per day?
1M*24*0.2 = 4M
4*10^6 / 10^5 = 40 QPS not very high = peak = 80 QPS.
Reserve dogsitters: 80 QPS -> Order book page 800 QPS -> View listings 8000 QPS.
Data Schema
Listing table:
id
dogsitter_id
type: boarding, dog_walking, drop_in
description:
date
start_time
end_time
is_available
version
Reservation table:
id
dogsitter_id
dogowner_id
payout
start_time
end_time
status: Pending, Cancelled, Rejected, Paid -> Refunded
Rate table:
id
listing_id (PK)
date (PK)
rate
Profile table:
user_id
profile_pic
location
review
description
Geohash_index table
user_id, geohash
API
How to avoid double booking?
The same user clicks on book button multiple times.
Multiple users try to book same room at the same time.
Same user
Client side implementation. On the website we can disable the button once the booking request is sent. This approach is not very reliable since user can maybe disable javascript bypassing client check.
Idempotent API. Add an idempotency key in the reservation API request. We can assign a idempotency key reservation_id to avoid double reservation.
Different user
User1 and User2 both check the listing, they both see the same listing available, they may or maynot click the booking button at the same time but both of their requests will go through. We will end up with two reservation records with the same listing.
The isolation property in ACID means database transactions must complet their tasks independently from other transactions. So data changes made by transaction 1 are not visible to transaction 2 until transaction 1 is completed.
Option 1: Pessimistic Locking
Pessimistic concurrency control, prevents simultaneous updates by placing a lock on a record as soon as one user starts to update it. Other users who attempt to update the record have to wait until the first user has released the lock.
For MySQL:
works by locking the rows returned by a selection query.
Pros:
Prevents applications from updating data that is being changed.
Easy to implement and it avoids conflicts by serializing updates.
Useful for write heavy application when data-contention is heavy.
Cons:
Deadlocks may occur when multiple resources are locked. Writing deadlock-free application code could be challenging.
If a transaction is locked for too long, other transactions cannot access the resource. This has a significant impact on database performance, especially when transactions are long lived.
Option 2: Optimistic Locking
Optimistic concurrency control, allows multiple concurrent users to attempt to update the same resource.
A new column called version is added to the database table.
Before a user modifies the database row, the application read the version number of the row.
When the user updates the row, the application increases the version number by 1 and write back to db.
A database validation check is put in place: the next version number should exceed the current version number by 1. Otherwise, validation fails and user tries again from step 2.
Pros:
Prevents applications from updating stale data.
We don't need to lock db resources. version number control is on application level.
Optimistic lock is good when data-contention and concurrency is low.
Cons:
Poor performance when data contention is heavy.
Why? When there are a lot of clients try to reserve the same hotel room, only one of them will succeed and the rest of client requests will have to retry.
Option 3: Database constraints
Instead of using is_available, we can use availablity initialized to 1, and we add the following DB constraint:
Pros:
Easy to implement, works well when data contention is low.
Cons:
The db constraint cannot be version controlled easily like application code.
Not all db support constraints, if we do data migration in the future, it might cause problems.
How to hold reservation for 10 minutes?
Use pessimistic lock with a timeout of 10 minutes?
Scale
Most of the components here are stateless, we can simply scale by adding more servers. but DB contains all the states so it might become the bottleneck.
A single MySQL server can handle roughly 2000 read QPS. modern server can have 512 GB RAM.
Geohash index table
The memory is only around 2GB so it can fit on a single MySQL server, we can add read replicas to handle the large amount of read QPS.
Listing table
Size of the table: # dogsitter * 3 types of listing * 365 = 100k*3*365 = 109,500,000 = 109M rows, this can fit into a single server but it would become single point of failures we can add replications across multiple regions and availability zones.
What if data is growing 100 times and it cannot fit in one server. Suppose we have 70*1000 = 70k QPS. We can shard the table by listing_id to 32 shards, and each shard will have 2000 QPS which is acceptable for MySQL's load.
Cache
Listing data only cares about current and future listing so we can set TTL to expire old data automatically.
CDN for static file like images, videos etc.
Data consistency across different micro-services
Two phase commit:
2PC is a database protocol used to guarantee atomic transaction commit across multiple nodes, either all nodes succeeded or all nodes failed. 2PC is blocking and a single node failure would block the process until the node has recovered.
Saga
Saga is a sequence of local transactions. Each transaction updates and publishes a message to trigger the next transaction step. If a step fails, saga executes compensating transactions to undo changes.
2PC works as a single commit to perform ACID transaction while Saga consists of multiple steps and rely on eventual consistency.
Last updated