Design Inventory
Last updated
Last updated
System Design:不是最常见的visa payment system, 而是inventory management, 设计当有客户,卖菜人,和买菜人同时需要更新菜的数量和买卖的情况。全程面试官毫无交流,感觉对墙说话。我每五分钟都停下来确认是不是再聊他想讨论的内容,面试官都是一句敷衍带过good good。最后草草收场去开别的会了。。
问如果做一个能显示各种category (food,clothes..)的页面, 然后点开每个category 的话还能显示sub- category,应该怎么设计database。感觉这里也答得一般
System Design: 是没见过的心题 , 就算是老面镜题payment都不知道能不能过关。给我来了一个全新的题。 题目: 有用户, 有快递小哥, 有厂商, 围绕着从库里的库存 的数量,更新数据。 api要求返回在1s之内。 问了, 选什么database, 怎么monitor, single point failure有哪些 之前没怎么看system design, 属于半裸面。 真不敢相信自己真的就扯了1个小时。
shopping experience for customer
retailer app check 库存 {商店, 商品, 薯量}
customer app 架入 商品 {商店, 商品, 薯量,盯单}
跑腿小哥 checkout {商店, 商品, 薯量,盯单} 没什么特别好的想法,主要问了问库存怎么扣出,怎么确认不超存量
How frequent does retailer update the inventory?
Retailers are able to check inventory, add/update inventory, delete inventory...
Customers are able to check inventory, add item to cart and place an order.
Shopper can update, checkout order.
Highly available - user will be able to do grocery shopping at any time.
Highly scalable - the system can handle large amount of read/write
Low latency
Consistency - If multiple users place orders on the same item, we need to accurately reflect on the item availability?
How much scale we are looking at? and read:write ratio?
75k stores, 500M products on the shelves.
2M active users place multiple orders of tens of items in their cart every month.
600k Shoppers.
One month -> 10M order -> 10M / 30 = 0.3M order / day = 0.3*10^6 / 10^5 = 15 QPS
15*3 = 45 QPS for checkout validation requests.
75k store, 500M items, 200 locations
On average a user will make 20 searches per basket before they checkout.
500*10^6 / (75*10^3) = 500/75*10^3 = 10^4 items per store.
1 store -> items
Retailer partners update inventory once a day for availability of all items on instacart.
Despite estimation about availability, purchased items are not reserved until our shoppers pick them up in the store.
Create cart order
Fetch item availability from Cache.
Create order and order items.
After shopper update order item status:
Fetch item availability and version number.
Update Item Availability table with version number for optimistic lock.
Update order item status.
shopper1 and shopper2 both mark the same item as found in the same retail store, we might ended up with inaccurate quantity for item availability table.
The isolation property in ACID means database transactions must complet their tasks independently from other transactions. So data changes made by transaction 1 are not visible to transaction 2 until transaction 1 is completed.
Pessimistic concurrency control, prevents simultaneous updates by placing a lock on a record as soon as one user starts to update it. Other users who attempt to update the record have to wait until the first user has released the lock.
For MySQL:
works by locking the rows returned by a selection query.
Pros:
Prevents applications from updating data that is being changed.
Easy to implement and it avoids conflicts by serializing updates.
Useful for write heavy application when data-contention is heavy.
Cons:
Deadlocks may occur when multiple resources are locked. Writing deadlock-free application code could be challenging.
If a transaction is locked for too long, other transactions cannot access the resource. This has a significant impact on database performance, especially when transactions are long lived.
Optimistic concurrency control, allows multiple concurrent users to attempt to update the same resource.
A new column called version is added to the database table.
Before a user modifies the database row, the application read the version number of the row.
When the user updates the row, the application increases the version number by 1 and write back to db.
A database validation check is put in place: the next version number should exceed the current version number by 1. Otherwise, validation fails and user tries again from step 2.
Pros:
Prevents applications from updating stale data.
We don't need to lock db resources. version number control is on application level.
Optimistic lock is good when data-contention and concurrency is low.
Cons:
Poor performance when data contention is heavy.
Why? When there are a lot of shoppers try to mark the same grocery item, only one of them will succeed and the rest of client requests will have to retry.
Similar to optimistic locking.
Instead of using is_available, we can use availability initialized to 1, and we add the following DB constraint:
Pros:
Easy to implement, works well when data contention is low.
Cons:
The db constraint cannot be version controlled easily like application code.
Not all db support constraints, if we do data migration in the future, it might cause problems.
For customer browsing item catalogs within a retailer store and placing an order, we can actually put all the read requests onto the cache, since the item is not sold until shopper picked it up from store.
Only when shopper pick up item from store, we start update item availability and solve the conflict by using optimistic lock.
After DB updated, we update the cache.
There are normally two ways we can scale the DB. Either we add replica or we shard the data.
Add Read Replicas:
Pros:
Can help ease up read load.
Cons:
Higher chance of change conflicts if user read stale data from version column within the table.
Partition data into multiple databases
Move data to cold storage, for example item availability table from past dates.
Shard by retailer_id
Each retailer is unique and location-based, so we are effectively splitting request traffic from different retailers.
Ingesting scores generated by ML models into DB storage for fast and bulk retrieval. In both method, we store score in DB and update them to address low latency use cases.
Full sync
The ML availability scoring service updates a table multiple times a day in Snowflake with the refreshed availability score of items. The DB ingestion workers read the snowflake table periodically and upsert the availability score for refreshed items to ensure no scores are stale.
Lazy Score Refresh
Scores are updated on demand based on an item appearing in the search results. The refresh activity happens in background jobs and used kinesis to aggregate the updates to DB.
The new model combines three components for general, trending and real-time availability scores.
General Score: describes typical item availability patterns and is recalculated weekly.
Trending Score: Quantifies short-term deciations from typical patterns and is recomputed daily or hourly.
Real-time score: based on latest observations, use a event driven streaming achitecture that employs Kafka and Flink to deliver signals sourced from customer applications and retailer systems.