👽
Software Engineer Interview Handbook
  • README
  • Behavioral
    • Useful Links
    • Dongze Li
  • Algorithm
    • Segment Tree
    • Array
      • Product Of Array Except Self
      • Merge Strings Alternately
      • Increasing Triplet Subsequence
      • String Compression
      • Greatest Common Divisor Strings
      • Max Product Of Three
      • Find Duplicate Num
      • Valid Palindrome Ii
      • Next Permutation
      • Rearrange Array By Sign
      • Removing Min Max Elements
      • Find Original Array From Doubled
      • Reverse Words Ii
    • Backtracking
      • Letter Combination Phone Number
      • Combination Sum Iii
      • N Queens
      • Permutations
      • Combination Sum
    • Binary Search
      • Koko Eating Bananas
      • Find Peak Element
      • Successful Pairs Of Spells Potions
    • Binary Search Tree
      • Delete Node In BST
      • Validate Bst
      • Range Sum Bst
    • Binary Tree
      • Maximum Depth
      • Leaf Similar Trees
      • Maximum Level Sum
      • Binary Tree Right Side
      • Lowest Common Ancestor
      • Longest Zigzag Path
      • Count Good Nodes
      • Path Sum III
      • Maximum Path Sum
      • Move Zero
      • Diameter Binary Tree
      • Sum Root Leaf Number
      • Traversal
      • Binary Tree Vertical Order
      • Height Tree Removal Queries
      • Count Nodes Avg Subtree
      • Distribute Coins
      • Binary Tree Max Path Sum
    • Bit
      • Min Flips
      • Single Number
      • Pow
      • Find Unique Binary Str
    • BFS
      • Rotten Oranges
      • Nearest Exist From Entrance
      • Minimum Knight Moves
      • Network Delay Time
      • Minimum Height Tree
      • Knight Probability In Board
    • Design
      • LRU Cache
      • Get Random
      • LFU Cache
      • Moving Average
      • Rle Iterator
      • Design Hashmap
    • DFS
      • Reorder Routes Lead City
      • Evaluate Division
      • Keys And Rooms
      • Number Of Provinces
      • Disconnected Path With One Flip
      • Course Schedule Ii
      • Robot Room Cleaner
      • Word Break Ii
      • Number Coins In Tree Nodes
      • Maximum Increasing Cells
      • Number Coins In Tree Nodes
      • Detonate Maximum Bombs
      • Find All Possible Recipes
      • Min Fuel Report Capital
      • Similar String Groups
    • DP
      • Domino And Tromino Tiling
      • House Robber
      • Longest Common Subsequence
      • Trade Stock With Transaction Fee
      • Buy And Sell Stock
      • Longest Non Decreasing Subarray
      • Number Of Good Binary Strings
      • Delete And Earn
      • Minimum Costs Using Train Line
      • Decode Ways
      • Trapping Rain Water
      • Count Fertile Pyramids
      • Minimum Time Finish Race
      • Knapsack
      • Count Unique Char Substrs
      • Count All Valid Pickup
    • Greedy
      • Dota2 Senate
      • Smallest Range Ii
      • Can Place Flowers
      • Meeting Rooms II
      • Guess the word
      • Minimum Replacement
      • Longest Palindrome Two Letter Words
      • Parentheses String Valid
      • Largest Palindromic Num
      • Find Missing Observations
      • Most Profit Assigning Work
    • Hashmap
      • Equal Row Column Pairs
      • Two Strings Close
      • Group Anagrams
      • Detect Squares
    • Heap
      • Maximum Subsequence Score
      • Smallest Number Infinite Set
      • Total Cost Hire Workers
      • Kth Largest Element
      • Meeting Rooms III
      • K Closest Points Origin
      • Merge K Sorted List
      • Top K Frequent Elements
      • Meeting Room III
      • Num Flowers Bloom
      • Find Median From Stream
    • Intervals
      • Non Overlapping Intervals
      • Min Arrows Burst Ballons
    • Linkedlist
      • Reverse Linked List
      • Delete Middle Node
      • Odd Even Linkedlist
      • Palindrome Linkedlist
    • Monotonic Stack
      • Daily Temperatures
      • Online Stock Span
    • Random
      • Random Pick With Weight
      • Random Pick Index
      • Shuffle An Array
    • Recursion
      • Difference Between Two Objs
    • Segment Fenwick
      • Longest Increasing Subsequence II
    • Stack
      • Removing Stars From String
      • Asteroid Collision
      • Evaluate Reverse Polish Notation
      • Building With Ocean View
      • Min Remove Parentheses
      • Basic Calculator Ii
      • Simplify Path
      • Min Add Parentheses
    • Prefix Sum
      • Find The Highest Altitude
      • Find Pivot Index
      • Subarray Sum K
      • Range Addition
    • Sliding Window
      • Max Vowels Substring
      • Max Consecutive Ones III
      • Longest Subarray Deleting Element
      • Minimum Window Substring
      • K Radius Subarray Averages
    • String
      • Valid Word Abbreviations
    • Two Pointers
      • Container With Most Water
      • Max Number K Sum Pairs
      • Is Subsequence
      • Num Substrings Contains Three Char
    • Trie
      • Prefix Tree
      • Search Suggestions System
      • Design File System
    • Union Find
      • Accounts Merge
    • Multithreading
      • Basics
      • Web Crawler
  • System Design
    • Operating System
    • Mocks
      • Design ChatGPT
      • Design Web Crawler
      • Distributed Search
      • News Feed Search
      • Top K / Ad Click Aggregation
      • Design Job Scheduler
      • Distributed Message Queue
      • Google Maps
      • Nearby Friends
      • Proximity Service
      • Metrics monitoring and alert system
      • Design Email
      • Design Gaming Leaderboard
      • Facebook New Feed Live Comments
      • Dog Sitting App
      • Design Chat App (WhatsApp)
      • Design Youtube/Netflix
      • Design Google Doc
      • Design Webhook
      • Validate Instacart Shopper Checkout
      • Design Inventory
      • Design donation app
      • Design Twitter
    • Deep-Dive
      • Back of Envelope
      • Message Queue
      • Redis Sorted Set
      • FAQ
      • Geohash
      • Quadtree
      • Redis Pub/Sub
      • Cassandra DB
      • Collaborative Concurrency Control
      • Websocket / Long Polling / SSE
    • DDIA
      • Chapter 2: Data Models and Query Languages
      • Chapter 5: Replication
      • Chapter 9: Consistency and Consensus
  • OOD
    • Overview
    • Design Parking
  • Company Tags
    • Meta
    • Citadel
      • C++ Fundamentals
      • 面经1
      • Fibonacci
      • Pi
      • Probability
    • DoorDash
      • Similar String Groups
      • Door And Gates
      • Max Job Profit
      • Design File System
      • Count All Valid Pickup
      • Most Profit Assigning Work
      • Swap
      • Binary Tree Max Path Sum
      • Nearest Cities
      • Exployee Free Time
      • Tree Add Removal
    • Lyft
      • Autocomplete
      • Job Scheduler
      • Read4
      • Kvstore
    • Amazon
      • Min Binary Str Val
    • AppLovin
      • TODO
      • Java Basic Questions
    • Google
      • Huffman Tree
      • Unique Elements
    • Instacart
      • Meeting Rooms II
      • Pw
      • Pw2
      • Pw3
      • Expression1
      • Expression2
      • Expression3
      • PW All
      • Expression All
      • Wildcard
      • Free forum tech discussion
    • OpenAI
      • Spreadsheet
      • Iterator
      • Kv Store
    • Rabbit
      • Scheduler
      • SchedulerC++
    • [Microsoft]
      • Min Moves Spread Stones
      • Inorder Successor
      • Largest Palindromic Num
      • Count Unique Char Substrs
      • Reverse Words Ii
      • Find Missing Observations
      • Min Fuel Report Capital
      • Design Hashmap
      • Find Original Array From Doubled
      • Num Flowers Bloom
      • Distribute Coins
      • Find Median From Stream
Powered by GitBook
On this page
  • Functional Requirements
  • Non-functional Requirements
  • QPS
  • APIs
  • Data Schema
  • High Level Diagram
  • Caching
  1. System Design
  2. Mocks

Proximity Service

Functional Requirements

  1. Who are our end users? two sides: toB and toC

  2. Serving side and ingestion side.

  3. What is the search radius? What's the maximum radius allow?

  4. How instantly do we want to update the business information

Non-functional Requirements

  1. Highly available

  2. Low latency

  3. consistency requirements?

  4. Read > write

QPS

Assuming we have 100M users and 5 search queries a day.

100M * 5 / 10^5 = 10^8 * 5 / 10^5 = 5000 QPS

APIs

GET v1/places?longitude=xxx&latitude=xxx&radius=xxx

Response:

[business1, business2, business3 ...]

Data Schema

Design Options


Option 1: Store the business with only longitude and latitude

Plain query over longitude and latitude for:

user_longitude-radius <= longitude <= user_longitude+radius
user_latitude-radius <= latitude <= user_latitude+radius

Option 2: Evenly divided grid

Segment the entire map into number of evenly divided grid.

Query -> We will just look for the segment that user location belongs to.

Pros:

  • More efficient compared to option 1.

Cons:

  • For each grid, there might be unevenly distributed number of businesses.

  • If user zoom in/out, this is not very flexible to show number of businesses at different zoom level.


Reducing the two-dimensional longitude and latitude data into one-dimensional string of letters and digits. Recursively dividing the world into smaller and smaller grids with each additional bit.

Pros:

  • Very efficient and can fit any precision use cases.

  • Not very straightforward to implement but luckily we have a lot of out-of-box libraries/solutions.

Build a in-memory quadtree by partitioning the two-dimensional space by recursively subdividing it into four quadrants until the content of the grid meet a certain criteria, for example, 100 businesses maximum.

Geohash vs Quadtree

Geohash
Quadtree

Easy to use and implement, No need to build a tree

Need to build a tree, harder to implement

Grid size is fixed. Support returning businesses within a specific radius but not k-nearest businesses

Good fit for k-nearest businesses it can automatically adjust the query range until it returns k results.

Precision is fixed, grid size is fixed, cannot adjust grid size based on item density.

Dynamically adjust the grid size based on population density.

Update/Remove a business is as easy as deleting that geohash record.

Updating index is more complicated than geohash. If a business is removed, we need to traverse from root to leaf node in order to remove business. Locking mechanism is also required if multiple threads are modifying it. Also need to think about rebalancing the tree, A possible fix is to over-allocate the ranges.


Business Table

Column
Type

id

string

name

string

longitude

float

latitude

float

geohash

string

Serving Algorithm

  1. Convert user's location to a geohash with a precision based on the radius.

  2. Start with geohashes with same prefix as user's location, calculate neighboring geohashes and add them to a list.

  3. For each geohash in the list, fetch businesses:

SELECT * FROM geohash_index WHERE geohash LIKE '9q8zn%'
  1. Filter these results by calculating distance between each business to user's location and only keep businesses that are within the search radius.

  2. Rank result list and return to client.

High Level Diagram

Caching

Caching is not a solid win because:

  • The workload is read-heavy, the dataset is relatively small. The data could fit in the working set of any modern database server. (1.7GB), the queries are not I/O bound and they should run almost as fast as in-memory cache.

  • If read is bottleneck, we can add more read replicas to improve read throughput.

Cache key selection

  • Location coordinates (latitude, longitude).

    Cons:

    • location returned from device not always accurate, will change slightly every time.

    • user can move

    • hit rate is terrible if we use location.

  • Geohash and business id

Key
Value

geohash

a list of business ids

business id

business entity

According to requirements, user can select different radius: 500m, 1km, 2km and 5km. Those radius mapped to 4, 5, 5, and 6 for geohash length. We can cache data on geohash#precision like geohash_4, geohash_5 and geohash_6.

Memory

Redis storage: 8 bytes x 200M x 3 precisions = 5GB

PreviousNearby FriendsNextMetrics monitoring and alert system

Last updated 1 year ago

Option 3:

Option 4:

Geohashing
Quadtree
Drawing