A data structure that is commonly used to partition a two-dimensional space by recursively subdividing it into four quadrants until content of grids meet certain criteria.
Overview
Although similar to geohashing, quadtree is a in-memory solution, not a database solution.
quadtree structure
Algorithm
Memory
Leaf Node Memory
Name
Size
Top-left and bottom-right to identify the grid
32 bytes (8 bytes x 4)
List of business IDs (for example: maximum 100)
8 bytes x 100
Total
832 bytes
Internal Node Memory
Internal nodes doesn't contain business list information, it only holds the four references to its children.
Name
Size
Top-left and bottom-right coordinates
32 bytes (8 bytes x 4)
Pointer to 4 children
32 bytes (8 bytes x 4)
Total
64 bytes
Total Memory Calculation
For example: 200 Million businesses to store and each leaf can only have maximum of 100 businesses.
public void buildQuadtree(TreeNode node) {
if (countNumberOfBusinessesInCurrentGrid(node) > 100) {
node.subdivide();
for (TreeNode child : node.getChildren()) {
buildQuadTree(child);
}
}
}
def nearest_neighbors(self, point, count=10):
"""
Returns the nearest points of a given point, sorted by distance
(closest first).
The desired point does not need to exist within the quadtree, but
does need to be within the tree's boundaries.
Args:
point (Point): The desired location to search around.
count (int): Optional. The number of neighbors to return. Default
is `10`.
Returns:
list: The nearest `Point` neighbors.
"""
# Algorithm description:
# * Search down to find the smallest node around the desired point,
# retaining a stack of nodes visited on the way down.
# * Reverse the visited stack, so that it's now in
# smallest/closest-to-largest/furthest order.
# * Iterate over the node stack.
# * Collect the points from the current node & it's children.
# * Sort the points by euclidean distance, using
# `euclidean_compare`, since the actual distance doesn't matter
# for now.
# * Add them to the "found" results.
# * If the "found" count is greater-than-or-equal to the desired
# count, break out of the loop.
# * If the stack is exhausted, we have all the points in the entire
# quadtree & can just return them.
# * Otherwise, we now have a decent set of results, ordered by
# distance. But we are not done. It's possible/probable that there
# are other nearby quadnodes that weren't touched by the search
# BUT are physically closer.
# * Take our furthest point and use it as a radius for a search
# "circle".
# * We'll actually just create a bounding box, which is
# computationally cheaper & we already have methods that
# support it.
# * Using that radius as a distance to the *edge* (not a corner),
# we create a box big enough to fit the search circle.
# * Collect all the points within that bounding box.
# * Re-sort them by euclidean distance (again, using
# `euclidean_compare`).
# * Slice it to match the desired count & return them.
point = self.convert_to_point(point)
nearest_results = []
# Check to see if it's within our bounds first.
if not self._root.contains_point(point):
return nearest_results
# First, find the target node.
node, searched_nodes = self._root.find_node(point)
# Reverse the order, as they come back in coarse-to-fine order, which
# is the opposite of nearby points.
searched_nodes.reverse()
seen_nodes = set()
seen_points = set()
# From here, we'll work our way backwards out through the nodes.
for node in searched_nodes:
# Mark the node as already checked.
seen_nodes.add(node)
local_points = []
for pnt in node.all_points():
if pnt in seen_points:
continue
seen_points.add(pnt)
local_points.append(pnt)
local_points = sorted(
local_points, key=lambda lpnt: euclidean_compare(point, lpnt)
)
nearest_results.extend(local_points)
if len(nearest_results) >= count:
break
# Slice off any extras.
nearest_results = nearest_results[:count]
if len(seen_nodes) == len(searched_nodes):
# We've exhausted everything. Return what we've got.
return nearest_results[:count]
search_radius = euclidean_distance(point, nearest_results[-1])
search_bb = BoundingBox(
point.x - search_radius,
point.y - search_radius,
point.x + search_radius,
point.y + search_radius,
)
bb_results = self._root.within_bb(search_bb)
nearest_results = sorted(
bb_results, key=lambda lpnt: euclidean_compare(point, lpnt)
)
return nearest_results[:count]