Constrained Nearest Neighbor Queries

Constrained nearest neighbor queries [FSAA01] can be defined as nearest neighbor queries that are constrained to a specified region. This type of queries is targeted towards the users who are particularly interested in nearest neighbors in a region bounded by certain spatial conditions, rather than in searching for nearest neighbors in the entire data space.

There are many applications of this variant of nearest neighbor queries. For example, a user may wish to find $ 5$ nearest gas stations in San Francisco from his location. Clearly, the constrained region is the map of San Francisco. Similarly, a user may issue a query to find $ 10$ nearest neighbors from his location to the North. In this example, the constrained region is the sub-space that lies to the North of the user. Recall, six constrained nearest neighbor queries are issued to find reverse nearest neighbor of a query [SAA00].

Ferhatosmanoglu et al. [FSAA01] propose a solution based on the best-first traversal of R-tree. Their proposed pruning criteria makes their technique optimal with respect to the number of I/O accesses. The proximity comparisons in [RKV95], based on the Euclidean distance metric use the notion of $ mindist(q,M)$ and $ minmaxdist(q,M)$ where $ q$ is a query point and $ M$ is a minimum bounding rectangle (MBR). More specifically, $ mindist(q,M)$ is the shortest distance from $ q$ to the given MBR $ M$ and $ minmaxdist(q,M)$ is the minimum distance from $ q$ to the furthest point on the closest face of the MBR $ M$ . Fig. [*] illustrates these definitions. These metrics ensure that for a minimum bounding rectangle $ M$ , there is at least one data point within the distance range of $ [mindist(q,M),minmaxdist(q,M)]$ . Consider a 1-NN query, an MBR $ M'$ and its child entries can be pruned if $ mindist(q,M')>minmaxdist(q,M)$ because by the definition it is known that $ M$ contains at least one object that is closer than all objects in $ M'$ . Note that $ M'$ and its child can also be pruned if there exists a point $ p$ such that $ dist(p,q)<mindist(q,M')$ .

Figure: Illustration of $ mindist(q,M,R)$ and $ minmaxdist(q,M,R)$
\includegraphics[width=6.0in]{background/fig/definitions.eps}

Ferhatosmanoglu et al. notice that the above definitions of $ mindist$ and $ minmaxdist$ used in [RKV95] cannot yield correct results so they re-define these distance metrics as follows: Let $ R$ be the constrained region, $ M$ be a minimum bounding rectangle (MBR) and $ q$ be a query point, $ mindist(q,M,R)$ is the minimum distance between $ q$ and the part of MBR that lies in $ R$ . Consider the example of Fig. [*], $ mindist(q,M,R)$ is the minimum distance of $ q$ to the shaded area of $ M$ . Note that $ mindist(q,M)$ can never be greater than $ mindist(q,M,R)$ ( $ mindist(q,M)\leq mindist(q,M,R)$ ) hence this definition provides a tighter pruning condition. $ minmaxdist(q,M,R)$ is the distance between $ q$ and the furthest point on the closest face of $ M$ that is fully contained in $ R$ . In Fig.[*], there is only one edge of $ M$ that completely lies inside $ R$ , so $ minmaxdist(q,M,R)$ is the distance from $ q$ to the farthest point on this edge as shown. It can be seen that with the previous definition of $ minmaxdist(q,M)$ , it cannot be guaranteed that there is at least one object that lies in the range $ [mindist(q,M),minmaxdist(q,M)$ because that point may lie outside the region $ R$ . According to the new definition, the condition that the edge should be fully contained in $ R$ guarantees that there is at least one object that lies in $ R$ within the range $ [mindist(q,M),minmaxdist(q,M,R)]$ .

Figure: Computation of a Constrained Nearest Neighbor Query
\includegraphics[width=6.0in]{background/fig/const-example.eps}

The authors prove that their approach is optimal with respect to the number of I/O accesses. Consider the example of Fig. [*], there algorithm accesses only the MBR $ E$ . The MBRs $ A$ and $ B$ are pruned because they do not lie in the constrained region $ R$ . MBR $ C$ is pruned because $ mindist(q,C,R)>minmaxdist(q,E,R)$ . Since $ mindist(q,B,R)<mindist(q,D,R)$ , $ B$ is accessed first and an object $ p$ is found. The MBR $ D$ is pruned because $ mindist(q,D,R)>dist(p,q)$ . The algorithm terminates and reports $ p$ as the answer.

To the best of our knowledge, there does not exist any previous work on continuous monitoring of constrained nearest neighbor queries. In chapter [*], we propose an efficient solution for the problem of continuous monitoring of constrained nearest neighbor queries.

Muhammad Aamir Cheema 2007-10-11