The most popular variant of nearest neighbor query is reverse nearest neighbor queries that focuses on the inverse relation among points. A reverse nearest neighbor (RNN) query is to find all the objects for which is their nearest neighbor. A reverse nearest neighbor query is formally defined below.
Definition: Given a set of objects and a query object , a reverse nearest neighbor query is to find a set of objects so that for any object and , .
set of a query may be empty or may have one or more elements. Korn et al. [KM00] defined the RNN queries and provided a large number of applications. For example, a two-dimensional RNN query may ask the set of customers affected by the opening of a new store outlet location in order to inform the relevant customers. This query can also be used to identify the location which maximizes the number of potential customers. Consider another example, an RNN query may be issued to find the stores outlets that are affected by opening a new store outlet at some specific location. Note that in first example, there are two different sets (stores and customers) involved in RNN query whereas in second example there is only one set (stores). Korn et al. defined two variants of RNN queries. A bichromatic query (the first example) is to find the reverse nearest neighbors where the underlying data set consists of two different types of objects. A monochramtic RNN query (the second exampe) is to find the reverse nearest neighbors where the data set contains only one type of objects. The problem of reverse nearest neighbors is extensively studied in past few years [SAA00,YL01,SRAA01,MZ02,LNY03,SFT03,TPL04,XZKD05,YPMT05,XZ06,TYM06,ABK$^+$06]. Below we briefly describe the most popular and general algorithms only.
Korn and Muthukrishnan [KM00] answer RNN query by pre-calculating a circle of each object such that the nearest neighbor of lies on the perimeter of the circle as shown in Fig. . The MBR of all these circles are indexed by an R-tree called RNN-tree. The problem of RNN query is reduced to a point location query on RNN-tree that returns all the circle containing . For examples, circle of and contain so both are the reverse nearest neighbors of . Yang et al. [YL01] improved their method by RdNN-tree. Similar to RNN-tree, a leaf node of the RdNN tree contains the circles of data points. The intermediate nodes contain the minimum bounding rectangles (MBR) of underlying points along with the maximum distance from every point in the sub-tree to its nearest neighbor. The problem with above mentioned techniques is that they rely on pre-computation and cannot deal with efficient updates. In order to alleviate this problem, Lin et. al [LNY03] introduced a method for bulk insertion in the RdNN-tree.
Stanoi et al. [SAA00] eliminate the need for pre-computing the nearest neighbors of all data points by utilizing an interesting property of reverse nearest neighbor queries. Their approach is called SAA hereafter. SAA utilizes the fact that for any two-dimensional RNN query , the number of reverse nearest neighbors cannot exceed six. The algorithm divides the space around in six regions of equal size to as shown in Fig. . They observe that for a nearest neighbor object of in region ; either is the RNN of or there is no RNN in . For example, is the nearest neighbor of in region but it is not the RNN because lies closer to it than . Consequently, there is no RNN of in and the algorithm does not need to consider other objects in this region. Based on this interesting property, SAA answers RNN queries in two steps. In first step, for each of the six regions a nearest neighbor is found. All these nearest neighbors form a candidate list. In second step, a NN query is issued for each object in the candidate list and the algorithm discards the objects from candidate list for which is not the nearest neighbor.
SAA suffers with the curse of dimensionality. The number of regions to be searched for candidate objects increases exponentially with the dimensionality. Singh et al. [SFT03] propose a solution that performs better than SAA in high-dimensional space. Given a system parameter , their proposed approach first computes nearest neighbors of by using conventional R-tree approach. These objects form the candidate list. In the second step, they eliminate the objects from candidate list that are closer to some other object in candidate list than . Finally, for each object in candidate list, a boolean range query with range is issued. A boolean range query is different from a conventional range query in the sense that it terminates as soon as the first object is found. Clearly, the objects for which the boolean range query does not return any object are the RNNs of . However, depending on the value of , their approach may miss some reverse nearest neighbor that was not retrieved by nearest neighbor search in first step of the algorithm.
Tao et al. [TPL04] utilize the idea of perpendicular bisector to reduce the search space. Let be a line segment joining two points and , a perpendicular bisector of the two points and is perpendicular to the line and passes through the midpoint of . Consider the example of Fig. where a perpendicular bisector between two points and is shown. This perpendicular bisector divides the space into two half planes and where contains and contains . It can be noted that there cannot be any RNN in the plane other than ( is closer to all points in than ). Based on this property, any MBR can be pruned that falls completely in . The MBR is pruned in the example of Fig. . Even if some MBR does not fall completely in a half plane of an object , it can be pruned if it lies entirely in the union of and where is the half-plane of another object . Fig. shows the pruning of MBR because it falls in the union of half-planes and . Their approach can also be extended to answer R NN queries that is to find all objects for which is one of their nearest neighbors.
To the best of our knowledge, the only algorithm that can answer continuous reverse nearest neighbor queries is proposed by Xia et al. [XZ06]. They utilize the idea presented in [SAA00]. Consider the example of Fig. where a continuous reverse nearest neighbor (CRNN) query is shown along with six regions to . The monitoring region of consists of two parts: pie-regions and circ-regions. A pie-region in a partition is a pie centered at having the nearest neighbor of in on the perimeter. Let be the nearest neighbor of in a region , the circ-region is a circle centered at having or an object nearer to than on its perimeter. The shadowed areas are the six pie-regions and the circles are six circ-regions. Intuitively, the pie-regions monitor the updates that will change the candidate list and may possibly change the results. The updates in circ-regions can make a previous result a false positive or can make a previous false positive an RNN. In other words, any update outside the monitoring region (the union of pie-region and circ-regions) cannot affect the results of query and can be ignored.
Muhammad Aamir Cheema 2007-10-11