Reverse Nearest Neighbor Queries

The most popular variant of nearest neighbor query is reverse nearest neighbor queries that focuses on the inverse relation among points. A reverse nearest neighbor (RNN) query

is to find all the objects for which

is their nearest neighbor. A reverse nearest neighbor query is formally defined below.

Definition: Given a set of objects

and a query object

, a reverse nearest neighbor query is to find a set of objects

so that for any object $p\in P$ and $r\in RNN$ , $dist(r,q)\leq dist(r,p)$ .

set of a query

may be empty or may have one or more elements. Korn et al. [KM00] defined the RNN queries and provided a large number of applications. For example, a two-dimensional RNN query may ask the set of customers affected by the opening of a new store outlet location in order to inform the relevant customers. This query can also be used to identify the location which maximizes the number of potential customers. Consider another example, an RNN query may be issued to find the stores outlets that are affected by opening a new store outlet at some specific location. Note that in first example, there are two different sets (stores and customers) involved in RNN query whereas in second example there is only one set (stores). Korn et al. defined two variants of RNN queries. A bichromatic query (the first example) is to find the reverse nearest neighbors where the underlying data set consists of two different types of objects. A monochramtic RNN query (the second exampe) is to find the reverse nearest neighbors where the data set contains only one type of objects. The problem of reverse nearest neighbors is extensively studied in past few years [SAA00,YL01,SRAA01,MZ02,LNY03,SFT03,TPL04,XZKD05,YPMT05,XZ06,TYM06,ABK$^+$06]. Below we briefly describe the most popular and general algorithms only.

Korn and Muthukrishnan [KM00] answer RNN query by pre-calculating a circle of each object

such that the nearest neighbor of

lies on the perimeter of the circle as shown in Fig.

. The MBR of all these circles are indexed by an R-tree called RNN-tree. The problem of RNN query is reduced to a point location query on RNN-tree that returns all the circle containing

. For examples, circle of

and

contain

so both are the reverse nearest neighbors of

. Yang et al. [YL01] improved their method by RdNN-tree. Similar to RNN-tree, a leaf node of the RdNN tree contains the circles of data points. The intermediate nodes contain the minimum bounding rectangles (MBR) of underlying points along with the maximum distance from every point in the sub-tree to its nearest neighbor. The problem with above mentioned techniques is that they rely on pre-computation and cannot deal with efficient updates. In order to alleviate this problem, Lin et. al [LNY03] introduced a method for bulk insertion in the RdNN-tree.

**Figure:** The Objects and are the Reverse Nearest Neighbors of
$\includegraphics[width=5.0in]{background/fig/korn.eps}$

Stanoi et al. [SAA00] eliminate the need for pre-computing the nearest neighbors of all data points by utilizing an interesting property of reverse nearest neighbor queries. Their approach is called SAA hereafter. SAA utilizes the fact that for any two-dimensional RNN query

, the number of reverse nearest neighbors cannot exceed six. The algorithm divides the space around

in six regions of equal size

as shown in Fig.

. They observe that for a nearest neighbor object

in region

; either

is the RNN of

or there is no RNN in

. For example,

is the nearest neighbor of

in region

but it is not the RNN because

lies closer to it than

. Consequently, there is no RNN of

and the algorithm does not need to consider other objects in this region. Based on this interesting property, SAA answers RNN queries in two steps. In first step, for each of the six regions a nearest neighbor is found. All these nearest neighbors form a candidate list. In second step, a NN query is issued for each object in the candidate list and the algorithm discards the objects from candidate list for which

is not the nearest neighbor.

**Figure:** Illustration of SAA
$\includegraphics[width=5.0in]{background/fig/pieNN.eps}$

SAA suffers with the curse of dimensionality. The number of regions to be searched for candidate objects increases exponentially with the dimensionality. Singh et al. [SFT03] propose a solution that performs better than SAA in high-dimensional space. Given a system parameter

, their proposed approach first computes

nearest neighbors of

by using conventional R-tree approach. These

objects form the candidate list. In the second step, they eliminate the objects from candidate list that are closer to some other object in candidate list than

. Finally, for each object

in candidate list, a boolean range query with range

is issued. A boolean range query is different from a conventional range query in the sense that it terminates as soon as the first object is found. Clearly, the objects for which the boolean range query does not return any object are the RNNs of

. However, depending on the value of

, their approach may miss some reverse nearest neighbor that was not retrieved by

nearest neighbor search in first step of the algorithm.

Tao et al. [TPL04] utilize the idea of perpendicular bisector to reduce the search space. Let

be a line segment joining two points

and

, a perpendicular bisector $\perp(A,B)$ of the two points

and

is perpendicular to the line

and passes through the midpoint

. Consider the example of Fig.

where a perpendicular bisector $\perp(p,q)$ between two points

and

is shown. This perpendicular bisector divides the space into two half planes

and

where

contains

and

contains

. It can be noted that there cannot be any RNN in the plane

other than

(

is closer to all points in

than

). Based on this property, any MBR can be pruned that falls completely in

. The MBR

is pruned in the example of Fig.

. Even if some MBR does not fall completely in a half plane $PL_{p_1}$ of an object

, it can be pruned if it lies entirely in the union of $PL_{p_1}$ and $PL_{p_2}$ where $PL_{p_2}$ is the half-plane of another object

. Fig.

shows the pruning of MBR

because it falls in the union of half-planes $PL_{p_1}$ and $PL_{p_2}$ . Their approach can also be extended to answer R

NN queries that is to find all objects for which

is one of their

nearest neighbors.

**Figure:** Half-Plane Pruning
[Pruning with one point] $\includegraphics[scale=0.42]{background/fig/tao1.eps}$ [Pruning with two points] $\includegraphics[scale=0.42]{background/fig/tao2.eps}$

To the best of our knowledge, the only algorithm that can answer continuous reverse nearest neighbor queries is proposed by Xia et al. [XZ06]. They utilize the idea presented in [SAA00]. Consider the example of Fig.

where a continuous reverse nearest neighbor (CRNN) query

is shown along with six regions

. The monitoring region of

consists of two parts: pie-regions and circ-regions. A pie-region in a partition

is a pie centered at

having the nearest neighbor of

on the perimeter. Let

be the nearest neighbor of

in a region

, the circ-region is a circle centered at

having

or an object nearer to

than

on its perimeter. The shadowed areas are the six pie-regions and the circles are six circ-regions. Intuitively, the pie-regions monitor the updates that will change the candidate list and may possibly change the results. The updates in circ-regions can make a previous result a false positive or can make a previous false positive an RNN. In other words, any update outside the monitoring region (the union of pie-region and circ-regions) cannot affect the results of query and can be ignored.

**Figure:** The Monitoring Region of a Continuous RNN Query
$\includegraphics[width=5.0in]{background/fig/crnn.eps}$