Reverse Nearest Neighbor Queries

The most popular variant of nearest neighbor query is reverse nearest neighbor queries that focuses on the inverse relation among points. A reverse nearest neighbor (RNN) query $ q$ is to find all the objects for which $ q$ is their nearest neighbor. A reverse nearest neighbor query is formally defined below.

Definition: Given a set of objects $ P$ and a query object $ q$ , a reverse nearest neighbor query is to find a set of objects $ RNN$ so that for any object $ p\in P$ and $ r\in RNN$ , $ dist(r,q)\leq dist(r,p)$ .

$ RNN$ set of a query $ q$ may be empty or may have one or more elements. Korn et al. [KM00] defined the RNN queries and provided a large number of applications. For example, a two-dimensional RNN query may ask the set of customers affected by the opening of a new store outlet location in order to inform the relevant customers. This query can also be used to identify the location which maximizes the number of potential customers. Consider another example, an RNN query may be issued to find the stores outlets that are affected by opening a new store outlet at some specific location. Note that in first example, there are two different sets (stores and customers) involved in RNN query whereas in second example there is only one set (stores). Korn et al. defined two variants of RNN queries. A bichromatic query (the first example) is to find the reverse nearest neighbors where the underlying data set consists of two different types of objects. A monochramtic RNN query (the second exampe) is to find the reverse nearest neighbors where the data set contains only one type of objects. The problem of reverse nearest neighbors is extensively studied in past few years [SAA00,YL01,SRAA01,MZ02,LNY03,SFT03,TPL04,XZKD05,YPMT05,XZ06,TYM06,ABK$^+$06]. Below we briefly describe the most popular and general algorithms only.

Korn and Muthukrishnan [KM00] answer RNN query by pre-calculating a circle of each object $ p$ such that the nearest neighbor of $ p$ lies on the perimeter of the circle as shown in Fig. [*]. The MBR of all these circles are indexed by an R-tree called RNN-tree. The problem of RNN query is reduced to a point location query on RNN-tree that returns all the circle containing $ q$ . For examples, circle of $ a$ and $ e$ contain $ q$ so both are the reverse nearest neighbors of $ q$ . Yang et al. [YL01] improved their method by RdNN-tree. Similar to RNN-tree, a leaf node of the RdNN tree contains the circles of data points. The intermediate nodes contain the minimum bounding rectangles (MBR) of underlying points along with the maximum distance from every point in the sub-tree to its nearest neighbor. The problem with above mentioned techniques is that they rely on pre-computation and cannot deal with efficient updates. In order to alleviate this problem, Lin et. al [LNY03] introduced a method for bulk insertion in the RdNN-tree.

Figure: The Objects $ a$ and $ e$ are the Reverse Nearest Neighbors of $ q$
\includegraphics[width=5.0in]{background/fig/korn.eps}

Stanoi et al. [SAA00] eliminate the need for pre-computing the nearest neighbors of all data points by utilizing an interesting property of reverse nearest neighbor queries. Their approach is called SAA hereafter. SAA utilizes the fact that for any two-dimensional RNN query $ q$ , the number of reverse nearest neighbors cannot exceed six. The algorithm divides the space around $ q$ in six regions of equal size $ S_0$ to $ S_5$ as shown in Fig. [*]. They observe that for a nearest neighbor object $ o_i$ of $ q$ in region $ S_i$ ; either $ o_i$ is the RNN of $ q$ or there is no RNN in $ S_i$ . For example, $ o_1$ is the nearest neighbor of $ q$ in region $ S_0$ but it is not the RNN because $ o_0$ lies closer to it than $ q$ . Consequently, there is no RNN of $ q$ in $ S_0$ and the algorithm does not need to consider other objects in this region. Based on this interesting property, SAA answers RNN queries in two steps. In first step, for each of the six regions a nearest neighbor is found. All these nearest neighbors form a candidate list. In second step, a NN query is issued for each object in the candidate list and the algorithm discards the objects from candidate list for which $ q$ is not the nearest neighbor.

Figure: Illustration of SAA
\includegraphics[width=5.0in]{background/fig/pieNN.eps}

SAA suffers with the curse of dimensionality. The number of regions to be searched for candidate objects increases exponentially with the dimensionality. Singh et al. [SFT03] propose a solution that performs better than SAA in high-dimensional space. Given a system parameter $ K$ , their proposed approach first computes $ K$ nearest neighbors of $ q$ by using conventional R-tree approach. These $ K$ objects form the candidate list. In the second step, they eliminate the objects from candidate list that are closer to some other object in candidate list than $ q$ . Finally, for each object $ p$ in candidate list, a boolean range query with range $ dist(p,q)$ is issued. A boolean range query is different from a conventional range query in the sense that it terminates as soon as the first object is found. Clearly, the objects for which the boolean range query does not return any object are the RNNs of $ q$ . However, depending on the value of $ K$ , their approach may miss some reverse nearest neighbor that was not retrieved by $ K$ nearest neighbor search in first step of the algorithm.

Tao et al. [TPL04] utilize the idea of perpendicular bisector to reduce the search space. Let $ AB$ be a line segment joining two points $ A$ and $ B$ , a perpendicular bisector $ \perp(A,B)$ of the two points $ A$ and $ B$ is perpendicular to the line $ AB$ and passes through the midpoint $ M$ of $ AB$ . Consider the example of Fig. [*] where a perpendicular bisector $ \perp(p,q)$ between two points $ p$ and $ q$ is shown. This perpendicular bisector divides the space into two half planes $ PL_q$ and $ PL_p$ where $ PL_q$ contains $ q$ and $ PL_p$ contains $ p$ . It can be noted that there cannot be any RNN in the plane $ PL_p$ other than $ p$ ($ p$ is closer to all points in $ PL_p$ than $ q$ ). Based on this property, any MBR can be pruned that falls completely in $ PL_p$ . The MBR $ N_1$ is pruned in the example of Fig. [*]. Even if some MBR does not fall completely in a half plane $ PL_{p_1}$ of an object $ p_1$ , it can be pruned if it lies entirely in the union of $ PL_{p_1}$ and $ PL_{p_2}$ where $ PL_{p_2}$ is the half-plane of another object $ p_2$ . Fig. [*] shows the pruning of MBR $ N_2$ because it falls in the union of half-planes $ PL_{p_1}$ and $ PL_{p_2}$ . Their approach can also be extended to answer R$ k$ NN queries that is to find all objects for which $ q$ is one of their $ k$ nearest neighbors.

Figure: Half-Plane Pruning

[Pruning with one point]\includegraphics[scale=0.42]{background/fig/tao1.eps} [Pruning with two points]\includegraphics[scale=0.42]{background/fig/tao2.eps}

To the best of our knowledge, the only algorithm that can answer continuous reverse nearest neighbor queries is proposed by Xia et al. [XZ06]. They utilize the idea presented in [SAA00]. Consider the example of Fig. [*] where a continuous reverse nearest neighbor (CRNN) query $ q$ is shown along with six regions $ S_0$ to $ S_5$ . The monitoring region of $ q$ consists of two parts: pie-regions and circ-regions. A pie-region in a partition $ S_i$ is a pie centered at $ q$ having the nearest neighbor of $ q$ in $ S_i$ on the perimeter. Let $ p_i$ be the nearest neighbor of $ q$ in a region $ S_i$ , the circ-region is a circle centered at $ p_i$ having $ q$ or an object nearer to $ p_i$ than $ q$ on its perimeter. The shadowed areas are the six pie-regions and the circles are six circ-regions. Intuitively, the pie-regions monitor the updates that will change the candidate list and may possibly change the results. The updates in circ-regions can make a previous result a false positive or can make a previous false positive an RNN. In other words, any update outside the monitoring region (the union of pie-region and circ-regions) cannot affect the results of query and can be ignored.

Figure: The Monitoring Region of a Continuous RNN Query
\includegraphics[width=5.0in]{background/fig/crnn.eps}
Muhammad Aamir Cheema 2007-10-11