The most popular variant of nearest neighbor query is reverse nearest neighbor queries that
focuses on the inverse relation among points. A reverse nearest neighbor (RNN) query
is
to find all the objects for which
is
their nearest neighbor. A reverse nearest neighbor query is formally defined below.
Definition: Given a set of objects
and a query object
, a reverse nearest neighbor query is to find a
set of objects
so that for any object
and
,
.
set of a query
may be empty or may have one or more elements.
Korn et al. [KM00] defined the RNN queries and provided a large number
of applications. For example, a two-dimensional RNN query may ask the set of customers affected
by the opening of a new store outlet location in order to inform the relevant customers. This
query can also be used to identify the location which maximizes the number of potential customers.
Consider another example, an RNN query may be issued to find the stores outlets that are affected
by opening a new store outlet at some specific location.
Note that in first example, there are two different sets (stores
and customers) involved in RNN query whereas in second example there is only one set (stores).
Korn et al. defined two variants of RNN queries. A bichromatic query (the first example)
is
to find the reverse nearest neighbors where the underlying data set consists of two different types of objects.
A monochramtic RNN query (the second exampe) is to find the reverse nearest neighbors where the
data set contains only one type of objects.
The problem of reverse nearest neighbors is extensively
studied in past few years [SAA00,YL01,SRAA01,MZ02,LNY03,SFT03,TPL04,XZKD05,YPMT05,XZ06,TYM06,ABK$^+$06]. Below
we briefly describe the most popular and general algorithms only.
Korn and Muthukrishnan [KM00] answer RNN query by pre-calculating a circle of each object
such that the
nearest neighbor of
lies on the perimeter of the circle as shown in Fig.
.
The MBR of all these circles
are indexed by an R-tree called RNN-tree. The problem of RNN query is reduced to a point location query
on RNN-tree that returns all the circle containing
. For examples, circle of
and
contain
so both are the reverse nearest neighbors of
.
Yang et al. [YL01] improved their method by RdNN-tree. Similar to RNN-tree, a leaf node of the RdNN tree contains the circles of data points. The intermediate nodes contain the minimum bounding rectangles
(MBR) of underlying points along with
the maximum distance from every point in the sub-tree to its nearest neighbor.
The problem with above mentioned techniques is that they rely on pre-computation and cannot deal
with efficient updates. In order to alleviate this problem, Lin et. al [LNY03]
introduced a method for bulk insertion in the RdNN-tree.
Stanoi et al. [SAA00] eliminate the need for pre-computing the nearest neighbors
of all data points by utilizing an interesting property of reverse nearest neighbor queries. Their approach
is called SAA hereafter. SAA utilizes
the fact that for any two-dimensional RNN query
, the number of reverse nearest neighbors cannot
exceed six. The algorithm divides the space around
in six regions of equal size
to
as shown in Fig.
. They observe that for a nearest neighbor object
of
in region
; either
is the RNN of
or there is no RNN in
. For example,
is the
nearest neighbor of
in region
but it is not the RNN because
lies closer to it than
. Consequently, there is no RNN of
in
and the algorithm does not need to consider other objects
in this region. Based on this interesting property, SAA answers RNN queries in two steps. In first step,
for each of the six regions a nearest neighbor is found. All these nearest neighbors form a candidate
list. In second step, a NN query is issued for each object in the candidate list and the algorithm discards
the objects from candidate list for which
is not the nearest neighbor.
SAA suffers with the curse of dimensionality. The number of regions to be searched for candidate
objects increases exponentially with the dimensionality. Singh et al. [SFT03]
propose a solution that performs better than SAA in high-dimensional space. Given a system parameter
, their proposed approach first computes
nearest neighbors of
by using conventional
R-tree approach. These
objects form the candidate list. In the second step, they eliminate
the objects from candidate list that are closer to some other object in candidate list than
. Finally,
for each object
in candidate list, a boolean range query with range
is issued. A boolean
range query is different from a conventional range query in the sense that it terminates as soon as
the first object is found. Clearly, the objects for which the boolean range query does not return any object
are the RNNs of
. However, depending on the value of
, their approach may miss some reverse nearest
neighbor that was not retrieved by
nearest neighbor search in first step of the algorithm.
Tao et al. [TPL04] utilize the idea of perpendicular bisector to reduce the
search space. Let
be a line segment joining two points
and
, a perpendicular bisector
of the two points
and
is perpendicular to
the line
and passes through the midpoint
of
.
Consider the example of Fig.
where a perpendicular bisector
between two points
and
is shown. This perpendicular bisector
divides the space into two half planes
and
where
contains
and
contains
. It can be noted that there cannot be any RNN in the plane
other than
(
is closer to all points in
than
). Based on this property,
any MBR can be pruned that falls completely in
. The MBR
is pruned in the example
of Fig.
. Even if some MBR does not fall completely in a half plane
of an object
, it can be pruned if it lies entirely in the union of
and
where
is the half-plane of another object
. Fig.
shows the pruning of MBR
because
it falls in the union of half-planes
and
. Their approach can also be extended
to answer R
NN queries that is to find all objects for which
is one of their
nearest neighbors.
To the best of our knowledge, the only algorithm that can answer continuous reverse nearest neighbor queries
is proposed by Xia et al. [XZ06]. They utilize the idea presented in [SAA00].
Consider the example of Fig. where a continuous reverse nearest neighbor (CRNN) query
is shown along with six regions
to
. The monitoring region of
consists of two parts: pie-regions and
circ-regions. A pie-region in a partition
is a pie centered at
having the nearest neighbor
of
in
on the perimeter. Let
be the nearest neighbor of
in a region
, the circ-region
is a circle centered at
having
or an object nearer to
than
on its perimeter.
The shadowed areas are the six pie-regions and the circles are six circ-regions. Intuitively, the pie-regions
monitor the updates that will change the candidate list and may possibly change the results. The updates
in circ-regions can make a previous result a false positive or can make a previous false positive an RNN. In other
words,
any update outside the monitoring region (the union of pie-region and circ-regions) cannot affect the results
of query and can be ignored.
Muhammad Aamir Cheema 2007-10-11