Aggregate Nearest Neighbor Queries

Aggregate nearest neighbor (ANN) queries are also called group nearest neighbor queries (GNN) so we will use these names interchangeably hereafter. An aggregate nearest neighbor query retrieves the data points with the smallest sum of distances to all query points in a query group $ Q$ . The formal definition is given below.

Definition: Given a set of points $ P$ and a group of queries $ Q$ , an aggregate $ k$ nearest neighbor query is to find a set of points $ R$ that contains $ k$ objects such that for any $ p\in (P-R)$ and any $ p'\in R$ , $ dist(p',Q)\leq dist(p,Q)$ where for any object $ o$ , $ dist(o,Q)=\sum_{\forall q_i\in Q}dist(o,q_i)$ .

As an example, consider few friends want to meet at some city so that the overall distance travelled is minimized. In this case the friends form the query group and the cities are data points. An aggregate nearest neighbor query will return the city so that the total distance travelled by all of them is minimum. Aggregate nearest neighbor queries can also be used for clustering and outlier detection. Papadias et al. [PSTM04] give another interesting application of aggregate nearest neighbor queries. The operability and speed of very large circuit depends on the relative distance between various components, the aggregate nearest neighbor queries can be issued to find the abnormalities and guide relocation of components [NO97].

Papadias et al. [PSTM04] present three algorithms to answer aggregate nearest neighbor queries called multiple query method (MQM), single point method (SPM) and minimum bounding method (MBM). All these algorithms traverse R-tree to answer aggregate nearest neighbor (ANN) queries. Since their experiments show that MBM outperforms other two algorithms, below we briefly describe MBM.

To prune the search space, MBM uses a minimum bounding rectangle $ M$ that contains all query points in $ Q$ . The algorithm traverses R-tree and prunes the nodes that cannot contain any candidate point. They prune the nodes based on two observations. Let $ M$ be the MBR of $ Q$ , and $ best\_dist$ be the distance of the best ANN found so far. A node $ N$ can be pruned if $ mindist(N,M)\geq best\_dist/n$ where $ mindist(N,M)$ is the minimum distance between nodes $ N$ and $ M$ and $ n$ is the cardinality of $ Q$ . Consider the example of Fig. [*], where the node $ N_1$ can be pruned because $ mindist(N_1,M)=3>(best\_dist/2=2.5)$ . This pruning can also be applied on data points. Whenever a data point $ p$ is retrieved, first its minimum distance from the MBR $ M$ is calculated and the data point $ p$ can be pruned if $ mindist(p,Q)>best\_dist/n$ .

Figure: Node $ N_1$ can be pruned
\includegraphics[width=7.0in]{background/fig/gnn1.eps}

Second observation provides a tighter pruning bound. Any node $ N$ can be pruned if $ \sum_{\forall q_i\in Q}mindist(N,q_i)\geq best\_dist$ where $ mindist(N,q_i)$ is the minimum distance between the node $ N$ and a query point $ q_i$ . In the example of Fig. [*], the node $ N_2$ is pruned because $ mindist(N_2,q_1)+mindist(N_2,q_2)=6>best\_dist$ , .

Li et al. [LLHH05] propose two ellipse-based pruning methods for the aggregate nearest neighbor queries. Yiu et al. [YMP05] solve the problem of aggregate nearest neighbors on road-network. In all above mentioned queries, the aim is to minimize the total sum of the distances to the query points from a point $ p$ . Mouratidis et al. [MHP05] introduce two other variants of aggregate nearest neighbor queries. In first variant the aim is to find a point $ p\in P$ with the smallest distance from any query point in $ Q$ . We call such queries as minimum aggregate nearest neighbor queries and define them as follows.

Minimum ANN Queries: Given a set of data points $ P$ and a group of query points $ Q$ , a minimum ANN query is to find a point $ p\in P$ such that for any other point $ p'\in P$ , $ mindist(p,Q)\leq mindist(p',Q)$ where for any object $ o$ , $ mindist(o,Q)=min_{\forall q_i\in Q}dist(o,q_i)$ .

Consider, for an example, that few friends need to buy one laptop and there are many shops in the city that sells their required laptop. They decide that the person who has some shop nearest to his home will go to buy the laptop. They issue a minimum ANN query. Their houses can be considered as data points and the shops can be treated as query points.

Another variant of ANN is to find the object $ p$ that has lowest maximum distance from query points in $ Q$ . We name such queries MinMax ANN queries and define as follows.

MinMax ANN Queries: Given a set of data points $ P$ and a group of query points $ Q$ , a MinMax ANN query is to find a point $ p\in P$ such that for any other point $ p'\in P$ , $ maxdist(p,Q)\leq maxdist(p',Q)$ where for any object $ o$ , $ maxdist(o,Q)=max_{\forall q_i\in Q}dist(o,q_i)$ .

Consider as an example that there are many proposed locations to establish a fire brigade unit in a small town. The mayor wants to establish the fire brigade unit at a place so that if any of the houses in town catches fire, the distance to that house from fire brigade unit is minimum compared to the other proposed location. Clearly, the target is to find a location so that the distance of farthest house from the fire brigade unit is minimized. A MinMax ANN query will serve the purpose. The houses in this case are query points and proposed locations are data points.

Mouratidis et al. also propose the solution of these variants of aggregate nearest neighbor queries that is very similar to CPM algorithm discussed in Section [*].

Muhammad Aamir Cheema 2007-10-11