Distributed Processing of $ k$ NN Queries

In previous problem settings, a central server is solely responsible for answering spatial queries assuming that the clients have no or very few computation and storage capability. Clients send queries to server, the server computes the results and sends answer back to the clients. However, database community has also investigated the option of utilizing the computation and storage capabilities of clients. One of the main design principles is to develop efficient mechanisms that utilize the computational power at mobile objects, leading to significant savings in terms of server load and/or messaging cost when compared to solutions relying on central processing of location information at the server

Prabhakar et al. [PXK$^+$02] proposed a method to continuously monitor range queries, called Q-index. They monitor the static range queries over moving data objects. The main idea is to send a safe region to each moving object with the guarantee that the result of any query will not change unless the object leaves its safe region. More specifically, each object $ p$ is assigned a circular or rectangular region such that $ p$ needs to issue a location update only if it leaves its region otherwise it does not affect the results of any query. They build R-tree index on queries instead of moving objects due to intensive updates of moving objects. Each object probes the index to find the query it influences. Cai et al. [CHC04] propose MQM, another range monitoring method. The workspace is divided into rectangular sub-domains and each object is aware only of the range queries intersecting its resident region which may consist of many sub-domains. Each object reports location update only when it crosses the boundary of any of these queries. When an object leaves its resident region, the server computes a new resident region and sends it to the object. The server computes the resident region by using a binary partitioning tree which maintains for each sub-division of the workspace the queries that intersect it. The number of sub-domains that form an object's resident region depends on how many queries it can store and process.

In both of the above approaches, the messaging cost of mobile objects is reduced because mobile objects report location updates only when the result of some query is expected to be changed. Gediket al. [GL04] propose an approach called MobiEyes that significantly reduces the server load and the messaging cost. The idea of MobiEyes is similar to that proposed in [CHC04] but MobiEyes can answer the moving range queries which latter fails to answer. More specifically, the work space is partitioned using a grid and for each query a monitoring region is maintained which is the union of grid cells the query can potentially intersect. Any object that falls in monitoring region of a query receives the information about query position and velocity. The objects report to the server when they enter or leave the predicted query region. Note that this way the objects monitor their spatial relationships with queries locally. Whenever a query moves out of its current cell or changes its velocity, it notifies the server of this change and server relays such position change information to the appropriate subset of objects through broadcasts.

Hu et al. [HXL05] presented first distributed approach for monitoring continuous nearest neighbor queries. They also employ the idea of safe regions which is a rectangular region that guarantees that the results of any query will not change as long as the object is inside its safe region. Fig. [*] shows two queries where $ Q_1$ is a nearest neighbor query and $ Q_2$ is a range query. The safe regions of two objects $ a$ and $ b$ are $ S_a$ and $ S_b$ , respectively, shown shaded in the figure. The current results of both queries will not change if both objects remain in their safe regions.

Figure: Example of Safe Regions
\includegraphics[width=5.0in]{background/fig/generic.eps}

Initial results of query are calculated using best-first search (BFS) on R-tree. The moving objects issue location updates to the server only when they leave their safe regions. Server receives the location updates and incrementally finds the new results of affected queries and computes the new safe regions for the appropriate objects. Consider the example of Fig. [*], where $ a$ moves out of $ S_a$ to a new location $ a'$ , the result of $ Q_1$ becomes undecided as either of the two objects could be the nearest neighbor. To resolve the ambiguity, the server needs to know the exact location of $ b$ so server asks object $ b$ to send its current location and computes the new result of $ Q_1$ . Safe regions of both the objects $ a$ and $ b$ are also needed to be updated.

Mouratidis et al. [MPBT05] propose a threshold-based algorithm that aims to minimize the communication cost between server and the data objects. Their proposed method can be used for multiple queries with unknown motion pattern and for any distance definition. The basic idea is, for each $ k$ NN query $ q$ the objects are categorized in two sets. The objects that belong to the result are called inner objects and the remaining objects are called outer objects. For each object $ p$ , a threshold is calculated so that as long as the object is within the range defined by the threshold it does not change the result of the query. For clear illustration, consider the example of Fig. [*] where the three nearest neighbors $ p_1$ , $ p_2$ and $ p_3$ are inner objects and $ p_4$ , $ p_5$ and $ p_6$ are outer objects. The thresholds values are $ t_1$ , $ t_2$ and $ t_3$ which define the range for each object such that if its distance from $ q$ lies within the range the result is guaranteed to be unchanged. Each threshold is set to the middle of distances of two consecutive neighbors of the query. More specifically, the distance range for $ p_1$ is $ [0,t_1)$ , for $ p_2$ is $ [t_1,t_2)$ , for $ p_3$ is $ [t_2,t_3)$ and for remaining objects $ p_4$ ,$ p_5$ and $ p_6$ is $ [t_3,\infty)$ .

Figure: Threshold-Based Algorithm for Monitoring a 3-NN Query
\includegraphics[width=5.0in]{background/fig/threshold.eps}

When an object moves out of the range, it reports to the server and the server updates the results and may assign new threshold. Consider the example when $ p_1$ moves to $ p'_1$ , the order of the two NNs may be changed. Server asks for the location of $ p_2$ and if it is closer to $ q$ than $ p_1$ , the results are changed accordingly. Otherwise, there is no result change and only the new threshold $ t_1$ is computed and objects $ p_1$ and $ p_2$ are informed of this new threshold. Clearly, all other objects are not required to be involved during this update.

Now consider the case where $ p_3$ moves to $ p'_3$ and issues a location update. Any of the outer objects may become the new $ 3^{rd}$ nearest neighbor depending on their current locations. Server broadcasts a request to all the outer objects to send their latest locations and updates $ p_6$ as the new $ 3^{rd}$ -NN. A new threshold $ t_3$ is computed based on the new positions of $ 3^{rd}$ -NN ($ p_6$ ) and $ 4^{th}$ -NN ($ p_3$ ) and server broadcasts this threshold value to all outer objects and $ p_6$ .

Muhammad Aamir Cheema 2007-10-11