Performance Analysis

First we present the proof that our algorithm answers the $ k$ NN queries by accessing minimum number of cells and then we compare it with CPM.

Proof of optimality and correctness: Given a query $ q$ , in our continuous $ k$ NN algorithm, the minimal set of cells are all accessed and only these cells are accessed.4.

Proof

Let $ q_{old}.dist_k$ be the distance of $ k^{th}$ nearest neighbor from $ q$ before the update and $ q_{new}.dist_k$ be the distance of $ k^{th}$ nearest neighbor after the result has been updated. For the case, when $ q_{new}.dist_k<q_{old}.dist_k$ , our algorithm updates the results without accessing any cell (i.e., in case when number of incoming updates is greater than outgoing updates). So we only consider the case when $ q_{new}.dist_k\geq q_{old}.dist_k$ .

First we identify the minimal set of cells $ C$ that has to be accessed in order to guarantee the correctness then we will show that our algorithm does not access any cell $ c' \notin C$ .

Corollary 1: Only the cells $ c$ that has $ maxdist(c,q)>q_{old}.dist_k$ can contain the new nearest neighbor.

Corollary 2: Only the cells $ c$ that has $ mindist(c,q)\leq q_{new}.dist_k$ can contain the new nearest neighbor.

From corollaries 1 and 2, it can be inferred that the minimal set of cells is $ C$ , such that for all $ c\in C$ , $ maxdist(c,q)>q_{old}.dist_k$ and $ mindist(c,q)\leq q_{new}.dist_k$ .

First we show that our algorithm does not access any unnecessary cell. To update the results, our algorithm starts by calling CircularTrip with radius $ q_{old}.dist_k$ . As the radius of subsequent calls to CircularTrip is always greater than $ q_{old}.dist_k$ , it can be immediately verified that no cell $ c$ can be returned such that $ maxdist(c,q)<q_{old}.dist_k$ (satisfies corollary [*]). Moreover, our algorithm accesses cells in strictly ascending order and stops as soon the next cell has $ mindist
(c,q)\geq q_{new}.dist_k$ (satisfies corollary [*]).

Now as a proof of correctness we show that our algorithm accesses all the cells in minimal set of cells $ C$ . The algorithm starts by calling CircularTrip with radius $ q_{old}.dist_k$ and the cells $ c$ that have $ maxdist(c,q)\geq q_{old}.dist_k>mindist(c,q)$ are retrieved. Algorithm iteratively calls CircularTrip with radius increased by $ \delta$ unless $ k$ NNs are found. According to lemma [*], this increment of $ \delta$ guarantees that no cell is missed.

Note that initial computation can be considered a special case of update handling where $ q_{old}.dist_k$ is zero.

As compared with CPM, our CircularTrip-based algorithm has following advantages.

To illustrate that CPM accesses unnecessary cells during update handling, we present a concrete example in Fig. [*], where update handling of a $ k (=4)$ NN query $ q$ is shown. Fig. [*] shows the initial computation of $ q$ where the result is $ q.kNN=\{p_1,p_2,p_3,p_4\}$ . The light-shaded cells are accessed during this computation and form the visit list. The dark-shaded cells and four rectangles ( $ L_5, U_6, R_6$ and $ D_6$ ) are the entries in heap $ H$ after the initial results have been computed by CPM..

Figure: An Example of Updates Handling by CPM

[A $ k (=4)$ -NN query evaluated by CPM]\includegraphics[width=3.0in]{kNN/fig/performance-cpm-1.eps} [$ p_4$ is deleted, $ \{p_1,p_2,p_3,p_5\}$ is updated result ]\includegraphics[width=3.0in]{kNN/fig/performance-cpm-2.eps}

Consider an update deletes the object $ p_4$ , CPM cannot update the result by starting from the heap $ H$ because the new result $ p_5$ is in visit list (a cell that was deheaped from $ H$ ). In order to update the results, CPM first needs to access all the cells in visit list and then it continues with heap $ H$ . It stops when the next entry (cell or rectangle) in heap has $ mindist\geq dist(p_5,q)$ . Fig. [*] shows the update of the results. During this update, CPM accesses light-shaded cells of Fig. [*]. The heap $ H$ contains the dark-shaded cells and four rectangles ( $ L_6,U_6,R_7$ and $ D_6$ ) after the result has been updated.

Fig. [*] shows the update handling of the same query by our CircularTrip based algorithm. When $ p_4$ is deleted, our algorithm calls CircularTrip with radius set as $ dist(p_4,q)$ as shown by dotted circle in Fig. [*]. The new result $ p_5$ is found and the algorithm calls CircularTrip with radius $ dist(p_5,q)$ just to confirm that it does not miss any result. The algorithm accesses only the shaded cells of Fig. [*] and it can be seen that number of cells accessed by our algorithm is much smaller than that of CPM (compare the light-shaded cells of Fig. [*] and Fig. [*].

Figure: An Example of Updates Handling by CircularTrip-based Algorithm

[A $ k (=4)$ -NN query]\includegraphics[width=3.0in]{kNN/fig/performance-circ-1.eps} [$ p_4$ is deleted from $ q.kNN$ and $ p_5$ is added]\includegraphics[width=3.0in]{kNN/fig/performance-circ-2.eps}

CPM stores visit list and search heap $ H$ to use them during updates. The size of visit list and heap $ H$ can be approximated to be $ (4+\lceil 2\times q.dist_k/\delta \rceil ^2)$ . (i.e., four rectangles + cells in rectangle of side length $ 2\times q.dist_k$ ).

The size of both the visit list and search heap $ H$ never decreases unless $ q$ reports location update in which case both the visit list and heap $ H$ are deleted. Even when the influence region is shrunk, the size of visit list and heap $ H$ remains unchanged. On the other hand, when the influence region expands the visit list and heap $ H$ also expand.

Once CPM receives the location update from a query $ q$ , $ q$ and all of its related information (i.e., $ q.kNN$ and the book-keeping information) are deleted and then a new continuous query is issued on the new location and computed from the scratch. Clearly, this approach does not utilize the previously computed information. Based on density function, our algorithm makes an optimistic estimate and utilizes the previously computed information if it is expected to be more efficient. Otherwise, the results are updated by deleting and treating $ q$ as a new query. In any case, our algorithm performs better than CPM (our experiments demonstrate that the initial computation module of CircularTrip based algorithm is faster than initial computation module of CPM because we do not need to maintain extra book-keeping information).

Muhammad Aamir Cheema 2007-10-11