Initial $ k$ NN Computation

The basic idea of $ k$ NN computation algorithm is to access the cells around query point $ q$ round by round. A round $ C_i$ contains all the cells that intersect the circle of radius $ r_i = r_0 + i \delta$ centered at $ q$ . To collect a round $ C_i$ , we call our CircularTrip algorithm with radius set as $ r_i$ . Formally, $ C_i = \{\forall c \vert  mindist(c, q) < r_i \leq maxdist(c, q)\}$ . $ r_0$ is the the first circle's radius. Obviously, $ r_0$ is at most $ maxdist(c^q, q)$ ; otherwise cell $ c^q$ will not be accessed by the algorithm. Examples of rounds are shown as the shaded cells in Fig. [*]. In each round, the algorithm accesses the cells in ascending order of their $ mindist(c,q)$ . The algorithm terminates when the next cell to be accessed has $ mindist(c,q) \geq
q.dist_k$ .

Figure: A Nearest Neighbor Query
\includegraphics[width=3.0in]{kNN/fig/initialNN.eps}

The detailed $ k$ NN computation algorithm is shown in Algorithm [*]. In the algorithm, the cells $ c$ in a round are maintained in a heap $ H$ so that they are accessed in ascending order of their $ mindist(c,q)$ . Initially, all cells of round $ C_0$ are inserted into a heap. In each iteration, if the top entry $ e^H$ of the heap with $ mindist(e^H, q)$ not smaller than $ q.dist_k$ ($ = \infty$ , initially), it is immediately known that $ k$ NN have been found and the computation terminates. Otherwise, $ dist(p,q)$ is computed for every data point $ p \in e^H$ and $ q.kNN$ and $ q.dist_k$ are updated if $ dist(p,q)$ is smaller than $ q.dist_k$ . After all the cells in the heap are processed, the current radius $ r$ is checked with $ q.dist_k$ . If $ r$ is smaller than $ q.dist_k$ , all the cells in the next round with radius as $ \min\{r +
\delta, q.dist_k\}$ are collected. Among them, the cells which were not processed before (i.e., $ q$ is not in their influence list) are inserted into the heap and examined in the same way till $ k$ NN are found.

Lemma

In a grid consisting of cells with size $ \delta \times \delta$ , given a cell $ c$ and a query point $ q$ where $ c$ does not contain $ q$ , $ \delta \leq
maxdist(c, q) - mindist(c, q) \leq \sqrt{2}\delta$ .

According to Lemma, a cell is intersected by at most two consecutive circles (e.g., the dark shaded cells in Fig. [*]). Although these cells are encountered twice during $ k$ NN computation (i.e., these cells appear in two rounds), they are accessed once only. This is because for a query $ q$ (1) our $ k$ NN algorithm only accesses the cells where $ q$ is not in their influence lists; and (2) $ q$ will be inserted into its influence list after a cell is accessed. In fact, Theorem [*] proves the upper bound of the total number of times the cells are encountered in our algorithm.

Theorem

In $ k$ NN algorithm, the total number of times the cells are encountered is at most $ 1.27$ times of the number cells in the minimum set of cells.

Proof

Let $ R$ be $ q.dist_k / \delta$ and $ \vert C_i\vert$ be the number of cells in round $ C_i$ . The total number of times the cells are encountered is at most $ 4 R^2$ and the number of cells in the minimum set of cells is at least $ \pi R^2$ . As a result, the ratio of them is at most $ 4 R^2 / \pi R^2 = 1.27$ .


\begin{algorithm}
% latex2html id marker 4442
[htb]
\caption{\bf ComputeNN( $G$...
...t of $c$;
\ENDIF
\ENDWHILE
\RETURN $q.kNN$;
\end{algorithmic}\end{algorithm}

Muhammad Aamir Cheema 2007-10-11