Handling Query Updates

To handle update $ \langle q.id, q_{pre}, q_{cur} \rangle$ of query $ q$ , the straight forward way is to delete its results and all of its related information (e.g., remove $ q$ from the influence lists of related cells), and then issue a new query on location $ q_{cur}$ and compute it from scratch. Clearly, this update computation does not utilize the previous computed information. Let $ q_{pre}.dist_k$ ( $ q_{cur}.dist_k$ ) be the distance of $ k^{th}$ NN from $ q$ when $ q$ at $ q_{pre}$ ($ q_{cur}$ ). The $ k$ NN results of $ q$ at location $ q_{pre}$ and $ q_{cur}$ share nothing if and only if $ dist(q_{pre},
q_{cur}) > q_{pre}.dist_k + q_{cur}.dist_k$ . Without loss of generality, we assume dataset follows uniform distribution. Consequently, $ q_{pre}.dist_k = q_{cur}.dist_k$ . Note that for other distributions, the similar formula can be derived by density functions. Therefore, in order to maximize sharing previous computed results, we delete the previous query and issue a new query only when query point moves farther than $ 2 q_{pre}.dist_k$ .

Handling single query update is described as follows.

$ \bullet$ Case 1: $ dist(q_{pre}, q_{cur}) \geq 2q_{pre}.dist_k$ . Query $ q$ and its $ k$ NN result are discarded first. Then, we remove $ q$ from the influence lists of cells which lie in the rectangle containing the circle with center at $ q_{pre}$ and radius as $ q_{pre}.dist_k$ . After that, a new continuous $ k$ NN query is issued on $ q_{cur}$ .

$ \bullet$ Case 2: $ dist(q_{pre}, q_{cur}) < 2q_{pre}.dist_k$ . We first calculate $ dist(p, q_{cur})$ for every $ k$ NN $ p$ of $ q_{pre}$ and update $ q_{cur}.dist_k$ as the maximum of them. i.e., $ q_{cur}.dist_k=max_{\forall p\in q.kNN}$ $ dist(p, q_{cur})$ . Then, recompute $ k$ NN for $ q_{cur}$ in the similar way to the initial $ k$ NN computation. The first circle's radius $ r_0$ is $ max\{max(c^q,q),q_{pre}.dist_k - dist(q_{pre}, q_{cur})\}$ . During computation, the data points in $ q_{pre}.kNN$ are skipped as they have been computed already. Clearly, the cells with $ maxdist(c, q_{pre})
\leq q_{pre}.dist_k$ are also skipped as all data points in these cells belong to $ q_{pre}.kNN$ .

Figure: Handling Query Updates

[$ p_1$ is found in first round]\includegraphics[width=3.0in]{kNN/fig/queryUpdates-1.eps} [$ p_1$ is confirmed as NN]\includegraphics[width=3.0in]{kNN/fig/queryUpdates-2.eps}

Muhammad Aamir Cheema 2007-10-11