Reproducibility for SLICE

Reproducibility for SLICE

This page contains source code, data sets, and scripts for our paper "SLICE: Reviving Regions-Based Pruning for Reverse k Nearest Neighbors Queries" published in ICDE 2014. We had another paper on RkNN query processing with a focus on a comprehensive experimental study in PVLDB 2015 ("Reverse k Nearest Neighbors Query Processing: Experiments and Analysis"). The source codes and data sets etc. for the PVLDB 2015 paper can be accessed by clicking here.

Copyright and Disclaimer

Copyright (c) 2013, Shiyu Yang, Muhammad Aamir Cheema
All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
* The names of the authors may not be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Download

Click here to download the package. Note that this package contains 32-bit binaries. To use the package on 64-bit system, download 64-bit binaries and replace these files with the files in ./bin folder.

This package contains the following

Source code of SLICE, InfZone and Six-regions approach (see files in ./code folder).
A program (see ./bin/datagen_multid) to generate synthetic data sets following Uniform and Normal distributions.
A program (see ./bin/rtree_gen_2d) to generate R-trees using the raw data files stored in ./Raw_files.
Real data set of POIs in North America (see ./realdata/NA.cnode). A program (see ./bin/norm) is also provided that normalizes the data set to a unit space. It also randomly divides the data set into two disjoint sets of almost equal sizes.
Scripts to generate synthetic and real data sets (./createRaw.sh and ./createReal.sh) and R-trees (./createRTree.sh) used in the paper. Scripts to run experiments shown in the paper (./runExperiments.sh). Scripts to draw figures shown in the paper (./drawFig.sh).

Instructions

To reproduce the experimental results in the paper, follow the steps below.

run createRaw.sh. It will create the synthetic data sets by following the parameters specified in the paper. The raw data files will be stored in ./Raw_files.
run createReal.sh. It will randomly split the real data set in two disjoint data sets of almost equal sizes. The script also normalizes the data to a unit space. The raw files are stored in ./Raw_files.
run createRTree.sh. It will create R-trees for all the data sets generated in the previous steps. The R-trees will be stored in ./RTrees.
run runExperiments.sh. It will run the three algorithms for the data sets and settings shown in the paper. The results (stats) of each algorithm for each experiments are stored in ./results.
run drawFig.sh. It will draw the figures (shown in paper) using the results files in ./results folder. Due to different CPU time on different systems, the appearance of the figures may not be optimal. However, it can be fixed by modifying the scripts appropriately.

FAQs

Q. Why the number of I/Os for six-regions approach is different each time I run an experiment.
A. As mentioned in the paper, a buffer of 10 pages is used for six-regions. Since the buffer uses random eviction strategy, the number of I/Os is different each time an experiment is run (although the difference is not huge). Since SLICE and InfZone do not use the buffer, the numbe of I/Os for these algorithm do not change.

This list of FAQs will be maintained (until mentioned otherwise). If you have any question, please contact us.

Reporting Bugs

We have extensively tested the source code by comparing the results of the three algorithms with a Brute Force algorithm for more than a Million queries on various data setts using different parameters. However, we cannot guarantee that the source code is bug-free. If you spot a bug, please contact me (email given at the home page) or Shiyu Yang (yangs AT CSE dot UNSW dot EDU dot AU).