Reproducibility for RkNN Experiments



This page contains source code, data sets, pseudocodes, and scripts for our paper "Reverse k Nearest Neighbors Query Processing: Experiments and Analysis" published in PVLDB 2015.

Copyright and Disclaimer

Copyright (c) 2014, Shiyu Yang, Muhammad Aamir Cheema
All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
* The names of the authors may not be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Download

Click here to download the package. Note that this package contains 32-bit binaries. If you compile the code to 64-bit binaries, you will need to use 64-bit R-tree generator. In that case, download 64-bit data (and R-tree) generator and replace these files with the files in ./bin folder in the original package.

This package contains the following: Click here and download the real data sets. You will need to move these files to ./Raw_files folder found in the extracted package.

Instructions

To reproduce the experimental results in the paper, run runAll.sh or follow the steps below.

Pseudocode

As mentioned in the paper, we implemented some simple yet effective optimizations in each algorithm. Click here to download the pseudocodes that we followed in our implementation. We have tried our best to implement each algorithm in an efficient and fair manner. If you have suggestions to improve certain algorithm, please contact us.

FAQs
This list of FAQs will be maintained (until mentioned otherwise). If you have any question, please contact us.

  1. Question. We downloaded the code and tried to run the executables but got the following error messgae:
    /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory
    Answer. It appears that you are trying to run 32-bit binaries on a 64 bit system that is missing some of the 32 bits libraries. Please either use 64-bit binaries (link provided in Download section above) or install the 32 bit libraries. For more details, click here

  2. Question. Why the number of I/Os for six-regions and FINCH are different each time I run the same experiment.
    Answer. Since the buffer uses random eviction strategy, the number of I/Os may be different each time an experiment is run especially when the number of buffers used is small (although the difference is not huge). Since TPL, TPL++, InfZone and SLICE do not use the buffer, the number of I/Os for these algorithms do not change.

  3. Question. I modified the script ./createRaw.sh to generate two small synthetic data sets containing 1000 and 1001 points, respectively (see below).
    for size in 1000 1001
    do
          ./datagen_multid $size unif 2
    done

    I noticed that the first 1000 points of the two synthetic data sets are always the same. Is there something wrong with the data generatore (./bin/datagen_multid)?
    Answer. The data generator works fine on a few machines we tested, therefore, we are not able to reproduce the error for extensive testing. However, another user had reported the same problem. Note that the psuedo-random generator takes time(NULL) as a seed, i.e. srand( time (NULL) ). The apparent reason of the reported problem is that the same seed is being passed for each call to ./datagen_multid (may be the data sets being generated are too small???). This problem can be fixed by adding sleep 1 in ./createRaw.sh before every call to ./datagen_multid. This ensures that it gets a different seed every time it is called. If you are curious how the data generator works, you can download the source code of the data generator.
Reporting Bugs

We have extensively tested the source code by comparing the results of all algorithms with a Brute Force algorithm for more than a Million queries on various data setts using different parameters. However, we cannot guarantee that the source code is bug-free. If you spot a bug, please contact me (email given at the home page) or Shiyu Yang (yangs AT CSE dot UNSW dot EDU dot AU).