How to replicate the experiments described in the FSE'15 paper?
- Please download the pre-configured VM here (sha1). We recommend the latest version of VirtualBox to execute the VM. The user/password is "qcoral". You will need a CPU with support for hardware virtualization (VT-x/AMD-v) to run the virtual machine; most of the modern Intel CPUs have support for it.
- Follow the instructions in the "REPLICATION" section below.
How to run the VM with VirtualBox?
- Download and install the latest version of VirtualBox.
- Open VirtualBox, click "File" -> "Import Appliance".
- Follow the instructions on the dialogs.
- Select the entry for "qCORAL VM" on the left side and click on "Start".
- After boot, login with user/password "qcoral" (username is the same as the password).
- Ignore any messages about unsupported software or updates that may appear :)
- Navigate to the qcoral folder.
REPLICATION
This artifact is organized as follows:
- src/ - Source code for qCORAL
- docs/ - Additional documentation for qCORAL (like the grammar for constraints)
- libs/ - Libraries used by qCORAL
- inputs/ - Benchmarks and other input files used in our experiments. The benchmarks used in our FSE'15 paper are located in the inputs/dwqcoral folder. See the README there for more info about the organization of the subjects.
- scripts/ - Scripts to run our experiments.
- output/ - The output of the scripts mentioned above is sent to here.
- raw-results/ - The raw results from our experiments. See the README there for more information about the files. The results for experiments with qCORAL (not Mathematica) were redone in a different machine (an EC2 c4.4xlarge instance from Amazon AWS); there may be some differences between these results and the results described in our paper. Our observations and conclusions still hold, though.
BASELINE RESULTS
How to reproduce the discretization results
- From the scripts folder:
- Run ./run_experiments_discretization_qcoral.sh. The raw data will be sent to '../output/data-discretization.csv"
Each entry in the output file contains the name of the subject, statistics (estimate, standard deviation of the variance estimate, time...) and the "mode": XYZ_discrete-3 if the discretization was done with three regions, and XYZ_discrete-6 if it was done with six regions.
- Run ./process_discretization_results.r ../output/data-discretization.csv. A csv file with statistics computed from the raw data (avg. estimate and avg. time, grouped by the input file and the "mode") will be printed to the terminal.
This script should take one to two hours to run.
Reproducing from our raw data
- Run ./process_discretization_results.r ../raw-results/data-discretization.csv.
Mathematica results
We translated the benchmarks to Mathematica scripts. You can find them in the folder inputs/dwqcoral/mathematica. Check the README file and runMath.sh inside the folder for instructions on how to run the experiments.
Those scripts will take a lot of memory to run, and a long time to finish (we aborted at least one manually after 40+ hours of execution). Also, the VM does not have Mathematica installed - you will need to obtain a license and install the tool to be able to run the scripts.
If you want to translate qCORAL input files to the Mathematica format, check the classes coral.util.visitors.MathematicaVisitor and coral.util.callers.MathematicaCaller.
Reproducing from our raw data
- From the raw-results/mathematica folder:
- Run ./sumResults.py exponential/* normal/* twonormal/*. A csv file will be printed in the terminal. Due to an mistake from our part, some results in the 'exponential' folder do not have the execution time available. Please see the README file for more details about this issue and other relevant information.
QCORAL RESULTS
Distribution-Aware results
- From the scripts folder:
- Run ./run_qcoral_dw_experiments.sh. The raw results will be sent to the file ../output/distaware/qcoral-dw-nonincremental.csv.
- Run ./process_dw_results.r ../output/distaware/qcoral-dw-nonincremental.csv. A csv file with statistics computed from the raw data (avg. estimate, avg. standard deviation and avg. time, grouped by the input file) will be printed to the terminal.
This script should take one to two hours to execute.
Reproducing from our raw data
- Run ./process_dw_results.r ../raw-results/distaware/qcoral-dw-nonincremental.csv.
Iterative sampling results
- From the scripts folder:
- Run ./run_qcoral_iterative_experiments.sh. Multiple files will be written to the ../output/incremental folder.
- (Only if you aren't using the provided VM and you have a multicore CPU) Before running the script, change the -j1 argument to parallel on line 30 to the number of parallel processes you want to run (if you want to run 4 processes in parallel, use -j4). This will speedup the experiment, but make sure there is enough memory to run all the JVMs (log files popping in your home folder are a sign of trouble).
Each csv file in the `output folder contains measurements from one distinct execution. The lines contain information about the estimation process after each iterative step of the algorithm. The last line of the file will probably be incomplete, since we kill the process after 31 minutes.
Due to the large size of the raw data (~11 gb)), we will need to sample it:
- Run ./sample_iterative_results.sh ../output/incremental ../output/sampled-incremental. This will take every 30th line from the input files and output them to files with the same name in the folder ../output/sampled-incremental
- Run ./process_iterative_results.sh ../output/sampled-incremental ../output/processed-incremental. This script will write multiple files to the ../output/processed-incremental, parameterized by the subject name and the step size of the incremental algorithm:
- incremental_stdev_vs_time_$SUBJECT_$STEP.csv: Contains the average standard deviation and the average time grouped by the number of samples and the heuristic used (baseline, local...).
- incremental_stdev_vs_time_$SUBJECT_$STEP.pdf: Plots of the csv file with the same name (like in Figure 5). The VM does not have an window system installed, but you can connect via ssh and copy the pdf files for later viewing.
- incremental_time_to_precision_$SUBJECT_$STEP.csv: Contains the average time needed by each heuristic to reach a determined precision.
This experiment will take a few days, depending on the number of processors available in your machine (if you aren't running this from the provided vm; see above). To reduce the execution time, you can remove some of the PRNG seeds from the inputs/seeds-5 file. Each seed corresponds to ~24h of computation time.
Reproducing from our raw data
- Run ./process_iterative_results.sh ../raw-results/incremental/ ../output/processed-incremental