Comparison of density estimation methods for astronomical datasets
Johann Bernoulli Institute for Mathematics and Computer Science, University
of Groningen, PO Box
407, 9700 AK
e-mail: firstname.lastname@example.org; email@example.com; firstname.lastname@example.org
2 Kapteyn Astronomical Institute, University of Groningen, PO Box 800, 9700 AV Groningen, The Netherlands
e-mail: email@example.com; firstname.lastname@example.org
Accepted: 2 May 2011
Context. Galaxies are strongly influenced by their environment. Quantifying the galaxy density is a difficult but critical step in studying the properties of galaxies.
Aims. We aim to determine differences in density estimation methods and their applicability in astronomical problems. We study the performance of four density estimation techniques: k-nearest neighbors (kNN), adaptive Gaussian kernel density estimation (DEDICA), a special case of adaptive Epanechnikov kernel density estimation (MBE), and the Delaunay tessellation field estimator (DTFE).
Methods. The density estimators are applied to six artificial datasets and on three astronomical datasets, the Millennium Simulation and two samples from the Sloan Digital Sky Survey. We compare the performance of the methods in two ways: first, by measuring the integrated squared error and Kullback-Leibler divergence of each of the methods with the parametric densities of the datasets (in case of the artificial datasets); second, by examining the applicability of the densities to study the properties of galaxies in relation to their environment (for the SDSS datasets).
Results. The adaptive kernel based methods, especially MBE, perform better than the other methods in terms of calculating the density properly and have stronger predictive power in astronomical use cases.
Conclusions. We recommend the modified Breiman estimator as a fast and reliable method to quantify the environment of galaxies.
Key words: methods: data analysis / methods: statistical / methods: miscellaneous
© ESO, 2011