PyCosmic: a robust method to detect cosmics in CALIFA and other fiber-fed integral-field spectroscopy datasets

B. Husemann; S. Kamann; C. Sandin; S. F. Sánchez; R. García-Benito; D. Mast

doi:10.1051/0004-6361/201220102

Home

All issues

Volume 545 (September 2012)

A&A, 545 (2012) A137

Full HTML

Free Access

Issue		A&A Volume 545, September 2012


Article Number		A137
Number of page(s)		10
Section		Numerical methods and codes
DOI		https://doi.org/10.1051/0004-6361/201220102
Published online		20 September 2012

A&A 545, A137 (2012)

PyCosmic: a robust method to detect cosmics in CALIFA and other fiber-fed integral-field spectroscopy datasets^⋆

B. Husemann¹, S. Kamann¹, C. Sandin¹, S. F. Sánchez²^,3, R. García-Benito² and D. Mast²^,3

¹ Leibniz-Institut für Astrophysik Potsdam (AIP), An der Sternwarte 16, 14482 Potsdam, Germany
e-mail: bhusemann@aip.de
² Instituto de Astrofísica de Andalucía (CSIC), Camino Bajo de Huétor s/n Aptdo. 3004, 18080 Granada, Spain
³ Centro Astronómico Hispano Alemán de Calar Alto (CSIC-MPIA), 4004 Almería, Spain

Received: 25 July 2012
Accepted: 8 August 2012

Abstract

Context. Detecting cosmic ray hits (cosmics) in fiber-fed integral-field spectroscopy (IFS) data of single exposures is a challenging task because of the complex signal recorded by IFS instruments. Existing detection algorithms are commonly found to be unreliable in the case of IFS data, and the optimal parameter settings are usually unknown a priori for a given dataset.

Aims. The Calar Alto legacy integral field area (CALIFA) survey generates hundreds of IFS datasets for which a reliable and robust detection algorithm for cosmics is required as an important part of the fully automatic CALIFA data reduction pipeline. Such a new algorithm needs to be tested against the performance of the commonly used algorithms L.A.Cosmic and DCR. General recommendations for the usage and optimal parameter settings of each algorithm have not yet been systematically studied for fiber-fed IFS datasets to guide users in their choice.

Methods. We developed a novel algorithm, PyCosmic, which combines the edge-detection algorithm of L.A.Cosmic with a point-spread function convolution scheme. We generated mock data to compute the efficiency of different algorithms for a wide range of characteristic fiber-fed IFS datasets using the Potsdam Multi-Aperture Spectrophotometer (PMAS) and the VIsible MultiObject Spectrograph (VIMOS) IFS instruments as representative cases.

Results. PyCosmic is the only algorithm that achieves an acceptable detection performance for CALIFA data. We find that PyCosmic is the most robust tool with a detection rate of ≳90% and a false detection rate ≲5% for any of the tested IFS data. It has one less free parameter than the L.A.Cosmic algorithm. Only for strongly undersampled IFS data does L.A.Cosmic exceed the performance of PyCosmic by a few per cent. DCR never reaches the efficiency of the other two algorithms and should only be used if computational speed is a concern. Thus, PyCosmic appears to be the most versatile cosmics detection algorithm for IFS data. It is implemented in the new CALIFA data reduction pipeline as well as in recent versions of the multi-instrument IFS pipeline P3D. Although PyCosmic has been optimized for IFS data, we have also successfully applied it to longslit data and anticipate that good results will be achieved with imaging data.

Key words: techniques: image processing / instrumentation: miscellaneous

^⋆

PyCosmic is freely available as a Python-based stand-alone program at http://pycosmic.sf.net for download.

© ESO, 2012

1. Introduction

The identification and rejection of artefacts on charged-couple device (CCD) detectors caused by cosmic ray hits (hereafter cosmics) is a persisting problem for the reduction and analysis of astronomical data. Combining multiple images of the same object or field is considered the best method to identify cosmics because it is less likely that the same pixel is affected in several images. Sophisticated algorithms that detect and reject outlier pixels during the combination of exposures were developed, for example, for the Hubble Space Telescope (e.g., Fruchter & Hook 1997). However, there are often cases where only a single exposure is available or multiple exposures cannot be combined. This happens frequently with fiber-fed integral field spectroscopic (IFS) data, where the effects of differential atmospheric refraction, instrument flexure, or a variable sky brightness during the sequence of exposures prevent reliable detection of cosmics by image comparison.

Various techniques have been developed to detect and reject cosmics in single CCD exposures. They use different methods like trained neural networks (Salzberg et al. 1995), convolution with a point-spread function (PSF, Rhoads 2000), Laplacian edge detection (van Dokkum 2001, hereafter D01), image statistics (Pych 2004, hereafter P04), or a fuzzy logic approach (Shamir 2005). A detailed performance evaluation of the different algorithms on single astronomical images was presented by Farage & Pimbblet (2005). Their tests revealed that the D01 algorithm, also known as L.A.Cosmic, performed well on imaging data. The algorithm of P04, known as DCR, did not perform as well on images, but was much less computationally expensive and primarily designed for spectroscopic data.

Currently, a thorough evaluation of the performance of cosmics detection algorithms for fiber-fed IFS data is missing. Signals in such data are much more complex because a spectrum from each individual fiber is recorded along a discrete trace on the CCD, with little gaps between spectra. Thus, edge-like structures are introduced, and bright object or night-sky emission lines are more likely to be misclassified as cosmics, which is why automatic data-reduction pipelines generally avoid including this crucial step in the reduction process (e.g., Barnsley et al. 2012). Sophisticated methods to detect cosmics in data from fiber-fed multi-object spectrographs were presented (Zhu et al. 2009; Wang et al. 2009) and show excellent results, but their parameter choices seem arbitrary for the L.A.Cosmic and DCR algorithms as their prime reference. Additionally, there is no public code available to make an independent check of their results and to verify whether the algorithm works also with IFS data.

For the Calar Alto legacy integral field area (CALIFA) survey (Sánchez et al. 2012) and other IFS studies using the same instrument (e.g., Sandin et al. 2008), it was discovered that the available algorithms always selected night-sky or object-emission lines as cosmics for the IFS data. An initial attempt to reduce the high false detection rate for CALIFA data by using a simplified Laplacian edge detection algorithm was implemented into the R3D reduction package (Sánchez 2006) and was only partly successful. Although it reduced the number of false detections, a significant number of cosmics were undetected.

In this paper, we present a novel algorithm called PyCosmic. It combines the iterative Laplacian edge detection scheme with a PSF convolution approach. We evaluate the performance of PyCosmic against the most popular algorithms available, DCR and L.A.Cosmic, on realistic mock data for different IFSs and compare the results with illustrative examples on observed raw data. We then provide general recommendations regarding the use of detection algorithms with fiber-fed IFS data.

The different algorithms used in this study are briefly described in Sect. 2. Results of our detailed performance and parameter study on IFS mock data are then presented in Sect. 3, followed by results obtained for real data in Sect. 4. Finally, we provide general recommendations as a guide for other IFS users in Sect. 5.

2. Outline of different cosmics detection algorithms

In the following, we briefly describe the three algorithms used in our comparative study, including our novel PyCosmic algorithm, to understand their basic differences.

2.1. DCR, count statistics on subframes

A simple and fast algorithm was presented by P04, which uses count statistics to detect cosmics as outliers in the histogram of pixel counts. To do this, the image I is first split into small overlapping subframes I_i that are treated separately. These subframes are intentionally kept small, ≤ 100 pixels, to consider only a local distribution of counts. Thereafter, the mode ⟨ I_i ⟩ and standard deviation σ_{I_i} are calculated for all pixels in a subframe. To remove the influence of high-value pixels, ⟨ I_i ⟩ and σ_i are calculated a second time, this time only including pixel values m that satisfy ⟨ I_i ⟩ − ξσ_i < m < ⟨ I_i ⟩ + ξσ_i, where ξ is an arbitrary threshold value.

The subsequent steps are: (i) construct a histogram h(I_i) using all pixel values m, (ii) search for the first empty histogram bin with a value m₀ that is higher than ⟨ I_i ⟩ , and (iii) find the first gap [m₁,m₂] in the histogram with zero number counts that fulfils (m₂ − m₁) > ξσ_i and m₁ > m₀.

If such a gap exists, then all pixels in I_i with a value higher than m₁ are masked as cosmics. Masked pixels (including neighbor pixels inside a so-called “growing radius” of one to two pixels) are then replaced with the mean value of a set of nearby pixels. In most applications of DCR, the growing radius is set to one pixel to fully cover the boundaries of the cosmics, but we use a zero-growing radius to achieve a fair comparison with the results of other algorithms. Furthermore, to account for multiple pixel cosmics, the algorithm is run iteratively. There are three free parameters that have to be set: the shape of the subframes I_i, the threshold value ξ, and the number of iterations.

Fig. 1

Visual outline of the intermediate steps of our PyCosmic algorithm, which is based on L.A.Cosmic.

2.2. L.A.Cosmic, the Laplacian edge detection approach

D01 was the first to use the Laplacian operator for the detection of cosmics in astronomical images. In its discrete form, the Laplacian operator can be written as $\nabla^{2} f = \begin{matrix} ⎧ \\ ⎪ \\ ⎪ \\ ⎪ \\ ⎨ \\ ⎪ \\ ⎪ \\ ⎪ \\ ⎩ \end{matrix} \begin{matrix} 0 & -1 & 0 \\ -1 & 4 & -1 \\ 0 & -1 & 0 \end{matrix} \begin{matrix} ⎫ \\ ⎪ \\ ⎪ \\ ⎪ \\ ⎬ \\ ⎪ \\ ⎪ \\ ⎪ \\ ⎭ \end{matrix} .$ $\begin{equation} \nabla^2f = \left\{\begin{matrix} 0 & -1 & 0\\ -1 & 4 & -1\\ 0 & -1 & 0\end{matrix}\right\}. \end{equation}$ (1)Convolved images using this operator will highlight sharp edges because it removes a smooth signal and increases the contrast of isolated strong pixels.

The algorithm starts by subsampling the bias-subtracted image I by a factor f_s = 2. This subsampling is required to avoid attenuation of cosmics by negative cross patterns when convolving the image with the discrete Laplacian kernel $ℒ^{(2)} = \nabla^{2} f \otimes I^{(2)},$ $\begin{equation} \mathcal{L}^{(2)}=\nabla^2f\otimes I^{(2)}, \end{equation}$ (2)where I⁽²⁾ is the subsampled image and ⊗ is the convolution operator. All negative values in ℒ⁽²⁾ are set to zero before the image is resampled to its original size. We refer to the resulting image as ℒ⁺.

Cosmics are identified in ℒ⁺ with respect to the expected noise of each pixel. The noise properties of ℒ⁺ and I are nearly equal for higher standard deviations (D01), which is why I can be used to estimate the noise (N), $N = \frac{1}{g} {(g (M_{5} \otimes I^{)} + {σ_{rn}^{2}}^{)}}^{\frac{1}{2}},$ $\begin{equation} N=\frac{1}{g}\left(g\left(M_5\otimes I\right)+\sigma_{\text{rn}}^2\right)^{\frac{1}{2}}\!,\label{eq:noise} \end{equation}$ (3)where g is the gain [e ${}^{-}{ADU}^{-1}$ $\hbox{$^{-}\text{ADU}^{-1}$}$ ], σ_rn the readout noise [e⁻], and M_n an n × n median filter (here n = 5 pixels). Deviations from the expected noise are calculated as $S = ℒ^{+} / (f_{s} N) .$ $\begin{equation} S=\mathcal{L}^{+}/\left(f_{\text{s}}N\right)\!. \end{equation}$ (4)Signals of real objects remain in S because of Poisson noise and the pixel sampling of smooth intensity profiles (cf. Fig. 1). This component, the sampling flux, can be significant if the signal is high or if the PSF is poorly sampled. Extended structures, which are larger than about five pixels, can be removed from S using a 5 × 5 median filter; S′ = S − M₅ ⊗ S. The first criterion for detection of cosmics demands S′ > σ_lim, where a typical limiting value is σ_lim = 5.

In addition, it will be difficult to distinguish cosmics from stars in a critically sampled image, i.e., close to Nyquist sampling, because they are very similar on small scales. Such point sources can, however, be distinguished from cosmics by their symmetry. An image ℱ is calculated that contains only symmetric fine structure on scales of 2–3 pixels $ℱ = M_{3} \otimes I - [M_{7} \otimes (M_{3} \otimes I^{)}] .$ $\begin{equation} \mathcal{F}=M_3\otimes I-\left[M_7\otimes\left(M_3\otimes I\right)\right]. \end{equation}$ (5)The second criterion states that the contrast between ℒ⁺ and ℱ is greater than a limiting value, ℒ⁺/ℱ > f_lim, where typical limiting values for images are f_lim = 2.0–5.0.

Cosmics are finally identified as pixels that satisfy both criteria, although cosmics are mostly larger than a single pixel. While detection probability is higher for pixels on the edge of large multiple-pixel features, it may be negligible for pixels within the feature. Arbitrarily large cosmics can be fully detected by iteratively applying the rejection process as described above. After each iteration, the newly identified cosmics are replaced with the median of nearby unmasked pixels. In total, there are four free parameters to set: the threshold value σ_lim, the limiting value f_lim, the number of iterations, and a special parameter σ_frac. This last undocumented parameter is a factor that is used to reduce the σ_lim threshold of neighbor pixels within a growing radius to detect cosmics in them as well. By default, the growing radius is set to one pixel. By definition, the effective growing radius is set to zero when σ_frac = 1.0.

2.3. PyCosmic, combining edge detection with a PSF convolution approach

Although L.A.Cosmic performs extremely well on imaging data, it is much less effective with fiber-fed IFS data. Spectra of several hundred fibers are dispersed along one axis of the CCD and are closely packed on the other axis with small separation. This introduces a highly asymmetric situation on small pixel scales, which is why neither of the L.A.Cosmic criteria are robust in this case. The longslit spectroscopy version of L.A.Cosmic invokes a model fit to the sky and object spectra. However, this scheme is difficult to apply to fiber-fed IFS data due to the comparatively inhomogeneous distribution of spectra on the CCD. It is practically impossible to fit thousands of profiles without introducing additional edges.

Our novel approach replaces the second criterion of L.A.Cosmic to avoid the simple median smoothing. Instead, we take advantage of the smooth two-dimensional shape of the spectrograph PSF in contrast to the highly asymmetric cosmics. This combines PSF-matched filtering (Rhoads 2000) and Laplacian edge detection of cosmics in a simple and effective way.

To discriminate between real signal and cosmics, we first smooth the bias-subtracted image I by convolving it with a two-dimensional Gaussian kernel G(w) (where w is the full width at half maximum (FWHM) of the Gaussian in pixels) and divide I by the smoothed image, $R = \frac{I}{I \otimes G (w)} \cdot$ $\begin{equation} R = \frac{I}{I \otimes G(w)}\cdot \end{equation}$ (6)The idea is to increase the contrast of higher frequency signal compared to the object signal, so that w should be larger than the width of the cosmics and smaller than FWHM of the spectrograph PSF (θ). We note that a similar approach was independently used by Conselice (2003) to define the clumpiness parameter as a measure for the high-frequency components in the morphology of galaxies. The artificial smoothing of the data is a computationally easy task and the most natural choice to capture the high-frequency components in a signal. Cosmics with R ≫ 1 appear surrounded by a halo with values of R ≪ 1. When R ≈ 1, however, there is a homogeneous structure. We further increase the contrast of cosmics in R by convolving it with the Laplacian kernel after subsampling R by a factor of two, $ℛ^{(2)} = \nabla^{2} f \otimes R^{(2)} .$ $\begin{equation} \mathcal{R}^{(2)}=\nabla^2f\otimes R^{(2)}. \end{equation}$ (7)Again, negative values in ℛ⁽²⁾ are set to 0, and the image is re-sampled to its original size; we refer to this result as ℛ⁺.

We then replace the second criterion of L.A.Cosmic by ℛ⁺ > r_lim, in order to minimize the false detection of real signal in fiber-fed IFS data. Snapshots of intermediate images of the process steps are shown in Fig. 1, which give a visual impression of the algorithm.

Before the Laplacian convolution is applied to the image in each subsequent iteration, it is necessary to replace all the pixels of the detected cosmics with “good” values. Yet, it is very difficult to restore the original information of these pixel in IFS data, particularly near bright emission lines. An artificial edge could be created that can cause the cosmics to expand into the unaffected signal of the line. However, it is straightforward to mask all pixels of already detected cosmics in the convolution operation I ⊗ G(w). In this way, we minimize the artificial extension of cosmics into bright object data. PyCosmic does not employ the σ_frac parameter and thus has one less free parameter than L.A.Cosmic.

Fig. 2

Simulated images to estimate the minimum threshold value for r_lim. Simulated thumbnail images and intermediate images to compute ℛ⁺ are shown for a pure emission line feature considering three different instrumental PSFs with θ = 2.5, 2.0, and 1.5 pixels FWHM. The maximum pixel value is provided for each subimage. The maximum in ℛ⁺ corresponds to the minimum value for r_lim to avoid misclassification as cosmics.

The threshold value r_lim depends on two independent parameters: the FWHM w of the Gaussian kernel G(w) and the FWHM of the instrumental PSF (θ). We consider here that θ and w are both identical for the dispersion and the cross-dispersion directions on the CCD. This is a realistic assumption for most IFS instruments, except when pixels are binned differently on the x- and the y-axes during CCD read-out. In that case, the axis with the smallest θ value is used as reference to select r_lim with respect to w of the round Gaussian kernel. To estimate a minimum r_lim value for a given setup, we simulated and processed snapshot images of a single emission line for a grid of θ and w values. In Fig. 2 we show the ℛ⁺ image for a few simulated cases with 1.5 ≤ θ ≤ 2.5 and w = 2. The maximum value in ℛ⁺ defines the absolute minimum value of r_lim to avoid misidentification of object signal as cosmics. The results from the parameter study with 1.0 ≤ θ ≤ 4.2 and 1.0 ≤ w ≤ 4.5 are summarized in Fig. 3, which serves as a guideline for selecting an appropriate r_lim for any dataset. However, these are idealized values in the sense that no noise, no underlying continuum, and no cross-talk between the different fibers have been taken into account. Hence, the optimal r_lim threshold for real data should be slightly higher.

Fig. 3

Absolute minimum values of r_lim as a function of θ and w. These values were determined from noise-free simulations and set a hard lower limit to avoid frequent false detections of object signal as cosmics. They are estimated from the instrument-specific value of w. The shaded area highlights the regime of significantly undersampled data, where more care is needed.

Table 1

Detection quality of the best parameters for the simulated data.

3. Performance tests of the detection algorithms

3.1. Simulation of mock fiber-fed IFS data

We prepared dedicated simulations to test the performance of PyCosmic against DCR and L.A.Cosmic. In order to obtain unbiased results from the simulations, it is important to ensure that the signal distribution and shapes of cosmics are as realistic as possible. As we mentioned earlier, the σ-clipping is unfeasible for the majority of IFS data. Instead, we use dark frames to extract our template cosmics masks for two telescopes/instruments: the Potsdam Multi-Aperture Spectrophotometer (PMAS, Roth et al. 2005) at the 3.5 m Calar Alto telescope and the VIsible MultiObject Spectrograph (VIMOS, Le Févre et al. 2003) at the Very Large Telescope. Dark frames are ideally suited for our purpose because they do not contain any signal, yet their long exposure times (~1800 s) are comparable to those of typical science frames. We determined the noise level in the dark frames and selected all outliers above a 5σ-threshold as cosmics. Given the low dark current and read-out noise of the detectors, the 5σ limit corresponds to ~ 30 counts, the minimum signal of recovered cosmics.

The PMAS instrument offers two integral-field units (IFUs): a lens array (LArr) with 16 × 16 lenses and a simple bundle of 382 fibers (PPak, Kelz et al. 2006), with the latter used in the CALIFA survey (Sánchez et al. 2012). VIMOS is a versatile imaging and multi-object spectrograph, which also includes a lens-array IFU. We simulated IFS raw data including the night-sky spectra as the main signal, which is understood well from existing observations for the three following IFU instruments/setups:

1)
PMAS-LArr IFU with a R1200 grating backward (bw)mounted and a 2 × 2 CCD binning, leading to θ ~ 1.5 pixels;
2)
PMAS-PPak IFU with a V500 grating and a 2 × 2 CCD binning resulting in θ ~ 2.4 pixels (CALIFA survey setup);
3)
VIMOS IFU with the mid resolution (MR) grism operating at θ ~ 3 pixels.

These IFU setups cover a wide range in data sampling from significantly undersampled to very well sampled data. We consider them representative of all fiber-fed IFUs that are currently in operation. Synthetic IFS data for each setup were produced using reduced fiberflats, traces, and dispersion masks from real observations. The same observed night-sky spectrum, scaled to 1800 s effective exposure time, was used as input signal for all fibers. Afterwards, Possion and read-out noise were added to the simulated images, as well as empirical cosmics from the dark frames.

Fig. 4

Performance and parameter study of the DCR, the L.A.Cosmic, and the PyCosmic algorithms applied to simulated VIMOS IFU data in the MR mode with θ = 3 pixels. The first row of all panels compares the detection rate (P_d) of cosmics, the second row compares the false detection rate (P_f), and the third row shows the combined efficiency (ϵ). The primary parameter that controls the performance of the algorithms is varied along the abscissa of each column of panels, while a secondary parameter is shown as different symbols connected by dotted lines as assigned in the legend.

3.2. Parameter study to reach optimal performance

We did not include additional signal from astronomical objects, given that the simulated IFU data of the night sky already include continuum and bright emission line features similar to the characteristics of astronomical objects. The goal here was to test the performance of the detection algorithms when the signal in each spectrum is dominated by the sky rather than the typically fainter object signal.

In order to properly compare the performances of each algorithm, we tested them with a grid of input parameters because the optimal ones are unknown a priori for a given dataset. From the simulations, we defined the number of detectable cosmics N_c, which were 5σ above the noise of the simulated image before the cosmics were added. We defined the detection rate as P_d = N_d/N_c, with the number of detected cosmics that match the input mask (N_d). Pixels that were misclassified as cosmics by the algorithms are false detections (N_f), expressed as a false detection rate P_f = N_f/N_c. We defined the detection efficiency as ϵ = N_d/(N_c + N_f), which has a value of ϵ = 1 for ideal detection rates and 0 < ϵ < 1 when the detection was incomplete or the number of false detection was non-zero.

Each of the algorithms has free parameters that need to be chosen by the user:

a)
DCR: subframe size of I_i, limiting sigma factor ξ, the number of iterations, and growing radius (set to zero pixels);
b)
L.A.Cosmic: significance σ_lim, threshold f_lim, threshold σ_frac, and the number of iterations;
c)
PyCosmic: significance σ_lim, threshold r_lim appropriate for the chosen value of w and the instrument specific value of θ, and the number of iterations.

We consistently used a maximum number of six iterations in all cases for our tests. For L.A.Cosmic and PyCosmic, we set the significance level to σ_lim = 5 to achieve comparable results. The algorithm performance as a function of input parameters is summarized in Figs. 4–6 for the three IFS instruments. Surprisingly, we found that the achievable performance was strongly dependent on the IFS instrument characteristics.

For the simulated VIMOS IFU data, which is representative of well-sampled raw data with θ ~ 3 pixels, the L.A.Cosmic and PyCosmic algorithms performed almost equally well at their best-parameter settings with a detection rate of P_d ~ 95%. PyCosmic achieved a slightly higher detection (P_d = 96.1%) with a marginally higher false detection rate (P_f = 3.5%) compared to L.A.Cosmic (P_d = 94.5%, P_f = 3.3%). The detection rate of P_d = 80% for DCR may not be sufficient for many applications.

Interestingly, L.A.Cosmic showed the poorest performance for PMAS-PPak IFU data critically sampled at θ ≲ 2.4 pixels. The instrumental characteristics responsible for this substandard performance remain unclear. All parameter configurations gave P_f ≳ 40%, which is unacceptable. PyCosmic was clearly the best algorithm, with a high detection rate and an accompanied low false detection rate (P_f < 1.5%). DCR performed as poorly as L.A.Cosmic with low detection and high false detection rates.

Simulated data for the highly undersampled PMAS-LArr setup (θ ~ 1.5 pixels) are clearly domains of L.A.Cosmic, because it was initially optimized for strongly undersampled Wide-Field Plenetary Camera 2 Hubble Space Telescope images. L.A.Cosmic reached a high detection rate at an acceptable false detection rate. Nevertheless, PyCosmic achieved a similar efficiency when the smoothing kernel width was set to w = 1.0 pixels, significantly smaller than the instrumental PSF. We expected DCR to perform poorly on undersampled IFU data because real signal and cosmics are hard to distinguish from count statistics. This is confirmed by our simulations.

Table 1 summarizes the algorithm parameters together with their corresponding P_d and P_f rates for each setup that gives the highest efficiency. For these “optimal” parameter settings, we made an additional statistical comparison to further elucidate strengths and weaknesses of each algorithm. The results are shown in Fig. 7. L.A.Cosmic and PyCosmic behaved similarly well as P_d steeply rose to P_d = 100% above the threshold significance of 5σ. The DCR algorithm in all cases had a shallower curve and reached 100% efficiency at much higher significance than the other two. Concerning misclassified pixels, we found that PyCosmic tended to detect pixels with low counts as cosmics, while L.A.Cosmic did not.

Fig. 5

Same as Fig. 4 for simulated PMAS-PPak data, θ = 2.4 pixels (CALIFA survey type data).

Fig. 6

Same as Fig. 4 for the simulated PMAS-LArr data that are heavily undersampled, with θ = 1.5 pixels.

Fig. 7

Detailed statistics of the detection rate and spuriously detected pixels for the three simulated raw datasets (PyCosmic → black, L.A.Cosmic → green, and DCR → blue). Left panels: detection rate in per cent as a function of the cosmic ray significance, i.e., cosmics counts divided by the Poisson and the read-out noise of the simulated pixel. Right panels: histogram of pixel counts for false detection compared to the count histogram of the entire image (grey area). The counts are given in logarithmic units.

4. Illustrative examples of cosmics detection for observed IFS data

Although our simulations closely matched real observations, it is difficult to simulate realistic data, including object signal, given the complexity of IFS data. Thus, we similarly processed data from real observations for the different IFUs to check whether results from the simulations agreed well with those of observed data. The optimal parameters as inferred from the simulations are used for the different algorithms and are applied to a typical raw frame from the CALIFA survey taken with the PMAS-PPak IFU at 900 s exposure time, to 1100 s frame of a low-redshift galaxy taken with the VIMOS IFU in MR mode, and to a 1800 s frame exposure of the center of a globular cluster taken with the PMAS-LArr IFU using the R1200(bw) setup. Representative subframes and corresponding cosmic masks recovered by the algorithms are shown in Fig. 8 for the different datasets.

In general, results obtained for real data reflected the outcome of the mock data analysis. Cosmics of the selected subframes are illustrative of the strengths and weaknesses of the algorithms. In case of PMAS-PPak data, we clearly see that DCR had problems detecting cosmics if the underlying signal was already quite high, yet it had few false detections at the same time. The false detections were a huge problem for L.A.Cosmic, not only in the simulations, because bright emission lines in real data were also classified as cosmics. PyCosmic almost perfectly detected cosmics and was by far the best algorithm for the PMAS-PPak instrument. This confirmed the necessity to develop a new algorithm for the CALIFA survey.

5. Guidelines for algorithm selection and optimal parameter settings

Based on the performance of the individual algorithms on simulated and real data, we try to provide useful guidelines here for users that need to tackle the problem of cosmics detection during the reduction of IFS data.

5.1. DCR

The performance of the algorithm depends only weakly on the subframe size. We recommend a symmetric size of the order of 15 × 15 pixels. The main parameter to be set properly is ξ, which should be ξ ~ 3 to achieve the highest performance. An exception to these recommendations are undersampled data, where the subframe size seems to be important. A much higher value of ξ needs to be chosen in this case (ξ ≥ 8) to achieve an acceptable false detection rate (see Fig. 6). Because of the intrinsically lower detection rate compared to the other two algorithms, DCR should only be used in case the highest computational speed outweighs all other concerns.

5.2. L.A.Cosmic

The results of L.A.Cosmic are mildly dependent on the growing radius. The middle columns of Figs. 4–6 show the best results using a zero growing radius. In this case, σ_frac = 1.0, which is our recommended value. The improvement that can be achieved with this parameter is relatively small when compared to models using a growing radius of one pixel and σ_frac = 0.5. Additionally, P_d and P_f vary smoothly with f_lim above a certain limit, but it is difficult to predict an optimal value of f_lim for a given instrument. In general, a value of f_lim > 5 is required even for well-sampled data. This is in contrast to the behavior and guidelines for imaging data given by D01. While L.A.Cosmic is able to reach an excellent performance for some IFS data, it is not a robust algorithm because it fails to produce acceptable results for certain IFU instrument configurations (see Fig. 5). L.A.Cosmic may be appropriate for undersampled IFS data, but it should not be applied to other IFS datasets without careful checking of the results.

5.3. PyCosmic

The width of the smoothing kernel should be set to w ≲ θ; otherwise, the optimal performance of PyCosmic cannot be reached. This is most evident in undersampled data, where the maximum efficiency decreases substantially with increasing w (Fig. 6). For any given combination of w and θ, r_lim determines the efficiency of the detection. In extreme cases, as with undersampled data, the tolerance in r_lim to reach the optimal efficiency is small. However, comparing the theoretically derived minimum values of r_lim (cf. Fig. 3) with the best values of the simulation, we consistently found that the optimal r_lim threshold needs to be a factor of ~ 2 larger than estimated from Fig. 3. With these parameter settings, PyCosmic provides the most robust detection efficiency for any IFS instrument configuration with an efficiency of ϵ ≳ 90%, well-defined parameter settings, and the possibility of reducing the number of false detections P_f to nearly zero.

5.4. Handling cosmics in IFS data reduction

All algorithms attempt to restore the information of pixels that are affected by cosmics. Nevertheless, the restored signal should be considered unreliable given the signal structure in IFS data. Instead of the common practice of simply processing the “cleaned” image, we emphasize that bad pixels can be nicely handled during the spectra extraction process when an optimal extraction scheme is used (e.g., Horne 1986; Sharp & Birchall 2010). Given that optimal extraction algorithms always assume a certain shape of the signal on the CCD, it is easy to mask bad pixels and restore the signal at the cost of a higher associated variance. When too many pixels are affected by cosmics on the raw frame for a given spectral-resolution element, they should be flagged as bad elements that are propagated through the reduction pipeline to the final data product. We consider this to be the best possible scheme to handle artefacts caused by cosmics in IFS data.

Fig. 8

Comparison of cosmics detected in real observation taken with three different IFS instruments. Three representative subframes of the raw images (left column of thumbnail images) were chosen to allow a good comparison for the results of the DCR, the L.A.Cosmic, and the PyCosmic algorithms. Pixel masks of detected cosmics are shown in the three right panel columns.

6. Conclusions

In this paper we have presented a novel detection algorithm for cosmics in single exposures called PyCosmic. The algorithm combines Laplacian edge detection with a PSF convolution approach. We systematically compared the performance of our new algorithm against other standard detection algorithms, DCR (Pych 2004) and L.A.Cosmic (van Dokkum 2001), on simulated and real images from fiber-fed IFS instruments. With the aid of these detailed comparison tests, we provide general recommendations for the use of these algorithms for the detection of cosmics in IFS data.

We have found that DCR does not reach a detection efficiency equivalent to that of L.A.Cosmic and PyCosmic. Therefore, we cannot recommend its use for IFS data in general, except when computational speed is critical. The strength of the L.A.Cosmic algorithm is that it works best for undersampled IFS data. However, a significant drawback is that the minimum false detection rate achievable for a given IFS data is entirely set by the characteristics of the instrument and cannot be reduced by changing any parameter settings. This peculiarity is most evident for PMAS-PPak IFU data from the CALIFA survey (Sánchez et al. 2012), where the false detection rate of L.A.Cosmic is P_f ≳ 40%. Our PyCosmic algorithm reduces the false detection rate with different parameter settings and solves this problem effectively. It has replaced the simplified R3D routine (based solely on a Laplacian edge detection scheme) in the reduction pipeline of the CALIFA survey.

PyCosmic is the most robust detection algorithm for cosmics in fiber-fed IFS data. In combination with well-characterized optimal parameter settings, it is well-suited for automatic usage for very large datasets. CALIFA is already a huge IFS survey by current standards that has significantly benefited from the development of PyCosmic. The next generation of IFS instruments like the Sydney-AAO Multi-object IFS (SAMI, Croom et al. 2012) or the IFU project Mapping Nearby Galaxies at APO (MaNGA) is already being built or is planned to carry out even larger IFS surveys. These surveys will deliver IFS data for thousands of galaxies in the near future, which will certainly benefit from robust data reduction algorithms such as PyCosmic.

The PyCosmic algorithm has recently been implemented in the versatile multi-IFU reduction software P3D (Sandin et al. 2010) and is also available as a Python-based stand-alone program¹ so that it can be easily used or even added to any existing IFS reduction pipeline. Although PyCosmic has been optimized for IFS data, we have also applied it successfully to longslit data and anticipate that good results will be achieved with imaging data.

¹

PyCosmic is available for download at http://pycosmic.sf.net

Acknowledgments

We thank the anonymous referee for a very prompt report with instructive comments that improved the quality and presentation of the article. Furthermore, we thank Ana Monreal Ibero for valuable comments and Barry Rothberg for several suggestions that increased the readability of the manuscript. B.H. gratefully acknowledge the support by the DFG via grant Wi 1369/29-1. C.S. was supported by the grant PTDESY-05A12BA1, and R.G.-B. acknowledges support from the Spanish Ministerio de Ciencia e Innovacion through grant AYA2010-15081.

References

Barnsley, R. M., Smith, R. J., & Steele, I. A. 2012, Astron. Nachr., 333, 101 [NASA ADS] [CrossRef] [Google Scholar]
Conselice, C. J. 2003, ApJS, 147, 1 [NASA ADS] [CrossRef] [Google Scholar]
Croom, S. M., Lawrence, J. S., Bland-Hawthorn, J., et al. 2012, MNRAS, 421, 872 [NASA ADS] [Google Scholar]
Farage, C. L., & Pimbblet, K. A. 2005, PASA, 22, 249 [Google Scholar]
Fruchter, A., & Hook, R. N. 1997, in SPIE Conf. Ser. 3164, ed. A. G. Tescher, 120 [Google Scholar]
Horne, K. 1986, PASP, 98, 609 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Kelz, A., Verheijen, M. A. W., Roth, M. M., et al. 2006, PASP, 118, 129 [NASA ADS] [CrossRef] [Google Scholar]
Le Févre, O., Saisse, M., Mancini, D., et al. 2003, in SPIE Conf. Ser. 4841, eds. M. Iye, & A. F. M. Moorwood, 1670 [Google Scholar]
Pych, W. 2004, PASP, 116, 148 [NASA ADS] [CrossRef] [Google Scholar]
Rhoads, J. E. 2000, PASP, 112, 703 [NASA ADS] [CrossRef] [Google Scholar]
Roth, M. M., Kelz, A., Fechner, T., et al. 2005, PASP, 117, 620 [NASA ADS] [CrossRef] [Google Scholar]
Salzberg, S., Chandar, R., Ford, H., Murthy, S. K., & White, R. 1995, PASP, 107, 279 [NASA ADS] [CrossRef] [Google Scholar]
Sánchez, S. F. 2006, AN, 327, 850 [Google Scholar]
Sánchez, S. F., Kennicutt, R. C., Gil de Paz, A., et al. 2012, A&A, 538, A8 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Sandin, C., Becker, T., Roth, M. M., et al. 2010, A&A, 515, A35 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Sandin, C., Schönberner, D., Roth, M. M., et al. 2008, A&A, 486, 545 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Shamir, L. 2005, Astron. Nachr., 326, 428 [NASA ADS] [CrossRef] [Google Scholar]
Sharp, R., & Birchall, M. N. 2010, PASA, 27, 91 [NASA ADS] [CrossRef] [Google Scholar]
van Dokkum, P. G. 2001, PASP, 113, 1420 [NASA ADS] [CrossRef] [Google Scholar]
Wang, J.-M., Chen, Y.-M., Hu, C., et al. 2009, ApJ, 705, L76 [NASA ADS] [CrossRef] [Google Scholar]
Zhu, J., Zhu, Z., Wang, C., & Ye, Z. 2009, PASA, 26, 69 [NASA ADS] [CrossRef] [Google Scholar]

All Tables

Table 1

Detection quality of the best parameters for the simulated data.

In the text

All Figures

	Fig. 1 Visual outline of the intermediate steps of our PyCosmic algorithm, which is based on L.A.Cosmic.
In the text

Fig. 2

Simulated images to estimate the minimum threshold value for r_lim. Simulated thumbnail images and intermediate images to compute ℛ⁺ are shown for a pure emission line feature considering three different instrumental PSFs with θ = 2.5, 2.0, and 1.5 pixels FWHM. The maximum pixel value is provided for each subimage. The maximum in ℛ⁺ corresponds to the minimum value for r_lim to avoid misclassification as cosmics.

In the text

	Fig. 3 Absolute minimum values of r_lim as a function of θ and w. These values were determined from noise-free simulations and set a hard lower limit to avoid frequent false detections of object signal as cosmics. They are estimated from the instrument-specific value of w. The shaded area highlights the regime of significantly undersampled data, where more care is needed.
In the text

Fig. 4

Performance and parameter study of the DCR, the L.A.Cosmic, and the PyCosmic algorithms applied to simulated VIMOS IFU data in the MR mode with θ = 3 pixels. The first row of all panels compares the detection rate (P_d) of cosmics, the second row compares the false detection rate (P_f), and the third row shows the combined efficiency (ϵ). The primary parameter that controls the performance of the algorithms is varied along the abscissa of each column of panels, while a secondary parameter is shown as different symbols connected by dotted lines as assigned in the legend.

In the text

	Fig. 5 Same as Fig. 4 for simulated PMAS-PPak data, θ = 2.4 pixels (CALIFA survey type data).
In the text

	Fig. 6 Same as Fig. 4 for the simulated PMAS-LArr data that are heavily undersampled, with θ = 1.5 pixels.
In the text

Fig. 7

Detailed statistics of the detection rate and spuriously detected pixels for the three simulated raw datasets (PyCosmic → black, L.A.Cosmic → green, and DCR → blue). Left panels: detection rate in per cent as a function of the cosmic ray significance, i.e., cosmics counts divided by the Poisson and the read-out noise of the simulated pixel. Right panels: histogram of pixel counts for false detection compared to the count histogram of the entire image (grey area). The counts are given in logarithmic units.

In the text

	Fig. 8 Comparison of cosmics detected in real observation taken with three different IFS instruments. Three representative subframes of the raw images (left column of thumbnail images) were chosen to allow a good comparison for the results of the DCR, the L.A.Cosmic, and the PyCosmic algorithms. Pixel masks of detected cosmics are shown in the three right panel columns.
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Barnsley, R. M., Smith, R. J., & Steele, I. A. 2012, Astron. Nachr., 333, 101 [NASA ADS] [CrossRef] [Google Scholar]

[2] Conselice, C. J. 2003, ApJS, 147, 1 [NASA ADS] [CrossRef] [Google Scholar]

[3] Croom, S. M., Lawrence, J. S., Bland-Hawthorn, J., et al. 2012, MNRAS, 421, 872 [NASA ADS] [Google Scholar]

[4] Farage, C. L., & Pimbblet, K. A. 2005, PASA, 22, 249 [Google Scholar]

[5] Fruchter, A., & Hook, R. N. 1997, in SPIE Conf. Ser. 3164, ed. A. G. Tescher, 120 [Google Scholar]

[6] Horne, K. 1986, PASP, 98, 609 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[7] Kelz, A., Verheijen, M. A. W., Roth, M. M., et al. 2006, PASP, 118, 129 [NASA ADS] [CrossRef] [Google Scholar]

[8] Le Févre, O., Saisse, M., Mancini, D., et al. 2003, in SPIE Conf. Ser. 4841, eds. M. Iye, & A. F. M. Moorwood, 1670 [Google Scholar]

[9] Pych, W. 2004, PASP, 116, 148 [NASA ADS] [CrossRef] [Google Scholar]

[10] Rhoads, J. E. 2000, PASP, 112, 703 [NASA ADS] [CrossRef] [Google Scholar]

[11] Roth, M. M., Kelz, A., Fechner, T., et al. 2005, PASP, 117, 620 [NASA ADS] [CrossRef] [Google Scholar]

[12] Salzberg, S., Chandar, R., Ford, H., Murthy, S. K., & White, R. 1995, PASP, 107, 279 [NASA ADS] [CrossRef] [Google Scholar]

[13] Sánchez, S. F. 2006, AN, 327, 850 [Google Scholar]

[14] Sánchez, S. F., Kennicutt, R. C., Gil de Paz, A., et al. 2012, A&A, 538, A8 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[15] Sandin, C., Becker, T., Roth, M. M., et al. 2010, A&A, 515, A35 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[16] Sandin, C., Schönberner, D., Roth, M. M., et al. 2008, A&A, 486, 545 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[17] Shamir, L. 2005, Astron. Nachr., 326, 428 [NASA ADS] [CrossRef] [Google Scholar]

[18] Sharp, R., & Birchall, M. N. 2010, PASA, 27, 91 [NASA ADS] [CrossRef] [Google Scholar]

[19] van Dokkum, P. G. 2001, PASP, 113, 1420 [NASA ADS] [CrossRef] [Google Scholar]

[20] Wang, J.-M., Chen, Y.-M., Hu, C., et al. 2009, ApJ, 705, L76 [NASA ADS] [CrossRef] [Google Scholar]

[21] Zhu, J., Zhu, Z., Wang, C., & Ye, Z. 2009, PASA, 26, 69 [NASA ADS] [CrossRef] [Google Scholar]

PyCosmic: a robust method to detect cosmics in CALIFA and other fiber-fed integral-field spectroscopy datasets⋆

1. Introduction

2. Outline of different cosmics detection algorithms

2.1. DCR, count statistics on subframes

2.2. L.A.Cosmic, the Laplacian edge detection approach

2.3. PyCosmic, combining edge detection with a PSF convolution approach

3. Performance tests of the detection algorithms

3.1. Simulation of mock fiber-fed IFS data

3.2. Parameter study to reach optimal performance

4. Illustrative examples of cosmics detection for observed IFS data

5. Guidelines for algorithm selection and optimal parameter settings

5.1. DCR

5.2. L.A.Cosmic

5.3. PyCosmic

5.4. Handling cosmics in IFS data reduction

6. Conclusions

Acknowledgments

References

All Tables

All Figures

PyCosmic: a robust method to detect cosmics in CALIFA and other fiber-fed integral-field spectroscopy datasets^⋆