A multi-scale, multi-wavelength source extraction method: getsources

A. Men’shchikov; Ph. André; P. Didelon; F. Motte; M. Hennemann; N. Schneider

doi:10.1051/0004-6361/201218797

Home

All issues

Volume 542 (June 2012)

A&A, 542 (2012) A81

Full HTML

Free Access

Issue		A&A Volume 542, June 2012


Article Number		A81
Number of page(s)		31
Section		Astronomical instrumentation
DOI		https://doi.org/10.1051/0004-6361/201218797
Published online		13 June 2012

A&A 542, A81 (2012)

A multi-scale, multi-wavelength source extraction method: getsources

A. Men’shchikov, Ph. André, P. Didelon, F. Motte, M. Hennemann and N. Schneider

Laboratoire AIM Paris-Saclay, CEA/DSM–CNRS–Université Paris Diderot, IRFU, Service d’Astrophysique, Centre d’Études de Saclay, Orme des Merisiers, 91191 Gif-sur-Yvette, France
e-mail: alexander.menshchikov@cea.fr

Received: 9 January 2012
Accepted: 16 April 2012

Abstract

We present a multi-scale, multi-wavelength source extraction algorithm called getsources. Although it has been designed primarily for use in the far-infrared surveys of Galactic star-forming regions with Herschel, the method can be applied to many other astronomical images. Instead of the traditional approach of extracting sources in the observed images, the new method analyzes fine spatial decompositions of original images across a wide range of scales and across all wavebands. It cleans those single-scale images of noise and background, and constructs wavelength-independent single-scale detection images that preserve information in both spatial and wavelength dimensions. Sources are detected in the combined detection images by following the evolution of their segmentation masks across all spatial scales. Measurements of the source properties are done in the original background-subtracted images at each wavelength; the background is estimated by interpolation under the source footprints and overlapping sources are deblended in an iterative procedure. In addition to the main catalog of sources, various catalogs and images are produced that aid scientific exploitation of the extraction results. We illustrate the performance of getsources on Herschel images by extracting sources in sub-fields of the Aquila and Rosette star-forming regions. The source extraction code and validation images with a reference extraction catalog are freely available.

Key words: stars: formation / infrared: ISM / submillimeter: ISM / methods: data analysis / techniques: image processing / techniques: photometric

© ESO, 2012

1. Introduction

The Herschel Space Observatory (Pilbratt et al. 2010) provides the best opportunity to study the earliest stages of star formation. Prestellar cores and young (Class 0) protostars emit the bulk of their luminosities at wavelengths 80–400 μm, which makes the Herschel imaging instruments PACS (Poglitsch et al. 2010) and SPIRE (Griffin et al. 2010) with their 6 wavebands from 70 to 500 μm perfect for performing a census of these objects down to 0.01–0.1 M_⊙ in the nearby (distances D ≲ 500 pc) molecular cloud complexes. In particular, the Herschel Gould Belt survey (André et al. 2010) aims at probing the link between diffuse cirrus-like structures and compact cores with the main goal to understand the physical mechanisms of the formation of prestellar cores out of the diffuse medium, which is crucial for understanding the origin of stellar masses. Furthermore, the Herschel HOBYS survey (Motte et al. 2010) aims at performing a census of massive young stellar objects, providing accurate bolometric luminosities and envelope masses for homogeneous and complete samples of the progenitors of massive stars.

Preparing for these two Herschel surveys, we had evaluated a few popular source extraction algorithms to check whether they could be used in our surveys. The main problem was the absence of any multi-wavelength extraction technique. None of the methods was designed to handle multi-wavelength data, making it necessary to match the independent catalogs obtained at different wavelengths using an association radius as a free parameter. This posed very serious problems for detecting and measuring sources in the Herschel images with angular resolutions differing by a factor of ~7. A direct consequence of the mismatch in resolutions is that the degree to which sources in a region may be blended depends very strongly on the waveband.

With such great differences in resolution, one cannot just match independent catalogs without introducing unknown and potentially very large errors in the association of sources detected across different wavebands and in their measured properties. Studying star formation in the nearest clouds, one expects to resolve many crowded regions in the highest-resolution images at the shortest wavelengths, but at the same time, one would progressively “lose” sources within much larger beams at the longer wavelengths. In effect, the fluxes of such sources on the long-wavelength side of their spectral energy distributions (SEDs) would have large and unknown errors when matching independent extraction catalogs, greatly reducing the extraction quality (detection completeness and reliability, as well as the accuracy of the derived properties).

For large-scale projects, such as the Herschel Gould Belt and HOBYS surveys, one needs a fully automated source extraction method that can find as many sources as possible from the images in all bands, reliably distiguishing them from variable backgrounds and noise, deblending them as accurately as possible in the crowded regions by preserving and utilizing all information obtained from the higher-resolution images at shorter wavelengths. A very serious problem with some existing source extraction methods is that they do not allow sources to overlap, thus deblending of crowded regions is impossible. One cannot expect from such methods high levels of detection completeness or consistently accurate flux measurements in realistic conditions.

Note that most existing methods have been developed and oriented for use in different areas of astronomy, thus their performances for a specific project must be carefully analyzed before an appropriate method can be chosen. The SPIRE SAG3 consortium has been testing several popular source extraction algorithms using simulated skies of various degrees of complexity and those benchmarks will be published elsewhere (Men’shchikov et al., in prep.). Below we introduce the reader to the basic concepts of those techniques, to place our new method¹ in a wider context.

1.1. Existing methods of source extraction

Most of the algorithms trying to solve the same (non-trivial) problem of source extraction originated from different ideas.

Stutzki & Guesten (1990)’s gaussclumps (developed for position-velocity data cubes in molecular-line studies of molecular clouds) performs least-squares fits of a Gaussian shape to the brightest peak constrained to keep the position and amplitude of the fitted shape close to the image maximum. Then it subtracts the fit from the image, producing the residuals image, and fits a new Gaussian shape to the brightest peak in the residuals. The iterations continue until the total intensity of all subtracted clumps is equal to the integrated intensity of the original image or there are no significant peaks left.

Williams et al. (1994)’s clumpfind (aimed also for molecular clouds data and position-velocity cubes) contours an image at a number of levels, starting from the brightest peak in the image. It descends down to a minimum contour level, marking as clumps along the way all connected areas of pixels that are above the contour level. This technique was “motivated by how the eye decomposes the maps into clumps” and it “mimics what an infinitely patient observer would do” (Williams et al. 1994). One should be aware that this method ignores the backgrounds in which sources are observed.

Bertin & Arnouts (1996)’s sextractor (designed for use with optical and near-IR images in extragalactic astronomy) estimates and subtracts background using sigma clipping and spline interpolation, then uses thresholding to find sources in the background-subtracted image, deblends them if they overlap, and measures their positions and sizes using intensity moments. A very useful property of this versatile algorithm is that it allows using a detection image that differs from the observed image and it can match sources with previously obtained catalogs.

CUPID²’s reinhold (oriented primarily to analyzing clumps in submillimeter data cubes) identifies “pixels within the image which mark the edges of emission clumps, producing a set of rings around the clumps. However, these structures can be badly affected by noise in the data and so need to be cleaned up. This is done using cellular automata which first dilate the rings or shells, and then erode them. After cleaning, all pixels within each ring or shell are assumed to belong to a single clump” (see the reference in the footnote).

CUPID’s fellwalker (also oriented towards clumps in submillimeter data cubes) finds image peaks by tracing the line of the steepest ascent, considering every pixel with a value above a specified threshold as a starting point for a walk to a peak along the steepest gradient. Having reached a peak, it searches for an even higher pixel intensity in a neighborhood; when found, the algorithm switches to that pixel and continues uphill. If a peak is found that is higher than all pixels in its neighborhood, a clump has been detected and the algorithm marks all pixels visited along the way as belonging to the clump.

Motte et al. (2003, 2007)’s mre-gcl (created for studies of star formation using ground-based submillimeter continuum imaging) combines gaussclumps with the image filtering based on a wavelet decomposition. The algorithm decomposes an image in spatial scales using an isotropic wavelet decomposition with the multi-resolution code mr_transform (Starck & Murtagh 2006), subtracts all scales larger than the largest scale of interest from the original image, and uses gaussclumps to detect and measure sources in the filtered image. Then the user defines each source’s largest extent as twice its measured size and repeats the decomposition and filtering steps for each source, runs gaussclumps again with the aim to improve the measurements of sizes and fluxes.

Molinari et al. (2011)’s cutex (developed for studying star formation with Herschel) attempts to overcome the difficulty of thresholding of an entire image: highly-variable backgrounds were expected in star-forming regions and indeed observed with Herschel. It analyzes multi-directional second derivatives of the original image and performs curvature thresholding to isolate compact sources out of extended emission, then fits variable-size elliptical Gaussians (adding also a planar background) at their positions. The algorithm can fit up to 8 Gaussians simultaneously in crowded areas, if sources are closer than two observational beam sizes. This method works only for compact sources with sizes up to approximately 3 times the beam size.

Kirk et al. (in prep.)’s csar (developed for use with the BLAST and Herschel images) is another method that defines clumps in terms of connected pixels. A source is defined as a region of connected pixels bound by a closed isophotal contour that contains at least one pixel that is at 3σ (where σ is the standard deviation) above the bounding contour level. The algorithm starts contouring just below the peak on the image and walks down until some predefined background is reached; sources are considered finished just before they become connected to others. Sources are not allowed to overlap and no attempt is made to assign flux outside of the closed contours to any source. The technique was designed with a “purpose of replicating what a (trained) human would do with an image if extracting sources manually” (Kirk, priv. comm.).

Crowded regions that are frequently observed with Herschel; the deblending of overlapping sources is the origin of major uncertainties. Whereas clumpfind, reinhold, fellwalker, and csar merely partition the image between sources not allowing them to overlap, gaussclumps, sextractor, mre-gcl, and cutex can deblend overlapping sources, which is quite an essential property for obtaining accurate results in crowded regions. We feel the need to stress here that the observed images are only projections of the complex three-dimensional reality onto the plane of the sky and, as such, it is a fundamental source of major uncertainties in the interpretation of observations and in the derived properties of objects. As we know from Herschel observations (e.g., Men’shchikov et al. 2010; Arzoumanian et al. 2011), the interstellar medium is highly filamentary, thus the filaments’ orientations play a very significant role in the appearance of the regions we observe.

1.2. Introducing a new approach

In this paper, we present the source extraction algorithm getsources developed with the aim to overcome the shortcomings of the existing methods and provide researchers in this Herschel era with a better extraction tool. For details on the astrophysical context of this work, we refer to Appendix A.

To clarify our terminology, we shall use the term noise to refer to the statistical instrumental noise including possible contributions from any other signals that are not astrophysical in nature, i.e. which are not related to the emission of the areas in space we are observing. In contrast, the term background will refer to the (filamentary) astrophysical backgrounds, such as cirruses or molecular clouds, containing the sources we are observing and objects we are studying (e.g., stars, protostars, cores, etc.)³. In order to reduce possible confusion, we are going to clearly distinguish between the morphologically-simple (convex, not very elongated) sources of emission as determined by the source extraction algorithms and the objects of specific astrophysical nature that are selected from the entire extraction catalog on the basis of all available information (besides the images) and some additional assumptions, criteria, and techniques, depending on the specific interests of a researcher.

The problem of detecting sources in continuum images usually reduces to finding significant intensity peaks, as such images provide us with just complex intensity distributions of the sky over an observed area. At the present state of the art, source extraction procedures do not know anything about the astrophysical nature or true physical properties of the objects that produced the emission of those significant peaks. An extraction algorithm can only detect sources (that are possibly harboring our objects of interest) and determine their apparent two-dimensional intensity distributions above the variable background and noise, and measure their apparent properties at each wavelength. This is the only information contained in the images, which is available to any extraction algorithm. The purpose of source extraction is to detect as many real sources as possible (distinguishing them from noise and background fluctuations) and to measure their apparent properties as accurately as possible⁴.

A catalog of such sources serves as the fundamental basis for all subsequent in-depth studies of the various objects of different nature. Only after we have detected significant sources of emission and measured all possible apparent properties (peak intensities, integrated fluxes, sizes, etc.), can we utilize those results to infer the real astrophysical nature and properties of the objects⁵. At this step, one should combine all the information in different wavebands from the extraction catalog and also from previous or follow-up observations, as well as any other available information, and classify the objects according to their physical nature and use them to derive new astrophysical knowledge. In what follows, we only consider the sources and leave the definition of objects to the future papers exploring various astrophysical applications of this new extraction algorithm.

Unlike other source extraction algorithms (except mre-gcl, Sect. 1.1), the new method analyzes fine spatial decompositions of original images across a wide range of scales and across all wavelengths (Sect. 2.2). As part of its multi-wavelength design, getsources removes the noise and background fluctuations from the decomposed images (Sect. 2.3) separately in each band, and constructs a set of wavelength-independent detection images (Sect. 2.4) that preserve information in both spatial and wavelength dimensions as well as possible. Sources are detected in the combined detection images by following the evolution of their segmentation masks across all spatial scales (Sect. 2.5). Measurements of the source properties are performed in the original images at each wavelength after the background has been subtracted by interpolation under the sources’ “footprints” and after overlapping sources have been deblended (Sect. 2.6). To facilitate visual analysis of the extraction results and various steps of the algorithm, a number of useful images are created for each waveband (Sect. 2.7). Based on the results of the initial extraction, detection images are “flattened” to produce much more uniform noise and background fluctuations in preparation for the second, final extraction (Sect. 2.8). The performance of getsources for Herschel images is illustrated on small sub-fields of the Aquila and Rosette star-forming regions (Sect. 3).

2. The getsources extraction method

The fundamental problem in extracting sources from observed images is that all spatial scales are mixed together and the intensity of any given pixel contains an unknown contribution from the noise, background, and surrounding blended sources. The central problem in accurate source extraction is to separate those contributions from the signal of the real sources.

The main idea of getsources is to analyze decompositions of original images (at each wavelength) across a wide range of spatial scales separated by only a small amount (typically ~ 3−5%). Replacing originals with a set of strongly filtered images brings several significant advantages. Each of the “single scales” contains non-negligible signals from only a relatively narrow range of spatial scales, mostly only from those sources (and the noise and background fluctuations) which have sizes similar to the scale considered. In effect, this automatically filters out all contributions of the noise, background, or overlapping sources on irrelevant (much smaller and larger) spatial scales. An immediate benefit is that such a filtering allows one to manipulate entire single-scale images as a whole and use thresholding to separate sources from the background and noise in the observed images (see Sect. 2.3 for details). Furthermore, considering the same spatial scales across a wide range of wavelengths allows one to sum up single-scale images at all wavelengths in combined (wavelength-independent) single-scale detection images and thus preserve the high-resolution information across all wavebands, minimizing the effect of degrading resolutions. Besides providing a substantial “super-resolution” effect, this eliminates the need of matching multiple catalogs obtained with different beams and reduces the matching and measurement errors.

Fig. 1

Main processing blocks of the getsources algorithm described in Sects. 2.1–2.7.

The extraction method is represented by 7 processing blocks shown in Fig. 1; they will be described below in Sects. 2.1–2.7. In order to make a clear distinction between images and various other parameters, the images are denoted throughout this paper by the capital calligraphic characters (e.g., $\hbox{$\mathcal{A}, \mathcal{B}, \mathcal{C}$}$ ; see Appendix B for a list of all symbols and definitions). The following subsections describe the algorithm in full detail.

2.1. Preparing observed and detection images

The first step (Fig. 1) towards the source extraction is to convert the original images ℐ_λ at all wavelengths to the same grid. This means to transform them into the observed images ℐ_λO, all with the same numbers of pixels, the same pixel size, aligned across wavelengths as accurately as possible (covering the same area on the sky), the same reference pixel and its coordinates. In practice, this is done by resampling all images to the same pixel size using the astronomical utility SWarp (Bertin et al. 2002).

Note that the alignment of images must be carefully checked before extracting sources with getsources, because of its multi-wavelength design. Images in all wavebands will be combined together in wavelength-independent detection images (Sect. 2.4) that will only be good if the originals are aligned within one pixel; significant misalignments of the images can create spurious sources⁶.

Fig. 2

Single-scale decomposition (Sect. 2.2). The central 0 $\overset{°}{.}$ $\hbox{${\fdg}$}$ 44 × 0 $\overset{°}{.}$ $\hbox{${\fdg}$}$ 44 sub-field of the detection image ℐ_λD of a 1° × 1° simulated star-forming region at 350 μm (upper left). Its single-scale images ℐ_λDj are shown at the scales indicated (left to right, top to bottom) for j = 17,30,43,57,70, N_S = 99, f_S = 1.053, S₁ = 4′′, S_{N_S} = 660′′ (see Eq. (1)). The image dimensions are 1800 × 1800 pixels and the pixel size Δ = 2′′. The scales were selected to be separated by a factor of 2 to illustrate the spatial decomposition. The scale sizes S_j are visualized by the yellow-black circles and annotated at the bottom of the panels. For better visibility, the values displayed in the panels are somewhat limited in range; the color coding is a linear function of intensity in MJy/sr.

The observed images ℐ_λO are only used to measure properties of detected sources, at the end of an extraction (Fig. 1, Sect. 2.6). Most of the processing in the algorithm is done on the detection images ℐ_λD, which in the simplest case may be the same as ℐ_λO, but in general can significantly differ from the latter. Any transformations of the observed images that have a potential to improve detection quality (such as completeness, reliability) can be used as ℐ_λD; this can be convolution, multiplication by weight images, subtraction of baseline images, etc. For example, one may want to sacrifice a little bit of the nominal angular resolution in order to reduce the unphysical pixel-to-pixel noise present in the images (on scales smaller than the observational beam size O_λ) using convolution $\hbox{$\mathcal{I}_{{\!\lambda}{\rm D}} = \mathcal{G}_{\lambda} * \mathcal{I}_{{\!\lambda}{\rm O}}$}$ , where $\hbox{$\mathcal{G}_{\lambda}$}$ is the smoothing Gaussian with full width at half maximum (FWHM) chosen to slightly degrade (by ~5%) the image resolution O_λ. This suppresses unphysical noise and small-scale artifacts in ℐ_λD that otherwise may become enhanced in the smallest-scale decomposed images. Being the default way of creating detection images in getsources, such smoothing is not required and may be skipped, if not deemed beneficial. Another example of a detection image that may be useful in some applications is a column density map produced by pixel-to-pixel SED fitting of the observed images.

The last part of the preparation is creating the observational masks ℳ_λ. Those are images with the pixel values of either 1 or 0, defining the areas in the original images that we are interested in. In practice, some areas of the observed rectangular images may not have been covered, some other areas may contain high noise or artifacts. The mask images are used by getsources to exclude from processing any area of ℐ_λD in which the mask has zero values; in the simplest ideal case, ℳ_λ has values of 1 in all pixels. Very noisy areas have the potential to affect the cleaning and detection algorithms described below and every effort must be made to exclude such areas using carefully prepared observational masks.

2.2. Decomposing detection images in spatial scales

The spatial decomposition is done by convolving the original images with circular Gaussians and subtracting them from one another (we call this procedure successive unsharp masking): $ℐ_{λ D j} = 𝒢_{j - 1} * ℐ_{λ D} - 𝒢_{j} * ℐ_{λ D} (j = 1, 2, ..., N_{S}),$ $\begin{equation} \mathcal{I}_{{\!\lambda}{\rm D}{j}} = \mathcal{G}_{j-1} * \mathcal{I}_{{\!\lambda}{\rm D}} - \mathcal{G}_{j} * \mathcal{I}_{{\!\lambda}{\rm D}} \,\,(j{\,=\,}1, 2,\dots, N_{\rm S}), \label{successive.unsharp.masking} \end{equation}$ (1)where ℐ_λD is the detection image (Sect. 2.1) at wavelength λ, ℐ_λDj are its “single-scale” decompositions, $\hbox{$\mathcal{G}_{j}$}$ are the smoothing Gaussian beams ( $\hbox{$\mathcal{G}_0$}$ is a two-dimensional delta-function). The latter have FWHM sizes S_j = f_S S_j − 1 in the range 2 Δ ≲ S_j ≲ S_max, where Δ is the pixel size, f_S > 1 is the scale factor, S_max is the maximum spatial scale considered, and the number of scales N_S depends on the value of f_S (typically ≈ 1.05) and S_max. We adopt $S_{\max} = 3 \max {A_{λ}^{\max}}$ $\hbox{$S_{\rm max}{\,=\,}3\,\max\,\{A^{\rm max}_{\lambda}\}$}$ , where $A_{λ}^{\max}$ $\hbox{$A^{\rm max}_{\lambda}$}$ is the maximum FWHM sizes of sources to be extracted and an upper limit for S_max is the image size⁷.

Equation (1) implicitly assumes that the convolved images are properly rescaled to conserve their total flux; therefore, the original image can be recovered by summing up all scales: $ℐ_{λ D} = \sum_{j = 1}^{N_{S}} ℐ_{λ D j} + 𝒢_{N_{S}} * ℐ_{λ D} .$ $\begin{equation} \mathcal{I}_{{\!\lambda}{\rm D}} = \sum\limits_{j=1}^{N_{\rm S}}\mathcal{I}_{{\!\lambda}{\rm D}{j}} + \mathcal{G}_{N_{\rm S}} * \mathcal{I}_{{\!\lambda}{\rm D}}.\label{recovered.original.image} \end{equation}$ (2)Before convolution, the input images ℐ_λD are expanded from the edges of the areas covered by the observational masks ℳ_λ towards the image edges and the entire images are further expanded on all sides by a large enough number of pixels (3 S_j/Δ) in order to avoid undesirable border effects. Both expansions are performed using the pixel values at the edges of the masks and images, respectively; after convolution, the images are reduced back to their original size.

Small values of f_S ensure the best spatial resolution of the single scales, just like fine mesh sizes always better resolve structures in numerical methods. For practical reasons, the minimum value of f_S is 1.03 and the maximum value of N_S is 99. For large f_S, the single scales actually contain mixture of a wide range of scales, and faint small-scale structures become completely diluted by the contribution of irrelevant scales. Again, this is very similar to any finite-difference numerical methods, where structures smaller than a few grid zones disappear within large structures resolved by coarse grids⁸.

Fig. 3

Single-scale removal of noise and background (Sect. 2.3). The field of Fig. 2 is shown as the full clean image ℐ_λD C at 350 μm (upper left) that accumulates clean images over all scales. The same set of spatial scales is displayed in the single-scale images ℐ_λDj C (left to right, top to bottom), cleaned of noise and background with an iterative procedure described in Sect. 2.3. All intensity peaks visible in the scales belong to the sources; most of the noise and background fluctuations (visible in Fig. 2) have been removed. The scale sizes S_j are visualized by the yellow-black circles and annotated at the bottom of the panels. For better visibility, the values displayed in the panels are somewhat limited in range; the color coding is a function of the square root of intensity in MJy/sr.

To illustrate the spatial decomposition and all other processing steps of getsources, we shall use images of a simulated star-forming region that we constructed well before the launch of Herschel in order to have a reasonably realistic model for testing various aspects of our future observational program (the source extraction methods, instruments simulators, etc.); we refer to Appendix C for more details. The spatial decomposition of images using convolution has a clear interpretation in terms of the Fourier transform. Interested readers are referred to Appendix D, where we present the Fourier amplitudes for the individual components of the simulated sky and for a few selected scales of the decomposed images, as well as examples from the actual Herschel observations⁹.

A sub-field of the simulated region at 350 μm (Fig. 2, upper left) clearly shows all ingredients: the cirrus background, protostars (bright compact peaks), and fainter starless cores of various sizes, from completely unresolved to very extended. Many sources vanish into the background and also many sources are blended with others. However, the single-scale decompositions filter out emission at all irrelevant scales and display the sources with a much higher contrast than the full image does. The decomposition of Eq. (1) thus naturally selects sources of specific sizes, which become best visible in the images with matching scales. The negative rings around bright sources are the direct consequence of the successive unsharp masking, the subtraction of an image convolution with a larger smoothing beam from an image convolved with a smaller beam. All peaks at the first four scales shown in Fig. 2 are identifiable with the corresponding peaks in the full image. However, the situation becomes more problematic as we proceed to larger scales, such as that displayed in the last panel of Fig. 2. For even larger scales, up to the entire image size, intensities from sources of all sizes become so heavily mixed and diluted in the large smoothing beams that one cannot disentangle the individual structures anymore.

This demonstrates the need to set a reasonable upper limit for the sources to be extracted¹⁰. Indeed, if huge structures were to be allowed (with sizes by orders of magnitude larger than O_λ), they would also have to be used in the background subtraction and deblending, both of which become increasingly inaccurate on very large scales. For accurate removal of the background, one has to either approximate or interpolate it; both approaches become highly uncertain when large distances are involved. Deblending of overlapping structures requires a good approximation of their intensity distributions, which also becomes inaccurate on very large scales. This would also greatly reduce the quality of the detection and measurements for the majority of “normal” sources. Many of the latter would be fully contained within the much larger (sub-structured) sources and to accurately measure them, one has to consider the large sources as their background¹¹.

Fig. 4

Single-scale cleaning residuals (Sect. 2.3). The field of Fig. 2 is shown as the reconstructed image ℐ_λD R of the residuals at 350 μm (upper left) that accumulates cleaning residuals from all scales. The same set of spatial scales is displayed in the single-scale images of the residuals ℐ_λDj R (left to right, top to bottom). The cleaning procedure left no significant intensity peaks of the simulated objects in the residuals, only the noise- and background-dominated pixels (cf. Fig. 7). The scale sizes S_j are visualized by the yellow-black circles and annotated at the bottom of the panels. The color coding is a linear function of intensity in MJy/sr.

Fig. 5

Using skewness and kurtosis for iterating accurate thresholds ϖ_λj in the cleaning process (Sect. 2.3). The upper panels show histograms for the full image ℐ_λD at 350 μm (left) and for the decomposed images ℐ_λDj at the 9′′ and 36′′ scales (middle, right) before cleaning. The lower panels display histograms for the reconstructed residuals ℐ_λD R (left) and for the residuals ℐ_λDj R of the two decomposed images of the upper panels (middle, right). The vertical lines in the left panels indicate the standard deviation σ_λ (short dash, green) and 6 σ_λ (long dash, magenta) computed in the full image ℐ_λD. In the other panels they indicate the converged values of the standard deviation σ_λj and 6 σ_λj in the single-scale images ℐ_λDj, as well as the final thresholds ϖ_λj (solid, red). The histogram of the residuals ℐ_λD R of the cleaning process, reconstructed from all spatial scales (lower left) has much greater symmetry and resembles a Gaussian distribution, whereas the histogram of the full image ℐ_λD (red, copied from the upper-left panel) is highly asymmetric. Both $s_{λ}^{\max}$ $\hbox{$s^{\rm max}_{\lambda}$}$ and $k_{λ}^{\max}$ $\hbox{$k^{\,\rm max}_{\lambda}$}$ have a value of 3.17 and the corresponding variable factors n_λj have the values of 4.52 and 4.09 for the two single-scale images. The width of the intensity bins is 1 MJy/sr in the left panels and 0.004 MJy/sr in all other panels.

2.3. Cleaning single scales of noise and background

Before one can use the single-scale detection images ℐ_λDj for source extraction, they need to be cleaned of the contributions of noise and background to make sure that most (if not all) non-zero pixels belong to real sources. The noise and background fluctuations in the far-infrared and submillimeter images of interstellar clouds, such as those coming from the Herschel Gould Belt and HOBYS surveys, do not follow Gaussian statistics. A great advantage of the fine spatial decomposition employed by getsources (Sect. 2.2) is that the emission fluctuations in the decomposed images of interstellar clouds or Galactic cirrus become approximately Gaussian. Thus, significant departures from Gaussian distribution in the single-scale images indicate the presence of sources. For more details and illustrations in terms of the power spectra of the components of the actual Herschel images, we refer to Appendix E; see also Fig. 5 below.

Cleaning can be done by global intensity thresholding of the single-scale images, as the larger-scale background has been effectively filtered out by the spatial decomposition. Unlike the original images ℐ_λO or ℐ_λD that often have a very strong and highly-variable background, the entire single-scale images are “flat” in the sense that all signals on considerably larger scales have been removed (see Fig. 2). Another advantage of this single-scale cleaning is that the noise contribution depends very significantly on the scale. For example, the small-scale noise gets heavily diluted at large scales, where extended sources become best visible. In effect, in a reconstructed clean image ℐ_λD C = ∑ _jℐ_λDj C one can see large structures better (deeper) than in ℐ_λD.

To clean the single-scale images, we designed an iterative algorithm¹² that automatically finds at each scale a cut-off level that separates the signal of significant sources from those of the noise and background. At the first scale (j = 1) it computes the cut-off (threshold) ϖ_λj = n_λj σ_λj for the image ℐ_λDj ℳ_λ, where σ_λj is the standard deviation over the entire image and n_λj is a variable factor having an initial value of n_λ1 = 6 (this j = 1 value was found to be optimal in our tests). Then the procedure masks out all pixels with the values | I_λj | ≥ ϖ_λj and repeats the calculation of σ_λj over the remaining pixels, estimating a new threshold, which is generally lower than the one at the previous iteration. The procedure masks out bright pixels again and iterates further, always computing σ_λj at | I_λj | < ϖ_λj, outside the peaks and hollows, until ϖ_λj converges (Δϖ_λj < 1%). To produce a clean single-scale image ℐ_λDj C, all pixels with I_λj < ϖ_λj are zeroed, which (ideally) leaves non-zero only those pixels that belong to significant peaks from sources. Several examples of clean images are displayed in the last five panels of Fig. 3.

In addition, the low-intensity pixels | I_λj | < ϖ_λj define the single-scale images of the residuals ℐ_λDj R, as well as the reconstructed image of the residuals ℐ_λD R = ∑ _jℐ_λDj R. The images are shown in Fig. 4, where one can clearly see that the single-scale residuals are much “flatter” than those accumulated over all scales. This illustrates why the single-scale cleaning can be performed on the entire images using thresholding, in contrast to the full images (for more illustrations, see Figs. 2, 5). Furthermore, the flattening step of our method (Sect. 2.8) ensures that also the standard deviations of the single-scale residuals become as uniform as possible over the entire image.

Starting with the second scale, j = 2, the factor n_λj is allowed to become lower than its initial value of n_λ1 = 6 at j = 1, depending on the image and the scale. Being an important parameter for the iterations to converge to accurate cut-off levels, n_λj must be accurately chosen. Empirical evidence shows that if n_λj were smaller than an optimal value, the iterations would converge to ϖ_λj that is too low, resulting in noise peaks contaminating the clean images. On the other hand, if n_λj were larger than its optimal value, the iterations would converge to a value of ϖ_λj not deep enough, thus some fainter sources present in ℐ_λDj would be missing in the clean images ℐ_λDj C.

To determine the appropriate values of n_λj, one can use the higher-order statistical quantities, skewness and kurtosis $s_{λj} = μ_{3 λj} σ_{λj}^{-3}, k_{λj} = μ_{4 λj} σ_{λj}^{-4} - 3,$ $\begin{equation} s_{{\lambda}{j}}{\,=\,}\mu_{3{\lambda}{j}}\,\sigma_{{\lambda}{j}}^{-3},\,\,\, k_{{\lambda}{j}}{\,=\,}\mu_{4{\lambda}{j}}\,\sigma_{{\lambda}{j}}^{-4}-3, \label{skewness} \end{equation}$ (3)where μ_3λj and μ_4λj are the third and fourth moments about the mean (both s_λj and k_λj are zero for a standard normal distribution). The idea is that when the pixel distribution of the residuals ℐ_λDj R becomes too asymmetric (large | s_λj | ) or too peaked (large k_λj), the optimal value of n_λj must actually have been lower. Thus, having iterated the cut-offs ϖ_λ1 at the first scale, getsources computes the upper limits to s_λj and k_λj given by an empirical formula $s_{λ}^{\max} = k_{λ}^{\max} = \begin{matrix} \max \end{matrix} {2.14 \ln (\frac{I_{λ D 1}^{\max}}{σ_{λ 1}} + 220) - 11.3, 0.25},$ $\begin{equation} s^{\,\rm max}_{\lambda}{\,=\,}k^{\,\rm max}_{\lambda}{\,=\,}\max\left\{2.14\,\ln \left(\frac{I^{\,\rm max}_{{\!\lambda}{\rm D}{1}}}{\sigma_{{\lambda}{1}}}+220\right)-11.3,0.25\right\}, \label{nsigma.limits} \end{equation}$ (4)where $I_{λ D 1}^{\max}$ $\hbox{$I^{\,\rm max}_{{\!\lambda}{\rm D}{1}}$}$ is the maximum pixel intensity over ℐ_λD1 ℳ_λ.

When iterations at scale j converge to a threshold ϖ_λj, getsources computes s_λj and k_λj in the image of the residuals ℐ_λDj R. If $| s_{λj} | > s_{λ}^{\max}$ $\hbox{$|s_{{\lambda}{j}}|{\,>\,}s^{\,\rm max}_{\lambda}$}$ or $k_{λj} > k_{λ}^{\max}$ $\hbox{$k_{{\lambda}{j}}{\,>\,}k^{\,\rm max}_{\lambda}$}$ , the algorithm slightly (by a few percent) reduces n_λj, whose initial value is n_λj − 1 from the previous scale, and re-iterates ϖ_λj. This procedure ensures that s_λj and k_λj always stay within the empirical bounds in the process of obtaining the thresholds and cleaning the single-scale images. Extensive experimentation has shown that the limits of Eq. (4) work very well for all images that we have tested¹³.

Fig. 6

Two types of combined single-scale images (Sect. 2.4). Schematically shown are the image ℐ_Dj C (left) used to detect sources and track the evolution of their shapes and the image $ℐ_{D j C}^{'}$ $\hbox{$\mathcal{I}^{\prime}_{{\rm D}{j}{\,\rm C}}$}$ (right) used to determine the characteristic scales and initial footprints.

The pixel distributions shown in Fig. 5 illustrate the cleaning algorithm. The histogram for the original image ℐ_λD shown in the upper-left panel contains all spatial scales and is therefore very wide and asymmetric; it cannot be used to separate sources from noise and background using global thresholding. All scales are blended together in such images and any threshold would enable one to either find only the brightest peaks losing most fainter sources or create many spurious peaks from the pixels belonging to the variable background or noise. In contrast, the highly-filtered single-scale images ℐ_λDj contain only narrow ranges of spatial scales and thus the histograms of the residuals ℐ_λDj R (representing the background and noise fluctuations) are much narrower and symmetric, resembling a Gaussian distribution (Fig. 5). Having no signals from all irrelevant larger scales, the images are much “flatter” and therefore well suited for the cleaning algorithm. The single-scale histograms show that using the upper limits of Eq. (4) for skewness and kurtosis helps in correcting the variable factor n_λj and thus in getting deeper thresholds ϖ_λj and much better detection of fainter sources while avoiding creation of spurious sources.

2.4. Combining clean single scales over all wavelengths

The cleaning algorithm outlined in Sect. 2.3 is applied to the single-scale detection images ℐ_λDj independently in each waveband. Clearly, getsources also works the same way for just one image at a single wavelength. It is necessary to fully complete “monochromatic” extractions as they provide the footprints of all sources and also the estimates of the sizes $A_{λ}^{\max}$ $\hbox{$A^{\rm max}_{\lambda}$}$ of the largest sources, the information used by our flattening procedure (Sect. 2.8). Combining wavelengths means utilizing all information across all bands and this should be used to improve the detection and measurement qualities over the simple approach of matching separate catalogs obtained in each waveband independently. Whereas combining independent catalogs on the basis of the association radius is possible for a few images obtained with similar angular resolutions, this usual approach of associating sources between wavelengths would introduce large and unknown errors when applied to Herschel images whose resolutions differ by as much as a factor of ~7.

In general, it is impossible to combine images at different wavelengths in a meaningful way using full images containing signals on all spatial scales. The fine spatial decomposition employed by getsources (Sect. 2.2) that filters out signals from all irrelevant scales, together with the single-scale cleaning (Sect. 2.3) enable us to create wavelength-independent clean images that accumulate only significant intensity peaks from all wavelengths, representing potential sources. The combined images must be normalized because of highly varying intensities in different bands; there is no need to preserve the spectral behavior of sources in the images used only for detection (Sect. 2.5). Indeed, the wavelength-dependent properties of all sources will be measured in the observed images ℐ_λO (Sect. 2.6) after all sources have been detected.

The cleaning procedure described in Sect. 2.3 works well when the small-scale noise and background fluctuations are relatively uniform across the image and there are no strong artifacts. It is not unusual, however, for the observed images to contain quite variable noise and various types of artifacts, including those from the map-making process. In order to reduce possible contamination of the clean images with the pixels belonging to the noise peaks or artifacts, getsources additionally employs a lower limit on the number of pixels N_Πλ in a cluster of connected pixels that may remain in single-scale images: $N_{Π λ}^{\min} = \begin{matrix} \max \end{matrix} {\frac{1}{3} π {(\frac{O_{λ}}{2 Δ})}^{2}, N_{Π}^{\min}},$ $\begin{equation} N^{\,\rm min}_{{\Pi}{\lambda}} = \max \left\{ \frac{1}{3}\,\pi \left(\frac{O_{\lambda}}{2 \Delta}\right)^{2}\!, N^{\,\rm min}_{\Pi} \right\},\label{minpix} \end{equation}$ (5)where O_λ is the observational beam size, Δ is the pixel size, and the default value of the parameter $N_{Π}^{\min}$ $\hbox{$N^{\,\rm min}_{\Pi}$}$ is 4. Small clusters with $N_{Π λ} < N_{Π λ}^{\min}$ $\hbox{$N_{{\Pi}{\lambda}} < N^{\,\rm min}_{{\Pi}{\lambda}}$}$ are removed from the single-scale images ℐ_λDj C before the latter are combined.

Fig. 7

Combination of single-scale images (Sect. 2.4). The field of Fig. 2 is shown as source masks ℳ_DC accumulated over all scales and wavebands (upper left) that give a summary view of how the sources, made visible by the cleaning (cf. Fig. 3), change their shapes and sizes. The same set of spatial scales is displayed in the combined clean single-scale images ℐ_Dj C (left to right, top to bottom) that accumulate information at those scales from all wavelengths. For better visibility, the values displayed in the masks image are limited to 300 and in the normalized images ℐ_Dj C they are limited to 0.07, 0.3, 0.3, 0.6, and 0.6. The scale sizes S_j are visualized by yellow-black circles and annotated at the bottom of the panels. The color coding is a logarithmic function of intensity.

Fig. 8

Evolution of sources in the clean single-scale images (Sect. 2.5). The full image of the source at 350 μm with a size of 3 $\overset{'}{.}$ $\hbox{${\farcm}$}$ 7 × 3 $\overset{'}{.}$ $\hbox{${\farcm}$}$ 7 (left) has been cut out of the corresponding panel in Fig. 2 (the source is located south-east of the image center, it is resolved in all wavebands). The other panels show the source in single-scale images ℐ_λDj C at the scales of 18, 36, 138, and 199′′, maximum intensities in the panels being 0.31, 0.89, 3.09, and 1.84 MJy/sr, respectively. The scale sizes S_j are visualized by the blue circles and annotated at the bottom of the panels. The color coding is a linear function of intensity.

When combining single scales from different wavebands, getsources only sums up limited ranges of spatial scales, depending on the angular resolution and the sizes of sources that can be found in the images. The range of scales is limited from below because the smallest scales may contain decreasing (and progressively noisier) contribution for the images obtained at longer wavelengths with poorer resolutions. Single scales within a factor of 3 below the nominal resolution in each band may still contain considerable signal from the sources. Accordingly, the lower limit of scales S_j being combined defaults to max{S₁,0.33 O_λ}, allowing for a substantial “super-resolution” in getsources. On large scales, the range of scales is also limited by the (wavelength-dependent) maximum sizes $A_{λ}^{\max}$ $\hbox{$A^{\rm max}_{\lambda}$}$ of sources (see Sect. 2.2). Initial guesses for $A_{λ}^{\max}$ $\hbox{$A^{\rm max}_{\lambda}$}$ are given as input to getsources, however, they are accurately determined during the initial extraction and used in the final extraction (Sect. 2.8).

Fig. 9

Single-scale source detection (Sect. 2.5). The field of Fig. 2 is shown as the initial footprint image ℱ_D after the source detection (upper left). The same set of spatial scales is displayed in the single-scale segmentation images ℐ_Dj S (left to right, top to bottom) showing the source segmentation masks determined from ℐ_Dj C (Fig. 7). The masks were obtained and analyzed by the detection procedure described in Sect. 2.5. The scale sizes S_j are visualized by the yellow-black circles and annotated at the bottom of the panels. The color coding is a function of the square root of the source number (which makes up the actual pixel values in these segmentation images).

It is necessary to define two sets of wavelength-combined single-scale images (illustrated in Fig. 6) for different purposes. The images ℐ_Dj C are used to detect sources and track the evolution of their shapes across all spatial scales (see Sect. 2.5) and the images $ℐ_{D j C}^{'}$ $\hbox{$\mathcal{I}^{\prime}_{{\rm D}{j}{\,\rm C}}$}$ are used to follow the dependence of the peak intensities of detected sources on scales. The first set of combined detection images is normalized such that all cleaning thresholds become equal to 1 in each band: $ℐ_{D j C} = \frac{1}{N} \sum_{λ} \frac{f_{λj}}{ϖ_{λj}} \begin{matrix} \max \end{matrix} {ℐ_{λ D j C}, 𝒯_{λ j}},$ $\begin{equation} \mathcal{I}_{{\rm D}{j}{\,\rm C}}{\,=\,}\frac{1}{N}\sum_{\lambda}\frac{f_{{\lambda}{j}}\,}{\varpi_{{\lambda}{j}}} \max \left\{ \mathcal{I}_{{\!\lambda}{\rm D}{j}{\,\rm C}}, \mathcal{T}_{{\!\lambda}{j}} \right\}, \label{combining.waves.1} \end{equation}$ (6)where $\hbox{$\mathcal{T}_{{\!\lambda}{j}}$}$ is the threshold image (all pixels of which are equal to ϖ_λj), N is the number of wavelengths, and f_λj gradually “turns on” the scales smaller than the observational beam O_λ (for larger scales, f_λj = 1): $f_{λj} = {(\frac{S_{j}}{O_{λ}})}^{3}, \begin{matrix} \max \end{matrix} {S_{1}, 0.33 O_{λ}} \leq S_{j} \leq O_{λ} .$ $\begin{equation} f_{{\lambda}{j}}{\,=\,}\left(\frac{S_{\!j}}{O_{\lambda}}\right)^3, \,\,\, \max \left\{S_{\!1}, 0.33\,O_{\lambda}\right\}{\,\le\,}S_{\!j}{\,\le\,}O_{\lambda}. \label{normalization} \end{equation}$ (7)The renormalization of images, the threshold ϖ_λj in the curly brackets, and the turn-on factor f_λj ensure that the images from different wavebands become smoothly combined in ℐ_Dj C (Eq. (6)) with no discontinuities and that there are no large changes in the combined images between any two adjacent scales (Fig. 6, left).

The normalization of ℐ_λDj C to a common threshold before summing them up is the most natural way of combining wavelengths to maximize sensitivity. It removes, however, the natural dependence of the source brightness on scales, which is used by our detection algorithm to determine the characteristic scale and initial footprint for each source (Sect. 2.5). Our second set of combined images allows that, as they are normalized only at the smallest scale (there is no reason to use the factor 1/N here): $ℐ_{D j C}^{'} = \sum_{λ} \frac{w_{λ}}{I_{λ D 1 C}^{\max}} ℐ_{λ D j C},$ $\begin{equation} \mathcal{I}^{\prime}_{{\rm D}{j}{\,\rm C}}{\,=\,}\sum_{\lambda} \frac{w_{\lambda}}{I^{\,\rm max}_{{\!\lambda}{\rm D}{1}{\,\rm C}}}\,\mathcal{I}_{{\!\lambda}{\rm D}{j}{\,\rm C}}, \label{combining.waves.2} \end{equation}$ (8)where $I_{λ D j C}^{\max}$ $\hbox{$I^{\,\rm max}_{{\!\lambda}{\rm D}{j}{\,\rm C}}$}$ is the maximum intensity over ℐ_λDj C and the weight w_λ enhances contributions of the images with higher angular resolutions: $w_{λ} = {(\frac{O̅}{O_{λ}})}^{γ},$ $\begin{equation} w_{\lambda}=\left(\frac{\bar{O}}{O_{\lambda}}\right)^{\gamma}, \label{weighting} \end{equation}$ (9)where $\hbox{$\bar{O} = N^{-1} \sum_{\lambda} O_{\lambda}$}$ is the average observational beam size and γ > 1 is a weighting exponent with a default value of 6. The aim of such weighting is to effectively separate the contributions of different wavebands over the intensities when computing $ℐ_{D j C}^{'}$ $\hbox{$\mathcal{I}^{\prime}_{{\rm D}{j}{\,\rm C}}$}$ (Eq. (8)). After the weighting, the summation of ℐ_λDj C practically does not alter their individual intensity profiles (Fig. 6, right). If a source exists in some high-resolution single-scale images at a certain wavelength, the dependence of its peak intensity across scales is fully preserved in $ℐ_{D j C}^{'}$ $\hbox{$\mathcal{I}^{\prime}_{{\rm D}{j}{\,\rm C}}$}$ . In the detection process, the source position, characteristic size, and footprint are largely determined by the high-resolution images and the spatial scale where it becomes the brightest (Sect. 2.5). The initial size and footprint obtained in the detection step are recomputed during the measurement iterations (Sect. 2.6).

Figure 7 illustrates our method of combining wavelengths. The upper-left panel shows an image of single-scale masks accumulated over scales and all Herschel wavebands, obtained from the clean single-scale images: $ℳ_{DC} = \sum_{λ} \sum_{j} \frac{1}{ϖ_{λj}} \begin{matrix} \min \end{matrix} {ℐ_{λ D j C}, 𝒯_{λ j}} = \sum_{λ} \sum_{j} \frac{ℐ_{λ D j C}}{ℐ_{λ D j C}},$ $\begin{equation} \mathcal{M}_{{\rm D}{\rm C}}{\,=\,}\sum_{\lambda}\sum_{j} \frac{1}{\varpi_{{\lambda}{j}}}\min\,\left\{\mathcal{I}_{{\!\lambda}{\rm D}{j}{\,\rm C}},\mathcal{T}_{{\!\lambda}{j}}\right\} = \sum_{\lambda}\sum_{j} \frac{\mathcal{I}_{{\!\lambda}{\rm D}{j}{\,\rm C}}}{\mathcal{I}_{{\!\lambda}{\rm D}{j}{\,\rm C}}}, \label{accumulated.masks} \end{equation}$ (10)where $\hbox{$\mathcal{T}_{{\!\lambda}{j}}$}$ is the threshold image and, of course, the division of the image by itself can only be done in non-zero pixels. The accumulated mask image presents a cumulative view of how the sources made visible by cleaning change their shape and size across all scales and wavelengths. All sources that can possibly be found in the clean images have left their mark in ℳ_DC. The other panels of Fig. 7 display the combined detection images ℐ_Dj C for several single scales, which accumulate information from all wavebands. Comparing them with the corresponding panels of Fig. 3, one can verify that each combined scale is indeed populated by more sources than any of the clean single scales at individual wavelengths.

2.5. Detecting sources in combined single-scale images

As can be seen in Figs. 3 and 7, sources “evolve” from small to large scales in single-scale images. In the clean images ℐ_Dj C, the sources appear at some relatively small scales (smaller than their actual size), become the brightest at a scale roughly equal to their size, and eventually vanish at significantly larger scales (see Fig. 8 for an illustration). The idea of our source detection scheme is to analyze all ℐ_Dj C (j = 1,...,N_S), identifying the sources and tracking the “evolution” of their shapes from small to large scales. For that purpose, getsources employs the Tint Fill algorithm (Smith 1979)¹⁴, developed for coloring arbitrary shapes in digital images. The algorithm assumes that the sets of pixels belonging to any shape are 4-connected, i.e. that for any pair of pixels Π_l and Π_m in the shape, there is a path from Π_l to Π_m through pixels in that shape, such that neighboring pixels in the path are connected to each other only by their sides (Fig. 10). Given a set of 4-connected (side-connected) pixels, each one having the same property (e.g., color), the algorithm fills all (and only) pixels of that shape with a new value¹⁵. The algorithm has been slightly generalized to be suitable for the source detection in single-scale images by replacing color with another pixel property of having a non-zero intensity.

The modified version of the Tint Fill algorithm looks, at each scale (j = 1,2,...,N_S), for 4-connected areas of non-zero pixels in ℐ_Dj C and assigns the value of the running source number i to each of those connected pixels, producing a segmentation image ℐ_Dj S. The latter consists of the segmentation masks of the sources found at scale j and previous smaller scales (getsources always analyzes single-scale images from small to large scales). A source mask is the area of 4-connected pixels (with values i) in ℐ_Dj S, all those pixels having non-zero intensity in ℐ_Dj C at j or at smaller scales. The masks uniquely identify the sources and they allow tracking them across all single scales (Fig. 9).

In order to better disentangle and follow the evolution of blended sources in ℐ_Dj C, the algorithm splits the images between their maximum and ϖ_j by a number of intensity levels ω_j l with a spacing δlnω_j l = 0.1 or smaller. At each scale j, our source detection procedure works on a sequence of “partial” images ℐ_{Dj C l} = max { ℐ_Dj C,ω_j l } , from top to the bottom, where the last partial image is the entire single-scale image ℐ_Dj C. For better efficiency, we only consider those levels ω_j l that increase the number of pixels in ℐ_{Dj C l} with respect to ℐ_{Dj C l − 1} by at least $δ N_{Π j}^{\min} = \begin{matrix} \max \end{matrix} \begin{matrix} ⎧ \\ ⎪ \\ ⎪ \\ ⎨ \\ ⎪ \\ ⎪ \\ ⎩ \end{matrix} \frac{1}{3} π {(\frac{S_{j}}{2 Δ})}^{2},δ N_{Π}^{\min} \begin{matrix} ⎫ \\ ⎪ \\ ⎪ \\ ⎬ \\ ⎪ \\ ⎪ \\ ⎭ \end{matrix},$ $\begin{equation} {\delta}N^{\,\rm min}_{{\Pi}{j}} = \max \left\{ \frac{1}{3}\,\pi \left(\frac{S_{\!j}}{2 \Delta}\right)^{2}\!, {\delta}N^{\,\rm min}_{\Pi} \right\}, \label{minpix.level} \end{equation}$ (11)where Δ is the pixel size and $δ N_{Π}^{\min} =$ $\hbox{${\delta}N^{\,\rm min}_{\Pi}{\,=\,}$}$ 9. The levels ω_j l with $δ N_{Π j} < δ N_{Π j}^{\min}$ $\hbox{${\delta}N_{{\Pi}{j}}{\,<\,}{\delta}N^{\,\rm min}_{{\Pi}{j}}$}$ are skipped by the detection algorithm. Processing a partial image ℐ_{Dj C l} within scale j, getsources first gives the coordinates of the sources already found at previous scales and levels to the modified Tint Fill algorithm and the latter fills the evolved shapes of the “old” sources with their numbers i. Then it checks all remaining pixels of ℐ_{Dj C l} to find new 4-connected areas of non-zero pixels that first appear at the current scale j; when found, the segmentation mask of each “new” source is filled with its new number i.

When an isolated source vanishes at a certain scale, its pixels are freed and may eventually become occupied by its neighbor or another (significantly larger) source that may appear there at a larger scale. One of two touching sources can also disappear from larger scales, when it becomes fainter at progressively larger scales and finally merges with the brighter neighbor (i.e., when the saddle point between the peaks disappears). When two or more isolated sources become connected in a single-scale image, their segmentation masks are not allowed to overlap. A boundary between two sources is maintained along the normal to the straight line connecting their centers (the normal being defined at a position along the line where intensity is at minimum); the sources can still grow at larger scales in the perpendicular direction. This boundary exists, however, only within a limited range of scales, until one of the touching sources vanishes or merges with another. The algorithm is perfectly able to handle hierarchical structures, as the segmentation masks can overlap at different scales for sources of significantly dissimilar dimensions. Whether a larger source containing smaller sources is detected as such or considered as the background, depends on the relative difference in the characteristic scales between the large and small sources, on the location of the small sources within the larger structure, and on the relative brightness of the peaks. If the source sizes are significantly different, the hierarchical structures are detected; a number of examples can easily be found in Fig. 9 (upper-left), Fig. 17 (right), and Fig. 18 (bottom).

Fig. 10

Topology of the pixel connections (Sect. 2.5). Pixels of the red shape are 4-connected and thus the shape may represent a source. Pixels of the dark blue shape are not 4-connected: there are no paths connecting any pair of the blue pixels such that any neighboring pixels in the paths are connected to each other only by their sides. Note, however, that there are two 4-connected sub-areas in the blue shape, which could represent sources.

One can show that a resolved isolated circular source i with the FWHM sizes A_i = B_i would have its maximum peak intensity I_ij in a single-scale image with a smoothing beam S_j ≈ A_i. Recall that the spatial decomposition (Eq. (1)) is based on convolution; the latter acts as a natural selector of scales in the decomposed images (cf. Sect. 2.2). Indeed, convolving with small beams (S_j ≪ A_i) would have almost no effect on the source, whereas using extended beams (S_j ≫ A_i) would greatly dilute the source. In both these extremes, sucessive unsharp masking produces decreasing peak intensities (Fig. 8) while creating the strongest signal for the sources with sizes A_i ≈ S_j; completely unresolved sources are the brightest at spatial scales S_j ≲ O_λ.

The scale j_F, where a source is the brightest, provides the best initial estimate for its actual FWHM size A_i ( = B_i) = S_{j_F}. This footprinting scale defines a source footprint, i.e. the entire area of the image pixels making non-negligible contribution to the total flux of the source. For unresolved sources, the footprints cover circular areas ≈ $π O_{λ}^{2}$ $\hbox{${\pi} O_{\lambda}^2$}$ , whereas for well-resolved sources with intrinsically Gaussian intensity distributions, they cover elliptical areas ≈ πA_iB_i. During the detection process, getsources creates initial footprints (cf. Fig. 9, upper-left) with full sizes of A_i F ( = B_i F) = 1.15 (2 S_{j_F}) = 2.3 S_{j_F}, where the empirical factor 1.15 was found to improve results in our tests. The footprints generally become elliptically-shaped in the measurement iterations (Fig. 15), reflecting the source shapes obtained from intensity moments (Sect. 2.6).

In a multi-wavelength extraction, the combined images are sums of rescaled images over all individual wavebands (Eq. (8)) using the weighting (Eq. (9)) that effectively separates ℐ_λDj C in the combined detection images $ℐ_{D j C}^{'}$ $\hbox{$\mathcal{I}^{\prime}_{{\rm D}{j}{\,\rm C}}$}$ . Although the weighting biases S_{j_F} towards higher-resolution images (makes the initial footprints smaller), this does not affect the results, as the sizes found in the detection process are replaced with the values obtained from the monochromatic images ℐ_λDj C during the measurement iterations (Sect. 2.6).

Coordinates of a source are computed from the moments of intensities (Appendix F) in a limited range of scales, only up to a scale 4 times larger than the one it first appeared at or up to the footprinting scale j_F, whichever is smaller. This gives more accurate positions than if recomputed at even larger scales, since the latter tend to mix in more and more of the signals from the surroundings and thus distort the single-scale intensity distribution of sources. To improve the positional accuracy even further, getsources uses only the pixels with values greater than I_ij/1.4, where I_ij is the peak intensity of a source i and the number in the denominator has been found empirically.

In general, observed images contain structures at all scales and there is a real danger of creating spurious sources by mistakenly detecting noise peaks or background fluctuations that happen to lie on top of larger structures. Although the latter may be relatively faint, they tend to “amplify” the small-scale noise and background and make the small structures appear as significant sources. This can be especially problematic if the local uncertainties of the peak intensities (Sect. 2.6) are not possible to estimate due to crowding, as the fluctuations outside the large structures may be noticeably smaller and thus the signal-to-noise ratio (S/N) may easily be overestimated.

There is a mechanism in getsources to greatly reduce this possibility. The idea is that small-scale peaks on top of larger structures tend to survive up to larger scales than they would do otherwise, without the “support” of the underlying structures. The latter tend to make the peaks eventually touch each other or merge at some spatial scale and this is used by the algorithm to identify and discard insignificant intensity peaks. Noise peaks on top of larger structures become diluted if considered relative to the intensity level contributed by the larger structure.

Fig. 11

Measurement iterations (Sect. 2.6). The processing block of measurements from Fig. 1 is shown at the top. It is sub-divided here in 5 algorithmic steps that are executed in iterations, until the footprint images ℱ_λ have converged, i.e. stable distributions of source footprints at each wavelength have been obtained. The deblending block itself includes iterations to disentangle contributions of many overlapping sources to the intensity of a pixel that belongs to all of them.

When getsources analyzes the intensity levels ω_j l of the partial images ℐ_{Dj C l}, it finds the level where two or more sources first touch each other and computes the contrast C_i = I_ij/ω_j l, where I_ij is the source maximum intensity in the image. The contrast of touching sources is checked to decide whether the sources should be treated as significant ones. The basis for this decision is the fact that when an insignificant source touches another source (either real or spurious), its contrast becomes quite low due to the underlying intensity of a larger structure. In practice, getsources requires that any real source must have its contrast C_i above the minimum value $C_{i}^{\min} = η \begin{matrix} \max \end{matrix} {C_{i A} C_{i E}, 1},$ $\begin{equation} C^{\,\rm min}_{i}{\,=\,}\eta \max \left\{ C_{i\,\rm A}{\,}C_{i\,\rm E}, 1 \right\}, \label{minimum.contrast} \end{equation}$ (12)where η is the configuration parameter of getsources (with the default value of 1.35), C_i A is the amplification factor, and C_i E is the elongation factor: $C_{i A} = \begin{matrix} \max \end{matrix} {\frac{1}{2} (\frac{F_{i lo}}{F_{i hi}}), 1}, C_{i E} = \begin{matrix} \max \end{matrix} {\frac{1}{2.5} (\frac{a_{i}}{b_{i}}), 1},$ $\begin{equation} C_{i\,\rm A}{\,=\,}\max \left\{ \frac{1}{2} \left(\frac{F_{i\,\rm lo}}{F_{i\,\rm hi}}\right), 1 \right\}, \,\, C_{i\,\rm E}{\,=\,}\max \left\{ \frac{1}{2.5} \left(\frac{a_{i}}{b_{i}}\right), 1 \right\}, \label{amplification.elongation} \end{equation}$ (13)where F_i lo is the flux integrated over the source segmentation mask below the current intensity level ω_j l (at which the source first touches another source) and F_i hi is the flux integrated above that level; a_i and b_i are the major and minor lengths of the source segmentation mask. These factors describe two aspects of the behavior of noise peaks on top of larger structures: the amplification factor increases with a stronger contribution of the underlying structure, whereas the elongation factor becomes large when the structure has filamentary shape. For C_i A C_i E ≤ 1, the source is considered real if its contrast C_i ≥ η (Eq. (12)); larger values of the product C_i A C_i E raise the “barrier” higher. For $C_{i} < C_{i}^{\min}$ $\hbox{$C_{i} < C^{\,\rm min}_{i}$}$ the current source is flagged as spurious and its segmentation pixels are freed to be used by more prominent sources. If some or all of the touching peaks have contrasts above $C_{i}^{\min}$ $\hbox{$C^{\,\rm min}_{i}$}$ , they survive the test and their evolution is followed further to larger scales.

Fig. 12

Background interpolation scheme (Sect. 2.6). The central red pixels belong to the source, defining its footprint in this example, whereas the surrounding blue pixels are those of the background. At each pixel of the source, the background is linearly interpolated in the four main directions based on the values of the pixels just outside the footprint (highlighted in darker blue), the resulting four values per pixel being averaged. Such interpolation probes the actual background variations around the sources, and thus the interpolated background is more complex and realistic than just a planar surface.

Having detected all sources in the combined images ℐ_Dj C over all spatial scales of interest and collected the information in a detection catalog, getsources now returns to the actual observed images ℐ_λO in each waveband for measurements of the basic properties of the sources, such as their peak intensity (flux), total (integrated) flux, and size.

2.6. Measuring and cataloging properties of sources

All measurements are performed in the observed images ℐ_λO for known positions (x_i,y_i) of all sources, obtained in the detection process (Sect. 2.5) and not recomputed anymore¹⁶. The measurements must be done together with the background subtraction, deblending, and improvements of the footprints; these are non-trivial interconnected problems that require iterations (Fig. 11).

Fig. 13

Deblending overlapping sources (Sect. 2.6). Three identical sources (A, B, C) with the same sizes and with peak intensities normalized to unity are overlapping with their footprints. For clarity of the figure, the sources are not shown with their individual profiles, but rather with their blended intensity distribution ΣABC; with noise added, the profile transforms into ΣABCn. The deblending profiles A_M, B_M, and C_M (Eq. (14)) are defined at the source positions by the sizes and peak intensities estimated from the background-subtracted image ℐ_λO BS. At each pixel, the deblending algorithm splits its intensity I_ΣABCn between each of the overlapping sources, according to the fraction of the shape’s intensity (Eq. (15)). The resulting deblended profiles of each source, A_D, B_D, and C_D, are shown in the right panel.

Fig. 14

Deblending overlapping sources (Sect. 2.6). Background-subtracted overlapping sources at 350 μm (left) are separated into two individual sources, cataloged under the numbers 244 and 242 (middle, right). This blended pair is clearly visible half-way north from the field centers in Figs. 2, 3, 7, 9, 17, 18; the annulus around the source 244 is highlighted in Fig. 16. Comparison with the known true model parameters shows that the peak intensities measured for these deblended sources are in error by − 1.1% and − 5.1% and the total fluxes were calculated with errors of − 0.5% and − 12%, respectively. The color coding is a linear function of intensity in MJy/sr.

To properly measure parameters of a source, one has first to determine and subtract the background. As discussed in Sect. 1, we define sources as significant intensity peaks detected by the algorithm and whose entire contribution to the image is bound by their footprints. Based on this definition, getsources determines the background by linearly interpolating pixel intensities in ℐ_λO under the source footprints (Figs. 12, 15). The interpolation is done in the four main directions (two image axes and two diagonals), based on the pixels just outside the footprints, which do not belong to any source. For each pixel, values from the 4 directions are averaged to produce the background intensity at the pixel. This procedure results in the image of clean background ℐ_λO CB and the background-subtracted image ℐ_λO BS = ℐ_λO − ℐ_λO CB (Figs. 14, 17). In very crowded areas of images with many overlapping sources it is not possible to probe the background around each of the sources. In this case, the interpolation gives the best possible background estimate based on the nearest background pixels available around the blended areas¹⁷.

Fig. 15

Converged footprints of the measured sources (Sect. 2.6). The field of Fig. 2 is shown at 70, 100, 160, 250, 350, 500 μm (left to right, top to bottom) as the footprints of all detected sources after the measurement iterations. The pixel values are the standard deviations σ_iλ P, due to the local noise and background variations, estimated for each source in an elliptical annulus around its footprint (Fig. 16). Strongly elliptical or too large footprints may appear at those wavelengths where some sources are too faint to be measurable. For such sources, the information is essentially lost and the intensity moments cannot provide meaningful estimates of their sizes and orientation. The color coding is a function of the square root of intensity in MJy/sr.

Fig. 16

Annuli around measured sources (Sect. 2.6). The field of Fig. 2 is shown as the true intensities of the model sources at 350 μm convolved to a 17′′ resolution (left), the image of annuli $\hbox{$\mathcal{A}_{\lambda}$}$ of all detected sources (middle) slightly modified to highlight the annulus area around the source 244 from Fig. 14, and the product $\hbox{$\mathcal{A}_{\lambda}\,\mathcal{I}_{{\!\lambda}{\rm O}}$}$ (right) to visuaize the actual observed intensities used to compute the flux uncertainties σ_iλ P shown in Fig. 15. The corresponding footprint images ℱ_λ are presented in Fig. 15 and the observed image ℐ_λO is displayed in Fig. 18. The color coding in the left panel is a function of the square root of intensity, in the other panels it is a linear function of intensity in MJy/sr.

Having created the background-subtracted images ℐ_λO BS, the algorithm deblends values of pixels in overlapping sources and computes peak intensities F_iλ P, and total fluxes F_iλ T for each source i. At the first measurement iteration, it uses the initial size estimate S_{j_F} obtained during the detection (Sect. 2.5), whereas in the subsequent iterations, the size and orientation from a previous iteration are used. Our iterative algorithm employs deblending shapes, the two-dimensional analogs of the function (Moffat 1969) $I_{iλ M} = F_{iλ P} {(1 + (r / R_{0})^{2})}^{- ζ},$ $\begin{equation} I_{{i}{\lambda}{\,\rm M}} = F_{{i}{\lambda}{\,\rm P}}\,\left(1+(r/R_0)^2\right)^{-\zeta}, \label{moffat.function} \end{equation}$ (14)where r is the radial distance from the peak and R₀ is a function of the FWHM shape (A_iλ, B_iλ, Θ_iλ) of a source i. The power-law exponent is fixed at ζ = 10 to have stronger, more realistic wings compared to the exponential wings of a Gaussian (ζ → ∞)¹⁸. The deblending shapes (Eq. (14)) are used to split the intensity I_λO BS of a pixel between the source i and all overlapping sources according to a fraction of the profiles’ intensities at that pixel: $I_{i λ} = \frac{| I_{iλ M} |}{\sum_{k} | I_{kλ M} |} I_{λ O BS},$ $\begin{equation} I_{i{\,\lambda}}{\,=\,}\frac{|I_{{i}{\lambda}{\,\rm M}}|}{\sum\limits_{k} |I_{{k}{\lambda}{\,\rm M}}|}\,I_{{\!\lambda}{\rm O}{\,\rm BS}}, \label{deblending.formula} \end{equation}$ (15)where the summation is done over all sources whose footprints contain the pixel (Figs. 13, 14). The deblending iterations start with the original (blended) intensities I_λO BS at the positions of each source and perform the splitting of the pixel values until the intensity I_iλ(x_i,y_i) at the center of each source converges to its deblended peak intensity F_iλ P. The deblended intensities I_iλ within the footprint ellipses (Fig. 15) are then used to integrate the total fluxes F_iλ T, as well as their FWHM shapes (A_iλ, B_iλ, Θ_iλ) from the intensity moments (Appendix F).

Local uncertainties of the peak intensities F_iλ P are given by the standard deviations σ_iλ P estimated in the observed images ℐ_λO in an elliptical annulus defined around each source i just outside its footprint¹⁹. To ensure that the uncertainties are statistically meaningful, the images of annuli $\hbox{$\mathcal{A}_{\lambda}$}$ (Fig. 16) are constructed by requiring that the area of any annulus must contain 50 areas of the observational beam O_λ. In the crowded fields, such as the one used for the illustrations in this paper, the footprints and annuli may overlap quite heavily (Figs. 15, 16) and therefore not all pixels can be used, as many of them belong to other sources. In such cases, getsources obtains an estimate of σ_iλ P as local as possible by expanding the outer edge of the annulus outwards, until its usable area of non-zero pixels becomes 50 O_λ (Fig. 16, middle).

Fig. 17

Background-subtracted sources (Sect. 2.6). The field of Fig. 2 is shown as the true intensities of the model sources at 350 μm convolved to a 17′′ resolution (left), the background-subtracted image ℐ_λO BS at 350 μm (middle), and the composite 3-color RGB image (500, 250, 160 μm) created using the images of the deblending shapes of each extracted source (right). For the true model intensity distribution (no background) to be more comparable to the background-subtracted image, it is shown above 5 MJy/sr. The color coding is a function of the square root of intensity in MJy/sr.

Uncertainties σ_iλ T of the total fluxes F_iλ T are estimated under the following assumptions. If a source footprint contains N_B observational beams O_λ, then the error of the sum of intensities over the footprint area will be the square root of the quadratic sum of the individual errors, since the beam measurements are statistically independent. Assuming the individual errors to be identical and equal to σ_iλ P, the total flux uncertainty is $σ_{iλ T} = σ_{iλ P} \frac{(A_{i F λ} B_{i F λ})^{1 / 2}}{1.15 (2 O_{λ})},$ $\begin{equation} \sigma_{{i}{\lambda}{\,\rm T}}{\,=\,}\sigma_{{i}{\lambda}{\,\rm P}}\, \frac{(A_{{i}{\,\rm F}{\lambda}}\,B_{{i}{\,\rm F}{\lambda}})^{1/2}}{1.15\,(2\,O_{\lambda})},\label{total.flux.error} \end{equation}$ (16)where A_i Fλ and B_i Fλ are the major and minor axes of the footprint ellipse (cf. Eq. (20)), O_λ is assumed to be a circular Gaussian, and the empirical factor 1.15 has been introduced in Sect. 2.5.

With the peak intensities and their uncertainties estimated for each source, one can define the standard S/N ratio Ω_iλ = F_iλ P/σ_iλ P, to quantify how prominent the sources are against the noise and background fluctuations in their immediate environments. Conventional practice is to use the quantity for defining reliability criteria to avoid contamination of the extraction catalogs with spurious detections; at the same time, σ_iλ P determines the errors of the measured fluxes. However, in contrast to the traditional source extraction methods, getsources performs detection on highly-filtered images ℐ_Dj C that are quite different from the measurement images ℐ_λO and it is important to make a clear distinction between the detection and measurement qualities.

In the wavelength-dependent detection images ℐ_λDj C, we define the contrasts C_iλj = I_iλj/ϖ_λj, similar to those introduced in Sect. 2.5. At the footprinting scale j_F the contrast is, within a factor of n_λj (Sect. 2.3), the monochromatic detection significance $Ξ_{iλ} = \frac{I_{iλ j_{F}}}{σ_{λ j_{F}}} = n_{λj} C_{iλj} .$ $\begin{equation} \Xi_{{i}{\lambda}}{\,=\,}\frac{I_{{i}{\lambda}{j_{\rm F}}}}{\sigma_{{\lambda}{j_{\rm F}}}}{\,=\,}n_{{\lambda}{j}} \,C_{{i}{\lambda}{j}}.\label{siginificance}\vspace*{-2mm} \end{equation}$ (17)Although Ξ_iλ is the single-scale analog of Ω_iλ, the quantity σ_{λj_F} evaluates the level of uncertainties at the footprinting scale in the single-scale detection images, whereas σ_iλ P quantifies the level of fluctuations during measurements in the full observed images.

Fig. 18

Visualization of the measured and cataloged sources (Sect. 2.7). The field of Fig. 2 is shown at 70, 100, 160, 250, 350, 500 μm (left to right, top to bottom) with the extraction ellipses (FWHM) of the measurable sources (F_iλ P > σ_iλ P and F_iλ T > σ_iλ T) overlaid on top of the observed images ℐ_λO. The default condition, that a tentative source must be detected in at least two bands, was used. Only the protostellar cores are visible at 70 and 100 μm, whereas at 160 μm cold starless cores starts to appear, becoming clearly visible in the SPIRE bands. For better visibility, the values displayed in the panels are somewhat limited in range; the color coding is a function of the square root of intensity in MJy/sr.

Extraction of an isolated source situated in approximately uniform background and noise results in Ξ_iλ ≈ Ω_iλ. For sources in more complicated environments, however, Ξ_iλ gives a cleaner and more accurate estimate of the detection quality and a better criterion for selecting real sources. The measured value of Ω_iλ inversely depends on the fluctuation level of highly-variable backgrounds, such as those observed by Herschel in Galactic regions. The single-scale cleaning of ℐ_λDj (Sect. 2.3) filters out all irrelevant spatial scales and thus substantially reduces the fluctuations outside sources. Therefore, the significance of detections by getsources may well be considerably higher than the conventional Ω_iλ would indicate²⁰.

Since getsources is a multi-wavelength extraction method and the detection is generally performed in the combined images ℐ_Dj C, it would be useful to define another quantity to measure the significance of source detection more globally, across all wavebands. To obtain a meaningful quantity, one cannot use the combined images, because they contain renormalized and summed up monochromatic images (Eq. (6)). Assuming that we have a set of independent images ℐ_λD, it makes sense to define the global significance as $Ξ_{i} = {(\sum_{λ} Ξ_{iλ}^{2})}^{1 / 2} .$ $\begin{equation} \Xi_{i}{\,=\,}\left(\sum_{\lambda}\,\Xi_{{i}{\lambda}}^2\right)^{1/2}. \label{global.significance} \end{equation}$ (18)We consider two levels of the robustness of source detection: the reliable level Ξ_rel and tentative level Ξ_ten (default values of 7 and 5, respectively). Reliable sources, i.e. those with Ξ_iλ ≥ Ξ_rel in at least one waveband used for detection, are cataloged without checking whether they are detected at any other wavelength. Tentative sources, i.e. those with Ξ_iλ < Ξ_rel in all wavebands used for detection, are kept in the final catalog only if $Ξ_{iλ} \geq Ξ_{ten} n_{\det}^{- 1 / 2}$ $\hbox{$\Xi_{{i}{\lambda}}{\,\ge\,}\Xi_{\rm ten}\,n_{\rm det}^{-1/2}$}$ in at least n_det wavebands (n_det defaults to 1 and 2 for the monochromatic and multi-wavelength extractions, respectively). For any cataloged source, Ξ_i ≥ Ξ_ten, and for reliable sources, Ξ_i ≥ Ξ_rel.

Having measured the properties of all detected sources at each wavelength, getsources completely removes those that are likely to be spurious. The algorithm treats removal of such sources with great care, dividing the measurement iterations into three phases. During the first phase, only non-detections are removed from the catalog, i.e. those sources that do not fulfill the above requirement of the simultaneous detection in at least n_det wavebands, all other detected sources are given a chance to converge. During the second phase, the algorithm removes sources with extremely low goodness (G_i < 0.01) which we define as $G_{i} = \begin{matrix} \min \end{matrix} {\frac{1}{2} (\frac{R_{i}}{Ξ_{rel}} - 1), 1},$ $\begin{equation} G_{i}{\,=\,}\min\left\{\frac{1}{2}\left(\frac{R_{i}}{\Xi_{\rm rel}}-1\right),1\right\}, \label{goodness} \end{equation}$ (19)where $R_{i} = Ξ_{i} Ω_{i} C_{i E j_{F}}^{-1}$ $\hbox{$R_{i}{\,=\,}\Xi_{i}\,\Omega_{i}\,C^{-1}_{{i\,\rm E}{j_{\rm F}}}$}$ is the source reliability, Ω_i is the global signal-to-noise ratio defined analogously to the global significance Ξ_i (cf. Eq. (18)), and C_{i Ej_F} is the elongation factor (Eq. (13)) at the footprinting scale j_F. Finally, during the third phase, sources are also removed when their position is too close to another source (within max { 1.5,(O_min/Δ)/3 } pixels, where O_min = min { O_λ } ) of almost the same size (within a factor of 2) and they have a lower value of Ξ_i Ω_i than the other source.

Fig. 19

Flattening of the detection images (Sect. 2.8). The entire 1° × 1° simulated star-forming region at 350 μm is shown (upper left), the central area of which was used to illustrate getsources in this paper. The image $ℱ_{λ}^{*}$ $\hbox{$\mathcal{F}^{*}_{\!\lambda}$}$ of the converged footprints ℱ_λ somewhat expanded by convolution (upper middle) is used to interpolate all detected sources off the image to obtain the clean background image (upper right). The image $\hbox{$\mathcal{D}_{\!\lambda}$}$ of the standard deviations (lower left) is computed in a very small (3 × 3 pixels) sliding window and further median-filtered using a 21 × 21 pixels sliding window to reduce effects of possible artifacts. The smoothed scaling image ℐ_λF (lower middle) is obtained by convolving the image in the previous panel with a circular Gaussian beam $\hbox{$\mathcal{G}_{{\!\lambda}{\it A}}$}$ larger than the maximum size of sources extracted at 350 μm in the initial extraction. The resulting flattened image ℐ_λDF (lower right) is obtained by dividing the original image ℐ_λD by the scaling image ℐ_λF. For better visibility, the values displayed in the panels are somewhat limited in range; the color coding is a linear function of intensity in MJy/sr.

Since getsources is a multi-wavelength extraction method, globally good sources may well be detectable ( $Ξ_{iλ} \geq Ξ_{ten} n_{\det}^{- 1 / 2}$ $\hbox{$\Xi_{{i}{\lambda}}{\,\ge\,}\Xi_{\rm ten}\,n_{\rm det}^{-1/2}$}$ ) or measurable (Ω_iλ ≥ 1) in only a limited number of wavebands. Indeed, it is quite usual that sources are either insignificant or non-measurable at some wavelengths and this generally leads to their footprints being very unreliable. In order to produce most accurate measurements, the footprints of such sources are removed from the corresponding footprint images ℱ_λ in that waveband. This becomes very important in crowded regions with many overlapping sources. Since the quality of both the background subtraction and deblending depends directly on the accuracy of the accumulated footprints, ℱ_λ must always remain as clean as possible, free of the footprints of insignificant or non-measurable sources. This wavelength-dependent removal is done starting from the second phase; however, the measurements of such sources are still kept in the catalog from the first phase of iterations.

After the removal of bad sources, getsources analyzes the spatial distribution of all sources at each wavelength and flags them to provide some useful information on global properties of a source. The single-digit flag f_i is assigned to sources according to the following definitions. A source is called isolated (f_i = 0) if it is not blended with any other source at any wavelength. Two sources are called blended (f_i = 1) if the intersection area of their footprints is greater than 20% of either of the footprints in at least one waveband. A source is called sub-structured (f_i = 2) if a footprint of at least one smaller source is fully contained within the inner 56% of the footprint area (or within 0.75 A_i Fλ and 0.75 B_i Fλ). A source is called sub-structuring (f_i = 3) if it causes another source to be flagged as sub-structured. In addition to the global flag f_i, we also define the monochromatic flag f_iλ that carries information on the wavelength-dependent properties for each source. Among other details, the flag identifies sources that are insignificant or non-measurable in a waveband.

Further, getsources updates the extraction catalog, where each line contains the source number, coordinates, world coordinates (using the xy2sky utility, Mink 2002), global significance, flag, and goodness, followed by the measured properties at each wavelength: $\begin{matrix} i x_{i} y_{i} α_{i} δ_{i} Ξ_{i} f_{i} G_{i} {(Ξ_{i λ} f_{i λ} F_{iλ P} σ_{iλ P} F_{iλ T} σ_{iλ T} A_{iλ} B_{iλ} Θ_{iλ})}_{N} . \end{matrix}$ $\begin{eqnarray*} i\: x_{i}\: y_{i}\: \alpha_{i}\: \delta_{i}\: \Xi_{\,i}\: f_{i}\: G_{i}\: \left(\, \Xi_{{\,i}{\lambda}}\: f_{{\,i}{\lambda}}\: F_{{i}{\lambda}{\,\rm P}}\: \sigma_{{i}{\lambda}{\,\rm P}}\: F_{{i}{\lambda}{\,\rm T}}\: \sigma_{{i}{\lambda}{\,\rm T}}\: A_{{i}{\lambda}}\: B_{{i}{\lambda}}\: \Theta_{{i}{\lambda}}\, \right)_{N}. \end{eqnarray*}$ The standard signal-to-noise ratio Ω_iλ is not cataloged, because it can easily be derived from the catalog entries. The last processing block of each measurement iteration updates the accumulated footprints ℱ_λ of all sources, based on the latest values of the source sizes and position angles. Full sizes and orientation of the footprint ellipses are computed from $A_{i F λ} = 2.3 \begin{matrix} \max \end{matrix} {O_{λ}, S_{j_{F} λ}}, B_{i F λ} = \frac{B_{iλ}}{A_{iλ}} A_{i F λ}, Θ_{i F λ} = Θ_{iλ} .$ $\begin{equation} A_{{i}{\,\rm F}{\lambda}}{\,=\,}2.3\max\,\left\{O_{\lambda}, S_{\!j_{\rm F}{\lambda}}\right\}, \,\,\, B_{{i}{\,\rm F}{\lambda}}{\,=\,}\frac{B_{{i}{\lambda}}}{A_{{i}{\lambda}}}\,A_{{i}{\,\rm F}{\lambda}}, \,\,\, \Theta_{{i}{\,\rm F}{\lambda}}{\,=\,}\Theta_{{i}{\lambda}}. \label{footprints} \end{equation}$ (20)Having obtained the improved footprints, getsources checks whether the total number of non-zero pixels in ℱ_λ has noticeably changed with respect to the previous iteration. If so, the measurement iterations continue until the total area of all footprints changes by less than 0.1%. Convergence properties of the measurement iterations vary for different fields; somewhere between 5 and 30 iterations may be required to obtain a stable pattern of all footprints for each waveband (Fig. 15) and to produce the final catalog. In the last iteration, getsources creates an additional catalog of colors from all possible combinations of total fluxes F_iλ T over all λ, a catalog of peak and background intensities, as well as three versions of the azimuthally-averaged intensity profiles (ℐ_λO, ℐ_λO BS, ℐ_{λO BSD}) computed from the original, background-subtracted, and deblended images, respectively.

2.7. Visualizing extractions

In order to facilitate visual analysis of the extraction results, getsources produces a number of useful images for each waveband. These include (but are not limited to) the background-subtracted images ℐ_λD BS, ℐ_λO BS (Fig. 17), observed images ℐ_λO with the extraction ellipses overlaid on top (Fig. 18), and detection images ℐ_λD (Fig. 19). Other images show just the central positions of the detected sources on top of the images ℐ_λD BS and ℐ_λO BS; these are useful for visualizing crowded regions, where very little can actually be seen under the ellipses. Furthermore, the source images ℐ_{iλO BSD} display the background-subtracted, deblended intensity distribution for each individual source at each wavelength (Fig. 14).

For an easy visualization of the various steps of the algorithm, getsources produces also a number of useful additional images. These include (but not limited to) the interpolated backgrounds ℐ_λD CB, ℐ_λO CB, the images of σ_iλ P (Fig. 15), the annuli $\hbox{$\mathcal{A}_{\lambda}$}$ (Fig. 16), the deblending shapes ℐ_λM (Fig. 17). The clean single-scale images ℐ_λDj C (Fig. 3), ℐ_Dj C (Fig. 7), the accumulated footprints ℱ_D (Fig. 9), and segmentation images ℐ_Dj S (Fig. 9) also remain available for an in-depth analysis and understanding of the extraction process and results.

2.8. Flattening background and noise

Despite the fine spatial filtering employed by getsources, there is one common problem that still needs special treatment. It is known that the Herschel images of Galactic regions often show highly-variable backgrounds. The standard deviations of the combined background and noise fluctuations outside sources may differ by orders of magnitude between various areas of a large image ℐ_λD. Any simple thresholding method would have a difficulty in handling such images, as the global thresholds would not be good for all areas. Although the single-scale decomposition used in our method makes the images ℐ_λDj much “flatter” and more suitable for use with global thresholding, it cannot overcome the problem completely, as the backgrounds fluctuate on all spatial scales, from the smallest to the largest ones. If the background or noise fluctuations strongly vary between some areas of the image ℐ_λD, they will still influence the intensity distributions of the single-scale images ℐ_λDj, although to a much lesser degree.

Our method adopts a special two-step approach (Fig. 20) to overcome the problem; essentially, two complete extractions are performed instead of a single one. The initial deeper extraction²¹ aims at finding all possible candidate sources, in order to remove them and create the cleanest background images, free of any sources. The background images are then used to create the standard-deviation images $\hbox{$\mathcal{D}_{\!\lambda}$}$ and convolve them with a Gaussian beam $\hbox{$\mathcal{G_{{\lambda}{\it A}}}$}$ , producing the scaling (flattening) images ℐ_λF (see below for details). Dividing the detection images by the scaling images, we obtain the detection images ℐ_λDF that are flat, in the sense that the standard deviations in their background areas (outside of the sources) are approximately the same. The flattening procedure can be expressed as $ℐ_{λ DF} = ℐ_{λ D} ℐ_{λ F}^{-1} = ℐ_{λ D} {(𝒢_{λ A} * 𝒟_{λ})}^{-1} .$ $\begin{equation} \mathcal{I}_{{\!\lambda}{\rm D}{\rm F}} = \mathcal{I}_{{\!\lambda}{\rm D}}\,\mathcal{I}^{-1}_{{\!\lambda}{\rm F}} = \mathcal{I}_{{\!\lambda}{\rm D}} \left(\mathcal{G_{{\lambda}{\it A}}} * \mathcal{D}_{\!\lambda}\right)^{-1}. \label{flattening} \end{equation}$ (21)The second and final extraction is performed the same way and with the same parameters as the initial extraction²². The only difference is that the detection images ℐ_λD are replaced with their flattened versions ℐ_λDF and that, instead of the initial guesses for the maximum sizes of the sources, the actual maximum sizes $A_{λ}^{\max}$ $\hbox{$A^{\rm max}_{\lambda}$}$ derived in the initial extraction are used. In both extractions, the measurements are performed on the same observed images ℐ_λO.

The flattening procedure is illustrated in Fig. 19. For reference, the upper-left panel shows the original detection image ℐ_λD at 350 μm of the simulated star-forming region used in this paper for illustrations of the method; the entire 1° × 1° is shown here in order to clearly see the flattening effect. The simulated background in this field has a temperature gradient along one diagonal of the image, clearly visible in the images. The dust temperature T_d linearly varies from 20 K in the upper-left corner down to 15 K in the lower-right corner, where the background appears much brighter at 350 μm. The footprint image $ℱ_{λ}^{*}$ $\hbox{$\mathcal{F}^{*}_{\!\lambda}$}$ shown in the upper-middle panel is the image ℱ_λ (Fig. 15) somewhat expanded by convolution, to make sure that the resulting clean background has no residual artifacts that would influence the quality of the final flattening image. The image $ℱ_{λ}^{*}$ $\hbox{$\mathcal{F}^{*}_{\!\lambda}$}$ is used to remove all sources from the detection image with our background interpolation scheme (Sect. 2.6).

The interpolated background in Fig. 19 is smooth and clean, except for a couple of artifacts at its lower edge. The images that are bright and variable at their edges may lead to such edge artifacts, because getsources uses convolution and interpolation. Although the images are sufficiently expanded (extrapolated) by the algorithm before convolution, they remain essentially unknown beyond their edges. This type of artifacts never happens with the entire images from real-world observations that produce very low intensities at their edges, due to the baseline subtraction. Such problems may only exist in simulated images or in sub-fields that have been cut out of full larger images²³.

From the background image getsources creates the image of standard deviations $\hbox{$\mathcal{D}_{\!\lambda}$}$ , computed in a very small (3 × 3 pixels) sliding window. The aim here is not to produce statistically-meaningful values, but to ensure that the features of $\hbox{$\mathcal{D}_{\!\lambda}$}$ remain as local as possible; much larger windows would smooth out the values, which may not be suitable for the original highest-resolution images. The image $\hbox{$\mathcal{D}_{\!\lambda}$}$ is further median-filtered using a 21 × 21 pixels sliding window to reduce the effects of possible residuals or artifacts. The same image $ℱ_{λ}^{*}$ $\hbox{$\mathcal{F}^{*}_{\!\lambda}$}$ is used again to interpolate the median-filtered pixel values under the footprints, resulting in the image shown in the lower-left panel of Fig. 19. The scaling (flattening) image ℐ_λF is obtained by convolving the filtered $\hbox{$\mathcal{D}_{\!\lambda}$}$ image with a circular Gaussian beam $\hbox{$\mathcal{G_{{\lambda}{\it A}}}$}$ of a size 3 times the maximum size of sources $A_{λ}^{\max}$ $\hbox{$A^{\rm max}_{\lambda}$}$ measured in the initial extraction; the smoothing ensures that the flattening does not distort the intensity distribution of even the largest sources. The scaling image ℐ_λF resembles fairly well the original temperature gradient that was introduced in the simulated images along their diagonal. The resulting flattened image ℐ_λDF is the original image ℐ_λD divided by ℐ_λF. The large-scale background variations clearly visible in the original detection image ℐ_λD have been mostly removed from the flattened image ℐ_λDF, making the latter suitable for the global single-scale thresholding applied by getsources in the final extraction.

Fig. 20

Flattening of the detection images (Sect. 2.8). The getsources algorithm requires two complete extractions, the initial and the final extractions (red blocks, expanded in Fig. 1; the preparation steps are shown in blue).

3. Applications to Herschel images

In Sect. 2, our multi-wavelength source extraction method was described and illustrated using the images of a simulated star-forming region. The simulation is one of our suite of benchmarks designed to aid in the development of getsources and in accurate tests of its performance in various conditions against the model images with fully known properties of the sources, background, and noise (the benchmarks will be described elsewhere; Men’shchikov et al., in prep.). In addition to the purely synthetic skies, getsources has been successfully tested on the ground-based millimeter continuum survey of the Aquila Rift complex (Maury et al. 2011), where all extracted sources have been carefully verified by eye inspection and their parameters evaluated manually.

We present here two real-life examples of getsources extractions for our Gould Belt and HOBYS surveys of the nearby and more distant star-forming regions with Herschel. For this purpose, we defined sub-fields of the observed images of Aquila and Rosette²⁴, small enough to enable readers to verify the extraction results with their own eyes. We emphasize that the only goal of this presentation is to help the readers clarify various aspects of this new source extraction method; any astrophysical analysis of the fields is beyond the scope of this paper.

3.1. A cluster of resolved prestellar cores in Aquila

The observations, data reduction, and first results for the Aquila star-forming region (part of the Herschel Gould Belt survey, adopted distance D = 260 pc) have been described by André et al. (2010); Men’shchikov et al. (2010); Könyves et al. (2010); Bontemps et al. (2010). The Aquila sub-field (365 × 365 3 $\overset{″}{.}$ $\hbox{${\farcs}$}$ 0 pixels), shows a cluster of cold prestellar cores, clearly visible in all SPIRE wavebands south-east of the bright W40 region in the Aquila field. The “cold cluster” was chosen to illustrate the performance of our new method for studying the earliest phases of low-mass star formation in the nearby regions (Figs. 21, 22).

The lower-right panel of Fig. 21 shows a 3-color composite image of the extracted sources in the Aquila sub-field, represented by their elliptical deblending shapes ℐ_λM (Eq. (14)) with the measured peak intensities, sizes, and orientations. The image gives an overview of the source properties combining the information from the 500, 250, and 160 μm bands in a single image. With just a few exceptions, all sources are red, yellow-red, and green, indicating that the radiation is emitted by cold starless objects, without any significant central energy source. Only 8 protostars (blue, white-red, and white-pink sources) are visible in the composite and PACS images, while as many as 39 cold prestellar cores become detectable and measurable in the SPIRE wavebands, as can be seen in the lower panels of Fig. 21.

Fig. 21

The Aquila sub-field (18′ × 18′) is shown as the observed images ℐ_λO at 70, 160, 250, 350, 500 μm (left to right, top to bottom) with the extraction ellipses (FWHM) of only measurable sources (F_iλ P > σ_iλ P and F_iλ T > σ_iλ T) overlaid, as well as the composite 3-color RGB image (500, 250, 160 μm) created using the images ℐ_λM of the deblending shapes of each extracted source (lower-right). The default condition, that a tentative source must be detected in at least two bands, was used. Only the protostars are visible at 70 μm, whereas at 160 μm one starless core appears, the other cold sources becoming clearly visible in the SPIRE bands. For better visibility, the values displayed in the panels are somewhat limited in range; the color coding in the lower-right panel is linear, in all other panels it is a function of the square root of intensity in MJy/sr.

Fig. 22

The Aquila sub-field is shown as a column density image with a 37′′ resolution (left, Könyves et al. 2010), an accumulated clean combined detection image containing spatial scales of up to 80′′ (middle), obtained by summing up the single scales ℐ_Dj C in that range, with the 500 μm ellipses of all detected sources overplotted. Also shown is the clean background ℐ_λO CB (right) at 500 μm. For better visibility, the values displayed in the panels are somewhat limited in range; the color coding in all panels is a function of the square root of column density and of intensity in MJy/sr.

The dense cold prestellar cores are clearly situated within a complex web of filamentary structures at significant intensity peaks; due to their nature, the cores must also coincide with significant column density peaks. The left panel of Fig. 22 displays a column density image of the cold cluster, demonstrating that most extracted sources are indeed centered on the column density peaks. A couple of compact sources appear to be offset from the column density peaks and intensity peaks at SPIRE wavelengths. These are the protostars prominent in the PACS bands that happened to either coincide (in projection) with those locations or form off-center in the dense clumps. We show in Fig. 22 the ellipses of all 46 detected sources; there are 7 additional sources that are not measurable at some wavelengths and therefore not visualized by ellipses in Fig. 21. The middle panel of Fig. 22 shows the same set of extraction ellipses on top of the combined detection image (at scales S_j ≤ 80′′) that clarifies why getsources found the sources at all those locations. In addition, the right panel shows the clean background image ℐ_λO CB at 500 μm that demonstrates that no significant sources in the Aquila sub-field were left undetected by getsources.

These results (visualized in Figs. 21, 22) demonstrate that getsources handles very well the multi-wavelength Herschel observations of resolved starless cores, the main ingredient of the Aquila sub-field. Although all unresolved protostars were also extracted, it is the next example that focuses on protostellar population.

3.2. Clustered unresolved protostars in Rosette

The observations, data reduction, and first results for the Rosette star-forming region (part of the HOBYS survey, adopted distance D = 1.6 kpc) have been described by Motte et al. (2010); Schneider et al. (2010); Hennemann et al. (2010); di Francesco et al. (2010). The Rosette sub-field (395 × 395 1 $\overset{″}{.}$ $\hbox{${\farcs}$}$ 4 pixels), shows a central part of the Rosette field with an extended bright area at 70 μm and a number of unresolved isolated and clustered protostars in the PACS wavebands. This sub-field of Rosette was chosen to illustrate the performance of getsources for studying faint unresolved protostars in distant star-forming regions (Figs. 23, 24).

Similarly to Fig. 21, the lower-right panel of Fig. 23 shows a 3-color composite image of the extracted sources in the Rosette sub-field, represented by their elliptical deblending shapes with the measured peak intensities, sizes, and orientations. Note, however, that the color image is strongly affected by the difference in spatial resolutions in the wavebands (by factors of ~3 and 2) than the color image shown in Fig. 21, as most of the sources in the distant Rosette sub-field are unresolved even at 70 μm. Most of the sources remain in all of the Herschel images displayed in the other panels of Fig. 23; they are deblended and remain measurable all the way to the lowest resolution of the 500 μm band. Figures 23, 24 also highlight severe problems encountered by the usual “monochromatic” algorithms when they extract sources independently at individual wavelengths and then associate sources based on their positions in the images with such greatly varying resolutions.

Fig. 23

The Rosette sub-field (9′ × 9′) is shown as as the observed images ℐ_λO at 70, 160, 250, 350, 500 μm (left to right, top to bottom) with the extraction ellipses (FWHM) of only measurable sources (F_iλ P > σ_iλ P and F_iλ T > σ_iλ T) overlaid, as well as the composite 3-color RGB image (500, 250, 70 μm) created using the images ℐ_λM of the deblending shapes of each extracted source (lower-right). The default condition, that a tentatve source must be detected in at least two bands, was used. Most of the compact sources visible at 70 μm are unresolved protostars; several groups of them, discussed in Sect. 3.2, are labeled A–F. For better visibility, the values displayed in the panels are somewhat limited in range; the color coding in the lower-right panel is linear, in the other panels it is a function of the square root of intensity in MJy/sr.

Fig. 24

The Rosette sub-field is shown as the accumulated clean combined detection image containing spatial scales of up to 12′′ (left), obtained by summing up the single scales ℐ_Dj C in that range, with the 160 μm ellipses of all detected sources overplotted. Also shown are the background-subtracted image ℐ_λO BS (middle) and clean background ℐ_λO CB (right) at 70 μm; when added together, they make up the original 70 μm image in Fig. 23. For better visibility, the values displayed in the panels are somewhat limited in range; the color coding is a function of the square root of intensity in MJy/sr.

The redder (colder) sources are mostly found in the lower half of the image, whereas the bluer sources are scattered over the Rosette sub-field. Here we will focus on the compact unresolved sources visible quite clearly in the 70 μm waveband in relatively low-background conditions, which can be used to judge whether getsources is able to separate faint peaks that are close to each other. One can count 35 such sources that are mostly clustered in groups of two, three, or four; we will limit our discussion to the six groups labeled A–F in Fig. 23. Aiming to clarify why getsources found the sources at those locations, the left panel of Fig. 24 shows the extraction ellipses of all sources on top of the combined detection image (at scales S_j ≤ 12′′). The middle panel of Fig. 24 displays the background-subtracted image giving the cleanest view of the sources and its right panel shows the clean background image ℐ_λO CB at 70 μm demonstrating, that no significant sources in the Rosette sub-field were left undetected by getsources.

Group A consists of 4 protostars of different brightness but very similar separations between the neighbors, clearly distinct at 70 μm and extracted by getsources across all wavelengths. Group B presents a more difficult case of 4 sources with decreasing brightess and distance between the members; the faintest protostar on top of group B is almost merged with its neighbor, but still has been detected and measured in 4 bands. A similar case is displayed by group C, where one faint source is situated between two other brighter sources; the faintest source is measurable only in the highest-resolution image at 70 μm. In group D we have another similar cluster of 4 sources with an extremely faint source surrounded by the brighter ones; most members of group D are measurable across all wavelengths. Group E is correctly extracted as two close companions, with only the brighter source being measurable in all wavebands. Group F is two relatively bright protostars at a very small separation. Although they are practically merged together, with a saddle point just a few percent below the peaks, the binary is detected as such and both components are measurable up to 250 μm.

The results presented in Figs. 23, 24 show that getsources handles very well the multi-wavelength Herschel observations of unresolved protostars, the main ingredient of the sub-field. It fully preserves the highest-resolution information from the 70 μm waveband and uses it to correctly identify and measure close companions in all groups of protostars at all wavelengths.

4. Conclusions

The multi-scale, multi-wavelength source extraction method getsources presented in this paper, was designed primarily for use in large far-infrared and submillimeter surveys of star-forming regions with Herschel. Instead of following the traditional approaches of extracting sources directly in the observed images (Sect. 1.1), the method analyzes highly-filtered decompositions of original images over a wide range of spatial scales (Sect. 2.2). The algorithm separates the peaks of real sources from those produced by the noise and background fluctuations (Sect. 2.3) and constructs wavelength-independent sets of combined single-scale detection images (Sect. 2.4) preserving spatial information from all wavebands. Sources are detected in the combined detection images by tracking the evolution of their segmentation masks across all spatial scales (Sect. 2.5). Source properties are measured in the original background-subtracted and deblended images at each wavelength in iterations (Sect. 2.6). Additional catalogs and images are produced to aid in the analysis of the extraction results (Sect. 2.7), complementing the main catalog of sources. Based on the results of the initial extraction, detection images are flattened to produce much more uniform noise and background fluctuations in preparation for the second, final extraction (Sect. 2.8). The performance of the new method on Herschel images was illustrated by extracting sources in small sub-fields of the Aquila and Rosette regions (Sect. 3).

There are several significant advantages of getsources over other existing methods of source extraction. (1) The fine spatial decomposition filters out irrelevant spatial scales and improves detectability, especially in the crowded regions and for extended sources. (2) The multi-wavelength design enables combining data over all wavebands, eliminating the need to match independent extraction catalogs and enabling substantial super-resolution in the images with lower spatial resolution. (3) The single-scale detection algorithm identifies sources and determines their characteristic sizes, avoiding spurious peaks on top of large structures and filaments. (4) The background subtraction and deblending, based on the wavelength-dependent footprint of each source, disentangle crowded regions with overlapping sources. (5) The extraction process is fully automatic and there are no free parameters involved: the default configuration works best in all cases that have been tested.

A disadvantage of the algorithm is that it may not be very fast and it may require considerable storage space, depending on the numbers of pixels, spatial scales, wavelengths, iterations, and potential sources detected (Appendix G); most of the space can be freed, however, after the extraction has been completed.

The method has been thoroughly tested using many simulated benchmark images and real-life observations. In particular, the overall benchmarking results (Men’shchikov et al., in prep.) have shown that getsources comes on top of the other source extraction methods that we have tested (Sect. 1.1) in both the completeness and reliability of source detection and the accuracy of measurements. The source extraction code is automated, very flexible, and easy-to-use; the code and validation images with a reference extraction catalog are freely available (see Appendix G).

¹

For a very brief summary, see Men’shchikov et al. (2010).

²

CUPID is a source extraction software package developed by the STARLINK team for use with the SCUBA2 surveys; it is a general wrapper to which additional methods can be added. See documentation: http://docs.jach.hawaii.edu/star/sun255.htx/sun255.html

³

The backgrounds we are dealing with are known to be structured at all scales, spatially fluctuating in an unknown way and thus creating the difficult problem of background removal. If the real backgrounds were just smooth large-scale structures, one would be able to approximate and subtract or filter them quite well.

⁴

Different extraction algorithms produce varying numbers of sources and estimates of their properties for the same set of images. The quality of the extraction methods can be assessed by measuring how well they are able to reconstruct known properties of model sources in simulated images.

⁵

Real physical properties of objects (e.g., their size) may be quite different from the measured apparent properties of the extracted sources at different wavelengths, whose accuracy, in turn, critically depends on the quality of the extraction algorithm.

⁶

In practice, the most accurate approach to alignment is to use images containing only small scales (see Sect. 2.2), up to ~twice the resolution in each waveband, as they show misalignments most clearly. One should carefully choose which peaks to align, as the appearance of sources may be affected by radiative transfer effects or by fluctuating backgrounds or by the close proximity to other sources.

⁷

The wavelength-dependent maximum sizes of sources are the only user-definable parameters in getsources. The actual maximum sizes depend on the observed images and the specific interest of a researcher. Before extracting sources, one has to obtain reasonable guesses of the maximum source sizes from the images and specify them in the configuration file (the parameter $A_{λ}^{\max}$ $\hbox{$A^{\rm max}_{\lambda}$}$ defaults to 6 O_λ).

⁸

Usually best results are obtained with N_S and f_S close to their limiting values. For f_S = 2, the decomposition of Eq. (1) is identical to that produced by the multi-resolution code mr_transform (Starck & Murtagh 2006) with its default linear wavelet transform (“à trous” algorithm).

⁹

Except the spatial decomposition step, where convolutions are done using a fast Fourier transform algorithm, our method has been designed to operate in the image space, which is a natural and intuitive way of source extraction.

¹⁰

Note that getsources has no fundamental limitation on the spatial scales or source sizes except they must be smaller than the image size.

¹¹

If one is interested primarily in extracting very large structures, one could first extract all smaller sources, subtract them from the original image, and then run the extraction again, targeted specifically at those structures.

¹²

A procedure similar to what is usually called “sigma clipping”.

¹³

We have applied getsources to the multi-wavelength images of a dozen of simulated fields, the ground-based (sub-)mm images of NGC 2068, NGC 2071, NGC 2264, W 43, and the Herschel images of Aquila, Cepheus, Cygnus X, IC 5146, Lupus, M 16, NGC 4559, NGC 7538, Orion B, Perseus, Pipe, Polaris, RCW 79, RCW 82, RCW 120, Rosette, Taurus, Vela, W 3, W 48.

¹⁴

Available at http://portal.acm.org/citation.cfm?id=800249.807456

¹⁵

Identification of distinct connected regions in similar algorithms is also known as connected-component labeling.

¹⁶

Positions were derived using filtered detection images and recomputing them from the observed images contaminated by irrelevant spatial scales would not make them more accurate.

¹⁷

This simple method works well (as long as one determines accurate footprints) and it is sufficient, as the background under sources is fundamentally unknown. Estimates of the background based on more complicated approaches, such as its approximation by some two-dimensional functions, always involve assumptions and free parameters, and our simulations show that they may well be less accurate than the simple linear interpolation.

¹⁸

As an example, the intensity of a circular I_iλ M at the footprint edge is by a factor of 1.565 higher than that of the corresponding Gaussian.

¹⁹

This is equivalent to the standard approach of measuring flux errors for an isolated source. Heavily crowded fields present, however, a serious problem, as no source-free annulus exist around many of the sources situated within the regions. No relevant local values of the uncertainties can be found in that case, as more distant source-free areas of images are likely to have different properties in case of highly-variable background or noise.

²⁰

Simulations show that Ξ_iλ gives a notably more accurate representation of the true model S/N and with a much lower dispersion than the conventional estimates do using the full images containing all spatial scales.

²¹

The depth is automatically adjusted by lowering 4 configuration parameters to the following values: Ξ_rel = 4, Ξ_ten = 3, n_det = 1, η = 1.1.

²²

The three parameters automatically lowered in the initial extraction are now set to their default values: Ξ_rel = 7, Ξ_ten = 5, n_det = 2, η = 1.35.

²³

A natural remedy is to define the sub-fields that are larger than the area one is interested in studying and make sure that the intensities at the edges of the extraction area are relatively low.

²⁴

The respective sub-fields are similar to the areas displayed in Fig. 5 of Könyves et al. (2010) and in Fig. 1 of Hennemann et al. (2010).

²⁵

Full description of this and other synthetic skies will be given in another paper (Men’shchikov et al., in prep.) devoted to benchmarking of source extraction algorithms.

²⁶

To optimize the processing times, it is a good idea to test available disk storage for speed. The script iospeed enables comparisons of available hard drives by reading and writing a FITS image multiple times. One can perform tests of the local and network disks, as well as of the virtual disks.

²⁷

http://gouldbelt-herschel.cea.fr/getsources, http://gouldbelt-herschel.cea.fr, http://hobys-herschel.cea.fr

Acknowledgments

The authors employed SAOImage DS9 (by William Joye) and WCSTools (by Douglas Mink) developed at the Smithsonian Astrophysical Observatory (USA), the CFITSIO library (by William D Pence) developed at HEASARC NASA (USA), the STILTS library (by Mark Taylor) developed at Bristol University (UK), the PSPLOT library (by Kevin E. Kohler) at Nova Southeastern University Oceanographic Center (USA), and SWarp (by Emmanuel Bertin) developed at Institut d’Astrophysique de Paris (France). We appreciate the feedback received from Guillaume LeLeu, Mika Juvela, Isabelle Ristorcelli, Anaëlle Maury, Quang Nguyen Luong, Doris Arzoumanian, Sacha Hony, Glenn White, Michael Reid, Alana Rivera-Ingraham, Arabindo Roy, Elaine Winston, Nick Cox, and Jason Kirk that helped improve the code and its usability. We are also grateful for useful comments on a draft of the manuscript received from Nicolas Peretto, Tracey Hill, Vera Könyves, Pedro Palmeirim, and the anonymous referee, that helped improve the clarity of this paper.

References

Alves, J. F., Lada, C. J., & Lada, E. A. 2001, Nature, 409, 159 [NASA ADS] [CrossRef] [PubMed] [Google Scholar]
André, P., Ward-Thompson, D., & Barsony, M. 2000, Protostars and Planets IV, 59 [Google Scholar]
André, P., Men’shchikov, A., Bontemps, S., et al. 2010, A&A, 518, L102 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Arzoumanian, D., André, P., Didelon, P., et al. 2011, A&A, 529, L6 [Google Scholar]
Bertin, E., & Arnouts, S. 1996, A&AS, 117, 393 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Bertin, E., Mellier, Y., Radovich, M., et al. 2002, in Astronomical Data Analysis Software and Systems XI, ed. D. A. Bohlender, D. Durand, & T. H. Handley, ASP Conf. Ser., 281, 228 [Google Scholar]
Bontemps, S., André, P., Könyves, V., et al. 2010, A&A, 518, L85 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
di Francesco, J., Sadavoy, S., Motte, F., et al. 2010, A&A, 518, L91 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Elmegreen, B. G., & Falgarone, E. 1996, ApJ, 471, 816 [NASA ADS] [CrossRef] [Google Scholar]
Falgarone, E., Phillips, T. G., & Walker, C. K. 1991, ApJ, 378, 186 [NASA ADS] [CrossRef] [Google Scholar]
Gong, H., & Ostriker, E. C. 2011, ApJ, 729, 120 [NASA ADS] [CrossRef] [Google Scholar]
Goodman, A. A., Barranco, J. A., Wilner, D. J., & Heyer, M. H. 1998, ApJ, 504, 223 [NASA ADS] [CrossRef] [Google Scholar]
Griffin, M. J., Abergel, A., Abreu, A., et al. 2010, A&A, 518, L3 [Google Scholar]
Hennemann, M., Motte, F., Bontemps, S., et al. 2010, A&A, 518, L84 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Johnstone, D., Wilson, C. D., Moriarty-Schieven, G., et al. 2000, ApJ, 545, 327 [NASA ADS] [CrossRef] [Google Scholar]
Kainulainen, J., Beuther, H., Henning, T., & Plume, R. 2009, A&A, 508, L35 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Kennicutt, R. C., Calzetti, D., Aniano, G., et al. 2011, PASP, 123, 1347 [NASA ADS] [CrossRef] [Google Scholar]
Könyves, V., André, P., Men’shchikov, A., et al. 2010, A&A, 518, L106 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Lagache, G., Abergel, A., Boulanger, F., Désert, F. X., & Puget, J. 1999, A&A, 344, 322 [NASA ADS] [Google Scholar]
Larson, R. B. 1981, MNRAS, 194, 809 [NASA ADS] [CrossRef] [Google Scholar]
Maury, A. J., André, P., Men’shchikov, A., Könyves, V., & Bontemps, S. 2011, A&A, 535, A77 [Google Scholar]
Men’shchikov, A., André, P., Didelon, P., et al. 2010, A&A, 518, L103 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Mink, D. J. 2002, in Astronomical Data Analysis Software and Systems XI, ed. D. A. Bohlender, D. Durand, & T. H. Handley, ASP Conf. Ser., 281, 169 [Google Scholar]
Miville-Deschênes, M.-A., Martin, P. G., Abergel, A., et al. 2010, A&A, 518, L104 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Moffat, A. F. J. 1969, A&A, 3, 455 [NASA ADS] [Google Scholar]
Molinari, S., Schisano, E., Faustini, F., et al. 2011, A&A, 530, A133 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Motte, F., André, P., & Neri, R. 1998, A&A, 336, 150 [Google Scholar]
Motte, F., André, P., Ward-Thompson, D., & Bontemps, S. 2001, A&A, 372, L41 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Motte, F., Schilke, P., & Lis, D. C. 2003, ApJ, 582, 277 [NASA ADS] [CrossRef] [Google Scholar]
Motte, F., Bontemps, S., Schilke, P., et al. 2007, A&A, 476, 1243 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Motte, F., Zavagno, A., Bontemps, S., et al. 2010, A&A, 518, L77 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Myers, P. C. 1983, ApJ, 270, 105 [NASA ADS] [CrossRef] [Google Scholar]
Pence, W. 1999, in Astronomical Data Analysis Software and Systems VIII, ed. D. M. Mehringer, R. L. Plante, & D. A. Roberts, ASP Conf. Ser., 172, 487 [Google Scholar]
Pilbratt, G. L., Riedinger, J. R., Passvogel, T., et al. 2010, A&A, 518, L1 [CrossRef] [EDP Sciences] [Google Scholar]
Poglitsch, A., Waelkens, C., Geis, N., et al. 2010, A&A, 518, L2 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. 1992, Numerical recipes in FORTRAN, The art of scientific computing [Google Scholar]
Rosolowsky, E. W., Pineda, J. E., Kauffmann, J., & Goodman, A. A. 2008, ApJ, 679, 1338 [NASA ADS] [CrossRef] [Google Scholar]
Roy, A., Ade, P. A. R., Bock, J. J., et al. 2010, ApJ, 708, 1611 [NASA ADS] [CrossRef] [Google Scholar]
Schneider, S., & Elmegreen, B. G. 1979, ApJS, 41, 87 [NASA ADS] [CrossRef] [Google Scholar]
Schneider, N., Motte, F., Bontemps, S., et al. 2010, A&A, 518, L83 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Smith, A. R. 1979, in SIGGRAPH’79: Proc. of the 6th annual conference on Computer graphics and interactive techniques (New York: ACM), 276 [Google Scholar]
Stutzki, J., & Guesten, R. 1990, ApJ, 356, 513 [NASA ADS] [CrossRef] [Google Scholar]
Starck, J.-L., & Murtagh, F. 2006, Astronomical Image and Data Analysis, ed. J.-L. Starck, & F. Murtagh [Google Scholar]
Williams, J. P., de Geus, E. J., & Blitz, L. 1994, ApJ, 428, 693 [NASA ADS] [CrossRef] [Google Scholar]
Williams, J. P., Blitz, L., & McKee, C. F. 2000, Protostars and Planets IV, 97 [Google Scholar]

Appendix A: Astrophysical objects: dense cores

The primary goal of this work is to develop a source extraction method suitable for the systematic detection and measurement of dense cores in molecular clouds with Herschel: one of the main observational objectives of the Gould Belt survey (André et al. 2010) is to obtain a complete census of such prestellar cores in nearby molecular clouds.

The structure of molecular clouds is often filamentary (e.g., Schneider & Elmegreen 1979); it is also known to be highly hierarchical and self-similar over a wide range of scales (e.g., Falgarone et al. 1991). This structure can be attributed to the role of supersonic interstellar turbulence (e.g., Larson 1981) and is reasonably well described by fractal models (e.g., Elmegreen & Falgarone 1996). However, interstellar turbulence dissipates on small scales; coupled to the effects of gravity in gravitationally-bound clouds, this breaks the self-similarity of cloud structure on scales below ~ 0.1 pc (e.g., Williams et al. 2000). The latter is the typical scale below which prestellar cores, the self-gravitating condensations of gas and dust giving birth to individual stars or systems, are observed in molecular clouds (e.g., Motte et al. 1998, 2001; André et al. 2000). Prestellar cores are observed at the bottom of the hierarchy of interstellar cloud structures and depart from Larson (1981)’s self-similar scaling relations. They correspond to “coherent” regions of nearly constant and thermal velocity dispersion which do not obey Larson’s power-law linewidth vs. size relation (e.g., Myers 1983; Goodman et al. 1998). The 18′′angular resolution of Herschel at 250 μm, equivalent to 0.03 pc at a distance of 350 pc, is sufficient to resolve the typical Jeans length in the nearby clouds; this is also the characteristic diameter expected for Bonnor-Ebert-like cores.

To first order, known prestellar cores have simple, convex, not very elongated shapes, and their density structure approaches that of the Bonnor-Ebert isothermal spheroids bound by the external pressure exerted by the parent cloud (e.g., Johnstone et al. 2000; Alves et al. 2001). Conceptually, a dense core may be defined as the immediate vicinity of a local minimum in the gravitational potential of a molecular cloud, corresponding to the part of the cloud under a given local gravitational influence. While, in general, the gravitational potential cannot be inferred from observations, it turns out to be directly related to the observable column density distribution for the post-shock, filamentary cloud layers produced by supersonic turbulence in numerical simulations of cloud evolution (Gong & Ostriker 2011). In practical terms, this means that one can define a dense core (more precisely, its projection onto the plane of the sky) as the immediate vicinity of a local maximum in observed column density maps, such as those derived from Herschel imaging, where dust continuum emission is largely optically thin and directly traces column densities. The source extraction method presented in this paper offers a new approach to the detection and measurements of sources, making full use of the multi-scale, multi-wavelength nature of the source extraction problem in the case of Herschel data. Analyzing a wide range of spatial scales, our method is able to detect the hierarchical structures of molecular clouds (cf. Sect. 2.5). Unlike such techniques as the dendrogram analysis (Rosolowsky et al. 2008), however, getsources is not explicitly designed to characterize the hierarchy of structures; our main focus is on the “compact” sources, at the end of the hierarchy.

Appendix B: List of symbols

For convenience of the readers, we list and define all symbols introduced in Sect. 2 of this paper:

$\hbox{$\mathcal{A}_{\lambda}$}$	images of the annuli around all detected sources
$\hbox{$\mathcal{D}_{\!\lambda}$}$	images of local standard deviations for flattening
ℱ_D	image of the initial footprints after source detection
ℱ_λ	images of source footprints in measurement iterations
$ℱ_{λ}^{}$ $\hbox{$\mathcal{F}^{}_{\lambda}$}$	images of source footprints expanded by convolution
$\hbox{$\mathcal{G}_{j}$}$	smoothing Gaussians in successive unsharp masking
$\hbox{$\mathcal{G}_{\!\lambda}$}$	smoothing Gaussians used to create detection images
$\hbox{$\mathcal{G}_{{\!\lambda}{\it A}}$}$	smoothing Gaussians used in the flattening procedure
ℐ_Dj C	clean detection images combined over wavelengths
ℐ_{Dj C l}	partial detection images above the sub-level ω_j l
$ℐ_{D j C}^{'}$ $\hbox{$\mathcal{I}^{\prime}_{{\rm D}{j}{\,\rm C}}$}$	clean detection images combined over wavelengths
ℐ_Dj S	single-scale segmentation images of source masks
ℐ_{iλO BSD}	background-subtracted, deblended images of sources
ℐ_λ	original observed images produced by a map-maker
ℐ_λDF	flattened detection images for the final extraction
ℐ_λD	detection images: either ℐ_λO or transformed ℐ_λO
ℐ_λDj	single-scale decompositions of the images ℐ_λD
ℐ_λDj C	single-scale images cleaned of noise and background
ℐ_λD C	full clean images reconstructed from ℐ_λDj C
ℐ_λDj R	single-scale images of the cleaning residuals
ℐ_λD R	full residuals images reconstructed from ℐ_λDj R
ℐ_λF	scaling image smoothed by convolution
ℐ_λM	images of the deblending shapes of all sources
ℐ_λO	measurement images: ℐ_λ resampled to pixel Δ
ℐ_λO BS	background-subtracted images of all detected sources
ℐ_λO CB	clean-background images with all sources removed
ℳ_DC	source mask images accumulated over scales and λ
ℳ_λ	observational mask images defining areas of interest
$\hbox{$\mathcal{T}_{{\!\lambda}{j}}$}$	threshold images with all pixels set to ϖ_λj
a_i	major length of a source segmentation mask at ω_j l
A_i	major FWHM size of a source i
A_i F	major axis of the footprint ellipse of a source i
A_i Fλ	major axis of the footprint ellipse of a source i at λ
A_iλ	major FWHM size of a source i measured at λ
$A_{λ}^{\max}$ $\hbox{$A^{\rm max}_{\lambda}$}$	maximum FWHM sizes of sources to be extracted
b_i	minor length of a source segmentation mask at ω_j l
B_i	minor FWHM size of a source i
B_i F	minor axis of the footprint ellipse of a source i
B_i Fλ	minor axis of the footprint ellipse of a source i at λ
B_iλ	minor FWHM size of a source i measured at λ
C_i	contrast of a source i above the sub-level ω_j l
$C_{i}^{\min}$ $\hbox{$C^{\,\rm min}_{i}$}$	required minimum contrast C_i for real sources
C_i A	amplification factor in the detection of noise peaks
C_i E	elongation factor in the detection of noise peaks
C_{i Ej_F}	elongation factor at the footprinting scale j_F
C_iλj	contrast of a source i above the threshold ϖ_λj
f_i	global flag: information on source global properties
f_iλ	monochromatic flag: information on source at each λ
f_S	scale factor defining relative spacing between scales
f_λj	turn-on factor for combining scales when S_j < O_λ
F_i hi	flux integrated over the source mask above ω_j l
F_i lo	flux integrated over the source mask below ω_j l
F_iλ P	background-subtracted and deblended peak intensity
F_iλ T	background-subtracted and deblended total flux
G_i	goodness of an extracted source i
i	running number of a source in the extraction catalog
I_ij	peak intensity of a source i at scale j
I_iλj	peak intensity of a source i in the image ℐ_λDj C
I_{iλj_F}	peak intensity of a source i in ℐ_λDj C at scale j_F
I_λj	pixel intensity in a single-scale detection image
$I_{λ D j C}^{\max}$ $\hbox{$I^{\,\rm max}_{{\!\lambda}{\rm D}{j}{\,\rm C}}$}$	maximum intensity over the clean image ℐ_λDj C
j	running number of a decomposed spatial scale
j_F	number of the footprinting scale of a source
k_λj	kurtosis in the single-scale residuals ℐ_λDj R
$k_{λ}^{\max}$ $\hbox{$k^{\,\rm max}_{\lambda}$}$	maximum allowed value of k_λj during cleaning
l	running number of the intensity sub-level ω_j l
n_det	minimum number of λ’s a source must be detected at
n_λj	variable number of standard deviations σ_λj in ϖ_λj
N	number of wavelengths used in the source extraction
N_S	number of spatial scales in the image decomposition
N_Πj	number of pixels in a partial detection image ℐ_{Dj C l}
$N_{Π j}^{\min}$ $\hbox{$N^{\,\rm min}_{{\Pi}{j}}$}$	minimum value of N_Πj for detecting sources in ℐ_{Dj C l}
N_Πλ	number of pixels in a cluster of connected pixels
$N_{Π λ}^{\min}$ $\hbox{$N^{\,\rm min}_{{\Pi}{\lambda}}$}$	minimum value of N_Πλ in ℐ_λDj C for making ℐ_Dj C
O_λ	observational angular resolution: FWHM beam size
$\hbox{$\bar{O}$}$	observational beam size averaged over wavelengths
R_i	reliability of an extracted source i
s_λj	skewness in the single-scale residuals ℐ_λDj R
$s_{λ}^{\max}$ $\hbox{$s^{\,\rm max}_{\lambda}$}$	maximum allowed value of s_λj during cleaning
S_j	spatial scale: FWHM of a smoothing Gaussian beam
S_{j_F}	characteristic footprinting scale of a source in ℐ_Dj C
S_{j_Fλ}	characteristic footprinting scale of a source in ℐ_λDj C
S_max	largest spatial scale in a single-scale decomposition
w_λ	weight enhancing contribution of high-res. images
x_i, y_i	source coordinates obtained in the detection process
α_i,δ_i	source coordinates in the equatorial system
η	parameter in the detection of noise peaks
γ	weighting power-law exponent defining w_λ
Δ	pixel size (the same for all images in an extraction)
λ	wavelength (central wavelength of a waveband)
μ_3λj	third statistical moments about the mean value
μ_4λj	fourth statistical moments about the mean value
ϖ_λj	iterated cleaning thresholds (cut-off levels)
σ_λj	standard deviation in a single-scale image
σ_λ	standard deviation in a full image
σ_{λj_F}	standard deviation in a detection image at scale j_F
σ_iλ P	uncertainty of the peak intensity F_iλ P of a source
σ_iλ T	uncertainty of the total flux F_iλ T of a source
Π_l, Π_m	pixels l, m in the clean combined images ℐ_Dj C
Θ_i Fλ	position angle of the elongation of a footprint
Θ_iλ	position angle of the elongation of a source
Ξ_i	global significance over all wavelengths λ
Ξ_iλ	monochromatic significance of a source i at λ
Ξ_rel	reliable level of signficance for extracted sources
Ξ_ten	tentative level of signficance for extracted sources
Ω_i	global signal-to-noise ratio over all wavelengths
Ω_iλ	monochromatic signal-to-noise ratio of a source i
ω_j l	sub-levels of intensities during source detection.

Appendix C: Simulated star-forming region

To illustrate the spatial decomposition and all other processing steps of getsources in this paper, we use a simulated star-forming region that we constructed well before the launch of Herschel in order to have a reasonably realistic model; it is sufficient to give here the following brief description²⁵.

The simulated star-forming region, placed at 140 pc, consists of a synthetic (scale-free) cirrus background fitted to a typical 100 μm intensity of the backgrounds in the Gould Belt survey and scaled to all other Herschel wavebands assuming a blackbody with a dust temperature of 17.5 K (cf. Lagache et al. 1999). The background was populated with 360 starless cores and 107 protostars with realistic intensity distributions, obtained from a large grid of 129 one-dimensional dust continuum radiative transfer models. The density distribution of the cores followed the structure of critical Bonnor-Ebert spheres, whereas the protostars had ρ ∝ r^-2 density profiles; both types of objects were embedded in background spherical clouds with a uniform density. The standard isotropic interstellar radiation field was shining on the outer edges of the clouds, making the temperature profile inverted (lower in the center) in all objects. Protostars, however, restored usual temperature distributions deeper towards their central parts, as they produced an accretion luminosity (L_acc ∝ M); they had accreted half of the mass of the cores they formed from (the conceptual borderline between Class 0 and Class I objects, André et al. 2000). Starless cores consisted of low-, medium-, and high-density sub-populations and were distributed according to the M_BE ∝ R_BE relation for the isothermal Bonnor-Ebert spheres (T_BE = 7, 14, 28 K) in the area of the mass-radius diagram occupied by prestellar cores observed in the Orion and Ophiuchus star-forming regions (Motte et al. 1998, 2001). All populations span wide ranges of masses (0.01 − 10 M_⊙) and radii (0.001–0.1 pc). Random noise (at the expected levels of the instrumental noise) was added to all pixels of the simulated images and the latter were convolved to the expected observational resolutions of 5, 7, 11, 17, 24, and 35′′ in all PACS and SPIRE bands at 70, 100, 160, 250, 350, and 500 μm.

Appendix D: A look in the Fourier domain

Fig. D.1

Fourier transform of the simulated star-forming region used in this paper (Appendix C) at 350 μm with a 24′′ resolution (Fig. 19, upper-left). The top panels show the Fourier amplitudes of separate components: the model sources, synthetic background, and noise (top left to right). The full image containing all the ingredients (and all spatial scales) is displayed in the following (middle-left) panel. The remaining panels (left to right, top to bottom) show the images of the single-scale decompositions of the simulated sky (cf. Fig. 2) with the scale sizes S_j annotated at the bottom of the panels. For the original images with 2048 × 2048 2′′ pixels, the panels present the Fourier amplitudes within the range of spatial frequencies from zero to one-fourth of the Nyquist frequency (0.25 arcsec^-1). For better visibility, the pixel values are somewhat limited in range; the color coding is linear.

Fig. D.2

Fourier transform of the actual star-forming regions observed by Herschel at 350 μm with a 25′′ resolution. Shown are the amplitudes of the decomposed images of Aquila (top left to right) and Rosette (bottom left to right); the scale sizes S_j are annotated at the bottom of the panels. For the original Aquila images with 4096 × 4096 3′′ pixels and Rosette images with 2048 × 2048 3′′ pixels, the panels present the Fourier amplitudes within the range of spatial frequencies from zero to one-third of the Nyquist frequency (0.17 arcsec^-1). For better visibility, the spatial frequencies and the pixel values are somewhat limited in range; the color coding is linear.

The original images and their single-scale decompositions can be transformed into the Fourier domain. The successive unsharp masking described by Eq. (1) (Sect. 2.2) can also be formulated in terms of the Fourier transforms of the images. For those readers who are used to the tranformations between the image and Fourier domains, we present some additional images that may be useful for better understanding of getsources.

In Fig. D.1, we show the amplitudes of the complex Fourier transform for the simulated star-forming region that is used in this paper to illustrate getsources (cf. Appendix C). In the simulated images, one exactly knows all individual components: the model sources, background, and noise; the amplitudes of the three components are displayed in the top panels of Fig. D.1. The Fourier transforms of the sources and noise are clearly dominated by the fact that the original images have a resolution of the 350 μm Herschel band: higher spatial frequencies have been suppressed by the convolution with the observational beam. Although the transform of the synthetic background is also shaped by the limited angular resolution at high frequencies, the steep power spectrum P(q) ∝ q^-3 of the synthetic background carries most of its power on large scales (cf. Appendix E) and hence the amplitude remains strongly peaked at zero frequency. The full simulated image has all components added together and at each spatial frequency the Fourier amplitude becomes a mixture of the noise, background, and sources that cannot be cleanly separated anymore (Fig. D.1, middle-left panel).

Our source extraction method attempts to give a practical solution to the problem of separating sources from all other components by decomposing the original images in a large number of fine spatial scales using a procedure that involves successive convolution and subtraction of the images (cf. Sect. 2.2, Eq. (1)). In effect, such a decomposition creates a set of the filtered “single-scale” images ℐ_λDj each containing considerable signals from only a limited range of spatial scales (or frequencies) that are determined by the size S_j of the smoothing Gaussian beam $\hbox{$\mathcal{G}_{j}$}$ . Consequently, in the Fourier domain, the single-scale images look like toroidal structures of variable widths heavily overlapping with each other over a substantial range of frequencies (Fig. D.1). By construction, the simulated sky does not include any asymmetric or highly-elongated (filamentary) structures, therefore the toroids look very regular and symmetric in the Fourier domain. The actual fields observed by Herschel are, however, more complicated, containing lots of filamentary structures, that make the toroids less symmetric. To illustrate their appearance in the real observations, we present in Fig. D.2 Fourier amplitudes of a few single-scale images for the Aquila and Rosette star-forming regions.

The Fourier space representation is useful for understanding some aspects of our method, as well as for image convolution (for which we apply a discrete fast Fourier transform algorithm). However, getsources is not entirely translatable into the Fourier domain and remains fundamentally the image-oriented method of source extraction.

Appendix E: Power spectra of image components

Fig. E.1

Typical power spectra of several components contained in observed Herschel images. Red circles show the power spectrum of the SPIRE 250 μm image of the Polaris cirrus cloud taken as part of the Herschel Gould Belt survey; no attempt was made here to correct the raw power spectrum for deviations of the SPIRE 250 μm beam from a Gaussian shape (cf. Miville-Deschênes et al. 2010). This cirrus background is well described by a power law P(q) ∝ q^-3 in the range of spatial frequencies 0.02 < q < 2 arcmin^-1. Blue squares show the estimated power spectrum of the instrumental noise in a typical Herschel image at 250 μm, which is flat over more than two orders of magnitude in q. The solid curve shows the power spectrum (scaled to an arbitrary level of power) of the intensity distribution of a Gaussian source with a size of 1 arcmin (FWHM). The vertical dashed line marks the spatial frequency of 0.46 arcmin^-1, at which the contrast of such a source is maximum over the background fluctuations (assumed to follow P ∝ q^-3).

The ability to detect compact sources, such as dense cores, in the wide-field images obtained as part of the Gould Belt and HOBYS surveys with Herschel is primarily limited by the confusion arising from the small-scale cloud structure. In contrast, the instrumental noise and mapping artifacts are often negligible. An important property of the background cloud fluctuations is that they do not follow Gaussian statistics. In particular, it is well known that the power spectrum of interstellar cloud fluctuations strongly depends on spatial scales (P(q) ∝ q^-3, where q is spatial frequency; e.g., Roy et al. 2010, and references therein). This is illustrated in Fig. E.1, which shows the power spectrum of a SPIRE 250 μm image of the Polaris cirrus cloud (see also Fig. 3 of Miville-Deschênes et al. 2010). In practice, this means that the background fluctuations are much stronger on larger scales and, unlike Gaussian fluctuations, cannot be characterized by a single value of the standard deviation. Indeed, they are better described by a wide range of standard deviations, as reflected in their power spectrum (Fig. E.1). Accordingly, the probability density function (PDF) of submillimeter intensity (or column density) variations in the Herschel images of Galactic fields is not Gaussian, but typically lognormal in non-star-forming clouds, such as Polaris (Schneider et al., in prep.; see also Kainulainen et al. 2009).

In contrast to the astrophysical backgrounds, the instrumental noise fluctuations are approximately Gaussian, having a flat power spectrum, essentially independent of spatial frequency (white noise). This is illustrated in Fig. E.1, which shows the estimated power spectrum of the instrumental noise in a typical Herschel image at 250 μm, derived from the power spectrum of the PACS 70 μm image of the Polaris field with no significant cloud emission (cf. Men’shchikov et al. 2010; Miville-Deschênes et al. 2010; André et al. 2010). For a better comparison with the power-law cirrus profile in Fig. E.1, the spatial frequencies sampled in the original PACS 70 μm data were scaled down by a factor 70/250, to represent the typical range of spatial frequencies in the SPIRE 250 μm images of the Herschel Gould Belt survey. Likewise, the power spectrum of the intensity distribution of a Gaussian source with a half-maximum width D is itself Gaussian and thus flat approximately up to its half-maximum width 2^3/2ln2 (π D)^-1 in spatial frequencies.

Figure E.1 illustrates why decomposing the observed images into a finely-spaced set of filtered images is advantageous for source extraction. Given the typical power spectra of the background fluctuations, instrumental noise, and Gaussian-like sources, the maximum contrast of a source with a size D over the background fluctuations is obtained at spatial frequencies 0.46 D^-1 or, equivalently, for spatial scales close to 2.2 D. By performing source detection in a fine grid of single-scale images, getsources can automatically identify and use the scale at which the contrast of each source over the background is maximized, thereby improving source detectability.

Another great advantage of the fine spatial decomposition employed by getsources (Sect. 2.2) is that the emission fluctuations in the decomposed single-scale images of interstellar clouds follow Gaussian statistics much more closely than the cloud fluctuations in the observed images containing all spatial scales (cf. Fig. 5). This is because each single-scale image of a cirrus cloud can be characterized by a single standard deviation value much better than the original image, since it selects only a narrow range of spatial frequencies from a power-law spectrum of standard deviations, such as that shown in Fig. E.1.

Appendix F: Estimating shapes of sources

We derive the source coordinates, sizes, and orientations from the moments of the background-subtracted, deblended intensity distributions over the pixels of their footprints using the first and second moments $\begin{matrix} E (x) & = & \frac{\int x I (r) d^{2} r}{\int I (r) d^{2} r}, \\ E (y) & = & \frac{\int y I (r) d^{2} r}{\int I (r) d^{2} r}, \\ E ({x^{2}}_{˜}) & = & \frac{\int {x^{2}}_{˜} I (r) d^{2} r}{\int I (r) d^{2} r}, \\ E ({y^{2}}_{˜}) & = & \frac{\int {y^{2}}_{˜} I (r) d^{2} r}{\int I (r) d^{2} r}, \\ E (\begin{matrix} ˜ \\ x \end{matrix} \begin{matrix} ˜ \\ y \end{matrix}) & = \end{matrix}$ $\appendix \setcounter{section}{6} \begin{eqnarray} {E}(x)&=&\frac{\int{x\,I({\vec r})\,{\rm d}^2{\vec r}}}{\int{I(\vec r)\,{\rm d}^2{\vec r}}}, \\ {E}(y)&=&\frac{\int{y\,I({\vec r})\,{\rm d}^2{\vec r}}}{\int{I(\vec r)\,{\rm d}^2{\vec r}}}, \\ {E}(\tilde{x}^2) &=& \frac{\int{\tilde{x}^2\,I({\vec r})\,{\rm d}^2{\vec r}}}{\int{I(\vec r)\,{\rm d}^2{\vec r}}}, \\ {E}(\tilde{y}^2) &=& \frac{\int{\tilde{y}^2\,I({\vec r})\,{\rm d}^2{\vec r}}}{\int{I(\vec r)\,{\rm d}^2{\vec r}}}, \\ {E}(\tilde{x}\,\tilde{y}) &=& \frac{\int{\tilde{x}\,\tilde{y}\,I({\vec r})\,{\rm d}^2{\vec r}}}{\int{I(\vec r)\,{\rm d}^2{\vec r}}}, \label{moments} \end{eqnarray}$ where x,y are the Cartesian coordinates of a vector r = (x,y) and the pixel coordinates relative to the source barycenter are given by $\begin{matrix} _{˜} \\ x \end{matrix} = x - E (x)$ $\hbox{$\tilde{x}{\,=\,}x-{E}(x)$}$ and $\begin{matrix} _{˜} \\ y \end{matrix} = y - E (y)$ $\hbox{$\tilde{y}{\,=\,}y-{E}(y)$}$ . Let us write the covariance matrix $M = [\begin{matrix} E ({x^{2}}_{˜}) & E (\begin{matrix} ˜ \\ x \end{matrix} \begin{matrix} ˜ \\ y \end{matrix}) \\ E (\begin{matrix} ˜ \\ x \end{matrix} \begin{matrix} ˜ \\ y \end{matrix}) & E ({y^{2}}_{˜}) \end{matrix}] = [\begin{matrix} σ_{x}^{2} & σ_{xy} \\ σ_{yx} & σ_{y}^{2} \end{matrix}]$ $\appendix \setcounter{section}{6} \begin{equation} M = \left[ \begin{array}{cc} {E}(\tilde{x}^2) & {E}(\tilde{x}\,\tilde{y}) \\ \noalign{\medskip} {E}(\tilde{x}\,\tilde{y}) & {E}(\tilde{y}^2) \end{array} \right] = \left[ \begin{array}{cc} \sigma_x^2 & \sigma_{x y} \\ \noalign{\medskip} \sigma_{y x} & \sigma_y^2 \end{array} \right] \label{covariance} \end{equation}$ (F.6)where $σ_{x}^{2}$ $\hbox{$\sigma_x^2$}$ and $σ_{y}^{2}$ $\hbox{$\sigma_y^2$}$ are the variances and σ_xy is the covariance of the intensity moments. For such a symmetric matrix (σ_xy = σ_yx), one can find eigenvalues Λ by solving the characteristic equation $\begin{matrix} \det \end{matrix} (M - Λ I^{)} = 0,$ $\appendix \setcounter{section}{6} \begin{equation} \det\,\left(M-\Lambda I\right) = 0, \label{characteristic.equation} \end{equation}$ (F.7)where I is the identity matrix. This leads to the quadratic equation $Λ^{2} - (σ_{x}^{2} + {σ_{y}^{2}}^{)} Λ + (σ_{x}^{2} σ_{y}^{2} - {σ_{xy}^{2}}^{)} = 0$ $\appendix \setcounter{section}{6} \begin{equation} \Lambda^2 - \left(\sigma_x^2 + \sigma_y^2\right) \Lambda + \left(\sigma_x^2 \sigma_y^2 - \sigma_{x y}^2\right) = 0 \label{quadratic.equation} \end{equation}$ (F.8)with the following two roots Λ⁺ and Λ⁻: $Λ^{\pm} = \frac{σ_{x}^{2} + σ_{y}^{2}}{2} \pm {({(\frac{σ_{x}^{2} - σ_{y}^{2}}{2})}^{2} + σ_{xy}^{2})}^{1 / 2}$ $\appendix \setcounter{section}{6} \begin{equation} \Lambda^\pm = \frac{\sigma_x^2 + \sigma_y^2}{2} \pm \left(\left(\frac{\sigma_x^2 - \sigma_y^2}{2}\right)^2 \!+ \sigma_{x y}^2\right)^{1/2} \label{two.roots} \end{equation}$ (F.9)defining the major and minor axes of an ellipse. The major axis forms an angle θ with the x-axis, given by $\tan (2 θ^{)} = \frac{2 σ_{xy}}{σ_{x}^{2} - σ_{y}^{2}} \cdot$ $\appendix \setcounter{section}{6} \begin{equation} \tan \left(2 \theta\right) = \frac{2\,\sigma_{x y}}{\sigma_x^2 - \sigma_y^2}\cdot \label{angle} \end{equation}$ (F.10)The major and minor FWHM sizes, as well as the position angle (east of north) of a source i at wavelength λ, are finally computed from $\begin{matrix} A_{iλ} & = & {(Λ^{+} 8 \ln 2)}^{1 / 2}, \\ B_{iλ} & = & {(Λ^{-} 8 \ln 2)}^{1 / 2}, \\ Θ_{iλ} & = \end{matrix}$ $\appendix \setcounter{section}{6} \begin{eqnarray} A_{{i}{\lambda}}\!&=&\!\left(\Lambda^+\,8 \ln 2\right)^{1/2}, \\ B_{{i}{\lambda}}\!&=&\!\left(\Lambda^-\,8 \ln 2\right)^{1/2}, \\ \Theta_{{i}{\lambda}}\!&=&\!\pi - \arctan \theta. \label{fwhm.sizes.and.position.angle} \end{eqnarray}$

Appendix G: Installing and using getsources

The source extraction method described above has been developed by A.M. (since July 20, 2008) as a Bash script called getsources and a suite of the FORTRAN utilities executed by the script that perform most of the work. This ensures a high degree of portability and efficiency, as these two languages are the de facto standards in the worlds of the UNIX-like operating systems and numerical computations. Either Mac OS X or Linux and either ifort 11.1 or gfortran 4.5 can be used to install getsources; other systems and compilers have not been tested. A preparation script called prepareobs makes use of SWarp (Bertin et al. 2002) for image resampling and reprojection. To read and write images in the FITS format, getsources uses the CFITSIO library (Pence 1999); to convolve images, the fast Fourier transform routine rlft3 (Press et al. 1992) is used; the source coordinates (α, δ) are computed by the utility xy2sky from WCSTools (Mink 2002).

The total processing times are proportional to the numbers of pixels, spatial scales, wavelengths, iterations, and potential sources detected, depending also on the computer CPUs available. The extraction time may vary between few minutes and a week, the latter being the longest time we have experienced and it refers to a 6-wavelengths extraction for 5 $\overset{°}{.}$ $\hbox{${\fdg}$}$ 2 × 5 $\overset{°}{.}$ $\hbox{${\fdg}$}$ 2 images with 6234 × 6234 3′′ pixels, each image occupying ~ 150 MB of disk space. On the other hand, a 6-wavelength extraction for the simulated sky used for illustrations in this paper (1800 × 1800 2′′ pixels, each image taking ~ 13 MB of disk space) was completed within one day. Although at first glance that may seem a long computational time, getsources is not a real-time software; the completeness and reliability of the source extraction, not its speed, were the priorities in its design. In the kind of astronomical research we are dealing with, even one week of processing is never a limiting factor, if the work is properly planned. A researcher would spend much more time on the analysis of the information delivered by the code (catalogs, images) and on the studies of the astrophysical reality of interest (in our case, the star formation).

Processing speed depends on many circumstances in a given computing system. Users are advised to always run getsources on local (internal) hard drives physically attached to the CPUs used for extractions. At some steps, the code performs lots of read and write (I/O) operations on FITS files and the users would benefit from the fastest possible I/O throughput²⁶. Our algorithm is able to use virtual random-access-memory (RAM) disks to speed up the I/O for the images and to reduce the processing times. However, the gain compared to the speed of a local hard drive access may not be very significant. It is generally not optimal to use network-attached hard drives, as the disk access over networks is very slow compared to that of the local disks. If one must use the network storage for running getsources and has very large amounts of RAM, one might consider using the RAM disk facility of the code. This may lead up to a considerable acceleration factor in processing times and thus compensate for the inefficient hard disk access over the networks. For multi-wavelengths extractions, a natural way of cutting down the computational times is to run the first two processing blocks (Fig. 1) in parallel, one wavelength per CPU, as they are decomposed and cleaned independently of each other. If the computational time becomes an issue for enormously large images, the users may want to split them into several sub-fields to obtain extractions much faster, running the extraction in parallel on different CPUs. For example, one can get an acceleration factor of ~30 by splitting images in only 3 sub-fields of equal size (the factor assumes the same number of sources in each sub-field and that the extractions are run in parallel).

Another consideration is the storage space necessary to keep available intermediate images for many spatial scales and wavelengths until the end of the extraction process; the space needed scales between hundreds of MB to tens of GB, depending on the image size and the numbers of the scales, wavelengths, and measurement iterations. In addition to the resulting catalogs and images produced by getsources, a relatively large number of intermediate images and catalogs are also kept on the hard drive. Those are useful in case the processing is interrupted for some reason or if the user needs to restart the extraction from some previous step or to continue the measurement iterations until their convergence. Those restart images may also be very useful to inspect, in order to better understand the extraction results; however, this time-saving feature works, of course, at the expense of the disk space. It is up to the user to decide what is the biggest issue, the time or the space. Whenever the user becomes satisfied with the extraction results or the time needed to re-run the extraction is not an issue, those extra files can be removed from the hard drive by getsources.

The code is powerful, automated, flexible, easy-to-use, and very extensively tested; the algorithm is designed to be run on properly prepared images twice (the initial and final extractions) and none of a few parameters of the code need to be changed, as their default values have been carefully fine-tuned to work best in all cases. The multi-wavelength design of getsources is quite flexible and it allows one to use it in some special ways. For example, one can detect sources using only selected wavelengths (even a single image) but produce catalogs with measurements for all wavebands. It is also possible to add other non-Herschel images for either both detection and measurements or to only measurements, to use more information and get better results. One can also use special mask images to exclude problematic (e.g., saturated) areas at some of the wavelengths to avoid using those areas in combining images over wavelengths and in detecting sources. Users can re-run only selected steps of the extraction and also restart the detection and measurements from any intermediate scale or iteration. There are also other possibilities; users are welcome to request information on their specific needs from the author.

The source extraction code with an installation guide and a quick start guide are freely available upon request or they can be downloaded from our web pages²⁷. Users installing getsources on their computers are advised to test it on a multi-wavelength extraction using Herschel images of the galaxy NGC 4559 that was chosen as our validation field (the galaxy was observed as part of the KINGFISH project, see Kennicutt et al. 2011). This relatively quick extraction performed by the author can also be requested or downloaded by the users who wish to validate their installation and verify that they are able to reproduce the reference extraction results.

All Figures

	Fig. 1 Main processing blocks of the getsources algorithm described in Sects. 2.1–2.7.
In the text

Fig. 2

Single-scale decomposition (Sect. 2.2). The central 0 $\overset{°}{.}$ $\hbox{${\fdg}$}$ 44 × 0 $\overset{°}{.}$ $\hbox{${\fdg}$}$ 44 sub-field of the detection image ℐ_λD of a 1° × 1° simulated star-forming region at 350 μm (upper left). Its single-scale images ℐ_λDj are shown at the scales indicated (left to right, top to bottom) for j = 17,30,43,57,70, N_S = 99, f_S = 1.053, S₁ = 4′′, S_{N_S} = 660′′ (see Eq. (1)). The image dimensions are 1800 × 1800 pixels and the pixel size Δ = 2′′. The scales were selected to be separated by a factor of 2 to illustrate the spatial decomposition. The scale sizes S_j are visualized by the yellow-black circles and annotated at the bottom of the panels. For better visibility, the values displayed in the panels are somewhat limited in range; the color coding is a linear function of intensity in MJy/sr.

In the text

Fig. 3

Single-scale removal of noise and background (Sect. 2.3). The field of Fig. 2 is shown as the full clean image ℐ_λD C at 350 μm (upper left) that accumulates clean images over all scales. The same set of spatial scales is displayed in the single-scale images ℐ_λDj C (left to right, top to bottom), cleaned of noise and background with an iterative procedure described in Sect. 2.3. All intensity peaks visible in the scales belong to the sources; most of the noise and background fluctuations (visible in Fig. 2) have been removed. The scale sizes S_j are visualized by the yellow-black circles and annotated at the bottom of the panels. For better visibility, the values displayed in the panels are somewhat limited in range; the color coding is a function of the square root of intensity in MJy/sr.

In the text

Fig. 4

Single-scale cleaning residuals (Sect. 2.3). The field of Fig. 2 is shown as the reconstructed image ℐ_λD R of the residuals at 350 μm (upper left) that accumulates cleaning residuals from all scales. The same set of spatial scales is displayed in the single-scale images of the residuals ℐ_λDj R (left to right, top to bottom). The cleaning procedure left no significant intensity peaks of the simulated objects in the residuals, only the noise- and background-dominated pixels (cf. Fig. 7). The scale sizes S_j are visualized by the yellow-black circles and annotated at the bottom of the panels. The color coding is a linear function of intensity in MJy/sr.

In the text

Fig. 5

Using skewness and kurtosis for iterating accurate thresholds ϖ_λj in the cleaning process (Sect. 2.3). The upper panels show histograms for the full image ℐ_λD at 350 μm (left) and for the decomposed images ℐ_λDj at the 9′′ and 36′′ scales (middle, right) before cleaning. The lower panels display histograms for the reconstructed residuals ℐ_λD R (left) and for the residuals ℐ_λDj R of the two decomposed images of the upper panels (middle, right). The vertical lines in the left panels indicate the standard deviation σ_λ (short dash, green) and 6 σ_λ (long dash, magenta) computed in the full image ℐ_λD. In the other panels they indicate the converged values of the standard deviation σ_λj and 6 σ_λj in the single-scale images ℐ_λDj, as well as the final thresholds ϖ_λj (solid, red). The histogram of the residuals ℐ_λD R of the cleaning process, reconstructed from all spatial scales (lower left) has much greater symmetry and resembles a Gaussian distribution, whereas the histogram of the full image ℐ_λD (red, copied from the upper-left panel) is highly asymmetric. Both $s_{λ}^{\max}$ $\hbox{$s^{\rm max}_{\lambda}$}$ and $k_{λ}^{\max}$ $\hbox{$k^{\,\rm max}_{\lambda}$}$ have a value of 3.17 and the corresponding variable factors n_λj have the values of 4.52 and 4.09 for the two single-scale images. The width of the intensity bins is 1 MJy/sr in the left panels and 0.004 MJy/sr in all other panels.

In the text

	Fig. 6 Two types of combined single-scale images (Sect. 2.4). Schematically shown are the image ℐ_Dj C (left) used to detect sources and track the evolution of their shapes and the image $ℐ_{D j C}^{'}$ $\hbox{$\mathcal{I}^{\prime}_{{\rm D}{j}{\,\rm C}}$}$ (right) used to determine the characteristic scales and initial footprints.
In the text

Fig. 7

Combination of single-scale images (Sect. 2.4). The field of Fig. 2 is shown as source masks ℳ_DC accumulated over all scales and wavebands (upper left) that give a summary view of how the sources, made visible by the cleaning (cf. Fig. 3), change their shapes and sizes. The same set of spatial scales is displayed in the combined clean single-scale images ℐ_Dj C (left to right, top to bottom) that accumulate information at those scales from all wavelengths. For better visibility, the values displayed in the masks image are limited to 300 and in the normalized images ℐ_Dj C they are limited to 0.07, 0.3, 0.3, 0.6, and 0.6. The scale sizes S_j are visualized by yellow-black circles and annotated at the bottom of the panels. The color coding is a logarithmic function of intensity.

In the text

Fig. 8

Evolution of sources in the clean single-scale images (Sect. 2.5). The full image of the source at 350 μm with a size of 3 $\overset{'}{.}$ $\hbox{${\farcm}$}$ 7 × 3 $\overset{'}{.}$ $\hbox{${\farcm}$}$ 7 (left) has been cut out of the corresponding panel in Fig. 2 (the source is located south-east of the image center, it is resolved in all wavebands). The other panels show the source in single-scale images ℐ_λDj C at the scales of 18, 36, 138, and 199′′, maximum intensities in the panels being 0.31, 0.89, 3.09, and 1.84 MJy/sr, respectively. The scale sizes S_j are visualized by the blue circles and annotated at the bottom of the panels. The color coding is a linear function of intensity.

In the text

Fig. 9

Single-scale source detection (Sect. 2.5). The field of Fig. 2 is shown as the initial footprint image ℱ_D after the source detection (upper left). The same set of spatial scales is displayed in the single-scale segmentation images ℐ_Dj S (left to right, top to bottom) showing the source segmentation masks determined from ℐ_Dj C (Fig. 7). The masks were obtained and analyzed by the detection procedure described in Sect. 2.5. The scale sizes S_j are visualized by the yellow-black circles and annotated at the bottom of the panels. The color coding is a function of the square root of the source number (which makes up the actual pixel values in these segmentation images).

In the text

Fig. 10

Topology of the pixel connections (Sect. 2.5). Pixels of the red shape are 4-connected and thus the shape may represent a source. Pixels of the dark blue shape are not 4-connected: there are no paths connecting any pair of the blue pixels such that any neighboring pixels in the paths are connected to each other only by their sides. Note, however, that there are two 4-connected sub-areas in the blue shape, which could represent sources.

In the text

Fig. 11

Measurement iterations (Sect. 2.6). The processing block of measurements from Fig. 1 is shown at the top. It is sub-divided here in 5 algorithmic steps that are executed in iterations, until the footprint images ℱ_λ have converged, i.e. stable distributions of source footprints at each wavelength have been obtained. The deblending block itself includes iterations to disentangle contributions of many overlapping sources to the intensity of a pixel that belongs to all of them.

In the text

Fig. 12

Background interpolation scheme (Sect. 2.6). The central red pixels belong to the source, defining its footprint in this example, whereas the surrounding blue pixels are those of the background. At each pixel of the source, the background is linearly interpolated in the four main directions based on the values of the pixels just outside the footprint (highlighted in darker blue), the resulting four values per pixel being averaged. Such interpolation probes the actual background variations around the sources, and thus the interpolated background is more complex and realistic than just a planar surface.

In the text

Fig. 13

Deblending overlapping sources (Sect. 2.6). Three identical sources (A, B, C) with the same sizes and with peak intensities normalized to unity are overlapping with their footprints. For clarity of the figure, the sources are not shown with their individual profiles, but rather with their blended intensity distribution ΣABC; with noise added, the profile transforms into ΣABCn. The deblending profiles A_M, B_M, and C_M (Eq. (14)) are defined at the source positions by the sizes and peak intensities estimated from the background-subtracted image ℐ_λO BS. At each pixel, the deblending algorithm splits its intensity I_ΣABCn between each of the overlapping sources, according to the fraction of the shape’s intensity (Eq. (15)). The resulting deblended profiles of each source, A_D, B_D, and C_D, are shown in the right panel.

In the text

Fig. 14

Deblending overlapping sources (Sect. 2.6). Background-subtracted overlapping sources at 350 μm (left) are separated into two individual sources, cataloged under the numbers 244 and 242 (middle, right). This blended pair is clearly visible half-way north from the field centers in Figs. 2, 3, 7, 9, 17, 18; the annulus around the source 244 is highlighted in Fig. 16. Comparison with the known true model parameters shows that the peak intensities measured for these deblended sources are in error by − 1.1% and − 5.1% and the total fluxes were calculated with errors of − 0.5% and − 12%, respectively. The color coding is a linear function of intensity in MJy/sr.

In the text

Fig. 15

Converged footprints of the measured sources (Sect. 2.6). The field of Fig. 2 is shown at 70, 100, 160, 250, 350, 500 μm (left to right, top to bottom) as the footprints of all detected sources after the measurement iterations. The pixel values are the standard deviations σ_iλ P, due to the local noise and background variations, estimated for each source in an elliptical annulus around its footprint (Fig. 16). Strongly elliptical or too large footprints may appear at those wavelengths where some sources are too faint to be measurable. For such sources, the information is essentially lost and the intensity moments cannot provide meaningful estimates of their sizes and orientation. The color coding is a function of the square root of intensity in MJy/sr.

In the text

Fig. 16

Annuli around measured sources (Sect. 2.6). The field of Fig. 2 is shown as the true intensities of the model sources at 350 μm convolved to a 17′′ resolution (left), the image of annuli $\hbox{$\mathcal{A}_{\lambda}$}$ of all detected sources (middle) slightly modified to highlight the annulus area around the source 244 from Fig. 14, and the product $\hbox{$\mathcal{A}_{\lambda}\,\mathcal{I}_{{\!\lambda}{\rm O}}$}$ (right) to visuaize the actual observed intensities used to compute the flux uncertainties σ_iλ P shown in Fig. 15. The corresponding footprint images ℱ_λ are presented in Fig. 15 and the observed image ℐ_λO is displayed in Fig. 18. The color coding in the left panel is a function of the square root of intensity, in the other panels it is a linear function of intensity in MJy/sr.

In the text

Fig. 17

Background-subtracted sources (Sect. 2.6). The field of Fig. 2 is shown as the true intensities of the model sources at 350 μm convolved to a 17′′ resolution (left), the background-subtracted image ℐ_λO BS at 350 μm (middle), and the composite 3-color RGB image (500, 250, 160 μm) created using the images of the deblending shapes of each extracted source (right). For the true model intensity distribution (no background) to be more comparable to the background-subtracted image, it is shown above 5 MJy/sr. The color coding is a function of the square root of intensity in MJy/sr.

In the text

Fig. 18

Visualization of the measured and cataloged sources (Sect. 2.7). The field of Fig. 2 is shown at 70, 100, 160, 250, 350, 500 μm (left to right, top to bottom) with the extraction ellipses (FWHM) of the measurable sources (F_iλ P > σ_iλ P and F_iλ T > σ_iλ T) overlaid on top of the observed images ℐ_λO. The default condition, that a tentative source must be detected in at least two bands, was used. Only the protostellar cores are visible at 70 and 100 μm, whereas at 160 μm cold starless cores starts to appear, becoming clearly visible in the SPIRE bands. For better visibility, the values displayed in the panels are somewhat limited in range; the color coding is a function of the square root of intensity in MJy/sr.

In the text

Fig. 19

Flattening of the detection images (Sect. 2.8). The entire 1° × 1° simulated star-forming region at 350 μm is shown (upper left), the central area of which was used to illustrate getsources in this paper. The image $ℱ_{λ}^{*}$ $\hbox{$\mathcal{F}^{*}_{\!\lambda}$}$ of the converged footprints ℱ_λ somewhat expanded by convolution (upper middle) is used to interpolate all detected sources off the image to obtain the clean background image (upper right). The image $\hbox{$\mathcal{D}_{\!\lambda}$}$ of the standard deviations (lower left) is computed in a very small (3 × 3 pixels) sliding window and further median-filtered using a 21 × 21 pixels sliding window to reduce effects of possible artifacts. The smoothed scaling image ℐ_λF (lower middle) is obtained by convolving the image in the previous panel with a circular Gaussian beam $\hbox{$\mathcal{G}_{{\!\lambda}{\it A}}$}$ larger than the maximum size of sources extracted at 350 μm in the initial extraction. The resulting flattened image ℐ_λDF (lower right) is obtained by dividing the original image ℐ_λD by the scaling image ℐ_λF. For better visibility, the values displayed in the panels are somewhat limited in range; the color coding is a linear function of intensity in MJy/sr.

In the text

	Fig. 20 Flattening of the detection images (Sect. 2.8). The getsources algorithm requires two complete extractions, the initial and the final extractions (red blocks, expanded in Fig. 1; the preparation steps are shown in blue).
In the text

Fig. 21

The Aquila sub-field (18′ × 18′) is shown as the observed images ℐ_λO at 70, 160, 250, 350, 500 μm (left to right, top to bottom) with the extraction ellipses (FWHM) of only measurable sources (F_iλ P > σ_iλ P and F_iλ T > σ_iλ T) overlaid, as well as the composite 3-color RGB image (500, 250, 160 μm) created using the images ℐ_λM of the deblending shapes of each extracted source (lower-right). The default condition, that a tentative source must be detected in at least two bands, was used. Only the protostars are visible at 70 μm, whereas at 160 μm one starless core appears, the other cold sources becoming clearly visible in the SPIRE bands. For better visibility, the values displayed in the panels are somewhat limited in range; the color coding in the lower-right panel is linear, in all other panels it is a function of the square root of intensity in MJy/sr.

In the text

Fig. 22

The Aquila sub-field is shown as a column density image with a 37′′ resolution (left, Könyves et al. 2010), an accumulated clean combined detection image containing spatial scales of up to 80′′ (middle), obtained by summing up the single scales ℐ_Dj C in that range, with the 500 μm ellipses of all detected sources overplotted. Also shown is the clean background ℐ_λO CB (right) at 500 μm. For better visibility, the values displayed in the panels are somewhat limited in range; the color coding in all panels is a function of the square root of column density and of intensity in MJy/sr.

In the text

Fig. 23

The Rosette sub-field (9′ × 9′) is shown as as the observed images ℐ_λO at 70, 160, 250, 350, 500 μm (left to right, top to bottom) with the extraction ellipses (FWHM) of only measurable sources (F_iλ P > σ_iλ P and F_iλ T > σ_iλ T) overlaid, as well as the composite 3-color RGB image (500, 250, 70 μm) created using the images ℐ_λM of the deblending shapes of each extracted source (lower-right). The default condition, that a tentatve source must be detected in at least two bands, was used. Most of the compact sources visible at 70 μm are unresolved protostars; several groups of them, discussed in Sect. 3.2, are labeled A–F. For better visibility, the values displayed in the panels are somewhat limited in range; the color coding in the lower-right panel is linear, in the other panels it is a function of the square root of intensity in MJy/sr.

In the text

Fig. 24

The Rosette sub-field is shown as the accumulated clean combined detection image containing spatial scales of up to 12′′ (left), obtained by summing up the single scales ℐ_Dj C in that range, with the 160 μm ellipses of all detected sources overplotted. Also shown are the background-subtracted image ℐ_λO BS (middle) and clean background ℐ_λO CB (right) at 70 μm; when added together, they make up the original 70 μm image in Fig. 23. For better visibility, the values displayed in the panels are somewhat limited in range; the color coding is a function of the square root of intensity in MJy/sr.

In the text

Fig. D.1

Fourier transform of the simulated star-forming region used in this paper (Appendix C) at 350 μm with a 24′′ resolution (Fig. 19, upper-left). The top panels show the Fourier amplitudes of separate components: the model sources, synthetic background, and noise (top left to right). The full image containing all the ingredients (and all spatial scales) is displayed in the following (middle-left) panel. The remaining panels (left to right, top to bottom) show the images of the single-scale decompositions of the simulated sky (cf. Fig. 2) with the scale sizes S_j annotated at the bottom of the panels. For the original images with 2048 × 2048 2′′ pixels, the panels present the Fourier amplitudes within the range of spatial frequencies from zero to one-fourth of the Nyquist frequency (0.25 arcsec^-1). For better visibility, the pixel values are somewhat limited in range; the color coding is linear.

In the text

Fig. D.2

Fourier transform of the actual star-forming regions observed by Herschel at 350 μm with a 25′′ resolution. Shown are the amplitudes of the decomposed images of Aquila (top left to right) and Rosette (bottom left to right); the scale sizes S_j are annotated at the bottom of the panels. For the original Aquila images with 4096 × 4096 3′′ pixels and Rosette images with 2048 × 2048 3′′ pixels, the panels present the Fourier amplitudes within the range of spatial frequencies from zero to one-third of the Nyquist frequency (0.17 arcsec^-1). For better visibility, the spatial frequencies and the pixel values are somewhat limited in range; the color coding is linear.

In the text

Fig. E.1

Typical power spectra of several components contained in observed Herschel images. Red circles show the power spectrum of the SPIRE 250 μm image of the Polaris cirrus cloud taken as part of the Herschel Gould Belt survey; no attempt was made here to correct the raw power spectrum for deviations of the SPIRE 250 μm beam from a Gaussian shape (cf. Miville-Deschênes et al. 2010). This cirrus background is well described by a power law P(q) ∝ q^-3 in the range of spatial frequencies 0.02 < q < 2 arcmin^-1. Blue squares show the estimated power spectrum of the instrumental noise in a typical Herschel image at 250 μm, which is flat over more than two orders of magnitude in q. The solid curve shows the power spectrum (scaled to an arbitrary level of power) of the intensity distribution of a Gaussian source with a size of 1 arcmin (FWHM). The vertical dashed line marks the spatial frequency of 0.46 arcmin^-1, at which the contrast of such a source is maximum over the background fluctuations (assumed to follow P ∝ q^-3).

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Alves, J. F., Lada, C. J., & Lada, E. A. 2001, Nature, 409, 159 [NASA ADS] [CrossRef] [PubMed] [Google Scholar]

[2] André, P., Ward-Thompson, D., & Barsony, M. 2000, Protostars and Planets IV, 59 [Google Scholar]

[3] André, P., Men’shchikov, A., Bontemps, S., et al. 2010, A&A, 518, L102 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[4] Arzoumanian, D., André, P., Didelon, P., et al. 2011, A&A, 529, L6 [Google Scholar]

[5] Bertin, E., & Arnouts, S. 1996, A&AS, 117, 393 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[6] Bertin, E., Mellier, Y., Radovich, M., et al. 2002, in Astronomical Data Analysis Software and Systems XI, ed. D. A. Bohlender, D. Durand, & T. H. Handley, ASP Conf. Ser., 281, 228 [Google Scholar]

[7] Bontemps, S., André, P., Könyves, V., et al. 2010, A&A, 518, L85 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[8] di Francesco, J., Sadavoy, S., Motte, F., et al. 2010, A&A, 518, L91 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[9] Elmegreen, B. G., & Falgarone, E. 1996, ApJ, 471, 816 [NASA ADS] [CrossRef] [Google Scholar]

[10] Falgarone, E., Phillips, T. G., & Walker, C. K. 1991, ApJ, 378, 186 [NASA ADS] [CrossRef] [Google Scholar]

[11] Gong, H., & Ostriker, E. C. 2011, ApJ, 729, 120 [NASA ADS] [CrossRef] [Google Scholar]

[12] Goodman, A. A., Barranco, J. A., Wilner, D. J., & Heyer, M. H. 1998, ApJ, 504, 223 [NASA ADS] [CrossRef] [Google Scholar]

[13] Griffin, M. J., Abergel, A., Abreu, A., et al. 2010, A&A, 518, L3 [Google Scholar]

[14] Hennemann, M., Motte, F., Bontemps, S., et al. 2010, A&A, 518, L84 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[15] Johnstone, D., Wilson, C. D., Moriarty-Schieven, G., et al. 2000, ApJ, 545, 327 [NASA ADS] [CrossRef] [Google Scholar]

[16] Kainulainen, J., Beuther, H., Henning, T., & Plume, R. 2009, A&A, 508, L35 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[17] Kennicutt, R. C., Calzetti, D., Aniano, G., et al. 2011, PASP, 123, 1347 [NASA ADS] [CrossRef] [Google Scholar]

[18] Könyves, V., André, P., Men’shchikov, A., et al. 2010, A&A, 518, L106 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[19] Lagache, G., Abergel, A., Boulanger, F., Désert, F. X., & Puget, J. 1999, A&A, 344, 322 [NASA ADS] [Google Scholar]

[20] Larson, R. B. 1981, MNRAS, 194, 809 [NASA ADS] [CrossRef] [Google Scholar]

[21] Maury, A. J., André, P., Men’shchikov, A., Könyves, V., & Bontemps, S. 2011, A&A, 535, A77 [Google Scholar]

[22] Men’shchikov, A., André, P., Didelon, P., et al. 2010, A&A, 518, L103 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[23] Mink, D. J. 2002, in Astronomical Data Analysis Software and Systems XI, ed. D. A. Bohlender, D. Durand, & T. H. Handley, ASP Conf. Ser., 281, 169 [Google Scholar]

[24] Miville-Deschênes, M.-A., Martin, P. G., Abergel, A., et al. 2010, A&A, 518, L104 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[25] Moffat, A. F. J. 1969, A&A, 3, 455 [NASA ADS] [Google Scholar]

[26] Molinari, S., Schisano, E., Faustini, F., et al. 2011, A&A, 530, A133 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[27] Motte, F., André, P., & Neri, R. 1998, A&A, 336, 150 [Google Scholar]

[28] Motte, F., André, P., Ward-Thompson, D., & Bontemps, S. 2001, A&A, 372, L41 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[29] Motte, F., Schilke, P., & Lis, D. C. 2003, ApJ, 582, 277 [NASA ADS] [CrossRef] [Google Scholar]

[30] Motte, F., Bontemps, S., Schilke, P., et al. 2007, A&A, 476, 1243 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[31] Motte, F., Zavagno, A., Bontemps, S., et al. 2010, A&A, 518, L77 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[32] Myers, P. C. 1983, ApJ, 270, 105 [NASA ADS] [CrossRef] [Google Scholar]

[33] Pence, W. 1999, in Astronomical Data Analysis Software and Systems VIII, ed. D. M. Mehringer, R. L. Plante, & D. A. Roberts, ASP Conf. Ser., 172, 487 [Google Scholar]

[34] Pilbratt, G. L., Riedinger, J. R., Passvogel, T., et al. 2010, A&A, 518, L1 [CrossRef] [EDP Sciences] [Google Scholar]

[35] Poglitsch, A., Waelkens, C., Geis, N., et al. 2010, A&A, 518, L2 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[36] Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. 1992, Numerical recipes in FORTRAN, The art of scientific computing [Google Scholar]

[37] Rosolowsky, E. W., Pineda, J. E., Kauffmann, J., & Goodman, A. A. 2008, ApJ, 679, 1338 [NASA ADS] [CrossRef] [Google Scholar]

[38] Roy, A., Ade, P. A. R., Bock, J. J., et al. 2010, ApJ, 708, 1611 [NASA ADS] [CrossRef] [Google Scholar]

[39] Schneider, S., & Elmegreen, B. G. 1979, ApJS, 41, 87 [NASA ADS] [CrossRef] [Google Scholar]

[40] Schneider, N., Motte, F., Bontemps, S., et al. 2010, A&A, 518, L83 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[41] Smith, A. R. 1979, in SIGGRAPH’79: Proc. of the 6th annual conference on Computer graphics and interactive techniques (New York: ACM), 276 [Google Scholar]

[42] Stutzki, J., & Guesten, R. 1990, ApJ, 356, 513 [NASA ADS] [CrossRef] [Google Scholar]

[43] Starck, J.-L., & Murtagh, F. 2006, Astronomical Image and Data Analysis, ed. J.-L. Starck, & F. Murtagh [Google Scholar]

[44] Williams, J. P., de Geus, E. J., & Blitz, L. 1994, ApJ, 428, 693 [NASA ADS] [CrossRef] [Google Scholar]

[45] Williams, J. P., Blitz, L., & McKee, C. F. 2000, Protostars and Planets IV, 97 [Google Scholar]