Open Access
Issue
A&A
Volume 654, October 2021
Article Number A78
Number of page(s) 26
Section Numerical methods and codes
DOI https://doi.org/10.1051/0004-6361/202141533
Published online 15 October 2021

© A. Men’shchikov 2021

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction

Extraction methods are critically important research tools, interfacing the astronomical imaging observations with their analyses and physical interpretations. Many different methods have been applied in various studies of star formation in the recent decades to extract sources and filaments and derive their physical properties. The launch of the Herschel Space Observatory stimulated the development of a number of new source-extraction methods, for example, cutex (Molinari et al. 2011), getsources (Men’shchikov et al. 2012), csar (Kirk et al. 2013), and fellwalker (Berry 2015). Ubiquitous filamentary structures observed with Herschel prompted the creation of several filament-extraction methods, for example, disperse (Sousbie 2011), getfilaments (Men’shchikov 2013), a Hessian matrix-based method (Schisano et al. 2014), rht (Clark et al. 2014), filfinder (Koch & Rosolowsky 2015), and tm (Juvela 2016). Most of the methods provide solutions to the problem of detecting sources or filaments, whereas a complete extraction entails their accurate measurements, for which knowledge of their backgrounds is necessary. However, the backgrounds of sources and filaments embedded in the complex, filamentary molecular clouds that strongly fluctuate on all spatial scales are highly uncertain, which induces increasingly larger measurement errors for fainter structures.

The methods employ very different approaches, and it is quite reasonable to expect the qualities of their results obtained for the same observed image to be dissimilar. Experience shows that various methods perform differently on increasingly complex images, although they tend to show more comparable results when tested on the simplest images. It seems unlikely that various independent tools would provide the same or consistent results in terms of detection completeness, number of false positive (spurious) detections, and measurement accuracy. The various uncalibrated tools applied in different studies have the potential to bring about contradictory results and wrong conclusions and to create serious long-term confusion in our understanding of the observed astrophysical reality.

It is highly important to benchmark the extraction methods before their astrophysical applications. Although new extraction methods are usually validated before publication on either observed or simulated images, the test images are different for each method, have dissimilar components and complexity levels, and are not always available for independent evaluations and future comparisons with other tools. The validation images, used to test older methods at the time of their publication, are unlikely to resemble the higher complexity observed with new telescopes that have improved angular resolution, sensitivity, and dynamic range. Before the use of the older tools for such improved generations of images, their performance must be reevaluated and compared with other methods on newer images that resemble the new observations. New methods must also be tested on the same set of images to demonstrate their advantages over the older methods.

Comparisons of extraction methods using observed images cannot be conclusive. Only the proper benchmarks would be able to reveal the true qualities and capabilities of the extraction tools. In this paper, the term “benchmark” refers to a standard multiwavelength set of simulated images with fully known properties of all their components, resembling a certain type of observed image in their components and complexity. To benchmark extraction methods means to run them on the simulated images without any knowledge of the model parameters, as if such images were the true observed images. Subsequent comparisons of the resulting extraction catalogs with the truth catalogs using a reasonable set of quality estimators would determine their detection and measurement qualities, inaccuracies, and biases. It would be highly desirable, if various studies used the extraction tool that shows the best performance in benchmarks, to exclude any discrepancies caused by different methods. Notwithstanding that such an approach is sometimes practiced within research consortia, it does not solve the problem entirely, because the results and conclusions derived for the same images by independent groups with completely different tools would still likely be incompatible.

Systematic benchmarking of different extraction methods to guide researchers in their selection of the most appropriate tool for their star-formation studies are hard to find in the literature. A quantitative benchmarking of eight source-extraction methods, referred to by Men’shchikov et al. (2012), was instrumental in the selection of the best tool to apply for the Herschel Gould Belt Survey (HGBS, André et al. 2010) and Herschel Imaging Survey of OB Young Stellar Objects (HOBYS, Motte et al. 2010), but that work remains unpublished. It would not make sense to publish the old results now, because some of the methods have been improved over the years, while the others have become outdated and are not used for the modern, complex images. Any publication of benchmarking results for a selection of extraction tools might quickly lose its value, because it cannot include any improved and newly developed methods. In this work, a completely different approach was taken.

A recent publication (Men’shchikov 2021, hereafter referred to as Paper I) presented a multicomponent, multiwavelength benchmark resembling the images observed by Herschel in star-forming regions. The benchmark images contain a realistic filamentary cloud and hundreds of starless and protostellar cores computed by radiative transfer modeling. Fully known properties of all components allow conclusive comparisons of different methods by evaluating their extraction completeness, reliability, and goodness, along with the detection and measurement accuracies. The benchmark images, together with the truth catalogs, are made publicly available and proposed as the standard benchmark for existing and future extraction methods.

Besides the benchmark, Paper I presented getsf, the multiscale, multiwavelength source- and filament-extraction method1, replacing the older getsources, getfilaments, and getimages algorithms (Men’shchikov et al. 2012; Men’shchikov 2013; Men’shchikov 2017); throughout this paper, the three predecessors of getsf are collectively named getold. The new method handles both sources and filaments consistently, separating the structural components from each other and from their backgrounds, thereby facilitating their extraction problem. The method produces flattened detection images with uniform levels of the residual background and noise fluctuations, which allows the use of global thresholds for detecting the structures. Independent information contained in the multiwaveband images is combined in the detection images, preserving the higher angular resolutions. Properties of the detected sources and filaments are measured in their background-subtracted images and cataloged.

This paper presents benchmarking results for source and filament extraction with getsf and for source extraction with getold, using the new benchmark from Paper I and the old benchmark from Men’shchikov et al. (2012). Instead of describing benchmarking results for an arbitrary selection of existing source-extraction tools, this paper provides researchers in star formation with an extraction quality evaluation system and the source-extraction results obtained with getsf and getold for five variants of the benchmarks with increasing complexity levels. Such an approach enables researchers to benchmark any number of source-extraction tools of their choice and evaluate improved or newly developed methods in the future. It is not unusual that researchers prefer to conduct their own benchmarking and analysis, which often is more convincing.

Extraction of filaments is more problematic than extraction of sources. Filaments are observed as the two-dimensional projections that are really hard to decipher and relate to their complex three-dimensional structure. Their appearance, identification, and measurements depend on the spatial scales of interest (cf. Sect. 3.4.5 in Paper I) and they usually contain sources that are either formed within the filaments or appear on them in projection. They are often heavily curved and blended, but no filament deblending algorithm is available, and their physically meaningful lengths and masses are hard to determine. Setting aside the difficult problems to the further dedicated studies, this paper presents the benchmark filament extraction with getsf. No such results are presented for getold, because this method was unable to reconstruct the filament with any acceptable level of accuracy.

Section 2 summarizes all properties of the old and new multiwavelength benchmarks. Section 3 introduces a system of quantities for evaluating performances of source-extraction methods. Section 4 presents the benchmarking results for several variants of the benchmark. Section 6 concludes this work.

Following Paper I, images are represented by the capital calligraphic characters (e.g., 𝒜, ℬ, 𝒞) and software names and numerical methods are typeset slanted (e.g., getsf) to distinguish them from other emphasized words. The curly brackets are used to collectively refer to either of the characters, separated by vertical lines. For example, {a|b} refers to a or b and {A|B}{a|b}c expands to A{a|b}c or B{a|b}c, as well as to Aac, Abc, Bac, or Bbc.

2. Benchmarks for extraction methods

The benchmark from Men’shchikov et al. (2012) (Benchmark A) includes a relatively simple background and many blended sources, whereas the benchmark from Paper I (Benchmark B) features a more complex, strongly fluctuating, filamentary background, but it does not have blended sources. All structural components were added to each other, without any attempt to account for the physical picture that the star-forming cores are the integral parts of the filaments that, in turn, are the integral parts of the molecular clouds. This is unnecessary for a benchmark, because the existing extraction tools do not discriminate between the embedded structures and chance projections of the structural components along the line of sight.

2.1. Benchmark A

The multicomponent, multiwavelength benchmark, described by Men’shchikov et al. (2012), was constructed in 2009, before the launch of Herschel, at slightly nonstandard wavelengths (λ of 75, 110, 170, 250, 350, and 500 μm). The images on a 1800 × 1800 grid of 2″ pixels cover 1° ×1° or 2.4 pc at a distance D = 140 pc. They include three independent structural components: the background ℬλ, sources 𝒮λ, and small-scale instrumental noise 𝒩λ.

The backgrounds ℬλ were computed from a synthetic scale-free image 𝒟B. The image was scaled at each wavelength to the typical intensities of molecular clouds in the nearby star-forming regions, adopting a planar image of dust temperatures decreasing from 20 to 15 K between the upper-left and lower-right corners, with a constant value of 17.5 K along the other diagonal.

The component 𝒮λ of sources was computed from the radiative transfer models of starless cores and protostellar cores with a range of masses from 0.01 to 6 M and half-maximum sizes from ∼0.001 to 0.1 pc. The individual model images of 360 starless and 107 protostellar cores were distributed quasi-randomly, preferentially in the brighter areas of the background ℬλ, allowing them to overlap without any restrictions. A broken power-law function with the slopes dN/dlog10M of 0.3 for M ≤ 0.08 M, −0.3 for M ≤ 0.5 M, and −1.3 for M > 0.5 M was used to determine the numbers of models per mass bin δlog10M ≈ 0.1.

The final benchmark images ℐλ were obtained by adding different realizations of the random Gaussian noise 𝒩λ at 75, 110, 170, 250, 350, and 500 μm and convolving them to the slightly nonstandard Herschel resolutions of Oλ of 5, 7, 11, 17, 24, and 35″. In this paper, the set of benchmark images is extended with an additional image ℐƛ ≡ 𝒟11″ of surface density at a high angular resolution OH = 11″, derived from ℐλ at 170 − 500 μm using the algorithm hires described in Sect. 3.1.2 of Paper I.

2.2. Benchmark B

The multicomponent, multiwavelength benchmark from Paper I is based on images of a simulated star-forming region at a distance D = 140 pc. The images in all Herschel wavebands (λ of 70, 100, 160, 250, 350, and 500 μm) on a 2690 × 2690 grid of 2″ pixels cover 1.5° ×1.5° or 3.7 pc. They include emission of four independent structural components: the background cloud ℬλ, long filament ℱλ, round sources 𝒮λ, and small-scale instrumental noise 𝒩λ. A sum of the first two components 𝒞λ represents the emission of the filamentary background.

The benchmark images were computed from the adopted surface densities and dust temperatures of the structural components (Figs. 2–4 of Paper I). The background cloud 𝒟B from Benchmark A was scaled to produce the surface densities NH2 from 1.5 × 1021 to 4.8 × 1022 cm−2 and fluctuation levels differing by two orders of magnitude in its diffuse and dense areas. The spiral filament 𝒟F has a crest density of N0 = 1023 cm−2, a full width of W = 0.1 pc (150″) at half-maximum (FWHM), and a power-law profile NH2(θ)∝θ−3 at large distances θ from the crest. The filament is self-touching, because the two sides of the tightly curved spiral touch each other (Fig. 13), but the filament is not self-blending: there is no additive mutual contribution of the two sides. This allows the benchmark filament to have unaltered radial profiles on both sides, to test the extraction methods’ ability to reproduce the profiles without any filament deblending algorithm. The filament mass MF = 3.04 × 103 M and length LF = 10.5 pc correspond to the linear density ΛF = 290 M pc−1.

The resulting surface densities 𝒟C = 𝒟B + 𝒟F of the filamentary cloud are in the range of 1.7 × 1021 to 1.4 × 1023 cm−2. The dust temperatures 𝒯C have values from 15 K in the densest central areas of the filamentary cloud to 20 K in its diffuse outer parts. The surface densities 𝒟C and temperatures 𝒯C were used to compute the cloud images 𝒞λ in all Herschel wavebands, assuming optically thin dust emission.

The component 𝒟S of sources was computed from radiative transfer models of starless cores and protostellar cores, very similar to those in Benchmark A, in a wide range of masses (from 0.05 to 2 M) and half-maximum sizes (from ∼0.001 to 0.1 pc). Individual surface density images of the models of 828 starless and 91 protostellar cores were distributed in the dense areas (NH2 ≥ 5 × 1021 cm−2) of the filamentary cloud 𝒟C. They were added quasi-randomly, without overlapping, at positions, where their peak density exceeded that of the cloud NH2 value. A power-law function with a slope dN/dlog10M of −0.7 was used to define the numbers of models per mass bin δlog10M ≈ 0.1.

This resulted in the surface densities 𝒟S, the intensities 𝒮λ of sources, and the emission 𝒞λ + 𝒮λ of the simulated star-forming region. The complete benchmark images ℐλ were obtained by adding different realizations of the random Gaussian noise 𝒩λ at 70, 100, 160, 250, 350, and 500 μm and convolving the images to the Herschel angular resolutions Oλ of 8.4, 9.4, 13.5, 18.2, 24.9, and 36.3″, respectively. The set of benchmark images is extended with an additional image ℐƛ ≡ 𝒟13″ of surface density at a high angular resolution OH = 13.5″ derived from ℐλ at 160 − 500 μm using the algorithm hires described in Sect. 3.1.2 of Paper I.

3. Quality evaluation system for source extractions

For comparisons of different source-extraction methods using benchmarks, it is necessary to define several quantities that would evaluate an extraction quality by comparing the positions of detected sources and their measured properties with the true values. Such a formalism was developed by the author in collaboration with Ph. André a decade ago (2010, unpublished) and used to compare getsources with seven other methods (listed in Sect. 1.1 of Men’shchikov et al. 2012). That quality evaluation system has been slightly improved and is now described below and applied to assess performances of getsf and getold in the benchmark extractions. Source extraction methods can be quantitatively compared with each other, using the definitions below and the truth catalogs of the benchmarks.

It is convenient to denote NT the true number of sources in a benchmark, NDλ the number of detected sources (acceptable at wavelength λ) whose peak coordinates match those of the model sources from the truth catalog, NGλ the number of sources among NDλ that have good measurements, and NSλ the number of spurious sources, that is the number of sources in NDλ that do not have any positional match in the truth catalog. A measurement is considered as good, if the measured quantities (fluxes, sizes) are within a factor of 21/2 from its true model value; otherwise, the measurement is regarded as bad and the corresponding number of bad sources is NBλ = NDλ − NGλ.

In the multiwavelength extraction catalogs, sources can be prominent in one waveband and completely undetectable or not measurable in another one. In the above definitions, a source n is deemed acceptable at wavelength λ, if

(1)

where Ξλn is the source detection significance, Γλn is the source goodness, Ωλn and Ψλn are the signal-to-noise ratios (S/Ns) related to the peak intensity FPλn and total flux FTλn, respectively (cf. Eqs. (41) and (42) of Paper I), {A|B}λn are the source FWHM sizes, and AFλn is the major diameter of the source footprint. The last inequality discards the sources with unrealistically small ratios AFλn/Aλn of their footprint and half-maximum sizes. The empirical set of conditions in Eq. (1) ensures that the selected subset of sources is reliable (not contaminated by significant numbers of spurious sources) and that selected sources have acceptably accurate measurements.

With the above definitions of NT and N{D|G|S}λ, it makes sense to define the source extraction completeness Cλ, reliability Rλ, and goodness Gλ as

(2)

where Rλ has been updated with respect to the original version of the system, where it was defined as 1/NSλ. The newly defined reliability is the Moffat (Plummer) profile (cf. Eq. (2) in Paper I), with Θ = 0.05 NDλ and ζ = 1/2. It has a Gaussian-like peak at NSλ = 0, slowly descends to 0.5 when 5% of NDλ are spurious sources, and decreases as 1/NSλ for NSλ ≫ 0.05NDλ.

It is useful to compute the ratios of the measured quantities to their true model values, for each acceptable source, and evaluate their mean values among NGλ sources with good measurements:

(3)

where 𝜚Eλ evaluates the accuracy of the source area. The mean ratios with their standard deviations σ{P|T|A|B}λ can be used to define the qualities of the measured source parameters as

(4)

Denoting δDλ ≡ ⟨Dλn⟩ the mean distance of the well-measurable sources from the true model peaks and σDλ the corresponding standard deviation, the positional quality is defined as

(5)

It is convenient to define the detection quality QCRλ and the measurement quality QPTEλ combining the qualities related to the independent source detection and measurement steps, as well as the overall quality Qλ of a source extraction,

(6)

The quantities defined by Eqs. (2)–(6) have values in the range [0, 1] that become unity for an imaginary perfect extraction tool that would extract all simulated sources and measure their parameters with no deviations from the true model values. The absolute values of the quantities are arbitrary and meaningless for a single extraction with a single method. The values become quite useful, however, when comparing the relative extraction qualities of two or more methods or of several extractions with a single method (with different parameters).

The quality evaluation system represented by Eqs. (2)–(6) is not unique and other formalisms might be devised and applied to the benchmark truth catalogs and the getsf and getold extraction catalogs found on the benchmarking page of the getsf website2.

4. Benchmarking

The benchmark names with subscripts are used to indicate the number of the structural components. For example, Benchmark A3 contains three components (background, sources, and noise) and Benchmark B4 has four components (background, filament, sources, and noise). There are also three simpler variants of the benchmarks: B3 has no filament, A2 and B2 have no background. Below, the source extractions in {A,B}2, {A,B}3, and B4 are presented in a sequence of their increasing complexity and followed by the filament extraction in B4.

The simplest benchmarks {A,B}2 (Figs. 1 and 2) contain only two components, the model cores and noise. Most sources are clearly visible against the noise, and therefore they must be uncomplicated to detect for a variety of extraction methods. The model sources have a wide range of the FWHM sizes from the angular resolution Oλ up to Aλn ≈ 200″; therefore the methods that limit the largest sizes of extractable sources by only a few beams are expected to miss many larger sources. The resolved models and real objects have non-Gaussian intensity distributions; therefore, the methods, assuming that all sources have Gaussian shapes, are expected to produce less accurate measurements.

thumbnail Fig. 1.

Benchmark A2 extraction of sources with getsf and getold. The original ℐλ are overlaid with the footprint ellipses from the measurement step. In the getold extraction (bottom), the nonexistent large-scale background was determined and subtracted in a preliminary run of getimages, in order to keep the general extraction scheme unaltered for all benchmarks. The images are displayed with a square-root color mapping.

thumbnail Fig. 2.

Benchmark B2 extraction of sources with getsf and getold. The original ℐλ are overlaid with the footprint ellipses from the measurement step. In the getold extraction (bottom), the nonexistent large-scale background was determined and subtracted in a preliminary run of getimages, in order to keep the general extraction scheme unaltered for all benchmarks. The images are displayed with a square-root color mapping.

The benchmark variants {A,B}3 (Figs. 3 and 4) contain three components (background, sources, and noise), adding fluctuating backgrounds to the sources and uniform noise of {A,B}2. The background fluctuations in A3 are similar in both diffuse and dense areas, whereas in B3 they progressively increase in the denser areas. In the presence of the background clouds, more of the sources are expected to remain undetected and possibly more spurious sources to become cataloged. Extraction methods may perform well in A3 with its relatively simple background, but some of them would experience greater problems in B3. The benchmarks could present serious problems to those extraction tools that are not designed to handle complex backgrounds.

thumbnail Fig. 3.

Benchmark A3 extraction of sources with getsf and getold. The original ℐλ are overlaid with the footprint ellipses from the measurement step. In the getold extraction (bottom), the large-scale background was determined and subtracted by getimages. The images are displayed with a square-root color mapping.

thumbnail Fig. 4.

Benchmark B3 extraction of sources with getsf and getold. The original ℐλ are overlaid with the footprint ellipses from the measurement step. In the getold extraction (bottom), the large-scale background was determined and subtracted by getimages. The images are displayed with a square-root color mapping.

The most complex variant B4 (Fig. 5) contains 4 components (background, filament, sources, and noise), adding the dense spiral filament to the structural components of B3. The filamentary background of the sources becomes much denser and it acquires markedly different anisotropic properties (e.g., along the filament crest and in the orthogonal directions), in addition to the strong and nonuniform background fluctuations of B3. Better resembling the complexity of the interstellar clouds revealed by the Herschel observations, it further complicates the source extraction problem. Among all benchmark variants, the largest numbers of model sources are expected to vanish in the filamentary background cloud of B4.

thumbnail Fig. 5.

Benchmark B4 extraction of sources with getsf and getold. The original ℐλ are overlaid with the footprint ellipses from the measurement step. In the getold extraction (bottom), the large-scale background was determined and subtracted by getimages. The images are displayed with a square-root color mapping.

In extractions with getsf and getold, it is necessary to determine the structures of interest to extract and specify their maximum sizes for each waveband ({X|Y}λ in Paper I and in Men’shchikov et al. 2012). In all benchmark extractions, this user-definable parameter was assigned the same values for both methods. The maximum sizes Xλ and for sources were 16, 25, 30, 150, 150, and 150″ for the Herschel wavebands and 150″ for the surface density image. In Benchmark B, the maximum size Yλ for filaments was 350″ for all images. The getold extractions followed an improved scheme (Men’shchikov 2017): all benchmark images were first processed by getimages (using the above maximum sizes) that subtracted their large-scale backgrounds and flattened residual background and noise fluctuations. The background-subtracted and flattened images were then used in the getsources extractions.

4.1. Source extractions in Benchmarks A and B

In the standard approach to the multiwavelength benchmarking adopted in this paper, sources are detected in the wavelength-independent images combined from all (seven) wavelengths. Effects of different combinations of images for source detection on extraction qualities are discussed in Sect. 4.1.4.

In the analysis of the extractions, all acceptable sources from the catalogs were positionally matched with the truth catalogs using stilts (Taylor 2006). The matching radius was essentially a quadratic mean of the angular resolution Oλ and the true FWHM size of the model core,

(7)

The extracted sources with positions within the circles were considered the matches to the true model cores. Only those of them with errors in measurements within a factor of 21/2 were evaluated in Tables C.1 and C.2 according to the system outlined in Sect. 3. For plotting the ratios of the measured and true parameters (cf. Figs. 610) the sources with measurement errors within a factor of 10 were used.

thumbnail Fig. 6.

Benchmark A2 extraction with getsf (three top rows) and getold (three bottom rows). Ratios of the measured fluxes FTλn, peak intensities FPλn, and sizes {A, B}λn to their true values (F/FT, I/IT, A/AT, and B/BT) are shown as a function of the S/N ratio Ωλn. The size ratios A/AT and B/BT are also shown as a function of the true sizes {A, B}λnT. The mean 𝜚{P|T|A|B}λ and standard deviation σ{P|T|A|B}λ of the ratios are displayed in the panels. Similar plots for λ ≤ 110 μm with only bright protostellar cores are not presented, because their measurements are quite accurate, with 𝜚{P|T|A|B}λ ≈ {0.999|0.998|0.999|0.999} and σ{P|T|A|B}λ ≈ {0.002|0.006|0.00004|0.00004}.

thumbnail Fig. 7.

Benchmark B2 extraction with getsf (three top rows) and getold (three bottom rows). Ratios of the measured fluxes FTλn, peak intensities FPλn, and sizes {A, B}λn to their true values (F/FT, I/IT, A/AT, and B/BT) are shown as a function of the S/N ratio Ωλn. The size ratios A/AT and B/BT are also shown as a function of the true sizes {A, B}λnT. The mean 𝜚{P|T|A|B}λ and standard deviation σ{P|T|A|B}λ of the ratios are displayed in the panels. Similar plots for λ ≤ 100 μm with only bright protostellar cores are not presented, because their measurements are quite accurate, with 𝜚{P|T|A|B}λ ≈ {0.999|0.998|1.005|0.996} and σ{P|T|A|B}λ ≈ {0.001|0.005|0.01|0.007}.

thumbnail Fig. 8.

Benchmark A3 extraction with getsf (three top rows) and getold (three bottom rows). Ratios of the measured fluxes FTλn, peak intensities FPλn, and sizes {A, B}λn to their true values (F/FT, I/IT, A/AT, and B/BT) are shown as a function of the S/N ratio Ωλn. The size ratios A/AT and B/BT are also shown as a function of the true sizes {A, B}λnT. The mean 𝜚{P|T|A|B}λ and standard deviation σ{P|T|A|B}λ of the ratios are displayed in the panels. Similar plots for λ ≤ 110 μm with only bright protostellar cores are not presented, because their measurements are quite accurate, with 𝜚{P|T|A|B}λ ≈ {0.999|0.998|1.001|0.999} and σ{P|T|A|B}λ ≈ {0.002|0.006|0.00004|0.00004}.

thumbnail Fig. 9.

Benchmark B3 extraction with getsf (three top rows) and getold (three bottom rows). Ratios of the measured fluxes FTλn, peak intensities FPλn, and sizes {A, B}λn to their true values (F/FT, I/IT, A/AT, and B/BT) are shown as a function of the S/N ratio Ωλn. The size ratios A/AT and B/BT are also shown as a function of the true sizes {A, B}λnT. The mean 𝜚{P|T|A|B}λ and standard deviation σ{P|T|A|B}λ of the ratios are displayed in the panels. Similar plots for λ ≤ 100 μm with only bright protostellar cores are not presented, because their measurements are quite accurate, with 𝜚{P|T|A|B}λ ≈ {1.000|0.999|1.000|1.000} and σ{P|T|A|B}λ ≈ {0.0004|0.001|0.0002|0.0002}.

thumbnail Fig. 10.

Benchmark B4 extraction with getsf (three top rows) and getold (three bottom rows). Ratios of the measured fluxes FTλn, peak intensities FPλn, and sizes {A, B}λn to their true values (F/FT, I/IT, A/AT, and B/BT) are shown as a function of the S/N ratio Ωλn. The size ratios A/AT and B/BT are also shown as a function of the true sizes {A, B}λnT. The mean 𝜚{P|T|A|B}λ and standard deviation σ{P|T|A|B}λ of the ratios are displayed in the panels. Similar plots for λ ≤ 100 μm with only bright protostellar cores are not presented, because their measurements are quite accurate, with 𝜚{P|T|A|B}λ ≈ {1.000|1.000|1.000|1.000} and σ{P|T|A|B}λ ≈ {0.0004|0.0006|0.0002|0.0002}.

4.1.1. Qualitative comparisons

Figures 15 visualize the source extraction results by means of three benchmark images overlaid with the footprints ellipses {A, B, ω}Fλn of all acceptably good sources n, selected by Eq. (1). The two short-wavelength images at λ < 160 μm are not shown, because only the strong unresolved peaks of protostellar cores appear there and their extraction would be uncomplicated for most methods. The 250 and 350 μm images are not presented either, because of their similarity to the three images displayed. The starless cores are best visible against their background in the surface densities 𝒟{11|13}″ that expose the dense sources more clearly than the intensities do. In {A,B}3 and B4, the images are dissimilar for getsf and getold, because getold subtracts their large-scale backgrounds in a preliminary run of getimages.

Figures 15 illustrate the difficulties created by the starless cores. Such cores are totally invisible or very faint at the higher resolutions of λ < 160 μm and blended with the other sources (A2 and A3) or backgrounds ({A,B}3 and B4) at the lower resolutions of the longer wavelengths. The starless cores with temperatures TD ≲ 10 K produce little emission at λ ≲ 100 μm; therefore, only the protostellar cores are extractable at the short wavelengths. Although the starless sources appear stronger at λ > 170 μm, progressively lower resolutions spread their emission over larger footprints, which makes interpolation of the fluctuating background less accurate.

The strongly overlapping sources in the crowded areas of Benchmark A become more heavily blended with each other and with their background, which makes their deblending less accurate. The backgrounds and true extents of the footprints of such blended sources are difficult, if not impossible, to determine reliably. Their footprints often become overestimated and the backgrounds underestimated, which leads to excessively large measured fluxes. For other sources that are largely isolated, the measurements of fluxes, sizes, and positions are usually more accurate. Increasing numbers of overlapping sources at the lower resolutions of longer wavelengths degrade the quality of their backgrounds further, because much more distant source-free pixels are to be used in the background interpolation.

Figures 1 and 3 demonstrate several difficult cases, when one or more narrow sources appear on top of the much wider, well-resolved starless source, referred to as a sub-structured source. If the narrow source is located close to the peak of the wide source, it is practically impossible for an automated extraction method to distinguish these two sources. Depending on the intensity distributions, the wide source may be regarded as the background of the narrow source, remaining not extracted, or it may be considered as belonging to the power-law outskirts of the narrow source. More often, such narrow sources are located off-peak of the wide source, hence they can be detected as separate sources. In both cases, however, the benchmarks reveal that their measurements are inaccurate, because of the incorrectly determined individual backgrounds of each source and an approximate nature of their deblending. Backgrounds of sources are highly uncertain (cf. Appendix A) and it is not surprising that they are even less accurate for the blended sources.

An inspection of Figs. 15 reveals several spurious sources, those that do not exist in the benchmarks. The spurious detections are partially or completely discarded from the final catalogs during measurements by the acceptability criteria in Eq. (1). Some spurious sources are found on the well-resolved starless sources, whose large-scale intensity peak enhances the small-scale background and noise fluctuations, making them appear as real sources. When a source extraction aims at the highest possible completeness, at finding the faintest sources, it is normal that some peaks, produced by the background and noise fluctuations, are mistakenly identified as genuine sources. A good source extraction method must, however, guarantee that the number NSλ of spurious sources in the final catalog remains below a few percent of the number of real sources NDλ. For some studies, it may be beneficial to require that a valid source must be detected and acceptable in at least two wavebands. This strategy potentially removes most of the spurious sources, together with some real sources, unfortunately. It is better not to apply such a condition when benchmarking source extraction methods, because practical applications often require extractions in a single image.

In the surface density images 𝒟{11|13}″ in Figs. 1 and 2, the footprints of several unresolved peaks of quite extended protostellar cores appear too small. They correspond to just the unresolved central peaks and not to the entire large cores with their power-law profiles. The same sources have large sizes of their extended footprints at {160|170} and 500 μm and in the benchmarks with background (Figs. 35). This abnormality is caused by the derivation algorithm of the images 𝒟{11|13}″, which employs fitting of the spectral shapes Πλ of the pixels. The surface densities are known to be quite inaccurate in the pixels with strong temperature gradients along the lines of sight (e.g., Appendix A of Paper I). Such fitting problems lead to the overestimated temperatures and underestimated surface densities around the unresolved peaks. The resulting strong depressions (local minima) around the peaks of several protostellar cores in 𝒟{11|13}″ prevent the extraction methods from finding the correct footprint sizes. This happens only in the simplest benchmarks {A,B}2 with just two components (sources and noise), because the bright emission of the background and filament dilutes the temperature effect along the lines of sight within the cores.

4.1.2. Measurement accuracies

Figures 610 display the measurement accuracies of the peak intensity FPλn, integrated flux FTλn, and sizes {A, B}λn for each acceptable source n, represented by the ratios of their measured and true values, as functions of their S/N ratios Ωλn and true FWHM sizes AλnT. The accuracy plots are not shown for λ < 160 μm, because only the bright protostellar cores are extractable in those images and their measurements are quite precise, with errors well below 1%. The measurement results in the derived 𝒟{11|13}″ are not shown either, because they are known to be inaccurate (e.g., Appendix A of Paper I). Some of the starless cores become measurable at {160|170} μm as faint sources with Ωλn ≲ 10, the values well below the S/N of the bright protostellar cores. The faintness of the starless cores with respect to the background and noise fluctuations makes measurements of some of them inaccurate, with a large spread of errors in total fluxes, exceeding a factor of 21/2. Toward the longer wavelengths (250 − 500 μm), the starless cores become brighter, whereas the protostellar cores become fainter, making their Ωλn ranges overlap for the two populations of sources.

Figures 610 reveal that getold systematically underestimates the FWHM sizes {A, B}λn of sources by ∼20%. The problem is most clearly visible for the well-resolved sources, because for the slightly resolved or unresolved sources getold adjusts the underestimated values {A, B}λn < Oλ by setting them to the angular resolution Oλ. The main reason for the systematic deficiency is the size estimation algorithm that uses the source intensity moments, which can only be accurate for the Gaussian sources. In most practical applications, however, there are no Gaussian-shaped sources, for several reasons. Firstly, the point-spread functions (PSFs, beams) of telescopes are often non-Gaussian in their lower parts, which affects the shapes of mostly the unresolved sources. Secondly, the radiative transfer models of the starless and protostellar cores (Sect. 2) suggest that the real physical cores produce non-Gaussian intensity profiles, thereby affecting the shapes of mostly the resolved sources. Finally, the backgrounds of sources in bright fluctuating molecular clouds cannot be determined accurately, hence non-negligible over- or under-subtraction of the background of even the Gaussian sources would create non-Gaussian shapes, in both resolved and unresolved cases. Background of extracted sources is often overestimated, hence the intensity moments of the background-subtracted source would underestimate {A, B}λn. For the protostellar cores that have power-law intensity profiles at large radii, the intensity moments algorithm leads to strongly overestimated half-maximum sizes and for the starless cores with flat-topped shapes, the intensity moments could significantly underestimate the half-maximum sizes (cf. Sect. 3.4.6 of Paper I).

Figures 610 demonstrate that getsf does not have such systematic problems with the FWHM sizes {A, B}λn of sources. This is because getsf evaluates them directly at the half-maximum intensity (Sect. 3.4.6 of Paper I), unlike getold that employs the source intensity moments. Direct measurements are much less affected by the background inaccuracies, but over-subtracted backgrounds of the (almost) unresolved sources could also lead to unrealistically small {A, B}λn < Oλ and underestimated fluxes FPλn and FTλn. The sizes and fluxes of such unresolved or slightly resolved sources are rectified by getsf using the correction factors (Appendix B) derived for an unresolved Gaussian source, assuming that it is the background over-subtraction that makes the source have the sub-resolution sizes {A, B}λn < Oλ. The Gaussian model is used to obtain the correction factors, not the measurements themselves. Unfortunately, similar corrections cannot be derived for the well-resolved sources, nor for the sources with underestimated backgrounds and overestimated sizes and fluxes.

Figures 610 show a behavior that is qualitatively similar for the different variants of Benchmarks A and B. There is an expected general trend that the numbers of acceptable sources in the accuracy plots become lower for the backgrounds with increasing complexity, in the sequence from {A,B}2 to {A,B}3 and to B4. This is caused by the much stronger variations in the immediate surroundings of the sources, especially those located on the densest parts of the background cloud, which strongly reduce the S/N ratio Ωλn of the extracted sources. As a result, some of those sources that were acceptable in the simpler variants of the benchmarks, are pushed off the acceptability domain by their lower values Ωλn < 2. In all benchmarks, the measurement errors significantly increase for the faint sources, because their estimated individual backgrounds become more strongly affected by the fluctuations of the filamentary cloud and noise. The resulting over- or underestimation of the backgrounds depends on whether the sources happen to be located on the hollow- or hill-like fluctuation, correspondingly, as well as on the other types of background inaccuracies (cf. Appendices A and B).

4.1.3. Extraction qualities

Figure 11 presents an overview of the extraction qualities of getold and getsf, displaying QCRλ, QPTEλ, and Qλ from Tables C.1 and C.2. The first two qualities conveniently evaluate the extraction methods at their independent detection and measurement steps, whereas the third one combines the two in the overall extraction quality. To facilitate their analysis, the plots display also the global qualities QCR, QPTE, and Q, the geometric mean values over the wavelengths, for each benchmark. All features of the plots in Fig. 11 can be readily understood by comparisons of the tabulated qualities (Tables C.1 and C.2).

thumbnail Fig. 11.

Overview of the benchmarking results for the source extractions with getold and getsf (Tables C.1 and C.2). The top panels show the detection quality QCRλ, the measurement quality QPTEλ, and the overall quality Qλ from Eq. (6), represented by vertical bars for each wavelength (3 PACS and 3 SPIRE bands, from left to right), with an exception of the fictitious ƛ of the surface density 𝒟{11|13}″. The global qualities of the methods, QCR, QPTE, and Q, defined as the geometric means over the wavelengths, are represented by the wide gray bars. The bottom panels help visualize the ratios of the qualities for the getsf and getold extractions for each benchmark and wavelength.

The source-detection quality QCRλ is the product of the extraction completeness Cλ and reliability Rλ. As expected, the global detection quality QCR of both methods decreases from A2 to B4, toward the more complex benchmarks (Fig. 11), demonstrating better results for getsf in all benchmarks, except A3. In A3, getsf has a 13% lower quality, because of several spurious (very noisy) sources extracted at 110 and 170 μm with very low significance levels, within just a few percent above the cleaning threshold ϖλSj = 5σλSj (Sect. 3.4.2 of Paper I). At {70|75} and {100|110} μm, QCRλ shows lower values, because only the protostellar cores are detectable, whereas at {160|170} μm, some of the starless cores appear as faint detectable sources, hence the quality gets higher. For some benchmarks, QCRλ becomes significantly lower, which usually indicates that more spurious sources were extracted, hence the lower reliability Rλ.

The measurement quality QPTEλ is a product of the respective qualities QPλ, QTλ, and QEλ of peak intensity, integrated flux, and source area. The global measurement quality QPTE for getsf is better by 20% than that for getold, across all benchmarks (Fig. 11). At {70|75} and {100|110} μm, QPTEλ is within 2% of unity, because the protostellar cores are bright, hence they can be accurately measured (cf. Figs. 610). The faint starless cores at {160|170} μm are poorly measurable; therefore, QPTEλ becomes lower. The getsf measurement quality is substantially higher at the SPIRE wavelengths, partly because getold systematically underestimates source sizes (Sect. 4.1.2).

The overall quality Qλ is the product of QCRλ and QPTEλ, as well as of the positional quality QDλ and goodness Gλ. In line with the expectations, Qλ for both methods decreases toward the more complex benchmarks (Fig. 11). For the simpler Benchmark A, getsf has a small 10% edge over getold, whereas for Benchmark B, the getsf quality reaches the values higher by a factor of two. The quality evaluation system (Sect. 3) encapsulates all aspects of source extraction; therefore, the plots in Fig. 11, based on Tables C.1 and C.2, justify the conclusion that the getsf is superior to getold in both Benchmarks A and B.

4.1.4. Dependence on the images used for detection

In the present multiwavelength benchmarking, all seven images were combined in the wavelength-independent images for detecting sources (Sect. 3.4.3 of Paper I). To some extent, however, source extraction results must depend on the images used for source detection. Both getsf and getold combine images and detect sources with almost the same algorithms; therefore, getsf alone may be used to evaluate the dependence of the extraction qualities on the subsets of images. Only the realistic benchmark variants with backgrounds ({A,B}3 and B4) may be used in these tests to keep the amounts of results within reasonable limits.

Figure 12 presents an overview of the overall quality Qλ and its global counterpart Q for source extractions with getsf in A3, B3, and B4 using 6 subsets of images combined for detection. The full set of seven images (PDS) was discussed above (Sects. 4.1.14.1.3) and is shown again for completeness. The subset of six images (PS) tests the case when the surface density image 𝒟{11|13}″ is not used. The subset of four images (P3S) examines the absence of two PACS images (at {70|75} and {100|110} μm). The subset of three images (S) clarifies the effects of the source detection with only the SPIRE images. The three single-image subsets (S1, S3, and D) explore the source extractions with the 250 μm image, the 500 μm image, and the surface density image 𝒟{11|13}″, respectively. The subsets with only the PACS images are not considered, because no starless cores appear in the images at λ < 160 μm.

thumbnail Fig. 12.

Overview of the dependence of the getsf extraction qualities (Tables C.3C.5) on various subsets of wavelengths (images) used in detecting sources (cf. Sects. 3.4.3 and 3.4.4 of Paper I) in benchmarks A3, B3, and B4. The overall qualities Qλ from Eq. (6) are represented by the green vertical bars for each wavelength (3 PACS and 3 SPIRE bands), with an exception of the fictitious ƛ of the surface density 𝒟{11|13}″. The results for the subsets are annotated on the horizontal axis with the number of images used for detection. In each panel, the getsf extractions are sorted in the order of their decreasing global quality Q, defined as the geometric mean over the wavelengths and represented by the wide gray bars.

The results (Fig. 12) for the seven different cases are sorted from left to right in the order of decreasing global quality Q. In all three benchmarks, the best extraction quality is found in the subset D, when the surface density 𝒟{11|13}″ is the single image used to detect sources. It is obvious from the original images (e.g., Figs. 35) that the surface density image must be beneficial for source extractions, because the sources are visible there most clearly. However, this result suggests that the high-resolution 𝒟{11|13}″ may also be used alone to detect sources, with better results than in a combination with the Herschel images. In the benchmarks {A,B}3, the second best global quality Q is shown by the complete set (PDS), when all seven images are used to detect sources. In B4, however, the extraction quality with this subset of images is only the fourth, which was caused by a few more spurious sources extracted at 70 and 160 μm. The spurious peaks are clearly identifiable with the background and noise fluctuations in those images that happened to be slightly brighter than the cleaning threshold ϖλSj = 5σλSj (Sect. 3.4.2 of Paper I). Without the spurious sources, the PDS set would have the second best Q value in all three benchmarks.

When the subset S of only the three SPIRE images is used for source detection, the global quality Q becomes the fourth, the third, and the second best in the benchmarks A3, B3, and B4, correspondingly (Fig. 12). An addition of the PACS 160 μm image to the SPIRE images in P3S leads to the fifth, the fourth, and the third best Q values among all subsets, always just below the global quality for the subset S. The slightly lower (by 5 − 10%) qualities in P3S can be traced to the chance extraction of a few spurious sources at the longest PACS wavelength. Using the subset S1 with the single 250 μm image for source detection makes the global quality in A3 the third best, whereas in B3 and B4 it becomes only the sixth and fifth, respectively. The absence of the high-resolution 𝒟{11|13}″ in the subset PS of the six Herschel images makes the extraction one of the two worst ones. However, the differences between the Q values outside the top three best-quality results is very small, at the levels of a few percent. An exception is the worst extraction for S3, whose quality is well below all others in A3 and B4, because of the lowest angular resolution of the detection image.

Formally taking all the benchmarking results, it is possible to rank the getsf source extraction qualities by summing up their places in the three benchmarks shown in Fig. 12. The two best subsets of the Herschel images to be used for source detection are D (𝒟{11|13}″) and PDS (𝒟{11|13}″ together with all PACS and SPIRE images), and the next good subset is S (SPIRE images at 250 − 500 μm). The three worst subsets appear to be S1 (250 μm image), PS (all Herschel images), and S3 (500 μm image). It must be emphasized that the actual choices in real-life applications depend on the research interests. For example, if the goal is to study the protostellar cores, then the shortest PACS wavelength, where they are the brightest and with the highest resolution, is the best choice for their detection. However, if the aim is to study the starless cores that are the strongest at the SPIRE wavelengths, then the high-resolution surface density 𝒟{11|13}″ (possibly together with the 250 − 500 μm images) is likely the best choice for the source detection with getsf. This is an important decision to make when preparing for source extractions.

4.2. Filament extraction in Benchmark B4

Filaments are separated from both backgrounds ℬλY and sources 𝒮λ and detected as skeletons in their own flattened, wavelength-combined component ℱDjC (Sects. 3.2–3.4 in Paper I). The separation allows the filament crests to be traced more precisely, reducing the interference from the sources that could significantly affect the results. In the standard approach to the multiwavelength benchmarking adopted in this paper, the filament detection image is combined from six wavelengths, excluding the 70 and 100 μm images, because the simulated filament is very faint and noisy at those wavelength.

Figure 13 shows the skeletons and footprints of eight filaments detected in B4 at the significance level ξ = 4 (Sect. 3.4.5 in Paper I). All but one of them are spurious, the short fluctuations of background that happened to be elongated and slightly denser than the filament detection threshold ϖλFj = 2σλFj (Sect. 3.4.2 in Paper I). The rate of spurious filaments can be reduced using a higher value of the skeleton significance ξ when detecting filaments. Spurious filaments usually have their lengths shorter than their widths, hence they can be discarded from further analysis after a visual inspection. This is not done by getsf automatically, because also the real filaments are often split into relatively short segments by the sources, other intersecting filaments, or background fluctuations.

thumbnail Fig. 13.

Benchmark B4 extraction of filaments with getsf. The component of filaments ℱƛY is shown in the middle panel for the surface density 𝒟13″, overlaid with the skeleton 𝒦k4 (detection significance ξ = 4) obtained from the combined ℱDjC on the scales Sk ≈ 153″, corresponding to the adopted value Yƛ = 150″. The left panel displays the skeletons and the footprints Υn, with the pixel values equal to the skeleton (filament) number. The right panel shows the detected skeleton of the spiral filament, together with its one-sided normals. The first pixel of the skeleton is at the center, hence the left normals point to the center (red) and the right normals point outward (orange). Square-root color mapping, except the footprints with linear mapping.

The sides of a filament are known to getsf as left (α) or right (β) with respect to the path from the first pixel of the skeleton to its last pixel. The first pixel of the spiral skeleton is in the center, hence the left normals to the skeleton are pointing inside the loops and the right normals are pointing outward (Fig. 13). Although the one-sided footprints and normals touch each other, which indicates overlapping of the two sides, the model filament is not affected by self-blending (Sect. 2.2), hence the orthogonal profiles of each loop must follow the true model profile, unaltered by the blending that would complicate the measurements of the observed filaments. The central loops of the filament are blended, however, with the dense background cloud (Fig. 13), which makes the separated background of the filament less accurate, underestimated in the central area (Fig. 8 in Paper I). The outermost loop of the spiral filament has a more accurate background and is filament-free along the right normals. Therefore, the right-sided measurements of the filament along the outermost loop may be expected to produce more accurate results than those over the inner parts of the spiral filament that have a contribution from the strongly fluctuating background.

Figure 14 presents the filament radial profiles D{α|β}(r) along the skeleton normals, median-averaged over the filament length. The standard deviations ς{α|β(r) about the median profiles are computed separately for the positive and negative differences. The true filament profiles correspond to the model surface densities (Eq. (2) in Paper I) and display practically no differences between the filament sides. The slopes γ{α|β}(r) accurately represent the true model values, increasing from γ(r)≈1 at the half-maximum radius of 0.05 pc to γ(r)≈3 (at much larger distances r ≳ 0.3 pc). For the filament extracted with getsf in B4, the radial profiles obtained from the entire filament are less accurate, with significantly larger dispersions (Fig. 14). This is caused by the underestimated fluctuating background in the central area (Fig. 8 in Paper I) that in effect makes a substantial contribution to the background-subtracted filament ℱƛY. For a comparison, the profiles D{α|β}(r) obtained over only the outer filament loop, where its background is more accurate (Fig. 14), much better reproduce the true model surface density distribution, with much smaller dispersions of their values along the segment.

thumbnail Fig. 14.

Benchmark B4 extraction of filaments with getsf. Plotted are the one-sided (left and right) profiles D{α|β}(r) of the filament surface densities ℱƛY, their slopes γ{α|β}(r), multiplied by 1021 for convenience, the reference line with a slope of −1, and the Gaussian beam with the half-maximum width OH = 13.5″. The profiles represent the median densities along the skeleton and the vertical bars are their positive and negative deviations ς{α|β(r) about the median profile. The left panel shows the true profiles of the simulated filament 𝒟F across its crest. The middle panel presents the profiles, measured in the ℱƛY component of 𝒟13″ along the entire detected skeleton length (Fig. 13). The right panel displays the more accurate profiles, measured across the outermost skeleton loop, where the filament is least affected by the inaccuracies of its background.

From the entire filament component ℱƛY, getsf integrated the one-sided masses M{α|β} = {3.48|3.78}×103M, overestimated by {14|24}% with respect to the true model value (Sect. 2.2). The one-sided linear densities Λ{α|β} = {343|306} M pc−1 (from Eq. (49) of Paper I) are overestimated by {18|6}% with respect to the true model value (Sect. 2.2). The discrepancies are caused by the residual contribution of the incompletely subtracted background, underestimated by up to ∼50% in the center of the filamentary cloud (Fig. 8 in Paper I). The filament footprint covers the entire cloud (Fig. 13), which makes it especially difficult to separate the filament from its blended background.

Benchmark B4 provides a good test for filament extraction methods. The source extraction in B4 with getold, described in Sect. 4.1, executed also getfilaments (Paper II), an integral part of getsources. Although the method passed simpler filament extraction tests (Sect. 3 in Paper II), getfilaments was unable to properly reconstruct the filament in B4. The crest values of the filament were underestimated by a factor of ∼5 in the central area of the dense background cloud, whereas the values were either correct or overestimated within ∼40% in some segments of the outermost loop of the filament. Even though the filament one-sided widths were determined fairly accurately (within 10 − 20%), the distant fainter areas of the filament profile (beyond a radius of 0.1 pc) were completely missing. Therefore, the mass and linear density of the filament were also strongly underestimated (by factors of ∼3).

The dense spiral filament in B4 represents just the simplest benchmark. The filament crest must not create problems for any skeletonization algorithm; its detection is not the main goal of this benchmark. The simulated filament was created primarily to test the accuracy of various methods in measuring the filament profile and physical properties. Observed filaments have various masses, densities, lengths, widths, curvatures, and S/N. The filaments imaged with Herschel are embedded in strongly fluctuating backgrounds and arranged in complex networks with hundreds of interconnected segments. A proper benchmarking would require the simulated images that resemble the observations, as well as a quality evaluation system, similar to that used in this paper for testing the source-extraction methods. Realistic and rigorous benchmarking of filament-extraction methods are the subject of a future work.

5. Discussion

Astronomical images are known to be very dissimilar across the electromagnetic spectrum (e.g., Figs. 16–23 in Paper I). Therefore, the source- or filament-extraction methods, developed for different research areas and types of observed images, have heterogeneous properties and qualities. Benchmarking of the extraction tools must also depend on the research project, and the simulated images must resemble the complexity and structural components of the typical observed images. For example, if the sources of interest are all unresolved and there is no strong fluctuating background in the observed images, then the benchmark images must also contain just the unresolved sources (with a similar spatial distribution) and faint background. In this simple case, it may well be that a simple source-extraction tool employing a PSF-fitting algorithm could give more accurate results than a more general method designed to work for both unresolved and resolved sources on complex filamentary backgrounds.

The benchmarks described and applied in this study were designed to resemble the mid- to far-infrared (submm) imaging observations obtained with Herschel for the nearby star-forming regions. The simulated images contain a bright, fluctuating filamentary background cloud and starless and protostellar cores with a wide range of sizes, from unresolved to strongly resolved. By construction, these benchmark images are most suitable for testing the source- and filament-extraction methods to be applied in the studies of star formation. If observed images are significantly different, the benchmarks explored in this paper may not be directly applicable for testing extraction methods. For example, the substantial differences between the ALMA interferometric images of distant star-forming regions (e.g., Fig. 23 in Paper I) and the Herschel images of the nearby star-forming clouds required creation of dedicated benchmarks with unresolved sources and background from MHD simulations (Pouteau et al., in prep.). To make the benchmark images better resemble the real interferometric observations, they were also processed with the ALMA observations simulator.

For testing the source-extraction methods, it is the model sources that are the most important (primary) component of the benchmarks, and it must resemble the sources in real observations as closely as possible. Similarly, for testing the filament-extraction methods, it is the model filaments that are the main component, with all their parameters tabulated in a truth catalog. The other components of the benchmark images (e.g., fluctuating background, instrumental noise) just complicate the extraction of the primary component. They may be scaled up or down to create variants of the same benchmark with diverse contributions of the secondary components. For example, this paper employed several benchmark variants ({A|B}2, {A|B}3, and B4) of different complexity, expanding the applicability of the two benchmarks to other types of images of the nearby star-forming regions.

Benchmarking source- or filament-extraction methods, it is important to make ensure that the simulated images contain realistic enough models of the sources or filaments that are expected to be extracted in the real-life observations. This may be a potential problem, because that requires an advance knowledge of the physical reality being observed. In practice, there usually exists a good deal of previous studies that would allow the creation of the suitable primary and secondary components of the benchmarks. However, if an application of the extraction tools to the observed images shows the component properties that are significantly different from the ones simulated for the benchmarks, the latter may need to be adjusted and the testing of the methods to be repeated.

6. Conclusions

This paper described detailed benchmarking of two multiwavelength source and filament extraction methods, getsf and getold, to quantitatively evaluate their performance in Benchmarks A and B. In total, the two methods of source extraction were tested and compared using five variants of the simulated multiwavelength images of different complexity. Although the benchmarks were designed to resemble the Herschel observations of star-forming regions, the images are suitable for evaluating extraction methods for various astronomical projects and applications.

Benchmark B includes the complex fluctuating background cloud, the long dense filament, and the multitude of sources (starless and protostellar cores) with wide ranges of sizes, masses, and intensity profiles, computed with a radiative transfer code. In Benchmark A with similar properties of the structural components (no filaments), the sources are allowed to arbitrarily overlap with each other. The benchmarks enable conclusive comparisons between different methods and allow a quantitative comparison of their qualities, using the formalism given in this paper, in terms of the extraction completeness, reliability, and goodness, as well as the detection and measurement accuracies and the overall quality. All benchmark images, the truth catalogs containing the model parameters, and the reference extraction catalogs produced by the author are available for download on the getsf website3.

The quantitative analysis of the benchmark source extractions showed that the getsf method has superior qualities in comparison with getold. The benchmark filament extraction with getsf recovered parameters of the model filament, in contrast to the extraction with getold that was unable to properly reconstruct the filament to an acceptable accuracy. An investigation of the dependence of the source extraction results on different sets of images used to detect sources suggested that the best choice for source detection with getsf is the high-resolution surface density, either alone or together with other Herschel images. The worst choice for source detection would be the lowest-resolution observed images.

The benchmarks explored in this paper are proposed as the standard benchmarks for calibrating existing and future source and filament extraction methods before any astrophysical applications of the methods. It is critically important to use only the best calibrated tools with known properties that are fully understood on the basis of the standard benchmarking. Applications of various uncalibrated extraction tools with unknown qualities that have never been quantitatively compared, could lead to a proliferation of incompatible results and severe long-term problems in understanding of the astrophysical reality.


Acknowledgments

This study used the cfitsio library (Pence 1999), developed at HEASARC NASA (USA), saoimage ds9 (Joye & Mandel 2003) and wcstools (Mink 2002), developed at the Smithsonian Astrophysical Observatory (USA), and the stilts software (Taylor 2006), developed at Bristol University (UK). The plot utility and ps12d library, used in this work to draw figures directly in the PostScript language, were written by the author using the psplot library (by Kevin E. Kohler), developed at Nova Southeastern University Oceanographic Center (USA), and the plotting subroutines from the MHD code azeus (Ramsey et al. 2012), developed by David Clarke and the author at Saint Mary’s University (Canada). HGBS and HOBYS are the Herschel Key Projects jointly carried out by SPIRE Specialist Astronomy Group 3 (SAG3), scientists of several institutes in the PACS Consortium (e.g., CEA Saclay, INAF-IAPS Rome, LAM/OAMP Marseille), and scientists of the Herschel Science Center (HSC).

References

  1. André, P., Men’shchikov, A., Bontemps, S., et al. 2010, A&A, 518, L102 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  2. Berry, D. S. 2015, Astron. Comput., 10, 22 [NASA ADS] [CrossRef] [Google Scholar]
  3. Clark, S. E., Peek, J. E. G., & Putman, M. E. 2014, ApJ, 789, 82 [Google Scholar]
  4. Joye, W. A., & Mandel, E. 2003, in Astronomical Data Analysis Software and Systems XII, eds. H. E. Payne, R. I. Jedrzejewski, & R. N. Hook, ASP Conf. Ser., 295, 489 [Google Scholar]
  5. Juvela, M. 2016, A&A, 593, A58 [Google Scholar]
  6. Kirk, J. M., Ward-Thompson, D., Palmeirim, P., et al. 2013, MNRAS, 432, 1424 [NASA ADS] [CrossRef] [Google Scholar]
  7. Koch, E. W., & Rosolowsky, E. W. 2015, MNRAS, 452, 3435 [NASA ADS] [CrossRef] [Google Scholar]
  8. Men’shchikov, A. 2013, A&A, 560, A63 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  9. Men’shchikov, A. 2017, A&A, 607, A64 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  10. Men’shchikov, A. 2021, A&A, 649, A89 (Paper I) [EDP Sciences] [Google Scholar]
  11. Men’shchikov, A., André, P., Didelon, P., et al. 2012, A&A, 542, A81 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  12. Mink, D. J. 2002, in Astronomical Data Analysis Software and Systems XI, eds. D. A. Bohlender, D. Durand, & T. H. Handley, ASP Conf. Ser., 281, 169 [Google Scholar]
  13. Molinari, S., Schisano, E., Faustini, F., et al. 2011, A&A, 530, A133 [Google Scholar]
  14. Motte, F., Zavagno, A., Bontemps, S., et al. 2010, A&A, 518, L77 [Google Scholar]
  15. Pence, W. 1999, in Astronomical Data Analysis Software and Systems VIII, eds. D. M. Mehringer, R. L. Plante, & D. A. Roberts, ASP Conf. Ser., 172, 487 [Google Scholar]
  16. Ramsey, J. P., Clarke, D. A., & Men’shchikov, A. B. 2012, ApJS, 199, 13 [NASA ADS] [CrossRef] [Google Scholar]
  17. Schisano, E., Rygl, K. L. J., Molinari, S., et al. 2014, ApJ, 791, 27 [NASA ADS] [CrossRef] [Google Scholar]
  18. Sousbie, T. 2011, MNRAS, 414, 350 [NASA ADS] [CrossRef] [Google Scholar]
  19. Taylor, M. B. 2006, in Astronomical Data Analysis Software and Systems XV, eds. C. Gabriel, C. Arviset, D. Ponz, & S. Enrique, ASP Conf. Ser., 351, 666 [Google Scholar]

Appendix A: Fluctuating backgrounds and the measurement accuracy for faint sources

Exact shapes of the molecular clouds under faint sources are practically impossible to separate from the observed emission peaks with any acceptable accuracy. The observed backgrounds of sources fluctuate on all spatial scales. Instrumental noise further complicates the source backgrounds by adding random fluctuations on scales of the angular resolution Oλ. The background and noise fluctuations are totally blended with the sources and no source extraction method is able to precisely deblend the components. This makes the measured sizes and fluxes of faint sources uncertain, often significantly over- or underestimated, depending on the unknown shapes of the fluctuations within the source footprints. Naturally, the background inaccuracies become relatively less important for increasingly stronger sources.

Figure A.1 illustrates the problem using a simple Gaussian source 𝒢 of a FWHM size of 10″ and several differently shaped backgrounds (flat and hill- or hollow-like). To simplify the matters, the source may be considered as unresolved, although the extended sources are also affected by the same problem. In the simplest (unrealistic) case, the source could be observed against constant background (ℬ1 = 1). The fluctuating backgrounds were modeled by adding the positive or negative 15″ (FWHM) Gaussians with peak values of 0.5 and 0.9 to the flat background. For simplicity, the source position is assumed to be aligned with the background extrema, which is sufficient to illustrate the roots of the problem.

thumbnail Fig. A.1.

Fluctuating backgrounds and measurements of faint sources. The model Gaussian source 𝒢 with a size of 10″ (FWHM) and a peak intensity of 1 corresponds to the flat background ℬ1 = 1 (red lines). The same source 𝒢 is also added to the nonuniform (hill- and hollow-like) backgrounds ℬ1.9, ℬ1.5, ℬ0.5, and ℬ0.1 (blue and green lines). The fluctuating backgrounds were obtained from the flat background ℬ1 by adding or subtracting the Gaussians with a size of 15″ (FWHM) and the peak values of 0.5 and 0.9. Extraction methods would not be able to recognize that the real backgrounds are hill- or hollow-like, hence they would instead subtract the flat backgrounds, based on the intensities just outside the apparent source footprints. Therefore, the source 𝒢 would be extracted with over- or underestimated FWHM sizes A, peak intensities FP, and total fluxes FT (the middle, left, and right columns of numbers, respectively).

The flat background would normally present no difficulties for accurate source measurements. However, the strongly fluctuating backgrounds pose severe problems for source extraction methods. Measurements of the faint sources could be quite different from the true values, depending on the sign and magnitude of the background fluctuation within their footprints (Fig. A.1). When the source is blended with a hill-like background, its shape remains very similar to a Gaussian source and contains no information that the background is not flat. The source footprint usually widens and the hill-like background contributes to the overestimated width and fluxes of the source. On the other hand, when the source is blended with a hollow-like background, its apparent footprint shrinks to the area limited by the intensity minimum that appears around the peak. As a consequence, the sizes and fluxes of such sources become underestimated, sometimes quite strongly (Fig. A.1).

It is clear that the backgrounds of sources in the benchmark simulations and real observations are much more complex than the above simple model. However, the model illustrates the fundamental reasons behind the increasingly larger inaccuracies for the sources with low S/N in Benchmarks A and B (Figs. 610). In general, measurement accuracy for such sources is impossible to improve, because the necessary information is practically lost, when the source peak is blended with the background and noise fluctuations. Fortunately, the unresolved or slightly resolved sources are the exception, for which it is possible to (approximately) correct the underestimated sizes and fluxes.

Appendix B: Corrections for the measurements of unresolved or slightly resolved sources

The PSFs (the telescope beams) set a natural lower limit to the source sizes {A, B}λn, their values must be larger or at least equal to the angular resolution Oλ. However, the benchmarking discussed in this paper has revealed numerous examples of sources with sizes {A, B}λn < Oλ and underestimated peak intensities FPλn and integrated fluxes FTλn. An analysis of the results showed that the underestimated parameters are related to the overestimated backgrounds of the faint sources. When the FWHM sizes are directly measured at half-maximum intensities, like in getsf (cf. Sect. 3.4.6 of Paper I), the measurements can be improved, as shown below. However, such corrections are not feasible for getold, because the sizes obtained with intensity moments often correspond to uncertain levels, significantly deviating from the half-maximum intensity.

Figure B.1 illustrates the Gaussian model, adopted by getsf to correct the underestimated sizes and fluxes of faint sources, when their measured sizes are smaller than the beam size. The model assumes that the unresolved or slightly resolved faint sources have Gaussian shapes, which is an appropriate assumption, because most telescopes have Gaussian beams in their central (upper) parts. Various deviations and artifacts that often appear in the PSFs at larger angular distances from their peak are invisible for the faint sources. The model also supposes that the actual source background is flat, which is the only reasonable assumption that could be made. Although there are many possible shapes of the fluctuating background within a source footprint, they cannot be accurately recovered from the blended source intensity distribution.

thumbnail Fig. B.1.

Approximate corrections for unresolved or slightly resolved sources with overestimated backgrounds. The model Gaussian source 𝒢 with a size of 10″ (FWHM) and a peak intensity of 1 is superposed on a flat background with a constant intensity of ℬ0.9 = 0.6 (upper red lines). The increasingly overestimated backgrounds ℬ0.7, ℬ0.8, ℬ0.9, ℬ1.0, and ℬ1.1 are shown with the dotted lines of different color. When such backgrounds are subtracted (lower colored curves), the source FWHM sizes A, peak intensities FP, and total fluxes FT become increasingly underestimated. The measured A values (given in the plot) get progressively smaller than the angular resolution of 10″, which clearly indicates an increasing inaccuracy of the background. Requiring that a source cannot be narrower than the telescope beam, it is possible to improve the measurements, substantially reducing their errors.

With the above two assumptions, Fig. B.1 demonstrates how the measured properties of a Gaussian source 𝒢 would be affected by the increasingly overestimated backgrounds ℬ0.7, ℬ0.8, ℬ0.9, ℬ1.0, and ℬ1.1. In the simplest case of an intrinsically flat background, the background could be progressively overestimated for stronger instrumental noise, which would effectively represent a fluctuating background of the source. For the blended sources or those in crowded areas, absence of the source-free pixels in their immediate environments is often the reason for the background to be overestimated. Whatever the actual cause, an over-subtraction of the increasingly inaccurate background leads to the progressively underestimated FWHM sizes, peak intensities, and total fluxes. For the Gaussian 𝒢 in Fig. B.1, it is possible to determine the correction factors that would recover the true properties of the source from their underestimated values.

The multiplicative correction factors fSλn, fPλn, and fTλn for the sizes {A, B}λn, peak intensity FPλn, and total flux FTλn, respectively, are obtained empirically by approximating the results for the Gaussian model (Fig. B.1) using different overestimated backgrounds,

(B.1)

where the factors differ from unity only when (AλnBλn)1/2 < Oλ. They are applied when creating the final catalog at the end of the measurement iterations. The factors are implemented in getsf, hence the benchmark extraction catalogs discussed in this paper contain improved measurements for the faint unresolved or slightly resolved sources.

By their definition, the factors from Eq. (B.1) provide precise results for only Gaussian sources on flat backgrounds. In most cases, however, the real backgrounds of sources have more complex shapes, in which case the formulas from Eq. (B.1) provide less accurate corrections to the measured quantities. Despite being approximate, the corrections are nevertheless very useful, because they significantly improve the measurements. For example, the hollow-like backgrounds ℬ0.5 and ℬ0.1 of a Gaussian source 𝒢 from Fig. A.1 lead to substantially overestimated derived backgrounds. The corrections fSλn, fPλn, and fTλn, obtained for the 𝒢 + ℬ0.5 model improve the measurements by the factors of 1.18, 1.28, and 2.21, whereas the improvement factors for the 𝒢 + ℬ0.1 model are 1.42, 1.69, and 4.42, respectively.

Appendix C: Tabulated qualities of source extractions in Benchmarks A and B

Tables C.1 and C.2 collect all qualities of the source extractions with getsf and getold in the benchmark variants {A,B}2, {A,B}3, and B4, discussed in Sects. 4.1.14.1.3. Tables C.3C.5 present the qualities of source extractions with getsf in the benchmarks {A,B}3 and B4 for different subsets of images used to detect sources, discussed in Sect. 4.1.4.

Table C.1.

Benchmarks {A,B}2 and {A,B}3 with getsf and getold in the standard approach, with all seven images combined for source detection. The extraction qualities, defined in Eqs. (2)–(6), are evaluated for only acceptable sources, cf. Eq. (1), with errors in measurements within a factor of 21/2. The numbers of model sources are NT = 459 in Benchmark A and NT = 919 in Benchmark B. Source measurements in the image of derived surface densities are known to be inaccurate (e.g., Appendix A of Paper I), hence the data are not presented.

Table C.2.

Benchmark B4 with getsf and getold in the standard approach, with all seven images combined for source detection. The extraction qualities, defined in Eqs. (2)–(6), are evaluated for only acceptable sources, cf. Eq. (1), with errors in measurements within a factor of 21/2. The number of model sources is NT = 919. Source measurements in the image of derived surface densities are known to be inaccurate (e.g., Appendix A of Paper I), hence the data are not presented.

Table C.3.

Benchmark A3 with getsf using different subsets of images for the combination over wavelengths and detection. The qualities, defined in Eqs. (2)–(6), are evaluated for only acceptable sources, cf. Eq. (1), with errors in measurements within a factor of 21/2. The number of model sources NT = 459. Source measurements in the image of derived surface densities 𝒟11″ (at a fictitious wavelength ƛ = 175 μm) are known to be inaccurate (e.g., Appendix A of Paper I), hence the data are not presented. The extractions are sorted, from top to bottom, by their global qualities Q of 0.117, 0.102, 0.092, 0.083, 0.082, 0.078, and 0.022.

Table C.4.

Benchmark B3 with getsf using different subsets of images for the combination over wavelengths and detection. The qualities, defined in Eqs. (2)–(6), are evaluated for only acceptable sources, cf. Eq. (1), with errors in measurements within a factor of 21/2. The number of model sources NT = 919. Source measurements in the image of derived surface densities 𝒟13″ (at a fictitious wavelength ƛ = 165 μm) are known to be inaccurate (e.g., Appendix A of Paper I), hence the data are not presented. The extractions are sorted, from top to bottom, by their global qualities Q of 0.042, 0.030, 0.030, 0.028, 0.027, 0.027, 0.025.

Table C.5.

Benchmark B4 with getsf using different subsets of images for the combination over wavelengths and detection. The qualities, defined in Eqs. (2)–(6), are evaluated for only acceptable sources, cf. Eq. (1), with errors in measurements within a factor of 21/2. The number of model sources NT = 919. Source measurements in the image of derived surface densities 𝒟13″ (at a fictitious wavelength ƛ = 165 μm) are known to be inaccurate (e.g., Appendix A of Paper I), hence the data are not presented. The extractions are sorted, from top to bottom, by their global qualities Q of 0.011, 0.0096, 0.0089, 0.0084, 0.0080, 0.0078, 0.0055.

All Tables

Table C.1.

Benchmarks {A,B}2 and {A,B}3 with getsf and getold in the standard approach, with all seven images combined for source detection. The extraction qualities, defined in Eqs. (2)–(6), are evaluated for only acceptable sources, cf. Eq. (1), with errors in measurements within a factor of 21/2. The numbers of model sources are NT = 459 in Benchmark A and NT = 919 in Benchmark B. Source measurements in the image of derived surface densities are known to be inaccurate (e.g., Appendix A of Paper I), hence the data are not presented.

Table C.2.

Benchmark B4 with getsf and getold in the standard approach, with all seven images combined for source detection. The extraction qualities, defined in Eqs. (2)–(6), are evaluated for only acceptable sources, cf. Eq. (1), with errors in measurements within a factor of 21/2. The number of model sources is NT = 919. Source measurements in the image of derived surface densities are known to be inaccurate (e.g., Appendix A of Paper I), hence the data are not presented.

Table C.3.

Benchmark A3 with getsf using different subsets of images for the combination over wavelengths and detection. The qualities, defined in Eqs. (2)–(6), are evaluated for only acceptable sources, cf. Eq. (1), with errors in measurements within a factor of 21/2. The number of model sources NT = 459. Source measurements in the image of derived surface densities 𝒟11″ (at a fictitious wavelength ƛ = 175 μm) are known to be inaccurate (e.g., Appendix A of Paper I), hence the data are not presented. The extractions are sorted, from top to bottom, by their global qualities Q of 0.117, 0.102, 0.092, 0.083, 0.082, 0.078, and 0.022.

Table C.4.

Benchmark B3 with getsf using different subsets of images for the combination over wavelengths and detection. The qualities, defined in Eqs. (2)–(6), are evaluated for only acceptable sources, cf. Eq. (1), with errors in measurements within a factor of 21/2. The number of model sources NT = 919. Source measurements in the image of derived surface densities 𝒟13″ (at a fictitious wavelength ƛ = 165 μm) are known to be inaccurate (e.g., Appendix A of Paper I), hence the data are not presented. The extractions are sorted, from top to bottom, by their global qualities Q of 0.042, 0.030, 0.030, 0.028, 0.027, 0.027, 0.025.

Table C.5.

Benchmark B4 with getsf using different subsets of images for the combination over wavelengths and detection. The qualities, defined in Eqs. (2)–(6), are evaluated for only acceptable sources, cf. Eq. (1), with errors in measurements within a factor of 21/2. The number of model sources NT = 919. Source measurements in the image of derived surface densities 𝒟13″ (at a fictitious wavelength ƛ = 165 μm) are known to be inaccurate (e.g., Appendix A of Paper I), hence the data are not presented. The extractions are sorted, from top to bottom, by their global qualities Q of 0.011, 0.0096, 0.0089, 0.0084, 0.0080, 0.0078, 0.0055.

All Figures

thumbnail Fig. 1.

Benchmark A2 extraction of sources with getsf and getold. The original ℐλ are overlaid with the footprint ellipses from the measurement step. In the getold extraction (bottom), the nonexistent large-scale background was determined and subtracted in a preliminary run of getimages, in order to keep the general extraction scheme unaltered for all benchmarks. The images are displayed with a square-root color mapping.

In the text
thumbnail Fig. 2.

Benchmark B2 extraction of sources with getsf and getold. The original ℐλ are overlaid with the footprint ellipses from the measurement step. In the getold extraction (bottom), the nonexistent large-scale background was determined and subtracted in a preliminary run of getimages, in order to keep the general extraction scheme unaltered for all benchmarks. The images are displayed with a square-root color mapping.

In the text
thumbnail Fig. 3.

Benchmark A3 extraction of sources with getsf and getold. The original ℐλ are overlaid with the footprint ellipses from the measurement step. In the getold extraction (bottom), the large-scale background was determined and subtracted by getimages. The images are displayed with a square-root color mapping.

In the text
thumbnail Fig. 4.

Benchmark B3 extraction of sources with getsf and getold. The original ℐλ are overlaid with the footprint ellipses from the measurement step. In the getold extraction (bottom), the large-scale background was determined and subtracted by getimages. The images are displayed with a square-root color mapping.

In the text
thumbnail Fig. 5.

Benchmark B4 extraction of sources with getsf and getold. The original ℐλ are overlaid with the footprint ellipses from the measurement step. In the getold extraction (bottom), the large-scale background was determined and subtracted by getimages. The images are displayed with a square-root color mapping.

In the text
thumbnail Fig. 6.

Benchmark A2 extraction with getsf (three top rows) and getold (three bottom rows). Ratios of the measured fluxes FTλn, peak intensities FPλn, and sizes {A, B}λn to their true values (F/FT, I/IT, A/AT, and B/BT) are shown as a function of the S/N ratio Ωλn. The size ratios A/AT and B/BT are also shown as a function of the true sizes {A, B}λnT. The mean 𝜚{P|T|A|B}λ and standard deviation σ{P|T|A|B}λ of the ratios are displayed in the panels. Similar plots for λ ≤ 110 μm with only bright protostellar cores are not presented, because their measurements are quite accurate, with 𝜚{P|T|A|B}λ ≈ {0.999|0.998|0.999|0.999} and σ{P|T|A|B}λ ≈ {0.002|0.006|0.00004|0.00004}.

In the text
thumbnail Fig. 7.

Benchmark B2 extraction with getsf (three top rows) and getold (three bottom rows). Ratios of the measured fluxes FTλn, peak intensities FPλn, and sizes {A, B}λn to their true values (F/FT, I/IT, A/AT, and B/BT) are shown as a function of the S/N ratio Ωλn. The size ratios A/AT and B/BT are also shown as a function of the true sizes {A, B}λnT. The mean 𝜚{P|T|A|B}λ and standard deviation σ{P|T|A|B}λ of the ratios are displayed in the panels. Similar plots for λ ≤ 100 μm with only bright protostellar cores are not presented, because their measurements are quite accurate, with 𝜚{P|T|A|B}λ ≈ {0.999|0.998|1.005|0.996} and σ{P|T|A|B}λ ≈ {0.001|0.005|0.01|0.007}.

In the text
thumbnail Fig. 8.

Benchmark A3 extraction with getsf (three top rows) and getold (three bottom rows). Ratios of the measured fluxes FTλn, peak intensities FPλn, and sizes {A, B}λn to their true values (F/FT, I/IT, A/AT, and B/BT) are shown as a function of the S/N ratio Ωλn. The size ratios A/AT and B/BT are also shown as a function of the true sizes {A, B}λnT. The mean 𝜚{P|T|A|B}λ and standard deviation σ{P|T|A|B}λ of the ratios are displayed in the panels. Similar plots for λ ≤ 110 μm with only bright protostellar cores are not presented, because their measurements are quite accurate, with 𝜚{P|T|A|B}λ ≈ {0.999|0.998|1.001|0.999} and σ{P|T|A|B}λ ≈ {0.002|0.006|0.00004|0.00004}.

In the text
thumbnail Fig. 9.

Benchmark B3 extraction with getsf (three top rows) and getold (three bottom rows). Ratios of the measured fluxes FTλn, peak intensities FPλn, and sizes {A, B}λn to their true values (F/FT, I/IT, A/AT, and B/BT) are shown as a function of the S/N ratio Ωλn. The size ratios A/AT and B/BT are also shown as a function of the true sizes {A, B}λnT. The mean 𝜚{P|T|A|B}λ and standard deviation σ{P|T|A|B}λ of the ratios are displayed in the panels. Similar plots for λ ≤ 100 μm with only bright protostellar cores are not presented, because their measurements are quite accurate, with 𝜚{P|T|A|B}λ ≈ {1.000|0.999|1.000|1.000} and σ{P|T|A|B}λ ≈ {0.0004|0.001|0.0002|0.0002}.

In the text
thumbnail Fig. 10.

Benchmark B4 extraction with getsf (three top rows) and getold (three bottom rows). Ratios of the measured fluxes FTλn, peak intensities FPλn, and sizes {A, B}λn to their true values (F/FT, I/IT, A/AT, and B/BT) are shown as a function of the S/N ratio Ωλn. The size ratios A/AT and B/BT are also shown as a function of the true sizes {A, B}λnT. The mean 𝜚{P|T|A|B}λ and standard deviation σ{P|T|A|B}λ of the ratios are displayed in the panels. Similar plots for λ ≤ 100 μm with only bright protostellar cores are not presented, because their measurements are quite accurate, with 𝜚{P|T|A|B}λ ≈ {1.000|1.000|1.000|1.000} and σ{P|T|A|B}λ ≈ {0.0004|0.0006|0.0002|0.0002}.

In the text
thumbnail Fig. 11.

Overview of the benchmarking results for the source extractions with getold and getsf (Tables C.1 and C.2). The top panels show the detection quality QCRλ, the measurement quality QPTEλ, and the overall quality Qλ from Eq. (6), represented by vertical bars for each wavelength (3 PACS and 3 SPIRE bands, from left to right), with an exception of the fictitious ƛ of the surface density 𝒟{11|13}″. The global qualities of the methods, QCR, QPTE, and Q, defined as the geometric means over the wavelengths, are represented by the wide gray bars. The bottom panels help visualize the ratios of the qualities for the getsf and getold extractions for each benchmark and wavelength.

In the text
thumbnail Fig. 12.

Overview of the dependence of the getsf extraction qualities (Tables C.3C.5) on various subsets of wavelengths (images) used in detecting sources (cf. Sects. 3.4.3 and 3.4.4 of Paper I) in benchmarks A3, B3, and B4. The overall qualities Qλ from Eq. (6) are represented by the green vertical bars for each wavelength (3 PACS and 3 SPIRE bands), with an exception of the fictitious ƛ of the surface density 𝒟{11|13}″. The results for the subsets are annotated on the horizontal axis with the number of images used for detection. In each panel, the getsf extractions are sorted in the order of their decreasing global quality Q, defined as the geometric mean over the wavelengths and represented by the wide gray bars.

In the text
thumbnail Fig. 13.

Benchmark B4 extraction of filaments with getsf. The component of filaments ℱƛY is shown in the middle panel for the surface density 𝒟13″, overlaid with the skeleton 𝒦k4 (detection significance ξ = 4) obtained from the combined ℱDjC on the scales Sk ≈ 153″, corresponding to the adopted value Yƛ = 150″. The left panel displays the skeletons and the footprints Υn, with the pixel values equal to the skeleton (filament) number. The right panel shows the detected skeleton of the spiral filament, together with its one-sided normals. The first pixel of the skeleton is at the center, hence the left normals point to the center (red) and the right normals point outward (orange). Square-root color mapping, except the footprints with linear mapping.

In the text
thumbnail Fig. 14.

Benchmark B4 extraction of filaments with getsf. Plotted are the one-sided (left and right) profiles D{α|β}(r) of the filament surface densities ℱƛY, their slopes γ{α|β}(r), multiplied by 1021 for convenience, the reference line with a slope of −1, and the Gaussian beam with the half-maximum width OH = 13.5″. The profiles represent the median densities along the skeleton and the vertical bars are their positive and negative deviations ς{α|β(r) about the median profile. The left panel shows the true profiles of the simulated filament 𝒟F across its crest. The middle panel presents the profiles, measured in the ℱƛY component of 𝒟13″ along the entire detected skeleton length (Fig. 13). The right panel displays the more accurate profiles, measured across the outermost skeleton loop, where the filament is least affected by the inaccuracies of its background.

In the text
thumbnail Fig. A.1.

Fluctuating backgrounds and measurements of faint sources. The model Gaussian source 𝒢 with a size of 10″ (FWHM) and a peak intensity of 1 corresponds to the flat background ℬ1 = 1 (red lines). The same source 𝒢 is also added to the nonuniform (hill- and hollow-like) backgrounds ℬ1.9, ℬ1.5, ℬ0.5, and ℬ0.1 (blue and green lines). The fluctuating backgrounds were obtained from the flat background ℬ1 by adding or subtracting the Gaussians with a size of 15″ (FWHM) and the peak values of 0.5 and 0.9. Extraction methods would not be able to recognize that the real backgrounds are hill- or hollow-like, hence they would instead subtract the flat backgrounds, based on the intensities just outside the apparent source footprints. Therefore, the source 𝒢 would be extracted with over- or underestimated FWHM sizes A, peak intensities FP, and total fluxes FT (the middle, left, and right columns of numbers, respectively).

In the text
thumbnail Fig. B.1.

Approximate corrections for unresolved or slightly resolved sources with overestimated backgrounds. The model Gaussian source 𝒢 with a size of 10″ (FWHM) and a peak intensity of 1 is superposed on a flat background with a constant intensity of ℬ0.9 = 0.6 (upper red lines). The increasingly overestimated backgrounds ℬ0.7, ℬ0.8, ℬ0.9, ℬ1.0, and ℬ1.1 are shown with the dotted lines of different color. When such backgrounds are subtracted (lower colored curves), the source FWHM sizes A, peak intensities FP, and total fluxes FT become increasingly underestimated. The measured A values (given in the plot) get progressively smaller than the angular resolution of 10″, which clearly indicates an increasing inaccuracy of the background. Requiring that a source cannot be narrower than the telescope beam, it is possible to improve the measurements, substantially reducing their errors.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.