Open Access
Issue
A&A
Volume 671, March 2023
Article Number A101
Number of page(s) 28
Section Numerical methods and codes
DOI https://doi.org/10.1051/0004-6361/202245041
Published online 14 March 2023

© The Authors 2023

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1 Introduction

The European Space Agency’s Euclid mission (Laureijs et al. 2011, mission RedBook), due to start operations in 2023, is designed to provide accurate photometric, spectroscopic, and morphological data (in particular cosmic shear and clustering distributions) for billions of galaxies across 15 000 deg2 of sky, using them as tracers to study the properties of the dark components of the Universe. To this end, a processing pipeline is being assembled by the Science Ground Segment, a team that is in charge of releasing the data to the community. This pipeline is ready to ingest, process, and analyse the raw imaging data from the satellite on a daily basis; optical data from external ground-based instruments (Rubin/LSST, DECAM, CHFT, Pan-STARRS, OmegaCAM, Subaru; see Euclid Collaboration 2022b) will also be used to complement the optical and near-infrared images obtained by the two satellite photometers, VIS (observing in IE, a broad optical band; see Cropper et al. 2016) and NISP-P (observing in the three near-infrared – NIR – bands YE, JE, and HE; Maciaszek et al. 2016; Euclid Collaboration 2022c), allowing for high-quality photometric redshift estimates. The final step of the image analysis pipeline will produce a global catalogue containing all the astrometric, photometric, and morphological information about each source detected in the IE images (plus an additional sample of NIR-detected sources). This catalogue will then be exploited for scientific use by the Euclid Collaboration, and it will also be released to the community for legacy use.

The pipeline currently implements two photometric techniques (aperture and template fitting, performed with Euclid specific versions of two public software tools, a-phot and t-phot respectively; see Merlin et al. 2015, 2016, 2019), and a module to estimate so-called concentration, asymmetry, smoothness (CAS) morphological parameters (non-parametric morphological features that can be used to distinguish between discs, ellipticals, compact, diffuse, symmetric and asymmetric, or clumpy objects by means of a dimensional reduction, see Conselice 2003). However, the pipeline is foreseen to also include a profile model-fitting algorithm. The Euclid Morphology Challenge (EMC) was organised with the aim of analysing and comparing the performance of various model-fitting software tools on Euclid data, in order to establish the foundations for choosing the tools that will be integrated into the official processing pipeline. The final choice will be driven by many factors, including computational performance, robustness of the algorithm, and compatibility with the current version of the pipeline; however, the accuracy of the parameter estimates will of course be the main driver. Therefore, assessing the performance of the different software packages on simulated data, for which the ground truth is known, is a necessary and fundamental step for a sound selection. Eight development teams of model-fitting software packages were invited to participate in the challenge, and five provided at least partial results.

In this paper we present the dataset created for the EMC, and we discuss the results concerning photometry. In fact, albeit not being the central focus of the challenge, flux measurements obtained via model-fitting techniques will have great relevance, providing a crucial complement to the more straightforward methods already included in the pipeline. A companion paper is dedicated to the analysis of such morphological estimates (Euclid Collaboration 2023).

This paper is structured as follows. Section 2 describes the technique used to create the simulated dataset for the challenge, with some technical details given in Appendix A. In Sect. 3 we briefly present the software tools being used in the challenge, and in Sect. 4 we describe the methods used to analyse and rank the data provided by the participants. The results are then presented in Sect. 5, where we investigate the general accuracy of the photometric measurements, as well as the reliability of the estimated uncertainty budgets, with a further focus on each software package’s performance given in Appendix B. Finally, Sect. 6 presents a summary of the work and provides conclusions. All magnitudes are given in the AB system.

2 Simulating the Euclid universe

Simulated datasets are being produced and used by the Euclid Science Ground Segment to test the full processing pipeline from image reduction to data analysis. These simulations consist of raw single exposures, including observational features and defects, and they must be processed and stacked to reach the nominal depth and be ready for scientific analysis, with background light and defects removed. To simplify this complex procedure, and to have all details under control, for the EMC we decided to produce a tailored dataset, directly simulating background-subtracted images at the expected nominal depths of the final stacked mosaics in all bands. With this approach, we were also free to try different options, producing simulations with single and double component analytical profiles, and also with realistic morphologies. In this section we explain the procedure we followed to obtain all these simulated datasets.

2.1 Catalogues and images creation

We started by creating mock cosmological catalogues with the code Egg (v1.3.1, Schreiber et al. 2017). Egg uses the statistical distributions of real galaxies as detected and classified in the five CANDELS fields (Grogin et al. 2011; Koekemoer et al. 2011) to build a simulated catalogue of a patch of the sky, complete with the properties of the objects as observed by a chosen set of pass-band filters, with a chosen pixel resolution, and to a chosen limiting magnitude. We refer the reader to the paper describing the code for a detailed description of its workflow; here we provide a short summary. Egg draws redshifts and stellar masses from observed galaxy stellar mass (M*) functions, and subsequently attributes a star-formation rate (SFR) to each galaxy from the observed SFR–M* main sequence; dust attenuation, optical colours and simple disc-plus-bulge morphologies are obtained from empirical relations established from the high-quality Hubble and Herschel observations of the CANDELS fields. Random scatter is introduced in each step to reproduce the observed distributions of each parameter. Finally, based on these observables, a suitable panchromatic spectral energy distribution (SED) is selected for each galaxy and synthetic photometry is produced by integrating the redshifted SED over the chosen broad-band filters. The galaxies are created as two-component objects, with a bulge and a disc both described by a Sérsic (1968) profile,

(1)

where Sérsic indices for the bulge and disc components are nbulge = 4 and ndisc = 1. The output catalogue contains the physical and observed properties of the galaxies within a field of view (FoV) corresponding to the chosen area; the objects are placed at random positions with a fixed angular two-point correlation function, neglecting large-scale clustering beyond 3′ (i.e. beyond ~1 Mpc at z > 0.5).

We created five catalogues, each one with a size of 0.482 deg2 (41′.66 per side, which is comparable to the typical area on which each single photometric catalogue will be extracted from real data), with limiting magnitude IE = 27.1 (the nominal 1σ limit as given in Laureijs et al. 2011). The total number of simulated galaxies was about 1 .5 million. We then used an Egg built-in script to obtain all the observational properties of the sources. In particular, for each galaxy the following parameters are given: position of the centroid in pixels; total flux in the simulated band; bulge-to-total flux ratio (b/t); scale length of the bulge and of the disc (defined as the radius at which the component is a factor of e less bright than it is at its centre); axis ratio for both components; and position angle for both components. For the first of the five fields (F0), we produced nine lists, to include a full multi-wavelength realisation of a Euclidian sky patch: one for each of the four Euclid bands IE, YE, JE and HE, plus five for the Rubin/LSST bands u, g, r, i and z. The filter transmission curves are shown in Fig. 1. For the other four fields (F1–4) we only produced the IE list, since the main purpose of these simulations is the morphological analysis, which with real data will mostly be performed on the IE images, given that it will be the band with the highest resolution and depth. We point out that in the multi-band realisation the morphological parameters do not change across the spectrum, while total fluxes and b/t do; this information was not explicitly shared with the participants.

We fed these catalogues to GalSim (Rowe et al. 2015, v2.2.1), a Python package that produces simulated astronomical images (it is also used for the official Euclid simulations). We created the images at their expected native pixel scale: 0″.1 for IE (25 000 × 25 000 pixels); 0″.3 for NIR bands; and 0″.2 for LSST bands. After the procedure described in the following paragraphs, we resampled all the images to the IE pixel scale, since this is the procedure that will be followed in the real pipeline. To simulate the effects of point spread functions (PSFs), we provided GalSim with the Euclid Mission Database models for the IE and NIR bands (as provided by the corresponding Euclid working groups; both of them are over-sampled by a factor of 6), while for LSST we provided custom simulated PSFs created using PhoSim (Peterson et al. 2015), at the expected observed pixel scale of 0″.2 (with no over-sampling). The approximate FWHMs of these PSFs are 0″.17 (IE)0″.54 (NIR), and 1″.00–1″.12 (LSST).

We produced noiseless galactic profiles with GalSim, with pixel values in μJy pixel−1. We simulated three different sets of images, all having identical sets of coordinates and total fluxes of the galaxies, which we describe in the following; see also Euclid Collaboration (2023) for further details on this.

  • Double Sérsic (DS) profiles, directly using the standard output of Egg, which consists of a catalogue formatted to be used with SkyMaker (Bertin 2009). In particular, this means that the dimensions of the objects are given as scale lengths. On the contrary, GalSim requires half-light radii; the two values coincide for the bulges, while the conversion factor is 1.678 for the discs (see e.g. Graham & Driver 2005), so we applied this correction before simulating the images. Also, GalSim requires that the fluxes of the two components are given separately, while Egg outputs a total magnitude and a b/t; therefore, to assign a flux to each component we simply used the relations fbulge = b/t ftot and fdisc = (1 − b/t) ftot, where ftot = 10−0.4(m−ZP) (where m is the magnitude of the sources as given in the Egg catalogue and ZP is the zero-point of the image, see Sect. 2.1.1).

  • Single Sérsic (SS) profiles, in which galaxies are modelled with a single Sérsic index, defined using the b/t values from the Egg catalogue as ntotal = (1 − b/t) ndisc + b/t nbulge = 3 b/t + 1. To compute the single effective radius re,tot from the two values given in the 2-component catalogue, we used the followingformula, calibrated empirically to obtain a good visual match between the two realisations: rtot = [b/t re,b]α + [(1 − b/t) re,d]α, where α = 0.8 if re,b < re,d (98% of the cases), and α = 2.0 otherwise. Finally, position angles and axis ratios already had the same values for bulges and discs, so we simply kept them unchanged.

  • Realistic morphologies (RM), in which galaxy stamps are created by means of a neural network using a variational autoencoder trained on observed COSMOS galaxies, as described in full detail in Lanusse et al. (2021) and Euclid Collaboration (2022a); each simulated galaxy mimics the properties of its corresponding analytical realisation. In this dataset, the biggest and brightest objects (re > 0″.2, IE < 20.5) are not simulated due to technical limitations; the list of excluded sources was provided to the participants, and accounts for approximately 1% of the total simulated galaxies. Also, the position angles of the galaxies are not constrained to be close to those of the Egg catalogue (and therefore they were not considered in the final analysis of the results). We point out that this is the first time that such a demanding test has been performed: the codes must provide an analytical fit on non-analytical shapes for which a ground-truth value is known. This is inherently a very challenging task. Moreover, the method used to create the images is not perfect. The conditioning of the latent space with galaxy morphology is not always exact, and this can introduce a systematic bias with respect to the input values (see the discussion in Euclid Collaboration 2022a), although the consistency is fully guaranteed in a statistical sense; some level of scatter remains on a object-by-object basis, meaning that the comparison with the input catalogue must be taken with caution. For more details, see Euclid Collaboration (2023).

Figure 2 shows the redshift distribution as a function of IE input magnitude. Egg outputs the redshift of each simulated galaxy, and although this information is not explicitly used in the present work, it might nevertheless be useful to have an idea of the global distribution; indeed, most of the analysis and figures will be presented as a function of the input IE magnitude, which correlates with redshift. For example, looking at the plot one can see that galaxies at z = 3 typically begin to be detectable at IE = 23. Figure 3 shows the distribution of simulated galaxies in the magnitude-size-b/t space for the same field (note how there is a non-negligible fraction of bulge-only objects with b/t = 1). Finally, in Fig. 4 we show the distributions of various input parameters for the IE band components and realisations (magnitude, effective radius, axis ratio, Sérsic index, bulge-to-total ratio and redshift), for the full dataset (the five fields), down to the nominal 5σ limit IE = 25.35 (Laureijs et al. 2011); we also show the COSMOS distributions (Mandelbaum et al. 2012) for reference. As expected, simulations and observations agree remarkably well, with the exception of the b/t distribution, which is more skewed towards disc galaxies in the Egg catalogue. This might have some impact on the analysis of the results (see Sect. 5).

By construction, we could not simulate irregulars, which are estimated to constitute less than 10% of the galaxies at z < 1, but up to 70% at z = 3 (e.g. Huertas-Company et al. 2015, although recent preliminary results from the James Webb Space Telescope seem to indicate a lower number). This is an obvious but unavoidable limitation of this work. The RM realisation can provide a hint about how model-fitting codes can deal with non-analytical shapes.

We then added a field of stars, to include the effects of their presence as contaminants in the fitting procedures. To obtain a realistic distribution, we took their celestial coordinates (converted to pixel positions) from one FoV of the official Euclid simulations used for Scientific Challenges1, and simply placed PSF stamps at the positions of the sources, scaling their flux to match the catalogue magnitudes. We excluded very bright stars (IE < 15), in order to avoid that large regions of the simulations were affected by their presence, and also because – given the limited extension of the PSF stamps – they would saturate creating artificial defects on the images. The fraction of pixels significantly contaminated by stellar light (that is, where the surface brightness from stellar light is more than the 1σ surface brightness per pixel) is approximately 1% in the IE images, 5% in the NIR bands, and from 1% to 7% in LSST bands (u and z respectively). In IE, approximately 2.5% (1.5%) of the 5σ galaxies are contaminated by stars brighter than 10% (50%) of their flux.

thumbnail Fig. 1

Filter transmission curves used to simulate the images (normalised to arbitrary units) in this work. The four filters IE, YE, JE, and HE form the Euclid set, while u, g, r, i, and z are external complementary filters (from LSST/VRO). The NIR curves are early estimates of the actual transmission functions, which are described in their latest updated version in Euclid Collaboration (2022c); however, since we only use these data to compute the input fluxes, the differences are not relevant for the present work.

thumbnail Fig. 2

Distribution of redshifts as a function of IE magnitude (input values) in F0. The orange line is the running mean, the vertical dashed line the 5σ limit.

thumbnail Fig. 3

Distribution of bulge-to-total ratios as a function of IE magnitude in the input catalogue of F0; the colours encode the global scale length, defined here as r = b/t re,bulge + (1 − b/t) re,disc.

thumbnail Fig. 4

Distributions of various input parameters for the IE band, in all five simulated fields, for all the separate components and realisations in the challenge, as described in the legend and in the labels of the panels. For comparison, we also show the corresponding parameter distribution in COSMOS.

2.1.1 Observational noise

Once the seed images containing the sources were produced, we proceeded to add simulated observational noise. First of all we replaced the smooth analytical profiles with stochastic realisations from a Poissonian distribution, to simulate the effects of photon shot noise. We used the Python module scipy.stats.poisson.rvs for this purpose. We paid particular attention to keep the units of the images always consistent during the whole process: we first converted the noiseless images from μJy/pixel to observational units, using the correct image observational ZP at 1 s, to obtain consistent Poissonian realisations, which depend on the total exposure time. Only after this step did we convert the images back to μJy. To calculate the ZPs we followed the procedure described in Euclid Collaboration (2019), which we summarise in Appendix A.1. The same method was also used to produce an empty sky map containing only Gaussian noise, simulating the observational background at the desired depth. Since the images were simulated with zero background light, the Gaussian noise must have zero mean, and the standard deviation of the pixel values defines the depth of the final simulated image.

The values used in this procedure are summarised in Table 1. The exposure times used in our simulations the for IE and for the NIR bands were taken from Laureijs et al. (2011), and it is worth pointing out that the actual in-flight values will be slightly different (4 × 560 s for IE and 4 × 88 s for all NIR bands). The exposure times for the LSST bands were estimated from simulated data created for one of the internal validation Scientific Challenges, and are representative of an early release (LSST final data after 10 yr of observations will of course be much deeper). The limiting magnitudes and background surface brightness values, which are needed in the computations, are set to be consistent with the expected values for the Euclid Wide Survey, and they were taken from a dedicated study by J. C. Cuillandre (priv. comm.); these too are slightly different from the current best estimated values, which can be found in Euclid Collaboration (2022b). These small inconsistencies are due to the fact that the present work began before the latest estimates had been made available; however, they have negligible impact on the scientific results of the EMC. It is worth pointing out that the images were produced with homogeneous noise levels, that is to say we did not simulate regions of different depths. Finally, we summed each image containing the Poissonian realisations of the galaxies and stars with the corresponding ‘empty’ Gaussian sky noise.

Table 1

Parameters used to simulate the images.

2.1.2 Rebinning

As previously mentioned, we simulated all the images at their native pixel scales (0″.1 for IE, 0″.3 for NIR, and 0″.2 for LSST), and we then rebinned F0 NIR and LSST images to the Ie pixel scale using Swarp (Bertin et al. 2002). This is consistent with the real pipeline workflow, with some differences: in the pipeline, a newer version of the software named Swarp++ is used, and single-exposure images are combined to create the mosaics, allowing information to be gained in the process. We also rebinned the PSFs accordingly. All the rebinning processes were performed using the BILINEAR interpolation mode. We note that this resampling procedure introduces artefacts in the noise map (in particular, pixel correlations) that alter the apparent signal-to-noise ratio (S/N) of the map, so the actual uncertainties of the measurements must be computed using a dedicated RMS map, which we discuss next.

2.2 RMS maps

It is common practice to assign uncertainties to the measurements performed on scientific images by means of a weight or an RMS map, which is often obtained from first principles during the data reduction chain. When this is not possible, it can be easily determined, at least to a first approximation, by measuring the RMS of the pixel values in ‘empty’ regions of the (non-rebinned) science frame - although such a measurement only provides information on the noise due to the unresolved sky background. For the sake of the EMC goals, we wanted to factorise out any possible source of complication, and therefore ready-to-use RMS maps were provided to the participants, along with the scientific images. This is also again consistent with the pipeline architecture. The procedure to build the RMS map is descibed in Appendix A.2.

The RMS maps were produced at the native pixel scales of the scientific images, and we checked that the pre-resampling S/Ns are consistent with the expected values. This is shown in Fig. 5, where we plot the S/N estimated for each simulated source and check that it is equal to the expected value at the limiting magnitude (star symbols); for this test we used a-phot, forcing the measurements within 2″ apertures (as for the definition of S/N adopted in this work) at the true input positions of the sources. It is important to note that some distributions of points are overlapping (the three NIR bands have the same expected depth, and so do two of the LSST bands). The overall agreement with the expected values is very accurate. Finally, we proceeded to resample the maps along with the scientific images, again using Swarp, checking that the S/N values of the resampled images are correct when the RMS maps are used to estimate uncertanties.

thumbnail Fig. 5

Signal-to-noise ratios of the simulated images at the native pixel scales, measured with forced photometry in 2″ diameter apertures at the input positions of the sources. Star symbols show the expected values, which are also reported in the legend.

2.3 Challenge setup

The scientific and RMS images were finally uploaded to a private online repository for the participants to download, together with lists containing the IDs and input positions of the centroids of the simulated objects down to various nominal S/N levels (100, 10, 5, 1σ). This was done to factor out possible inaccuracies in object detection and deblending, so that the challenge could be actually focussed on the accuracy of fitting photometry and morphology without adding any further possible source of error. We point out that to obtain the lists we simply applied different cuts to the input IE magnitudes, so they must be considered as coarse reference levels rather than accurate estimations of detection significance.

As a visual example, Fig. 6 shows a small crop of the DS F0 images in all nine bands, while Fig. 7 shows a small crop of the DS, SS and RM F0 IE images. A sample of the ground-truth input values of the simulated galaxies was also provided to the participants (in particular, a small portion of the multi-band F0 dataset and the whole F4 IE dataset), allowing for a check that their procedures were reasonably correct without evident errors.

The output requested from each participant consisted of the estimates of (i) flux, (ii) Sérsic index (in the SS and RM realisations), (iii) half-light semi-major axis, (iv) axis ratio, and (v) position angle, each with a corresponding 1σ uncertainty, for each component of the simulated objects. In particular, while the SS and RM realisations only required a single fit with free n, for the DS realisation we asked for two estimates, namely: one fit with fixed indices (ndisc = 1 and nbulge = 4, consistent with the way the images were simulated); and another with ndisc = 1 and nbulge left free to vary.

As mentioned, the output estimates were required for the objects belonging to the list including only S /N > 5 (in IE) sources. Analysis of objects at lower S/N were explicitly mentioned as an optional output that would not influence the final comparison among the software packages.

3 Model-fitting software packages

Eight development teams of model-fitting software packages were invited to participate to the challenge; of them, five (DeepLeGATo, Galapagos-2, Morfometryka, ProFit, and SourceXtractor++) provided at least partial results. All but one are based on parametric methods, using functional forms to fit the observed light distributions, the exception being DeepLeGATo, which exploits a convolutional neural network (CNN).

Here we briefly summarise the basic properties and features of these five software packages, and point out a few important details about the procedures each one followed. It is instructive and important to notice how various subtleties in the interpretation of the requests, usage of the provided input datasets, and processing methods used by each participant led to differences in the format and accuracy of the provided outputs.

3.1 DeepLeGATo

DeepLeGATo (Tuccillo et al. 2018) is a software package for estimating galaxy structure based on a supervised deep-learning approach. The code uses CNNs to perform a simple regression between an image centred on a given galaxy and its structural parameters, providing results in very short times (see Appendix B.1). In the version used in this work, the training was performed with images of fixed size (128×128 pixels) independently of the galaxy effective radius; this likely caused sub-optimal performance on the largest and brightest galaxies. The images used for training were not the ones provided with the EMC dataset; instead, they were idealised analytic 1- or 2-component Sérsic profiles, convolved with the PSF of the corresponding band, to which noise having similar properties to that in the EMC data were added. A different training was performed for each structural parameter with slightly different architectures, which are variations of standard CNNs (see Tuccillo et al. 2018, for more details). The fits were performed at the positions of the sources as given in the input files.

Because the loss function used for training is a standard mean square error, only point estimates were provided; therefore in the implementation used for this work the uncertainty budget (i.e. the uncertainties on the estimates) was not computed. Noticeably, the 5σ input source lists were subdivided into S/N bins using the other input catalogues (10 and 100σ), and different parameters were used for the fits in different bins; this can be seen in the distributions of the points in the plots (see Appendix B). While this makes the version of the software used in the challenge not directly suitable for an implementation in the Euclid pipeline, because the S/N of real sources cannot be known a priori, it is worth pointing out that the code remains under development, and more recent releases work without the need for this fine-tuning of the parameters for different input data. Only IE SS and DS fits were provided.

thumbnail Fig. 6

Small region of the DS F0 realisation in the nine bands; all are shown with the same colour scale.

thumbnail Fig. 7

Small crop of the three realisations of IE images. Left to right: DS, SS, RM. Notice that in RM the orientations of the galaxies are not the same as in the other two realisations.

3.2 Galapagos-2

Galapagos-2 (Häußler et al. 2022)2 is an updated and enhanced version of Galapagos (Barden et al. 2012). It provides a wrapper around either Galfit (Peng et al. 2002, 2010) for single-band fits, or GalfitM (Häußler et al. 2013) for either single- or multi-band fits, and it is specifically designed to carry out fully automated fitting on all objects in a large survey. Starting from the input images and a simple setup file, it employs SExtractor (Bertin & Arnouts 1996) for object detection, and then uses this information to automatically set up the fits. The postage stamp size used for each fit/object depends on the estimated size of the object, set up by the user. Using some limited input from the setup, for example enlargement factors to conservatively increase the size and shape of the object estimated by SExtractor, Galapagos-2 takes care of neighbouring objects (deblending and fitting of bright nearby objects, masking of fainter and more distant objects), estimates the sky background level with a sophisticated and robust scheme (see Barden et al. 2012 for details), and sets up the fit using SExtractor values as initial guesses to run the fitting algorithm GalfitM for all objects, starting from the brightest objects and using the PSF provided. Once an object has been fit, it is merely subtracted from the fits of nearby objects. This significantly speeds up the process overall, as these bright, large objects take the longest to fit, but this only needs to be done once. In a fully automated pipeline, it then reads out the result and provides one final catalogue, which contains fitting information for all bands. GalfitM itself uses a Levenberg–Marquardt minimisation to derive the best-fit parameters and uncertainties. In the multi-band realisations of the EMC, all bands are connected via physically reasonable polynomials and fitted simultaneously, to reduce the degrees of freedom of the fit and make full use of the multi-wavelength information. The software requires one additional input image compared to what was provided, namely a weight image to flag bad pixels. Since no bad pixels were in the data, this was trivially created as a uniform image of the correct size.

All the requested outputs were provided; however, for DS F0 only a simultaneous fit was run, therefore including IE in the multi-band fitting process – in other words, there is no isolated fit for IE DS F0. This causes the IE fit for F0 to be substantially different from that of other fields; for this reason, it was decided not to include F0 in the analysis of IE-only results for all codes (see below).

3.3 Morfometryka

Morfometryka (Ferrari et al. 2015), written in Python, was primarily designed to measure non-parametric morphological quantities, but as a bonus it performs single Sérsic model fitting. The software takes as input a galaxy stamp (plus the PSF model), estimates the background with an iterative algorithm, segments the sources and defines the target. Then, it filters out external sources using the code GalClean (de Albernaz Ferreira & Ferrari 2018). From the segmented region it calculates basic geometrical parameters (e.g. centroid, position angle, axis ratio) using light-profile moments. Then it performs photometry, measuring fluxes within ellipses with the aforementioned parameters (contextually masking out point sources over the ellipse annulus with a sigma clipping criterion). From the luminosity growth curve it establishes the Petrosian radius, inside which all the measurements are made. The Sérsic fit is performed on the 1D luminosity profile; for robustness, the 1D outputs are used as inputs for a 2D Sérsic fit of the galaxy pixels. Finally, it measures several morphometric parameters, for example concentrations, asymmetries, Gini, and M20 (the former is a coefficient quantifying the inequality among values of a frequency distribution, in this case of pixel values; the latter is the second order moment, that is to say the flux values weighted by their square distance to the centre, of 20% of the brightest pixels; see Lotz et al. 2004), entropy, spirality, curvature, among others). In a forthcoming version, the luminosity profile curvature (Lucatelli & Ferrari 2019) will be used to provide a more robust input to a parametric model-based fit of the light profile, eventually replacing the 1D Sérsic fit as a metric, mainly to mitigate a long lasting problem of Sérsic index determination (see discussion in Euclid Collaboration 2023). Only the IE SS fit was provided.

3.4 ProFit

ProFit (Robotham et al. 2017) is a software package designed to perform Bayesian two-dimensional photometric galaxy profile modelling. It consists of a low-level C++ library accessible via a command-line interface and documented API, along with high-level R (v.3.6.1) and Python interfaces. The fitting process for each object starts running the source finder ProFound (Robotham et al. 2018)3 on a 500 × 500 pixel cutout centred on each target, to create a segmentation map and find nearby sources requiring simultaneous modelling; the output also provides some reasonable initial guess for the profile solution. The actual fitting is then performed by the Highlander core software4, which combines a genetic algorithm step with a CHARM (Turchin 1971; Smith 1984) Markov chain Monte-Carlo (MCMC) process, repeated twice; each one is run for 100 steps (where model realisations are modified by the number of free parameters also). The CHARM algorithm is particularly useful on highly covariant parameter search, but it is computationally expensive, because a single iteration requires sampling all parameters. Since the kind of fitting used for ProFit is relatively low in the number of parameters, but sometimes quite highly covariant in the posterior, CHARM has proven to be a powerful exploration tool. The provided solution is the combination of parameters that generate the maximum likelihood given the per-pixel Data – Model residual. The parameter priors are implicitly assumed to be uniform. Errors are estimated from the final MCMC run, with full covariance matrix information available.

All requested data were provided. Partially building upon the effort put in the EMC, the whole ProFit pipeline has recently been developed into a new package, ProFuse (Robotham et al. 2022).

3.5 SourceXtractor++

SourceXtractor++ (Bertin et al. 2020; Kümmel et al. 2020)5 is a ground-up re-write of the widely used SExtractor2 software (Bertin & Arnouts 1996), written in C++ with a strong focus on extensibility and model-fitting photometry; the software is under active development and the results submitted to the challenge represent a snapshot in this process (the version of SourceXtractor++ used in the EMC is 0.12).

Each SourceXtractor++ run includes two stages, detection and measurement. The detection stage follows the same procedure as used in SExtractor2. Detection parameters need to be optimised for a compromise between the completeness of the true object list and the number of spurious objects extracted or deblended; over-extraction of sources impacts the performance of the run-time required, and may also reduce the accuracy of morphological measurements if objects are over-deblended. The parameters for the EMC were tuned aiming for good overall performance, and therefore not for reaching 100% completeness of the input 5σ source list. The SS and DS simulations produce slightly different distributions in apparent extent and surface brightness for the galaxy images, and so the parameters governing detection were optimised separately for the different simulations.

Measurement in one or several bands is controlled via a Python configuration file with flexible model fitting at its core, which allows for the simultaneous source analysis over a large number of FITS files with different pixel grids. Various components (point source, exponential disc, free Sérsic , etc.) can be used individually or in combination; reasonable priors must therefore be provided to the fitting engine in order to cover the range of parameter values and provide sensible fits. The chosen priors for the EMC are described in Appendix B.5.2.

Noticeably, the SourceXtractor++ pipeline used for the EMC includes a pre-processing of the images, namely the extraction and usage of PSFs from images, performed using the PSFEx software (Bertin 2011), and a re-downsampling to the original pixel scales of the NIR and LSST images (0″.3 and 0″.2, respectively). This procedure was allowed by the guidelines of the EMC, given that no additional input data were used. However, given that no other participants did anything similar, we decided to also check the performance of the package on standard, non-pre-processed data, finding overall good agreement with a few differences that should be taken into account when considering the overall processing cost of the pipeline. We discuss this in Appendix B.5 (see also Euclid Collaboration 2023). Additionally, SourceXtractor++ priors were obtained by comparing the output distribution of a given morphological parameter with the equivalent distribution for the provided samples of the input true catalogues; the priors on parameters were iteratively adjusted under the constraint that a simple analytical transfer function is required to map each distribution to a Gaussian. Each parameter was calibrated independently, without including covariances; only the statistical distributions were used (i.e. there was no object-by-object comparison in the process). A detailed description of the calibration of the priors is given in Appendix B.5.2. All requested data were provided, except for the multi-band DS fit with free nbulge.

4 Diagnostic metric

Given the high dimensionality of the output data, a straight-forward comparison of the results was not feasible. In order to obtain a reasonably comprehensive overview of the quality of the performance, we defined an ad hoc metric. The participants provided catalogues that were matched to the input ones by means of the unique ID of each source. Then for each run we proceeded to estimate the difference between the input and the measured fluxes of each object, and computed averaged statistical diagnostics.

Importantly, to compute such statistics we used a subset of sources from the 5σ-limit list, including only those for which all software packages provided a meaningful fit. Recall that the default request was to provide a fit for all the sources in the nominal 5σ list; with some minor caveats and exceptions mentioned in Sect. 3, all participants obliged to this, and some also provided results for lower S/N sources. However, not all the fits were successful (i.e. some were given as NaN or default values in the output catalogues), and some were flagged as “bad” or “unreliable” in one or more codes. In general, these sources were not included in the lists used to evaluate the accuracy of the results, which therefore only included the objects for which all the software packages had provided a reliable fit. An important exception is the case of Galapagos-2, which outputs several quality flags describing which component of a galaxy can be considered as reliably fit. In particular, for the DS runs a total of five flags were provided: USE_FLAG_SS indicates that the single Sérsic fit is usable, as it did not run into fitting constraints; USE_FLAG_BULGE_CONSTR and USE_FLAG_DISK_CONSTR serve the same purpose for the bulge and disc components, respectively; in addition, USE_FLAG_BULGE_BRIGHT and USE_FLAG_DISK_BRIGHT indicate whether the bulge and the disc are relatively bright enough (b/t > 0.2 and b/t < 0.8, respectively), that their fit could in general be trusted, with the additional difficulty that b/t itself is defined via such a fit. However, all of these flags were ignored in our analysis, to avoid excessive complications in the definition of the common set of fitted sources within the submissions. While this choice certainly impacts the statistics of the results, because galaxies are taken into account that are known to violate fitting constraints (they are 4% and 13% of the total number of objects, for single and double component fits, respectively), we found that the effects on the overall analysis was marginal. We stress that a general user of Galapagos-2 shall consider these flags, according to their purposes; a thorough description of the flags is provided in Häußler et al. (2022). See the Appendix of Euclid Collaboration (2023) for a comprehensive discussion on this topic.

To estimate the impact for each software tool, an additional term evaluating the completeness fraction of the output catalogues with respect to the full 5σ list was included in the global metric, as described below.

To build the metric, we started considering the relative flux difference of each object with respect to the input true flux, i.e. δf = (fmeasftrue)/ftrue. We then used δf to evaluate three diagnostics: the bias ℬ; the dispersion 𝒟; and the outlier fractions 𝒪. In summary, the three diagnostics were first averaged over the sources belonging to bins of input magnitude (we used 15 bins to divide the full interval of simulated magnitudes, from 14 to 28), to quantify the impact of S/N; then these averages (normalised with weighting factors) were summed, and further combined with the completeness 𝒞 diagnostic, finally yielding a global score 𝒮 for each field and realisation. For IE, the values computed for each simulated field were finally averaged to obtain a single figure; while this is not strictly correct from a statistical point of view, given that the results in the different fields were very similar we assume that the outcome is sufficiently accurate. In more detail, the four quantities were defined as follows.

  • The bias ℬbin is the median of δf in each bin of input magnitude, computed considering only the objects having |δf| ≤ 5σtrue, where σtrue is the standard deviation of δf for an ideal distribution of fluxes, that we obtained by perturbing the input true values with a random realisation of observational Gaussian noise consistent with the expected depth of each image (we imposed a minimum value corresponding to δf = 0.02 (2%), to avoid unrealistically small values of σtrue at the bright end). An unbiased measurement would yield ℬbin = 0. We then define the average value of the bias as the weighted mean of its values across the magnitude bins, ℬ = Σbins wbin|ℬbin|, where wbin is a weighting factor given by the fraction of objects in each bin of true magnitude (to give more weight to highly populated bins) multiplied by the logarithm of the median S/N in that bin (to give more weight to the fit of bright objects). We note that while the values of ℬbin can be positive or negative, B is defined to be positive.

  • The dispersion 𝒟bin is the ratio between σmeas, that is to say the standard deviation of the distribution of δf (again only including objects within 5σtrue) and σtrue, in each magnitude bin. The average dispersion is defined as 𝒟 = Σbins wbin𝒟bin.

  • The outlier fraction 𝒪bin is the number of objects having |δf| > 5σtrue divided by the total number of fitted objects in each bin. These objects fall outside the expected distribution, and we assume that their large bias is due to some systematic error in their measurement (e.g. strong contamination from neighbours, or catastrophic failure of the fit). Therefore they were not included in the statistics of ‘well-behaved’ sources, and were instead isolated into a separate diagnostic. The average outlier fraction is defined as 𝒪 = Σbins wbin𝒪bin.

  • The completeness 𝒞 is simply the number of objects for which a successful fit was provided, divided by the total number of objects in the input list of S /N > 5 sources (we do not weight this quantity by the magnitude bins). A galaxy is considered not to be fit if there is no entry in the provided output catalogue, or if the challenge participant flagged that galaxy as a ‘bad fit’ (see discussions above). Each software package has different ways of identifying unreliable fits, and we refer the reader to the publications describing each code for additional information. Here, we simply trusted the participants’ verdict on the reliability of their fits.

We point out that the definitions used in Euclid Collaboration (2023) are very similar, but since it is difficult to construct meaningful expectations for ideal perturbed distributions of morphological parameters (corresponding to the σtrue we use here for the fluxes), some differences were introduced. The interested reader should therefore pay attention to these details.

Finally, the global diagnostic S for each run is defined as

(2)

where we subtract 1 from 𝒟 when computing the final global statistics, because when the dispersion is ‘ideal’ the ratio with σtrue is 1, and we want the value of all diagnostics to be close to zero for ideal fits. In this expression, the k factors are multiplicative constants assigned to each of the three diagnostics in an attempt to reasonably weight their relative contributions.

We chose k = 20.0, k𝒟 = 0.6667 and k𝒪 = 5.0. While these choices are to some extent arbitrary, it is worth pointing out that the large differences in these values do not reflect the actual weight given to each diagnostic; on the contrary, they were chosen exactly to try and reach a reasonable balance between the three weighted quantities. We argue that a fit might be defined as ‘optimal’ if it has, for example, 𝒞 = 1.0 (100% completeness above 5σ), ℬ < 0.015 (1.5% median bias), 𝒟 < 1.33 (dispersion no larger than 4/3 of that from the perturbed true fluxes), and 𝒪 < 0.1 (10% of outliers) in all bins of magnitude; in the case of these exact values, applying the chosen weights one gets 𝒮 ≃ 0.3 + 0.22 + 0.5 = 1.02 (the bin weights wbin are not relevant here). So, we see that if 𝒮 ≤ 1 the fit can be considered as optimal; 𝒮 ≤ 1.33 is very good; 1.33 < 𝒮 ≤ 1.67 is good; and 1 .667 < 𝒮 ≤ 2.0 is acceptable. Finally, with this metric values of 𝒮 much larger than 2 indicate a bad overall fit. We note that when marginalising the contributions for a 100% complete fit, 𝒮 = 2.0 can be due to: a 10% overall offset; a standard deviation; or a 40% outlier fraction.

The diagnostics were evaluated automatically by means of Python scripts, but the results were also visualised graphically, to allow for sanity checks and for a quick grasp of any particular features. Figure 8 shows an example of a diagnostic plot that we used to analyse one of the provided output catalogues; similar plots were created for the outputs for each field and realisation that each participant provided. Each dot is a single fitted galaxy, and its δf is plotted against its true input magnitude in the considered band. The dots are colour-coded by the true bulge-to-total ratio (which for the SS and RM realisations is a proxy for the Sérsic index, n = 3 b/t + 1). For each bin of magnitude, the median, standard deviation, and outlier fraction of the distribution were computed, and the values were then used to compute the diagnostics described above; the dotted lines show the 1σtrue and levels. Specific examples are given in Appendix B.

Because of a technical problem not related to the performance of the code and gone unnoticed during the run, the processing of F1 by DeepLeGATo was interrupted before the end of the input list, and the corresponding catalogue was therefore incomplete. To ease the comparisons, and considering that the problem was not caused by a bug in the code (since the processing of all the other fields ended smoothly), we decided to remove F1 entirely from the analysis process, after checking that this would not favour one of the codes with respect to the others.

Finally, as already mentioned the Galapagos-2 runs on DS F0 were performed in a simultaneous multi-band fit, including IE with the other bands; this caused the results to be significantly different from those in the other four IE fields. To avoid any impact on the evaluation, and considering the many different approaches of the other participants (DeepLeGATo only provided the IE fits, Morfometryka did not provide the fit, ProFit only provided non-simultaneous fits, and SourceXtractor++ provided both a simultaneous and a separated fit), we decided to remove F0 from the analysis of IE DS.

In summary, IE SS and RM were analysed by averaging F0, F2, F3, and F4 (resulting in 212 000 objects for SS, and 204 229 for RM, where the bright galaxies were not simulated); IE DS by averaging F2, F3 and F4 (207 064 objects); and the other bands on DS F0 alone (because it was the only field simulated with a multi-band dataset; it contains 70 700 objects).

thumbnail Fig. 8

Example of a diagnostic plot used to analyse each output catalogue provided by the participants to the EMC. Each dot shows the δf value of a fitted galaxy, colour-coded by the b/t value from the original Egg catalogue (colour-bar on the right), as a function of the input true magnitude in the considered band. The black solid line is the running median of the distribution, which should be identically zero for a perfect fit. The dotted lines show the (positive and negative) 1 and 5 levels, used to compute the diagnostics described in the text and reported in the top left corner of the plot (see Sect. 4).

thumbnail Fig. 9

Visual summary of the 𝒮-values listed in Table 2.

5 Results

In this section we discuss the results obtained with the different software packages. First of all, it is worth pointing out again that the complexity of the challenge caused a significant scatter in the interpretation of its goals and spirit by the participants. This caused substantial differences in the adopted approaches, level of processing and output formats between them. Together with the high dimensionality of the dataset, this makes a direct and comprehensive comparison of the results very challenging. In other words, the strategies and techniques adopted by the participants influenced the overall accuracy of the provided output, and this must be taken into account in the analysis, to ensure a fair overview of each code’s capabilities and limitations. Nevertheless, we believe it is possible to draw some interesting general conclusions from the comparison. We describe the overall outcomes in the following, with some particular cases discussed in more detail, when necessary.

Individual diagnostic plots for all the different runs are available in an on-line interactive tool6. Further discussions on the results provided by each participating team are provided in Appendix B, together with a summary of the computational times and memory workload required by each software package.

5.1. Global outcome

In the following, we separately analyse the three realisations SS, DS, and RM. We separate the multi-band dataset from the IE-only DS fits, since the results are significantly different (we identify the multi-band case with the addition of the letter ‘M’ to the acronym DS whenever necessary). The values of the global metric 𝒮 for each realisation are listed in Table 2 (where the values of the completeness factor 𝒞 are also reported, to allow for a better evaluation of its impact), and shown in Fig. 9. A visual summary of the values of each diagnostic used to compute 𝒮 (i.e. ℬ, 𝒟 and 𝒪), averaged over the magnitude bins and fields, is given in Fig. 10. For the multi-band case, these values are the average of the ones obtained in each of the nine individual bands. We note that in these plots the values are shown before the weighting by the k factors in Eq. (2), so the relative difference between the values obtained by any two codes for the three diagnostics does not straightforwardly reflect their final difference in 𝒮.

Because the global diagnostic quantities only provide a crude overview of the results, being averages over the input parameter space obtained with arbitrary weights, in Sect. 5.2 we also present a collection of summary plots (Figs. 1114), showing the trends of the diagnostics as a function of the input magnitudes in all the cases of interest.

We want to begin emphasising that each code proved to have points of strength and of weakness, so the comparison of the global score 𝒮 and of its factors is only intended as a quick overview, and should by no means be taken as a rigorous evaluation and ranking of the software packages. We fully acknowledge that this is a simplified view and therefore alternative metrics, tailored to specific science cases or assigning different weights to the considered diagnostics, could result in different conclusions. That said, we can claim that with our metric all software packages provided acceptable to good results in at least some of the realisations. Some differences are present in a few cases (e.g. a particular realisation or band, the faint end of the simulated distribution of galaxies, etc.), but the outputs provided are typically fairly accurate. Unsurprisingly, in the DS realisation the best results were obtained in the IE runs, for most of the software packages, given its high resolution and depth. The multi-band dataset proved to be more demanding, because of the lower S/N and resolution of the images, and also of the resampling procedure which introduced noise correlations. All the participants reached at least 95% global completeness in the IE DS dataset; in the other realisations, some lower scores were obtained (see the numbers in parenthesis in Table 2).

In the IE-only runs, the best global results by all codes were obtained on the SS realisation, with all software packages reaching values of 𝒮 between 1.0 and 1.7, with the exception of Morphometryka (𝒮 = 2.54), penalised by a significant bias ℬ caused by systematic underestimation of fluxes of bright bulge-dominated galaxies, and of all objects at faint magnitudes (see discussions in Sect. 5.2 and in Appendix B.3). On the DS realisation of the IE band, SourceXtractor++ and DeepLeGATo reached 𝒮 < 2.0, although it must be recalled that the scheme adopted by the latter (dividing the sources according to their nominal S/N bin) introduces peculiar features in the distribution of δf (see Appendix B.1); Galapagos-2 reached 𝒮 ≃ 2.0, while ProFit was penalised by a large ℬ caused by a strong overestimation of fluxes at faint magnitudes (see Appendix B.4). Interestingly, the fits with free nbulge were in general slightly better than those with fixed nbulge = 4 (except for Galapagos-2), despite the fact that the simulated galaxies had indeed nbulge = 4.

Finally, the results for the RM realisation were less accurate than the ones for the other IE images. This should be expected, given the inherently more difficult task of fitting analytical profiles on complex realistic morphologies. Interestingly, here the three codes that provided results have very similar trends; this seems to imply that when dealing with realistic galaxy shapes, the impact of prior calibrations, pre-processing of the images, and robustness of the algorithm become of secondary importance with respect to the inherent difficulty of the task.

The results for the multi-band dataset show the evident impact of the different strategies followed by the participants. SourceXtractor++ (which included an image and PSF preprocessing pipeline, calibrated its priors on the provided example ground-truth data, and fitted all bands simultaneously) obtained optimal results, comparable to those for the IE band fitted alone. Galapagos-2 did perform a simultaneous fit, but without the image and PSF pre-processing obtained sub-optimal results. ProFit, which performed a separate fit on each band, had weaker performance due to not using information from the IE image in the multiband fits: faint galaxies detected in IE are likely to have very low S/N or could even be undetected in most NIR and LSST bands, and without the IE parameter to constrain the fit, it is very difficult to properly model their light profiles in these bands. This outcome is important for highlighting how the synergy with Euclid can significantly improve the accuracy of Rubin/LSST measurements, as already pointed out by several studies (see e.g. Rhodes et al. 2017; Capak et al. 2019).

It is worth mentioning here that any statistical result showing a dependence on the bulge fraction of the sources is biased by the low fraction of simulated bulge-dominated galaxies with respect to the real Universe distribution (see Sect. 2.1), so the overall performance on real data might be worse.

Table 2

Software package, runs and 𝒮-values considering the common list of sources (see text for details).

thumbnail Fig. 10

Visual summary of the diagnostic quantities: (left to right) absolute value of the median bias (ℬ), average ratio of σmeas and σtrue (𝒟), and outlier fraction (𝒪). In the bias panel, the points corresponding to ProFit DS multi-band runs are not shown, being off the scale (their values are 0.49 and 0.44, respectively). See text for details.

thumbnail Fig. 11

Trends of the completeness C with input IE magnitude of the output provided by the five participants to the challenge, for (top to bottom) DS, SS, and RM realisations (in the latter, points start at IE = 20.5 because bright objects were not included in the simulated images).

thumbnail Fig. 12

DSb4 (top), SS (middle), and RM (bottom) summary plots for IE. Left to right: median bias ℬ; dispersion (standard deviation) 𝒟; and fraction of outliers O. Each line and colour correspond to a different code as indicated in the legend; note the different y-axis scales.

5.2. Trends of the diagnostics with input magnitudes

In Fig. 11, the trends of completeness 𝒞 are shown for each software package as a function of IE in bins of magnitude, for the three realisations DS, SS and RM (we remind the reader that for the latter only objects with IE > 20.5 were simulated).

As one can expect, the fraction of successfully fitted objects decreases with increasing magnitude for all codes, since fainter objects are generally harder to detect and fit; the only exception is DeepLeGATo, for which bright objects are more often prone to failure. This is a consequence of the fixed stamp sizes that are used in the current version of the software, so that very bright (and large) objects may be larger than the stamp size.

Overall, down to IE ≃ 23.5 all codes successfully fit more than 95% of the galaxies; the completeness then typically decreases, with the noticeable exception of ProFit which always stays close to 100% for the SS and DS realisations. This is likely due to the different thresholds used to perform detection in the software packages (we recall that Galapagos-2 uses SExtractor and ProFit uses ProFound), leading to different efficiencies in the actual detection of sources.

In each panel of Figs. 1214, the trends of one of the diagnostics ℬbin, 𝒟bin, and 𝒪bin are shown as a function of magnitude in the considered band, for all the software packages that have provided a fit in the relevant realisations. Some individual cases of particularly notable behaviour are described in more detail in Appendix B.

thumbnail Fig. 13

Multi-band summary plots for IE and the NIR bands. Left to right: median bias ℬ; dispersion (standard deviation) 𝒟; and fraction of outliers 𝒪. Top to bottom: IE, YE, JE, and HE. Each line and colour correspond to a different code as indicated in the legends; note the different y-axis scales.

5.2.1 Bias in IE runs

In the IE DSb4 runs, for Galapagos-2 and ProFit, the typical absolute bias in the measured flux ℬbin is below 1% for bright sources (IE < 23), increasing to 10–15% at the faint end. This is likely due to contamination from nearby brighter sources, and/or to the inherent difficulty in fitting low S/N objects with analytical profiles. DeepLeGATo has a less stable trend, with slightly larger values of bias for intermediate and bright sources, but a lower bias at the faint end. Finally, SourceXtractor++ shows the most stable trend, without a strong overestimation at the faint end, despite a more pronounced average bias of 2–3%.

In the SS runs, in most cases we see more stable trends, with monotonic trends towards overestimation of fluxes with decreasing brightness reaching about 5% bias at the faint end. Noticeably, ProFit behaves differently, slightly underestimating intermediate magnitude sources before starting to overestimate at the faint end. An even more striking exception is given by Morphometryka (SS was the only provided output dataset), which has a clearly declining monotonic trend, reaching δf ≃ −15% at IE ≃ 25.3.

The situation is completely different in the RM realisation, where all three of the codes that provided results have similar declining trends in the bias ℬ, with faint sources typically having fluxes underestimated by about 10–15% at IE = 24–25. This is at odds with the results from the other realisations. We checked that this is not an issue in the simulations: while minor inconsistencies between the input true fluxes in the catalogue and the actual realisations of the sources in the images can be present because of the simulation method (see Sect. 2.1), we found that the impact of this is negligible, with a typical mean offset of about 0.05% and some scatter in the values that is not sufficient to explain the global trends of the measurements. We postpone further investigation on this topic to future work, given that it does not strongly impact the present analysis; the trend is almost identical for the three considered software packages, leaving the comparison among them essentially unchanged.

thumbnail Fig. 14

Same as Fig. 13, for LSST bands. Top to bottom: u, g, i, i, z. Note the different y-axis scales.

5.2.2 Dispersion in IE runs

The dispersion 𝒟bin for all codes is typically comparable to σtrue at the faint end in all realisations. In DSb4, at IE < 23 there is a hierarchy of performance with DeepLeGATo reaching values around 3.0 (again probably because of the limited dimensions of the stamps), ProFit reaching around 2.0, SourceXtractor++ 1.5 and Galapagos-2 going from 0.5 to 1.5. In SS this hierarchy is less pronounced with all codes, including Morfometryka, staying below 2.0 at all magnitudes; the evident exception is again DeepLeGATo. In the RM case again we see similar (and sub-optimal) trends for the three codes that provided results.

5.2.3 Outliers in IE runs

The outlier fraction 𝒪bin in DSb4 stays below 10% at all magnitudes for SourceXtractor, and goes from very low values to 20% at faint magnitudes for Galapagos-2. ProFit reaches 30%, while again DeepLeGATo suffers from the limited dimensions of the stamps, reaching 100% outliers at the bright end, while remaining close to zero at IE > 23. In SS, SourceXtractor and Galapagos-2 stay below 10% at all magnitudes, while Morfometryka has large values (around 50%) at the bright end, likely because of a sub-optimal estimate of bulge-dominated sources (see Appendix B.3). Again in the RM case there are no major differences between the quality of the performance, except for SourceXtractor performing better on IE < 22 objects.

5.2.4 The free Sérsic bulge case

There is no substantial difference between the fixed and free nbulge cases in the DS realisation. We do not show the trends for the free nbulge case; as mentioned, the latter yields slightly better results than the fixed nbulge = 4 case for SourceXtractor++, reducing the bias at all magnitudes, and for ProFit, which has better trends for all diagnostics; for Galapagos-2, the bias is almost identical in the two runs, while the dispersion and the outlier fraction have opposite trends (the free case having many more bright outliers but, simultaneously, a lower dispersion ratio for the few well-behaved sources), resulting in a similar final score (see Fig. 10).

5.2.5 The multi-band dataset

In the multi-band dataset (for which only Galapagos-2, ProFit, and SourceXtractor++ provided results; see Figs. 13 and 14) the trends are generally similar to those of the IE case, with larger values of bias at the faint end for all codes in the NIR and LSST bands. In particular, ProFit reaches ℬbin ≃ 1.5 in the NIR bands, and approximately 4 in the LSST u and g bands, likely because of the independent fits performed on low S/N sources (we do not show the corresponding points in the plots, for readability); there is also a particular trend in the LSST bands, with sources being underestimated at intermediate magnitudes before turning to a strong overestimation at the faint end (we were not able to find an easy explanation for this). Galapagos-2 and SourceXtractor obtain better (and similar) results in NIR thanks to the simultanous fit, reaching ℬbin ≃ 0.1; in LSST they behave similarly as well, with SourceXtractor++ performing clearly better only in the u band. Here, the free nbulge fit generally yields slightly worse results in dispersion and outlier fraction for Galapagos-2, and slightly better results for ProFit, than the fixed nbulge fit (not shown in these plots; see again Fig. 10).

5.3. Separated bulge and disc estimates

So far we have considered total fluxes, but it is also instructive to investigate how the software packages performed in the separate flux measurements of the two components of each galaxy (bulge and disc) in the DS realisation. It is worth stressing that both estimates can be individually worse than the total flux one, if their sum is close to the true total value, but the partition among bulge and disc is not well recovered; this is linked to the accuracy of the morphological parameter estimates, discussed in Euclid Collaboration (2023).

In Figs. 1518, we show the summary plots for bulges and discs separately. In IE, evident features are the bias trends for Galapagos-2 and ProFit, both overestimating the flux of bulges at the faint end and underestimating that of discs at all magnitudes, by approximately 5% down to IE ≃ 22 and then getting worse; DeepLeGATo and SourceXtractor++ show a strong underestimating trend for faint bulges, while showing a reasonable accuracy for discs at all magnitudes (DeepLeGATo overestimating their flux by a few percent). The dispersion and the outlier fraction are generally larger for bulges (often reaching very high values) than for discs, in particular at intermediate and faint magnitudes, where discs show nearly optimal values. For bulges, the dispersion has very similar trends for all codes, while for discs SourceXtractor++ performs better, followed by Galapagos-2, while ProFit and DeepLeGATo suffer at the bright end. SourceXtractor++ also yields the best performance concerning the outlier fraction, both for bulges and for discs.

Similar observations can be made for the other bands. In the NIR bands, the bias of bulges for Galapagos-2 and SourceXtractor++ show a similar underestimation of 3–5% at intermediate magnitudes, and all codes show a strong over-estimation at the faint end, particularly dramatic for ProFit. For discs, SourceXtractor++ stays close to zero quite firmly, Galapagos-2 shows an underestimation at the faint end balancing out the bulge overestimation, while ProFit again shows an overestimation. In LSST bands, Galapagos-2 shows the same opposite trend for bulges and discs at faint magnitudes; interestingly, SourceXtractor++ tends towards the reverse (underestimating bulges and counterbalancing with an overes-timation of discs). ProFit still shows the strong rising trend in all cases.

The trend of the dispersion for bulges is very similar for the three codes, all having 𝒟 ≃ 2.0–3.0 for the bulk of the bins in all bands; for discs, better values are found towards the faint end, again with SourceXtractor++ performing slightly better than the other two codes. Finally, bulges typically have very high outlier fractions, typically above 50% in all bins in the NIR bands, while having lower values at the bright end in the LSST bands, but always peaking at values above 75% at intermediate magnitudes; on the other hand, discs have a lower fraction of outliers at the faint end.

thumbnail Fig. 15

Trends for bulges only, in the multi-band DSb4 fit, for IE and the NIR bands. Left to right: median bias ℬ; dispersion (standard deviation) 𝒟; and fraction of outliers 𝒪. Top to bottom: IE, YE, JE, and HE. Each line and colour correspond to a different code as indicated in the legends; note the different y-axis scales.

thumbnail Fig. 16

Same as Fig. 15, but for discs only.

thumbnail Fig. 17

Trends for bulges only, in the multi-band DSb4 fit (as in Fig. 15), for LSST bands. Top to bottom: u, g, r, i, and z.

thumbnail Fig. 18

Same as Fig. 17, but for discs only.

5.4. Colours in the multi-band dataset

Thus far, we have focussed on the accuracy of the flux estimates. However, for the multi-band dataset, it is also interesting to examine how accurately the three software packages that provided results were able to recover flux ratios among the bands, i.e. colours. Here too, the multi-dimensionality of the data requires some effort to summarise the results in a few informative plots. In each small sub-panel of Figs. 19 (Galapagos-2), 20 (ProFit), and 21 (SourceXtractor++), we show the comparison between a measured colour and its corresponding true value. For each fitted galaxy (colour-coded by its true IE magnitude) the measured colour is the y-axis coordinate, and the true colour is the x-axis coordinate (so a perfect colour estimate would result in a diagonal line from the bottom left to the top right corners; for reference, we plot this line in each panel). The figures show all possible combinations between the nine bands, giving an overview of the results. In each small subpanel, the considered colour is given by the x-axis band label and the y-axis band labels, so for example the first sub-panel in the upper left corner is the IEYE colour, the one below it is the IEJE colour, and so on. A distribution of points with a narrow vertical strip, which is often seen (e.g. in colours including IE), imply that while the true colours are close to zero, the measured ones have a large dispersion; if the distribution of points follows the diagonal the colour is estimated with good accuracy.

In each panel the values of two statistical diagnostics are also shown. Defining δc = |colourmeas − colourtrue|/(1 + colourtrue), the first number is the normalised median absolute deviation, NMAD = 1.48ℳ(δc) considering the sources with δc ≤ 0.2 (ℳ is the median of the distribution); and the second one is η, the percentage of outliers having δc > 0.2. The upper bigger panel in the figures shows these two values for the colours including IE, as reported in the legend. As a further example, the lower bigger panels show a colour-colour diagram, IEg versus IEHE; again, plotted are all measured colours of individual galaxies, colour-coded by their input IE magnitudes; the density contours shown as solid lines represent the true colour distribution.

Obviously, fainter sources have more scattered distributions. In all cases, the colours including the u band are clearly the least accurate, with large NMAD and η; this is easily understandable given the fact that deep-sky galaxies are typically fainter in blue bands, because of intrinsic properties, dust absorption, and cosmological redshifting; also, the band itself is quite shallow (mlim = 23.6 at 10σ).

These diagrams again show how the impact of the different approaches used by the three participants affect the results. ProFit, using independent fits in each band, obtains the least accurate results both in terms of NMAD and η. Galapagos-2 and SourceXtractor++ perform significantly better, probably because of the multi-band simultaneous fit. Between the two, SourceXtractor++ yields slightly better results, most likely due to the careful prior calibration used (see Appendix B.5.2).

thumbnail Fig. 19

Colours estimates for Galapagos-2. Each small sub-panel on the left part of the plot shows the measured versus true colour (magnitude difference) for a pair of bands: the colour is always x-axis label – y-axis label. The numbers in each panel are the NMAD and η, as defined in the text. The upper larger panel shows the values of NMAD and η for the colours of all bands with respect to IE, as reported in the legend. The rightmost larger panel shows the IEg versus IEHE colour-colour diagram; points are the measured colours for individual galaxies, while the black lines are density contour levels of the corresponding true colours distribution.

thumbnail Fig. 20

Same as Fig. 19, but for ProFit.

5.5. Uncertainty budgets

We computed a final quantity of interest, the fraction of sources that have a measurement of flux consistent with the true value within their nominal uncertainty budget, as provided in the output catalogues: in other words, |fmeansftrue| < σmeans (e.g. Häußler et al. 2007). This fraction should in principle be 68.3% (assuming a Gaussian distribution of the measurements, a reasonable assumption given that the noise map is predominantly Gaussian at least when the photon noise is not relevant, i.e. for intermediate and faint galaxies).

For the codes that provided magnitudes, we computed the errors on fluxes as σf = 0.921f σm. For the double component runs, we estimated a global uncertainty with the usual approach, i.e. σtot = b/t σbulge + (1 − b/t) σdisc, since we cannot consider the errors on the two components as independent. We show the results in Figs. 22 (IE) and 23 (multi-band DSb4). It is evident that uncertainties are typically strongly underestimated for bright sources despite their per pixel RMS being enhanced by the contribution of photon noise, while for faint sources they are better estimated, in a statistical sense (with variations from case to case).

A possible explanation for these trends is that bright sources are fit with less uncertainty because of their high S/N, but the uncertainty budget (which is based on the information in the RMS map) does not account for systematic errors that cause the actual offsets in the measurements. We note that in the evaluation of the bias ℬ we considered the relative offset δf, which is typically small for bright sources even when the absolute offset |fmeansftrue| is large.

At the faint end, ProFit yields substantially less accurate results than Galapagos-2 and SourceXtractor++, in all bands. It must be pointed out that the version of DeepLeGATo that was used for the EMC did not include a consistent algorithm for the uncertainty budget estimation. Therefore, we have not included DeepLeGATo in this analysis. A new version including error estimation is under development. The uncertainty estimation in Morfometryka takes into account the RMS map, however it only represents the fit parameters uncertainty, since this is extracted from the covariance matrix of the fit. For fluxes and magnitudes, the quoted uncertainty is only due to the fitted radius and underlying region used to integrate the flux.

thumbnail Fig. 21

Same as Fig. 19, but for SourceXtractor++.

thumbnail Fig. 22

Summary plots showing the fraction of sources satisfying the relation |fmeasftrue| < σmeas in each magnitude bin, for the IE band runs. Expected fractions should be equal to 0.683 (horizontal dashed line). Left to right: DSb4, SS, RM. Each symbol and colour correspond to a different code, as indicated in the legends.

thumbnail Fig. 23

Same as Fig. 22, but for the DSb4 multi-band run. Each panel refers to a band, as indicated in the titles (we note that IE is the last panel, for readability of the plot).

6 Conclusions

This is the first of two papers presenting and discussing the EMC, a project aimed at comparing the results of model-fitting software packages on a set of simulated Euclid observations, with the aim of providing a first quantitative study to define the best-suited algorithm to be included in the Euclid Science Ground Segment processing pipeline. Five model-fitting tools (DeepLeGATo, Galapagos-2, Morfometryka, ProFit, and SourceXtractor++) were tested in the challenge, providing partial or complete results. It is worth stressing that the results obtained for the EMC and discussed in this paper and in Euclid Collaboration (2023) were used by the developers to improve and, in some cases, significantly upgrade the software packages, which is per se a relevant outcome of this work.

While the companion paper (Euclid Collaboration 2023) focusses on results concerning the morphological parameters, in this work we have presented the simulated dataset and focussed on the results concerning photometric estimates only. We used the code Egg (Schreiber et al. 2017) to produce mock galaxy catalogues, which were then exploited to create synthetic images in three different realisations: two using GalSim (Rowe et al. 2015), so as to generate SS and DS analytical profiles (the latter with nbulge=4 and ndisc=1 and varying bulge-to-total luminosity ratios); and one with the method described in Euclid Collaboration (2022a) and Lanusse et al. (2021) to obtain realistic morphologies by means of a CNN technique (RM). We did not simulate irregular galaxies, which constitute a substantial fraction of the high-redshift population, and we did not include AGNs. Furthermore, the distribution of the bulge-to-total ratios of the simulated galaxies is more skewed towards disc-dominated objects than in the real Universe (see Fig. 4); this implies that any result showing a dependence on this feature must be considered as statistically optimistic.

We created five fields of view in the Euclid IE band with an area of 0.482 deg2 each, and for one of them we also simulated eight additional bands in the double Sérsic realisation (Euclid NIR YE, JE, and HE, plus five Rubin/LSST bands u, g, r, i, and z). The images were produced with zero background flux (i.e., already ‘background subtracted'). Gaussian noise realisations mimicking the expected depths of the Euclid Wide Survey were added to the noiseless images. The analytical profiles of the galaxies were finally randomised, following a consistent Poissonian distribution. We simulated RMS maps including photon noise; they were provided to the participants along with the PSFs of each field, and lists of objects with nominal S/N > 5 in IE, for which the actual centroid positions in pixel coordinates were given. The requested output consisted of the photometric (and morphological) estimates for all of the objects in these lists.

To allow for a quantitative analysis of the results and for a comparison of the accuracy of the codes, we defined a metric as described in Sect. 4, computing three quantities related to the median bias in the flux measurements (ℬ), the mean dispersion of the measured flux distribution (𝒟), and the outlier fraction (𝒪). These quantities were evaluated on a subset of the dataset including only the sources for which all participants provided a fit, and we also computed a further value quantifying the fraction of sources of the input list for which a software package succeeded to provide a successful fit (𝒞). Finally, we combined them in a single figure of merit S; the lower the value of 𝒮, the better the global fit was, with 𝒮 ≤ 1 being considered optimal performance. The global results are summarised in Table 2 and Figs. 914. We also analysed how accurately colours were retrieved, and the reliability of the uncertainty estimates. The computational times and some in-depth analysis for each individual software package are discussed in the appendix.

In general, all the participants provided acceptable to good results in at least some of the tests, with a few differences from case to case. A thorough comparison could only be performed with the results of ProFit, Galapagos-2, and SourceXtractor++ (the first two provided all the requested outputs, and the third only lacked one part of a set, namely the free nbulge run in the double Sérsic multi-band realisation). DeepLeGATo only provided fits for the IE analytical realisations, yielding good results with the caveat of a S/N-dependent calibration of the models, while Morfometryka only provided data for the single Sérsic set; therefore, their contribution to the challenge can only be considered as indicative of their potential.

Considering the 𝒮 values, among the three codes that provided the complete output, the best performance was obtained by SourceXtractor++ in all realisations. It is worth stressing again that the pipeline it adopted for the EMC was the most tuned on the dataset, with the inclusion of a substantial pre-processing of the data, and the priors being modelled using the samples of true values that were provided along with the simulated images. In the analytical realisations (SS and DS), SourceXtractor++ and Galapagos-2 reached typical values in the bias of the measured fluxes with respect to the true values below 1% at a S/N above 10 in all bands, and above 5 in IE (in at least one of the possible configurations of the codes for which results were provided). For these two codes, the dispersion is typically slightly larger than, but comparable to, the one expected from a theoretical distribution obtained by perturbing the input true fluxes with simulated observational noise consistent with the nominal depths of the images; a small fraction of outliers (objects with a relative error in the fitted flux larger than 5 times the expected theoretical distribution) is always present. ProFit also yielded good results in SS, but was less accurate on the DS realisation (Galapagos-2 also being sub-optimal at the faint end in IE for this realisation). The three software packages performed similarly on the RM dataset, and in all cases results were poorer with respect to the analytical realisations; this seems to imply that the overall quality of the fits might be sub-optimal when dealing with real data. However, further investigation is needed to assess to what extent the quality of the simulation might have impacted the results.

All codes tend to underestimate the uncertainties at the bright end, but statistically provide a reasonable estimate for intermediate and faint sources. Finally, a relevant outcome is the importance of the simultaneous fitting of the multi-band dataset: using the information provided by the deep and high resolution IE band helps to obtain more accurate measurements on the shallower NIR bands, and noticeably also on the ground-based LSST bands, both in terms of absolute fluxes and colours. Once again this highlights how the synergy between the two surveys will be of paramount important for the success of both.

We did not attempt to investigate the performance of the model-fitting algorithms in the context of the Euclid Deep Survey (see Laureijs et al. 2011). It is reasonable to expect that the results should be similar to those obtained for the Wide Survey at an equivalent S/N, although the increased level of contamination and blending might worsen the performance. Dedicated simulations would be required to quantitatively confirm this hypothesis, but this is beyond the scope of the present study.

The results of the EMC set the baseline to decide which algorithm is to be implemented in the Euclid pipeline. One possible way to proceed might be to use the computational power of DeepLeGATo to provide fast and reliable priors to SourceXtractor++; we will investigate this option in the future. However, given the many different approaches and techniques adopted by the participants (different use of priors, different pre-processing strategies, different approaches for multiband fitting, etc.) and the other parameters not included in the metric that should be taken into account (such as the computing time, required resources, compatibility with the current pipeline, accuracy of uncertainty budget estimates, etc.), it is important to stress that the results presented in this work and in Euclid Collaboration (2023) must be interpreted with caution. We followed one possible approach to analyse, in a compact way, a very complex and multi-layered dataset; and we emphasise that some of the software packages might be better suited than the ones that obtained the best score here for other specific science cases. We invite the interested reader to check the full set of results using the online tool. Future work will include testing at least some of the software packages in more realistic environments using Euclid official simulations and, finally, the implementation in the Euclid pipeline of the chosen algorithms.

Acknowledgements

The Euclid Consortium acknowledges the European Space Agency and a number of agencies and institutes that have supported the development of Euclid, in particular the Academy of Finland, the Agenzia Spaziale Italiana, the Belgian Science Policy, the Canadian Euclid Consortium, the French Centre National d'Etudes Spatiales, the Deutsches Zentrum für Luft- und Raumfahrt, the Danish Space Research Institute, the Fundação para a Ciência e a Tecnologia, the Ministerio de Ciencia e Innovación, the National Aeronautics and Space Administration, the National Astronomical Observatory of Japan, the Netherlandse Onderzoekschool Voor Astronomie, the Norwegian Space Agency, the Romanian Space Agency, the State Secretariat for Education, Research and Innovation (SERI) at the Swiss Space Office (SSO), and the United Kingdom Space Agency. A complete and detailed list is available on the Euclid web site (http://www.euclid-ec.org).

Appendix A Technical information

Here we provide detailed information about the technical realisation of the noise background and of the RMS maps for the dataset.

A.1 Noise maps and zero-points

We created Gaussian background noise maps using SkyMaker (Bertin 2009). The software requires the value of the expected background surface brightness (SBbkg, in magnitude arcsec−2) and the observational ZP at 1 second of exposure as inputs, to produce a map at the desired depth. The conceptual steps to compute the ZP, following (Euclid Collaboration 2019), are the following. Given a desired limiting magnitude mlim within a given area A (e.g. in square arcseconds), the corresponding limiting flux for the total exposure time is

(A.1)

where g is the gain of the detector. The uncertainty per pixel from the background for the same source is

(A.2)

where l is the pixel scale, and for the second step we assume that the contribution of the read-out noise σRON is negligible. Assuming that σbkg,pixel is constant, the total noise within the aperture is

(A.3)

so if r is the radius of the aperture relative to A, , the background S/N reads

(A.4)

Finally, the formula can be inverted to make the expression for ZP explicit:

(A.5)

where it can be noticed that the pixel-scale terms l2 have canceled out.

We computed the ZP of each image putting S/N = 10 and considering the corresponding mlim computed in 2″ apertures (so r = 1″ in the above formulas), SBbkg, and exposure times as given in Table 1. We point out again that this method is used to compute the ZP for 1 second of exposure (to obtain the ZP at the given exposure time, a term 2.5 log10 texp should be added).

A.2 RMS maps and photon noise

To obtain the RMS map for each scientific image, first of all we produced a map with a constant value equal to the standard deviation of the original Gaussian background noise image, which only provides information on the noise due to the unresolved sky background and undetected faint sources.

The flux of the photons arriving from galaxies is a Poissonian distribution and contributes to the uncertainty on the actual flux estimate. This contribution (photon noise) was added to the background noise map with the following procedure. Given an arbitrary exposure time t, the pixel values St = Ct + skyt are due to the sum of the counts coming from sources (Ct) and from the background (skyt). The noise per pixel Nt is found by summing in quadrature the sky contribution RMSsky,t and the photon noise contribution ):

(A.6)

Considering a science image normalised to t = 1 second, in which pixels have values S1 = St/t, with N1 being the corresponding RMS map pixel values, the S/N of each pixel must be conserved, that is

(A.7)

This relation leads to the procedure to build the noise map N1 of the normalised image as

(A.8)

By combining Eqs. A.6 and A.8, and considering that Ct = C1t we finally obtain

(A.9)

Appendix B Computational times and separate analysis of the performance of each software package

In this section we provide information about the computational times and memory workload required by each software package to complete the runs, and analyse in more detail some particular cases in which the results were especially interesting, adding some final remarks where appropriate.

B.1 DeepLeGATo

DeepLeGATo exploits neural network models, that for the EMC were trained on a single Tesla V100 GPU. The output was provided in three separate catalogues of different minimum S/N, according to the input source lists that were distributed to the participants; a different model was used for each S/N section, and the training time was of the order of 2 h per model. At inference, it then took a few minutes to ‘fit’ all galaxies in a simulated field; therefore, concerning computational times DeepLeGATo has by far the best performance, due to the very different fitting technique.

thumbnail Fig. B.1

Examples of diagnostic plots for DeepLeGATo. Top, IE SS F4 run; bottom, IE DS F4 run. See Fig. 8 for a description of this and the subsequent similar plots. We note that the colour coding is given by the value of the bulge fraction b/t in the original Egg double-Sérsic catalogue; it can be considered as a proxy for the value of the Sérsic index, n = 3 b/t + 1.

We stress again that the subdivision of the catalogue into sub-samples fitted with different models caused evident features in the global distribution of δf, though the average trends and the final 𝒮 values were generally quite good. In particular, the bright sources in each section were systematically underestimated while the faint ones were overestimated. There were also many bright outliers especially in the DS runs, with bulges typically being underestimated and discs being overestimated; this is likely to be a consequence of the CNN fitting technique (fixed dimensions of the stamps, and less training at the bright end). Figure B.1 provides an example of these features.

B.2 Galapagos-2

Galapagos-2 employs the down-hill gradient minimisation method of Galfit/GalfitM, so it is typically fast, and can be reasonably parallelised to use 16 cores (due to an IDL limitation; ways around this limit exist, as it can also run several sessions in parallel). For the EMC, the code was run on a machine with 48 cores (2.3 GHz each) at 96 threads and ample (512 GB) memory, using 12 threads (at 1.15 GHz) at a time (and 4 sessions in parallel) for multi-band fits. The run times of the Galfit fits themselves were measured and stored, and account for the majority of the time used, with the wrapper not adding significant computational time. The typical GalfitM fitting times were approximately as follows:

  • DS F1 single-band single Sérsic fit, 160 hours (about 5.7 seconds per galaxy);

  • DS F1 single-band double Sérsic fit, 150 hours (about 5.4 seconds per galaxy);

  • DS F0 multi-band single Sérsic fit, 8000 hours (about 288 seconds per galaxy);

  • DS F0 multi-band double Sérsic fit, 10000 hours (about 360 seconds per galaxy).

Therefore the multi-band fitting, while providing the results for nine bands in parallel, took significantly more than the time elapsed to fit individual bands. However, as shown, it clearly provides better results for the shallower bands.

The overall results were good, with no evident pathological behaviour. The performance on IE in the multi-band fit DS F0, i.e. obtained simultaneously with the other bands, is significantly worse than those on other fields, where IE is measured alone (see Fig. B.2); this is not surprising, because when fitting one band individually all the parameters get optimised to minimise that band, while fitting more bands in parallel the structural parameters are minimised averaging over the wavelengths (and this is particularly true if the band is the deepest and the one with the higher resolution, as is the case for IE). However, it should be pointed out that multi-band fits ‘fail’ less often than the combination of several single-band fits, which have independent constraints (see Häußler et al. 2013).

The total magnitude estimated with a single Sérsic fit (which is always provided by the code) is typically better than the one from the double Sérsic in IE, even on the DS simulation (Fig. B.3). On the contrary, the double Sérsic magnitude estimate is better for the other bands (Fig. B.4). While the latter result is expected, given that the code is fitting a double profile with two separate analytical functions (and, the shallower the band the more the multi-band fit should improve the results, so the NIR and LSST bands should benefit more than IE from it), the opposite outcome found on IE images is more surprising. We point out that since the other teams did not provide single Sérsic flux estimates for DS runs, we could not check whether this particular result is a general one, or only concerns Galapagos-2.

Interestingly, the fit of the LSST i band on the DS F0 dataset by Galapagos-2 is substantially worse than the fits provided for the other bands. After an in-depth analysis, we concluded that this is actually due to the polynomial fitting method used in the software. In short, Galapagos-2 works with effective central wavelengths, and when performing a multi-band fit on a dataset including two bands with a very close central wavelength, the shallower of the two bands might get a sub-optimal fit with a systematic offset. This is exactly the case here, with the LSST i band having a central wavelength very close to the IE one ( and λi = 754 nm), and very different filter widths; see Sect. 2.1 and Fig. 1. Noticeably, this does not affect the whole catalogue, but only a fraction of the sources. Despite trying to find a correlation with some other input or output parameter of the dataset, we were not able to identify the criteria resulting in a good or a bad fit for a given source. Nevertheless, we verified the reliability of our conclusion by picking one source having a bad fit, and looking at the residual images produced by Galapagos-2 in the full multi-band run, and in a test run in which the IE band information was not used to fit the i band. The result of the test is shown in Fig. B.5: while the other LSST bands are well-fitted in both cases, when the IE band is used the i band has an evident residual, which flattens out when excluding IE. Of course, this cannot be considered a viable, definitive ‘solution’ to the issue, since IE is the most important and deepest band, so it cannot be excluded from the fit to improve the fit in another band; on the contrary, one might want to exclude the i band from the fit, although this would imply entirely discarding a band and its information. Therefore, for the sake of comparison with the other software packages we agreed to keep the delivered dataset including this issue, although it non-negligibly worsens the statistics. Finally, we note that the issue is more evident when using the fluxes of the single Sérsic fit, while it is somehow mitigated if the double Sérsic fit is considered (Galapagos-2 provides both for the DS realisation, which is the only one including LSST bands).

thumbnail Fig. B.2

Galapagos-2results on F0(top) and F1 (bottom) DS IE band (one component fits). The fit on F1, performed on the IE image only rather than in multi-band mode, is more accurate. See text for details.

thumbnail Fig. B.3

Galapagos-2 results on F1 DS IE band, one (top) and two (bottom; nbulge fixed) component fits. The one-component (i.e. single Sérsic) fit is more accurate in IE-only runs. See text for details.

thumbnail Fig. B.4

Galapagos-2 results on F0 DS LSST g band, one (top) and two (bottom; nbulge fixed) component fits. The two-component (double Sérsic) fit is more accurate for non-IE bands in the multi-band simultaneous fit. See text for details.

B.3 Morfometryka

Morfometryka runs were performed on an Intel(R) Xeon(R) CPU E5-2640 v3 2.60 GHz shared 32-core workstation, running 16 jobs in parallel at any given time. The wall time was roughly 920 minutes per field, which is roughly 10 seconds per source. For each run, Morfometryka spends around 30% of its running time for the Sérsic profile fits, and the remaining time is dedicated to other measurements; however, runtimes strongly depend on the input image size.

There is an evident dependence on morphological type in the results. Bulge-dominated galaxies (having high Sérsic index, which in the SS realisation is obtained from the bulge-to-total ratio given in the Egg catalogue, see Sect. 2.1) are strongly biased (ℬ ≃ 0.15), and therefore are considered outliers in the computation of 𝒮, thus significantly affecting the accuracy and resulting in a sub-optimal overall performance (see Fig. B.7).

thumbnail Fig. B.5

Test on the Galapagos-2 LSST i band issue. In both sets of panels, for a chosen badly fitted source (ID 2158 in F0), top to bottom we show scientific image; Galapagos-2 models; and residuals. Left to right are LSST g, r, i, and z bands. Top set: reference run including IE information in the fit. Bottom set: test run without including IE information in the fit. The residual in the i band, evident in the upper panel, vanishes in the bottom panel, when IE is not included. See text for details.

thumbnail Fig. B.6

Galapagos-2 results on F0 LSST i band. The evident double trend is due to the multi-band simultaneous fitting (see Sect. B.2).

B.4 ProFit

ProFit ran at a mean single-core time of 144 minutes per run, resulting in about 93 000 hours of total CPU time; the runs were performed on the Magnus super computer (operated by Pawsey in Western Australia), which is a multi-node cluster where each node is 24 core (48 thread), comprised of two Intel Xeon E5-2690 v3 (Haswell) 12-core CPUs.

thumbnail Fig. B.7

Example of diagnostic plot for Morfometryka. This plot refers to the IE SS F2 run. The colour coding clearly shows that the bimodal distribution of points correlates with the input bulge-to-total fraction, with bulge-dominated galaxies (bluer points) being fitted with less accuracy than disc-dominated ones (redder points).

The typical single Sérsic profile fit took a little under a minute per object, and a double Sérsic fit around two minutes on average; this included the whole processing, but in practice the vast majority of the time was spent doing the ProFit optimisation, with the ProFound detection stage usually taking about one second. This was deemed enough time to find a good global solution per fit, but it was not long enough to thoroughly explore uncertainties, as shown in Sect. 5.5.

Good results were provided for the SS realisation, but much less so on the DS and RM ones. The faint end is typically strongly biased (fluxes are overestimated); LSST bands also have a ‘U-shaped’ trend (see Figs. B.8), for which we were not able to find a simple explanation.

B.5 SourceXtractor++

Single-band runs (SS, RM, DS on IE) were performed on a small cluster at LMU Munich, with 56 cores and 2 GB of RAM per core. Computational times were of about 50 minutes per field for single Sérsic runs, and 2 hours per field for double Sérsic runs.

Multi-band runs were performed on a local server at the Institut Astrophysique de Paris, with 24 AMD EPYC 7402P cores (ignoring hyperthreading) running at 2.8 GHz, using an improved version of the code with increased parallelisation efficiency. The computational time was 45 hours (it is worth recalling that the NIR and LSST bands were rebinned to their original pixel scales, so their sizes were smaller than the IE one – respectively 12 500 and 8333 pixels per side).

The results were generally good, with no evident trends or pathological behaviour. Among the provided output, SourceXtractor++ typically obtained the best values for the global diagnostics. Interestingly, in the DS realisation the runs with free nbulge typically yielded slightly better results than those with fixed nbulge = 4, in terms of average bias (see Fig. B.9).

As mentioned, the processing pipeline adopted for the EMC included a substantial pre-processing of the dataset, and the priors used for the measurements were calibrated on the provided samples of true values, albeit only in a statistical sense (see below).

thumbnail Fig. B.8

Example of diagnostic plot for ProFit. Top to bottom: IE SS F2; IE DS F2 (fixed bulge); and LSST g F0. The IE SS fit is substantially better than the DS fit, and there is a ‘U-shaped’ trend in the LSST bands.

B.5.1 Re-runs on raw data

The NIR and LSST images were rebinned back to their original pixel scales (while the ones provided in the dataset shared with the participants had been rebinned to the IE pixel scale of 0.1″). Additionally, the PSFs used in the fitting process were not those provided in the dataset; instead, they were extracted from the images using PSFEx. This did not require any additional input, and therefore it was not outside the guidelines of the challenge. Still, we felt it was necessary to check the performance of the software on the non-processed images and using the official PSFs, for a fairer comparison with the other participants. To this end, a few additional runs were performed by the SourceXtractor++ team (using version 0.16 of the code instead of 0.12), and we compared the results with the ones officially provided for the EMC. The values of the four diagnostics 𝒞, ℬ, 𝒟, and 𝒪, and of the global 𝒮, for both the EMC runs and the re-run, are given in Table B.1. Some trends for the DSb4 multi-band realisation are also shown in Fig. B.10 as examples (namely the IE, HE, and g trends, since the aim was to check the impact of the re-extraction of PSFs and of the re-rebinning of images with native pixel scale different from the IE one). The results do not seem to significantly improve because of the preprocessing; most of the values and trends look similar, and in some cases even slightly better in the re-run. The evident exception is given by the NIR bands, in particular concerning the median bias; including the weighting factors, the values of 𝒮 are significantly worse in the re-run, albeit still reasonably good (i.e. close to 1).

thumbnail Fig. B.9

Example of diagnostic plot for SourceXtractor++. These plots refers to the IE DS F2 run; the upper panel shows the fit with nbulge = 4, while the bottom panel shows the one with nbulge left free to vary.

These re-runs required 40 minutes per field for the single Sérsic runs (SS and RM), and 1 hour and 20 minutes per field for the double Sérsic IE runs (using 28 cores). Finally, the multiband re-run was performed on a different set of nodes, with 32 AMD EPYC 7302 16-Cores at 3.2 GHz, and required 67 hours.

B.5.2 SourceXtractor++ priors

As mentioned, SourceXtractor++ priors were obtained by determining an appropriate transfer function to map each prior to a Gaussian mimicking the distributions of the provided samples of the input true catalogues. Each parameter was calibrated independently, without including covariances; only the statistical distributions were used (i.e. there was no object-by-object comparison in the process). We have shown how this choice likely had a significant impact on the accuracy of the results.

Even though SourceXtractor++ models source ellipticity internally, with the standard axis ratio q and position angle φ variables, it was found useful to express the priors needed for these two degrees of freedom in terms of two e1, e2 Cartesian ellipticity parameters (that usually enter weak lensing studies). More specifically, in complex notation, . Hence, a Normal prior on both e1 and e2 was set, centred on zero and with σ = 0.3. This was in broad agreement with the population distribution of axis ratios in the SS F4 simulation, for which ground truth was provided. A similar prior was built for the Sérsic index, as well as for the effective radius.

Table B.1

Example values of the diagnostics for the SourceXtractor++ runs in the output catalogues, provided for the EMC (top value in each box) and in corresponding runs with non-pre-processed images and PSFs (i.e., using those provided for the challenge; bottom value in each box).

thumbnail Fig. B.10

SourceXtractor++ re-runs on non-pre-processed (top to bottom) IE, HE, and LSST g F0 DS images (empty symbols, solid lines), compared to output provided for the EMC, obtained with pre-processing of the images (full symbols, dashed lines). Note the different limits on the axes.

In multi-component models, priors were set under the assumption that the bulge and disc of each source share a common position angle, but axis ratios are not correlated (this is not actually the case, since axis ratios do not vary across the spectrum in the Egg catalogues). Therefore the default q and φ variables were used instead of the (e1, e2) ellipticities. A single fit across all bands was run for bulge and disc effective radii, ellipticity, and position angle; the amplitudes of the bulge and disc components were left free to vary. Here again, the fitting procedure prefers the definition of a set of total magnitudes and a set of b/t (with their own associated priors), instead of using the flux of each component in each band as a set; in practice, b/t is expressed via the transform Xb/t = log10(b/t + 0.01)/(1.01 − b/t). The priors on Xb/t were set to be band dependent, but with little variation.

References

  1. Barden, M., Häußler, B., Peng, C. Y., McIntosh, D. H., & Guo, Y. 2012, MNRAS, 422, 449 [CrossRef] [Google Scholar]
  2. Bertin, E. 2009, Mem. Soc. Astron. Ita., 80, 422 [Google Scholar]
  3. Bertin, E. 2011, in ASP Conf. Ser., 442, 435 [Google Scholar]
  4. Bertin, E., & Arnouts, S. 1996, A&Amp;AS, 117, 393 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  5. Bertin, E., Mellier, Y., Radovich, M., et al. 2002, in ASP Conf. Ser., 281, 228 [Google Scholar]
  6. Bertin, E., Schefer, M., Apostolakos, N., et al. 2020, in ASP Conf. Ser., 527, 461 [NASA ADS] [Google Scholar]
  7. Capak, P., Cuillandre, J.-C., Bernardeau, F., et al. 2019, arXiv e-prints [arXiv:1904.10439] [Google Scholar]
  8. Conselice, C. J. 2003, ApJS, 147, 1 [NASA ADS] [CrossRef] [Google Scholar]
  9. Cropper, M., Pottinger, S., Niemi, S., et al. 2016, SPIE Conf. Ser., 9904, 99040Q [NASA ADS] [Google Scholar]
  10. de Albernaz Ferreira, L., & Ferrari, F. 2018, MNRAS, 473, 2701 [CrossRef] [Google Scholar]
  11. Euclid Collaboration (Martinet, N., et al.) 2019, A&A, 627, A59 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  12. Euclid Collaboration (Bretonnière, H., et al.) 2022a, A&A, 657, A90 [CrossRef] [EDP Sciences] [Google Scholar]
  13. Euclid Collaboration (Scaramella, R., et al.) 2022b, A&A, 662, A112 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  14. Euclid Collaboration (Schirmer, M., et al.) 2022c, A&A, 662, A92 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  15. Euclid Collaboration (Bretonnière, H., et al.) 2023, A&A, A&A, 671, A102 [Google Scholar]
  16. Ferrari, F., de Carvalho, R. R., & Trevisan, M. 2015, ApJ, 814, 55 [NASA ADS] [CrossRef] [Google Scholar]
  17. Graham, A. W., & Driver, S. P. 2005, PASA, 22, 118 [NASA ADS] [CrossRef] [Google Scholar]
  18. Grogin, N. A., Kocevski, D. D., Faber, S. M., et al. 2011, ApJS, 197, 35 [NASA ADS] [CrossRef] [Google Scholar]
  19. Häußler, B., McIntosh, D. H., Barden, M., et al. 2007, ApJS, 172, 615 [NASA ADS] [CrossRef] [Google Scholar]
  20. Häußler, B., Bamford, S. P., Vika, M., et al. 2013, MNRAS, 430, 330 [Google Scholar]
  21. Häußler, B., Vika, M., Bamford, S. P., et al. 2022, A&A, 664, A92 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  22. Huertas-Company, M., Pérez-González, P. G., Mei, S., et al. 2015, ApJ, 809, 95 [NASA ADS] [CrossRef] [Google Scholar]
  23. Koekemoer, A. M., Faber, S. M., Ferguson, H. C., et al. 2011, ApJS, 197, 36 [NASA ADS] [CrossRef] [Google Scholar]
  24. Kümmel, M., Schefer, M., Apostolakos, N., et al. 2020, in ASP Conf. Ser., 527, 29 [Google Scholar]
  25. Lanusse, F., Mandelbaum, R., Ravanbakhsh, S., et al. 2021, MNRAS, 504, 5543 [CrossRef] [Google Scholar]
  26. Laureijs, R., Amiaux, J., Arduini, S., et al. 2011, arXiv e-prints [arXiv:1110.3193] [Google Scholar]
  27. Lotz, J. M., Primack, J., & Madau, P. 2004, AJ, 128, 163 [NASA ADS] [CrossRef] [Google Scholar]
  28. Lucatelli, G., & Ferrari, F. 2019, MNRAS, 489, 1161 [NASA ADS] [CrossRef] [Google Scholar]
  29. Maciaszek, T., Ealet, A., Jahnke, K., et al. 2016, SPIE Conf. Ser., 9904, 99040T [NASA ADS] [Google Scholar]
  30. Mandelbaum, R., Hirata, C. M., Leauthaud, A., Massey, R. J., & Rhodes, J. 2012, MNRAS, 420, 1518 [NASA ADS] [CrossRef] [Google Scholar]
  31. Merlin, E., Fontana, A., Ferguson, H. C., et al. 2015, A&A, 582, A15 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  32. Merlin, E., Bourne, N., Castellano, M., et al. 2016, A&A, 595, A97 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  33. Merlin, E., Fortuni, F., Torelli, M., et al. 2019, MNRAS, 490, 3309 [CrossRef] [Google Scholar]
  34. Peng, C. Y., Ho, L. C., Impey, C. D., & Rix, H.-W. 2002, AJ, 124, 266 [Google Scholar]
  35. Peng, C. Y., Ho, L. C., Impey, C. D., & Rix, H.-W. 2010, AJ, 139, 2097 [Google Scholar]
  36. Peterson, J. R., Jernigan, J. G., Kahn, S. M., et al. 2015, ApJS, 218, 14 [NASA ADS] [CrossRef] [Google Scholar]
  37. Rhodes, J., Nichol, R. C., Aubourg, É., et al. 2017, ApJS, 233, 21 [Google Scholar]
  38. Robotham, A. S. G., Taranu, D. S., Tobar, R., Moffett, A., & Driver, S. P. 2017, MNRAS, 466, 1513 [NASA ADS] [CrossRef] [Google Scholar]
  39. Robotham, A. S. G., Davies, L. J. M., Driver, S. P., et al. 2018, MNRAS, 476, 3137 [NASA ADS] [CrossRef] [Google Scholar]
  40. Robotham, A. S. G., Bellstedt, S., & Driver, S. P. 2022, Astrophysics Source Code Library [record ascl:2204.018] [Google Scholar]
  41. Rowe, B. T. P., Jarvis, M., Mandelbaum, R., et al. 2015, Astron. Comput., 10, 121 [Google Scholar]
  42. Schreiber, C., Pannella, M., Leiton, R., et al. 2017, A&A, 599, A134 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  43. Sérsic, J. L. 1968, Atlas de galaxias australes (Cordoba, Argentina: Observatorio Astronomico) [Google Scholar]
  44. Smith, R. L. 1984, Oper. Res., 32, 1296 [Google Scholar]
  45. Tuccillo, D., Huertas-Company, M., Decencière, E., et al. 2018, MNRAS, 475, 894 [NASA ADS] [CrossRef] [Google Scholar]
  46. Turchin, V. F. 1971, Theory Probabil. Applic., 16, 720 [CrossRef] [Google Scholar]

1

Scientific Challenges are official Euclidean benchmark tests performed to check and validate the progress of the work in preparation for the launch of the satellite.

All Tables

Table 1

Parameters used to simulate the images.

Table 2

Software package, runs and 𝒮-values considering the common list of sources (see text for details).

Table B.1

Example values of the diagnostics for the SourceXtractor++ runs in the output catalogues, provided for the EMC (top value in each box) and in corresponding runs with non-pre-processed images and PSFs (i.e., using those provided for the challenge; bottom value in each box).

All Figures

thumbnail Fig. 1

Filter transmission curves used to simulate the images (normalised to arbitrary units) in this work. The four filters IE, YE, JE, and HE form the Euclid set, while u, g, r, i, and z are external complementary filters (from LSST/VRO). The NIR curves are early estimates of the actual transmission functions, which are described in their latest updated version in Euclid Collaboration (2022c); however, since we only use these data to compute the input fluxes, the differences are not relevant for the present work.

In the text
thumbnail Fig. 2

Distribution of redshifts as a function of IE magnitude (input values) in F0. The orange line is the running mean, the vertical dashed line the 5σ limit.

In the text
thumbnail Fig. 3

Distribution of bulge-to-total ratios as a function of IE magnitude in the input catalogue of F0; the colours encode the global scale length, defined here as r = b/t re,bulge + (1 − b/t) re,disc.

In the text
thumbnail Fig. 4

Distributions of various input parameters for the IE band, in all five simulated fields, for all the separate components and realisations in the challenge, as described in the legend and in the labels of the panels. For comparison, we also show the corresponding parameter distribution in COSMOS.

In the text
thumbnail Fig. 5

Signal-to-noise ratios of the simulated images at the native pixel scales, measured with forced photometry in 2″ diameter apertures at the input positions of the sources. Star symbols show the expected values, which are also reported in the legend.

In the text
thumbnail Fig. 6

Small region of the DS F0 realisation in the nine bands; all are shown with the same colour scale.

In the text
thumbnail Fig. 7

Small crop of the three realisations of IE images. Left to right: DS, SS, RM. Notice that in RM the orientations of the galaxies are not the same as in the other two realisations.

In the text
thumbnail Fig. 8

Example of a diagnostic plot used to analyse each output catalogue provided by the participants to the EMC. Each dot shows the δf value of a fitted galaxy, colour-coded by the b/t value from the original Egg catalogue (colour-bar on the right), as a function of the input true magnitude in the considered band. The black solid line is the running median of the distribution, which should be identically zero for a perfect fit. The dotted lines show the (positive and negative) 1 and 5 levels, used to compute the diagnostics described in the text and reported in the top left corner of the plot (see Sect. 4).

In the text
thumbnail Fig. 9

Visual summary of the 𝒮-values listed in Table 2.

In the text
thumbnail Fig. 10

Visual summary of the diagnostic quantities: (left to right) absolute value of the median bias (ℬ), average ratio of σmeas and σtrue (𝒟), and outlier fraction (𝒪). In the bias panel, the points corresponding to ProFit DS multi-band runs are not shown, being off the scale (their values are 0.49 and 0.44, respectively). See text for details.

In the text
thumbnail Fig. 11

Trends of the completeness C with input IE magnitude of the output provided by the five participants to the challenge, for (top to bottom) DS, SS, and RM realisations (in the latter, points start at IE = 20.5 because bright objects were not included in the simulated images).

In the text
thumbnail Fig. 12

DSb4 (top), SS (middle), and RM (bottom) summary plots for IE. Left to right: median bias ℬ; dispersion (standard deviation) 𝒟; and fraction of outliers O. Each line and colour correspond to a different code as indicated in the legend; note the different y-axis scales.

In the text
thumbnail Fig. 13

Multi-band summary plots for IE and the NIR bands. Left to right: median bias ℬ; dispersion (standard deviation) 𝒟; and fraction of outliers 𝒪. Top to bottom: IE, YE, JE, and HE. Each line and colour correspond to a different code as indicated in the legends; note the different y-axis scales.

In the text
thumbnail Fig. 14

Same as Fig. 13, for LSST bands. Top to bottom: u, g, i, i, z. Note the different y-axis scales.

In the text
thumbnail Fig. 15

Trends for bulges only, in the multi-band DSb4 fit, for IE and the NIR bands. Left to right: median bias ℬ; dispersion (standard deviation) 𝒟; and fraction of outliers 𝒪. Top to bottom: IE, YE, JE, and HE. Each line and colour correspond to a different code as indicated in the legends; note the different y-axis scales.

In the text
thumbnail Fig. 16

Same as Fig. 15, but for discs only.

In the text
thumbnail Fig. 17

Trends for bulges only, in the multi-band DSb4 fit (as in Fig. 15), for LSST bands. Top to bottom: u, g, r, i, and z.

In the text
thumbnail Fig. 18

Same as Fig. 17, but for discs only.

In the text
thumbnail Fig. 19

Colours estimates for Galapagos-2. Each small sub-panel on the left part of the plot shows the measured versus true colour (magnitude difference) for a pair of bands: the colour is always x-axis label – y-axis label. The numbers in each panel are the NMAD and η, as defined in the text. The upper larger panel shows the values of NMAD and η for the colours of all bands with respect to IE, as reported in the legend. The rightmost larger panel shows the IEg versus IEHE colour-colour diagram; points are the measured colours for individual galaxies, while the black lines are density contour levels of the corresponding true colours distribution.

In the text
thumbnail Fig. 20

Same as Fig. 19, but for ProFit.

In the text
thumbnail Fig. 21

Same as Fig. 19, but for SourceXtractor++.

In the text
thumbnail Fig. 22

Summary plots showing the fraction of sources satisfying the relation |fmeasftrue| < σmeas in each magnitude bin, for the IE band runs. Expected fractions should be equal to 0.683 (horizontal dashed line). Left to right: DSb4, SS, RM. Each symbol and colour correspond to a different code, as indicated in the legends.

In the text
thumbnail Fig. 23

Same as Fig. 22, but for the DSb4 multi-band run. Each panel refers to a band, as indicated in the titles (we note that IE is the last panel, for readability of the plot).

In the text
thumbnail Fig. B.1

Examples of diagnostic plots for DeepLeGATo. Top, IE SS F4 run; bottom, IE DS F4 run. See Fig. 8 for a description of this and the subsequent similar plots. We note that the colour coding is given by the value of the bulge fraction b/t in the original Egg double-Sérsic catalogue; it can be considered as a proxy for the value of the Sérsic index, n = 3 b/t + 1.

In the text
thumbnail Fig. B.2

Galapagos-2results on F0(top) and F1 (bottom) DS IE band (one component fits). The fit on F1, performed on the IE image only rather than in multi-band mode, is more accurate. See text for details.

In the text
thumbnail Fig. B.3

Galapagos-2 results on F1 DS IE band, one (top) and two (bottom; nbulge fixed) component fits. The one-component (i.e. single Sérsic) fit is more accurate in IE-only runs. See text for details.

In the text
thumbnail Fig. B.4

Galapagos-2 results on F0 DS LSST g band, one (top) and two (bottom; nbulge fixed) component fits. The two-component (double Sérsic) fit is more accurate for non-IE bands in the multi-band simultaneous fit. See text for details.

In the text
thumbnail Fig. B.5

Test on the Galapagos-2 LSST i band issue. In both sets of panels, for a chosen badly fitted source (ID 2158 in F0), top to bottom we show scientific image; Galapagos-2 models; and residuals. Left to right are LSST g, r, i, and z bands. Top set: reference run including IE information in the fit. Bottom set: test run without including IE information in the fit. The residual in the i band, evident in the upper panel, vanishes in the bottom panel, when IE is not included. See text for details.

In the text
thumbnail Fig. B.6

Galapagos-2 results on F0 LSST i band. The evident double trend is due to the multi-band simultaneous fitting (see Sect. B.2).

In the text
thumbnail Fig. B.7

Example of diagnostic plot for Morfometryka. This plot refers to the IE SS F2 run. The colour coding clearly shows that the bimodal distribution of points correlates with the input bulge-to-total fraction, with bulge-dominated galaxies (bluer points) being fitted with less accuracy than disc-dominated ones (redder points).

In the text
thumbnail Fig. B.8

Example of diagnostic plot for ProFit. Top to bottom: IE SS F2; IE DS F2 (fixed bulge); and LSST g F0. The IE SS fit is substantially better than the DS fit, and there is a ‘U-shaped’ trend in the LSST bands.

In the text
thumbnail Fig. B.9

Example of diagnostic plot for SourceXtractor++. These plots refers to the IE DS F2 run; the upper panel shows the fit with nbulge = 4, while the bottom panel shows the one with nbulge left free to vary.

In the text
thumbnail Fig. B.10

SourceXtractor++ re-runs on non-pre-processed (top to bottom) IE, HE, and LSST g F0 DS images (empty symbols, solid lines), compared to output provided for the EMC, obtained with pre-processing of the images (full symbols, dashed lines). Note the different limits on the axes.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.