Open Access
Issue
A&A
Volume 683, March 2024
Article Number A246
Number of page(s) 38
Section Numerical methods and codes
DOI https://doi.org/10.1051/0004-6361/202347758
Published online 28 March 2024

© The Authors 2024

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1 Introduction

A fundamental limitation of astronomical observations is that they only give access to the two-dimensional (2D) projection of cosmic structures onto the plane of the sky. Consequently, it is a central theme of modern astronomical and astrophysical research to resolve this degeneracy and try to reconstruct the underlying three-dimensional (3D) structures. When measuring lines, for example, we can take spectra at equally spaced positions within the area of interest and build a 3D cube of position-position-velocity (i.e. line-of-sight velocity) information. This PPV data is often used as an approximation for the intrinsic 3D position-position-position (PPP) structure of the emitting region (for further discussions, see e.g. Ballesteros-Paredes & Mac Low 2002; Beaumont et al. 2013). However, what to do when we are relying on continuum radiation, such as the thermal emission from interstellar dust grains (e.g. Molinari et al. 2010; Planck Collaboration XI 2014), is less clear. To address this challenge, we present a new deep learning approach for the reconstruction of 3D morphological information and employ multi-band observations in the wavelength regime from 12 to 1300 µm to estimate the 3D spatial distribution of dust density and temperature of star-forming cloud cores.

Thermal emission from interstellar dust grains is the dominant source of radiation across the sky at mid- and far-infrared wavelengths (see Hill et al. 2018, and references therein). This is the result of dust grains distributed throughout the interstellar medium (ISM), which are heated primarily by starlight and cool through thermal radiation (Tielens 2010; Draine 2011; Klessen & Glover 2016). The dust grains are heated to temperatures between roughly 20 and 200 K depending on the spectrum and intensity of the interstellar radiation field and the size and optical properties of the grains (see Galliano et al. 2018, for a recent review).

Dust emission remains optically thin at relatively high column densities (see, e.g. Planck Collaboration XIX 2011), hence, it provides a crucial observable to study regions of the universe that are not accessible at visible wavelengths. Observations of the dust thermal emission have been used to characterise the evolution of the universe through the study of the cosmic infrared background (CIB, see Hauser & Dwek 2001). Radiation at far infrared wavelengths registered by the Herschel satellite has allowed for the reconstruction of star formation activity across the Milky Way disk (Molinari et al. 2016; Elia et al. 2021) and within nearby star-forming regions (André et al. 2010). Polarised dust thermal emission observed by the Planck satellite has provided the opportunity to infer the first whole-sky map of the projected Galactic magnetic field (see, e.g. Planck Collaboration XXXV 2016).

Interstellar dust also plays many critical roles in galactic evolution. It is a catalyser for the formation of molecular hydrogen (H2), while sequestering select elements in solid grains (see, e.g. Gould & Salpeter 1963 or Jones & Ysard 2019). The interaction between dust grains and ultraviolet (UV) starlight releases electrons that can be the dominant source of heating for interstellar gas (Wolfire et al. 2003; Glover & Mac Low 2011). Dust grains also transfer the radiation pressure from starlight to the gas and couple it to the interstellar magnetic field through collisions (see, e.g. Draine 2003 for a review; or Reissl et al. 2018, 2023 for a microphysical model). Thus, reconstructing the distribution of the dust is crucial for understanding the physical conditions of the ISM.

Existing attempts to reconstruct the 3D distribution of dust on galactic scales often take into account distance information from stars in combination with individual extinction measurements (or the combination of Gaia with auxiliary data; see e.g. Lallement et al. 2018, 2019, 2022; Leike et al. 2020, 2022; or Zhang et al. 2023). A similar approach has also been adopted for assessing the 3D structure of individual molecular clouds (e.g. Rezaei Kh. et al. 2017, 2020; Zucker et al. 2021; Rezaei Kh. & Kainulainen 2022) or for building a realistic model of the matter distribution in the solar neighbourhood (e.g. Zucker et al. 2022, 2023). On the smaller scales of individual star-forming clumps, there have also been forward-modeling approaches introduced for the 3D dust distribution based on the combination with line data (see, e.g. Liseau et al. 2015) or from multi-frequency dust emission data by fitting overlapping Gaussian ellipsoids (Steinacker et al. 2005). Previous attempts to use machine learning to invert the radiative transfer problem have also been reported (e.g. Garcia-Cuesta et al. 2009).

In this study, we introduce a novel deep learning approach for the 3D reconstruction task, which employs a conditional invertible neural network (Ardizzone et al. 2019a,b). The latter belongs to the group of methods based on normalising flows (e.g. Kobyzev et al. 2021) and has the advantage of giving access to the full posterior distribution function. For this reason, the cINN architecture is particularly well suited for solving degenerate inverse problems and has been successfully applied to a range of subjects in astronomy. These include: the characterisation of stellar properties from photometry (Ksoll et al. 2020) or spectra (Kang et al. 2023b), prediction of exoplanet properties (Haldemann et al. 2023), analysis of emission lines in HII regions (Kang et al. 2022, 2023a), cosmic ray origin studies (Bister et al. 2022), and the reconstruction of galaxy assembly histories from numerical simulations (Eisert et al. 2023). Due to the lack of an observational ground truth sample, we have trained the cINN with data taken from numerical models of the turbulent multi-phase ISM, based on the Cloud Factory suite of simulations introduced by Smith et al. (2020), which we post-processed using detailed radiative transfer calculations (employing POLARIS, see Reissl et al. 2016, 2019) to bring them closer to the observational domain.

This paper is structured as follows. In Sect. 2, we outline the construction of the training data for our method from synthetic dust cloud simulations, including our setup for radiative transfer. Section 3 provides a summary of the invertible neural network approach, specifications of the inverse problem, implementation details, and our analysis methods. In Sect. 4, we present the evaluation of our trained models on synthetic test data and discuss the predictive performance of our approach. Lastly, Sect. 5 summarises our main results.

2 Training data

The main goal of this study is to reconstruct 3D dust distributions from the observed dust emission for sites of star formation on the scale of individual cloud cores. As we want to tackle this task with a supervised deep-learning approach, we therefore require a training data set consisting of 3D dust distributions with their properties and corresponding dust emission observations. As such a data set does not yet exist for real observations, we have turned to simulations to build a suitable database for training.

2.1 Simulation data

As a basis for our training data set we chose the AREPO-based (an adaptive Voronoi mesh hydrosolver, see Springel 2010), galactic-scale ISM hydrodynamics simulation suite Cloud Factory, introduced by Smith et al. (2020) and Izquierdo et al. (2021). The Cloud Factory self-consistently follows the formation of dense gas and molecular hydrogen with an average molecular weight of µg = 2.4 in a Milky Way-like galactic gas disc at radii 4 kpc < r < 12 kpc, including the effects of galactic scale forces, gas chemistry and cooling, and supernova feedback. The time-dependent chemical evolution is modelled as in Smith et al. (2014) using the hydrogen chemistry of Glover & Mac Low (2007) and the simplified CO treatment of Nelson & Langer (1997). It includes gas self-shielding from a UV field equivalent to that seen in the solar neighbourhood and cosmic-ray ionisation at the local rate as well.

The Cloud Factory employs a series of nested zooms with a base mass resolution of 1000 M smoothly increased to 10 M within a co-rotating box of size 3 kpc. In the co-rotating box individual cloud complexes are then selected and their resolution further increased to 0.25 M, which is equivalent to a spatial resolution better than 0.1 pc in gas with number densities higher than 109 m−3. By including the galactic scale forces that form the clouds, the Cloud Factory suite reproduces the turbulent gas motions on multiple scales as observed in the ISM. However, the current version of the suite does not include magnetic fields or other forms of stellar feedback, such as stellar winds, jets, or photoionisation.

For our analysis, we used Complex C and D, as shown in Fig. 5 of Smith et al. (2020), and only included gas at the highest resolution (0.25 M or four cells per local jeans length, whichever is higher). These molecular cloud complexes were formed in regions that had previously experienced supernova feedback and, therefore, they already contained a well developed turbulent energy cascade. They are filamentary in structure, and extend for more than 100 pc along their longest axis. The density probability density function of the entire cloud complex peaks at number densities of around 108 m−3, but extends beyond number densities of 1010 m−3, which marks the point when sink particles may form (see Tress et al. 2020). Sink particles represent regions of star formation and at this resolution, they correspond to single star systems that may actually be multiples. While sinks may form above densities of 1010 m−3, they will only be created if the gas passes energy checks to ensure it is bound and converging, so in practice the simulations include densities up to 1010−1012 m−3. Similarly, neighbouring gas cells can have material accreted by the sinks, but only when the gas becomes bound to the sink.

A high spatial resolution allows us to select the compact cloud cores on the scale of individual star-forming clumps that we want to analyse in this study. As a starting point, we extracted compact pre-stellar cloud cores (i.e. dense, cloud-like structures that have not formed a sink and are smaller than 0.2 pc) for this purpose from the Cloud Factory. However, we did not just want to model these early phases of star formation, but also more evolved star forming regions, particularly those affected by nearby stars. As the star formation prescription in the Cloud Factory forms sink particles on cluster scales (rather than individual stars), we chose to modify the raw simulation data in a post processing step by manually adding a star to model the types of evolved star-forming regions that we want to consider in this study. We guided these modifications on the example of a well-observed real-world counterpart of this type of star-forming cores, such as the nearby (d = 120 pc, Loinard et al. 2008), compact (0.1 pc), star-forming core ρ Oph A (Loren et al. 1990) in the Ophiuchus star-forming region.

We began our training set construction by cutting out cubes centered on high-density, core-like gas aggregations from the Cloud Factory in the complexes C and D, so that each cube contains a mass between 5 to 40 M and a substructure with a gas number density of at least 1010 to 1011 m−3 (i.e. the lower density end of low-mass star-forming regions). In practise, the candidate positions for these cubes are determined by applying a density threshold to column density maps generated for three different (perpendicular) viewing angles of the complexes. We note that beyond these criteria, the cubes were randomly selected and do not represent any specific real-world counterpart star-forming core. For simplicity, we matched the Cloud Factory data to a regular grid. Initially, these cubes are selected on a 64 × 64 × 64 pixel resolution, corresponding to a physical cube size of 0.4 pc. In total, we prepared a sample set of 11 036 individual cubes from The Cloud Factory in this first step. The initial large cube size serves primarily to avoid edge artefacts that can occur in the radiative transfer simulation when synthesising the dust temperature in post-processing for the later synthetic dust emission observations. For the final training data, we actually cropped out the inner 32 × 32 × 32 pixel cubes, corresponding to a total 0.2 pc edge length. This resolution and size were chosen to keep the problem simple for this proof-of-concept and to reduce the overall training data size to facilitate the data handling during the training phase of our approach.

2.2 Synthetic images

To produce synthetic dust emission observations from the selected cloud model cubes, we employed the Monte Carlo (MC) radiative transfer (RT) code POLARIS1 (Reissl et al. 2016, 2019). POLARIS calculates a dust temperature based on a given 3D density distribution and a specific dust composition, assuming an instantaneous temperature correction and a thermal equilibrium between the dust and its surroundings (for details we refer to Lucy 1999 and Bjorkman & Wood 2001). For the subsequent RT dust heating and emission simulations, we assumed a dust mass to gas mass ratio of δgd = 1% and a material composition of 37.5% graphite and 62.5% (astro)silicate for the grains typical for the ISM. The applied grain sizes are a ∊ [5 nm, 250 nm] and the number of grains, N, follows a power-law N(a) ∝ a−3.5 (see e.g. Mathis 1977; Li & Draine 2001, for further details). We emphasise that the selected dust parameters in combination with the average molecular weight of the gas, µg, define an exact conversion factor from gas to dust density. Because of this, we use the gas density as a measure of the dust density in the following, without explicitly performing the conversion to remain consistent with the Cloud Factory.

We began by preparing the models for two distinct RT setups. In the first one, we considered the dust clouds to be only subject to the diffuse interstellar radiation field (ISRF). Here, we used the parameterisation of Mathis et al. (1983) for the spectral energy distribution with an intensity of G0 = 3, which is typical for star-forming cores (Liseau et al. 2015). In the second scenario, we also added a single star inside the cube in addition to the background ISRF. We used the parameters of a typical B4-type star (R = 4.33 R, Teff = 16 000 K) in our MC dust heating simulation. We note that the dust MC heating by POLARIS in this step does not modify the ionisation state of the gas or redistribute the gas by means of radiatiave feedback. In each individual cube, we simply placed the B4-analogue star inside the inner 32 × 32 × 32 pixels, selecting a point of low gas density. This procedure roughly emulates the fact that the feedback of such a star would likely clear out its immediate surroundings.

We generated synthetic, monochromatic dust emission observations with a 32 × 32 pixel resolution (matching the resolution of the underlying dust distribution) at 23 wavelengths between 12 and 1300 µm, matching the central wavelengths of bands available at various observational facilities (see Table A.1 for a full list). We note that we only generate synthetic observations for one viewing angle for each cube. In principle, it would be possible to include multiple viewing angles to increase the size of the training dataset, but given that our simulation suite provides a sufficiently large dataset for training from different physical regions, this is not necessary here. Nevertheless, after training, we also conducted a performance test of our algorithm on an individual region observed from different viewing angles (Appendix B.1), which confirms that the reconstructed 3D structure is nearly identical.

The choice for monochromatic emission observations is again made for simplicity, as modelling the full instrument responses of the considered bands is quite complex and beyond the scope of this proof of concept. Nevertheless, we wanted to select wavelengths that are actually accessible with current observational facilities; thus, we employed the corresponding central wavelengths. Henceforth, we refer to the different wavelengths by the names of the respective instrument bands in the following. We note that we did not consider wavelengths shorter than 12 µm because the influence of scattered light becomes non-negligible in this regime, adding extra complexity. At the current stage of our development, we have not considered instrumental effects related to the point spread functions (PSFs) of the various telescopes or observational noise. Thus, we treat our synthetic observations as fully resolved at all wavelengths and uncertainty free. Properly modelling these effects is not trivial either, particularly with respect to interferometric observations with ALMA, where simulations with a dedicated processing tool such as CASA2 (CASATeam et al. 2022) would be necessary. Thus, we reserve a proper treatment of these effects for a follow-up work. Still, we want to note that accounting for uncertainties is well within the capabilities of the invertible neural network architecture used in this work, as demonstrated by Kang et al. (2023a).

We initially generated our synthetic dust emission observations assuming a distance of 3.703 × 1018 m (120 pc). To build a more generally applicable approach, we then rescaled the synthetic fluxes following: f^=f×d2dref 2,$\hat f = f \times {{{d^2}} \over {d_{{\rm{ref }}}^2}}$(1)

where d denotes the actual distance and dref is the reference distance (which the flux is scaled to) to determine a distance independent absolute flux measure. The choice of dref is arbitrary and since we operate on the logarithm of the fluxes in the following (see Sect. 3.3.1) only represents a linear offset to all fluxes, which will not notably affect the training outcome of our neural network approach. For simplicity, we set dref = 1 m, so that the offset is zero in logarithmic space.

Having a complement of observations at 23 different wavelengths for a single real cloud core is typically unrealistic. Given the complexity of the 3D reconstruction task, we started out with this unrealistically large wavelength coverage to emulate a perfect information scenario and determined the best predictive performance our approach could achieve for this proof of concept. In addition, we also investigated a second, more realistically limited scenario, where we considered synthetic observations at the central wavelengths of the following bands: WISE 22 µm, SOFIA 89 µm and 154 µm, Herschel PACS 100 µm and 160 µm, Herschel SPIRE 350 µm, and LABOCA 870 µm (see Table A.1). This particular combination of bands is inspired by real observational data, which is, for instance, available for the ρ Oph A star forming cloud (Liseau et al. 2015; Santos et al. 2019). We emphasise here that this particular wavelength selection does not necessarily preserve the most information for the given inverse problem and that there may be a much more optimal subset of seven wavelengths among our total of 23 to maximise the reconstructive performance of the approach outlined below. Determining this combination is beyond the scope of this proof of concept and we reserve this to a dedicated follow-up study.

3 Reconstruction approach

To solve the inverse problem of recovering the 3D dust temperature and dust density distribution from the observed dust emission maps, we employed a supervised deep learning approach called an invertible neural network (INN). In the following, we provide a short summary of this methodology and outline our specific setup for the 3D dust reconstruction task.

3.1 The conditional invertible neural network

The INN (Ardizzone et al. 2019a) belongs to the greater family of normalising flows (NFs, Tabak & Vanden-Eijnden 2010; Tabak & Turner 2013; Dinh et al. 2015; Rezende & Mohamed 2015; Kobyzev et al. 2021). More specifically, these are deep learning approaches that model complex distributions through sequences of invertible transformations of simpler known probability distributions (see also Kobyzev et al. 2021, for a review). Among the NF methods, the INN stands as a neural network (NN) architecture that is particularly well suited for solving degenerate inverse problems. Introducing a set of latent variables z to encode the information loss in the forward mapping xy from the physical parameters x to a set of observables y, which renders the inverse problem yx degenerate, the INN can estimate full posterior distributions p(x|y) for the target parameters. This allows this method to both highlight and in some cases even break degeneracies in solving the inverse problem.

In this study, we employ an INN architecture called conditional invertible neural network (cINN, Ardizzone et al. 2019b). During training, this method learns a mapping of the physical parameters x to the latent variables z, conditioned on the observables y, that is the forward mapping denoted as: z=f(x;c=y).${\bf{z}} = f({\bf{x}};{\bf{c}} = {\bf{y}})$(2)

In doing so, the cINN encodes all variance of the physical parameters that is not explained by the corresponding observables in the latent variables, while the training process explicitly maintains a prescribed prior distribution P(z) for the latent variables. At prediction time, the cINN can then query this encoded variance by drawing samples from the known prior distribution P(z) of the latent variables and once it has been conditioned on a new query observation y′, it can make use of its fully invertible architecture to generate corresponding samples of the posterior distribution p(x|y′) following: p(xy)~g(z;c=y) with zP(z),$p\left( {{\bf{x}}\mid {{\bf{y}}^\prime }} \right)\~g\left( {{\bf{z}};{\bf{c}} = {{\bf{y}}^\prime }} \right){\rm{ with }}{\bf{z}} \propto P({\bf{z}}),$(3)

where g(⋅; c) = f−1(⋅; c) denotes the inverse of the forward mapping, f, for fixed condition, c. For simplicity, P(z) is usually prescribed to be a multivariate normal distribution with zero mean and unit covariance. The dimension of the latent space dim(z) is per construction equal to the dimension of the target parameter space dim(x). On the other hand, as the observations are treated as a condition their dimension can become arbitrarily large. In fact, the architecture of the cINN allows for the introduction of a feature extraction network, trained in tandem with the cINN itself, to transform the input observations into a more useful (learned) representation (Ardizzone et al. 2019b).

The invertibility of the cINN is achieved by employing so called conditional affine coupling blocks (Dinh et al. 2017; Ardizzone et al. 2019b). After splitting their input vector u into two halves u1 and u2, these coupling blocks perform two complementary affine transformations: v1=u1exp(s2(u2,c))t2(u2,c),v2=u2exp(s1(v1,c))t1(v1,c),$\matrix{ {} & {{{\bf{v}}_1} = {{\bf{u}}_1} \odot \exp \left( {{s_2}\left( {{{\bf{u}}_2},{\bf{c}}} \right)} \right) \oplus {t_2}\left( {{{\bf{u}}_2},{\bf{c}}} \right)} \cr {} & {{{\bf{v}}_2} = {{\bf{u}}_2} \odot \exp \left( {{s_1}\left( {{{\bf{v}}_1},{\bf{c}}} \right)} \right) \oplus {t_1}\left( {{{\bf{v}}_1},{\bf{c}}} \right)} \cr } $(4)

to compute the halves, v1 and v2, of the output vector, v, where ⊙ and ⊕ denote elementwise multiplication and addition, respectively. Here, si and ti represent arbitrarily complex transformations of the concatenation of ui/vi and the conditioning input c. To run the network in reverse, Eq. (4) is then trivially inverted given the output vector v = (v1, v2) following: u2=(v2t1(v1,c))exp(s1(v1,c)),u1=(v1t2(u2,c))exp(s2(u2,c)),$\matrix{ {} & {{{\bf{u}}_2} = \left( {{{\bf{v}}_2} \ominus {t_1}\left( {{{\bf{v}}_1},{\bf{c}}} \right)} \right) \odot \exp \left( { - {s_1}\left( {{{\bf{v}}_1},{\bf{c}}} \right)} \right)} \cr {} & {{{\bf{u}}_1} = \left( {{{\bf{v}}_1} \ominus {t_2}\left( {{{\bf{u}}_2},{\bf{c}}} \right)} \right) \odot \exp \left( { - {s_2}\left( {{{\bf{u}}_2},{\bf{c}}} \right)} \right)} \cr } $(5)

where ⊖ denotes elementwise subtraction. As the transformations si and ti are always evaluated in the same direction in both the forward, Eq. (4) and backward pass, Eq. (5), of the coupling block, it is not necessary to choose them to be invertible themselves. In fact, si and ti do not even need to be prescribed, but can be learned instead during the training of the cINN by representing them with small sub-networks; for instance, a fully connected neural network (Ardizzone et al. 2019a,b). The specific setup of the cINN and the coupling block architecture used in this work is described in Sect. 3.3.

3.2 Single LoS reformulation

The full inverse problem of the 3D reconstruction task consists of predicting the 2 × N × N × N hypercube X of dust densities and temperatures from the K × N × N cube, Y, containing the corresponding observed dust emission in the K different wavelengths. Evidently, even for the small resolution of N = 32 that we selected, this is a very high-dimensional problem. To simplify our approach and mitigate the difficulties of high dimensionality, we therefore decided to reformulate the inverse problem. In particular, we reduced the 3D reconstruction problem to a matter of individual LoSs under the main assumption that the emission measured in any given pixel is independent of its neighbouring pixels. This independence assumption is especially valid for the perfectly resolved observation scenario that we consider here, but it should also hold (at least to first order) for real observations unless the PSF of the instrument is significantly larger than the pixel size, where smearing could become an issue. With that, we aim to use the vector of the K measured emission fluxes (eλ1,,eλK)$\left( {{e_{{\lambda _1}}}, \ldots ,{e_{{\lambda _K}}}} \right)$ of a given pixel to recover the corresponding dust density (nı,…, nN) and temperature (Tdust,1,…, Tdust,N) vectors. To avoid having to train two networks, we further combined the line-of-sight (LoS) dust densities and temperatures into a single vector, formulating the inverse problem as: K2N(eλ1,,eλK)(n1,,nN,Tdust ,1,,Tdust ,N),$\eqalign{ & {^K} \to {^{2N}} \cr & \left( {{e_{{\lambda _1}}}, \ldots ,{e_{{\lambda _K}}}} \right) \to \left( {{n_1}, \ldots ,{n_N},{T_{{\rm{dust }},1}}, \ldots ,{T_{{\rm{dust }},N}}} \right), \cr} $(6)

for a given LoS.

Within the cINN framework, the LoS emission vectors, y=(eλ1,,eλK)${\bf{y}} = \left( {{e_{{\lambda _1}}}, \ldots ,{e_{{\lambda _K}}}} \right)$, correspond to the conditioning input and the combined vector of dust densities and temperatures, while x = (n1,…, nN, Tdust,1,…, Tdust,N) denotes the target parameters. Consequently, the latent space introduced by the cINN has a dimension equal to that of x, that is: 2N.

At prediction time, a given query K × N × N cube of emission maps is first decomposed into the N2 line of sight emission vectors of length K. For each of these LoSs i (with i ∊ {1,…, N2}), the corresponding emission vector, yi, is then processed by the trained cINN, generating S samples of the full 2N-dimensional joint posterior distribution p(xi|yi) by sampling the latent space according to the known Gaussian prior distribution P(ɀ). Afterwards, we reassemble these N2 LoS prediction results of size 2N × S into the 2 × N × N × N × S hypercube, Xsamp, of the density and temperature posterior samples.

It is worth noting that with this LoS decomposition approach, our method is not limited to the 32 × 32 pixel resolution in the plane of sky, so that larger dust emission maps can be processed as long as they match the physical resolution of 6.25 × 10−3 pc per pixel of the training data. With regard to depth, however, the presented approach is always limited to the 32 pixel depth corresponding to a physical size of 0.2 pc that the cINN is trained on. A possible avenue to create a more depth flexible extension of the method presented here could be to train the cINN on data with varying per pixel resolution and providing the physical resolution as an additional conditioning input. Because this would require a substantially larger training data set and notably increases the complexity of the inverse problem (perhaps even beyond the point of feasibility), we reserve this experiment to our follow-up studies and focus on the fixed depth scenario here. In any case, with the presented approach it is always possible to tailor the training data towards the characteristic sizes of the objects that are to be analysed and train a correspondingly specialised model, as we have done here for the example of very compact, star-forming cores.

Final training data set

Following the prescription of the reformulated inverse problem, we decomposed the 2 × 11 036 training cubes into their respective LoSs, netting a total of 2 × 11036 × 32 × 32 = 22601 728 vectors. Figure A.1 shows the corresponding effective prior distributions prescribed by the training data for dust density and temperature across all pixels of these lines of sights, as well as a correlation diagram indicating the coverage in the density-temperature space. In particular, we have covered a total density range from 3.3 × 106 to 2.2 × 1013 m−3, although most of the data is concentrated between 108 and 1012 m−3. The effective prior distribution for dust temperature ranges from 6.3 to 240 K, but is fairly skewed towards the 13 to 24 K interval, so that there are comparatively a lot fewer training pixels above a temperature of 32 K. This is a direct consequence of the fact that such high dust temperatures only occur in the relatively few pixels in the vicinity of a star. Given that half of our training cubes do not contain a star, the per-pixel dust temperature prior distribution is naturally biased towards this intermediate dust temperature regime because there are simply much more pixels that are either only subject to the ISRF to begin with or far enough away from the star to avoid being heated to very high temperatures. A corresponding diagram of the prior distributions of the measured fluxes at the 23 considered wavelengths across all pixels is provided in Fig. A.2. We emphasise that as a data-driven approach, the cINN is mostly limited to the parameter space covered by the training data. Although the cINN does exhibit some capability for extrapolation beyond the limits of the learned parameter space, there is in general no guarantee for a (physically) sound prediction outcome for inputs and targets that fall outside the described ranges.

We further split this data set randomly into a training (80% of the data) and test set (20% of the data). The latter serves as held-out data that is not seen during training of the cINN to later evaluate the convergence and performance of the model. While the split is in general randomly chosen, we make sure that the held-out test set contains a subset of 100 complete cubes. For this subset, we selected the same 50 cubes twice: once subject solely to the ISRF and once in the ISRF + star configuration. The aim is to evaluate how much the radiation setup affects the prediction outcome for the dust density and temperature. While we verified the model convergence on the greater test data set, the reported performance and all diagrams presented in Sect. 4 are based on this subset of 100 coherent cubes. Although this set of 100 × 32 × 32 = 102 400 LoSs only represents 0.5% of the total data, it has been selected as a representative subset of the test data set in order to keep the memory requirements at a manageable level. For instance, storing the predicted posterior samples for these 100 cubes as an uncompressed csv table following the setup outlined further below already requires ~ 360 GB of memory.

3.3 Implementation details

We employed the Python deep learning module PYTORCH (Paszke et al. 2017) and the dedicated Framework for Easily Invertible Architectures (FrEIA3, Ardizzone et al. 2019a,b) package to implement the cINN approach. For the affine coupling blocks, we employed the Generative Flow (GLOW; Kingma & Dhariwal 2018) configuration, in which the transformations s1, t1 and s2, t2 were jointly estimated by one sub-network each, which reduces the number of sub-networks in each coupling block from four to two. As sub-networks we utilised simple, fully connected networks with three layers of size 1024 and the rectified linear unit (ReLU) activation function. As in Ardizzone et al. (2019b), we also introduced a clamping procedure in the affine transformations in Eqs. (4) and (5) to the argument, s, of the exponential functions of the form: sclamp =2απ arctan (sα),${s_{{\rm{clamp }}}} = {{2\alpha } \over \pi }\arctan \left( {{s \over \alpha }} \right),$(7)

with α = 1.9. This procedure avoids instabilities arising from exploding magnitudes of exp(s). Furthermore, we alternated the affine coupling layers with random permutation layers, which randomly (but in a fixed and thus invertible manner) permute the output vector between each coupling layer to better intermix the information between the two streams u1 and u2. Our final network architecture (as determined via hyperparameter optimisation) is made up of nine coupling blocks in total. We also employed a simple feature extraction network, consisting of a three-layer (with 512, 512, and 256 nodes, respectively) fully connected network with ReLU activation functions, trained jointly with the cINN, to process the input observations.

3.3.1 Additional data preprocessing

Prior to training, we converted both the dust density and temperature to logarithmic space. This serves to prevent issues during training that can occur when the target parameters have a large dynamic range. This is particularly notable in the case of the dust density, which covers almost seven orders of magnitude. In addition, this implicitly ensures that the predicted dust densities and temperatures are always strictly positive. Afterwards, we performed two linear scaling operations on the training data. Each element xi of the target parameters, x, was rescaled by subtraction of its mean (over the entire training set) and then by division by its standard deviation, so that the resulting distribution of the rescaled i has zero mean and unit standard deviation. For the observables, we applied a matrix whitening procedure (Hyvärinen & Oja 2000) to the M × K matrix of training observations, Y, where M is the number of training examples and K is the dimension of a single observation, y, such that the rescaled observable matrix, Ŷ, has a unit covariance matrix. Given they are linear transformations, these scaling operations are easily inverted to convert the cINN output back to the true target parameter space. The coefficients of these scaling operations were determined on the training data and at the prediction time applied in the same fashion to the new query input.

3.3.2 Training setup and sampling strategy

We trained our cINN approach via minimisation of the maximum likelihood loss, 𝓛, as described in Ardizzone et al. (2019b), namely: =Ei( f(xi;ci,Θ) 222Ji),${\cal L} = {_i}\left( {{{\left\| {f\left( {{{\bf{x}}_i};{{\bf{c}}_i},\Theta } \right)} \right\|_2^2} \over 2} - {J_i}} \right),$(8)

where Ji=det(f/x|x=xi)${J_i} = \det \left( {\partial f/{{\left. {\partial {\bf{x}}} \right|}_{{\bf{x}} = {{\bf{x}}_i}}}} \right)$ denotes the determinant of the Jacobian matrix evaluated at training instance xi and Θ represents the network weights. During training, the network weights, Θ, that minimise the loss function, 𝓛, are determined using a standard stochastic gradient descent approach. This means that after making an initial random guess for the weights, they are iteratively updated in the direction of the gradient ∇Θ𝓛 based on randomly drawn subsets (batches) of the training data until a convergence is reached. In particular, we employ the adaptive learning rate, momentum-based Adam (adaptive moment, Kingma & Dhariwal 2018) optimiser for this purpose (with β1 = β2 = 0.8). Here, we start with an initial learning rate (for Adam this is a scaling factor for the adaptive step size in the weight updates along the loss gradient) of linit = 9.642 × 10−5 and then we reduced it by a factor of γ = 0.831 every 11 epochs. In total, our models were trained for 250 epochs, using a batch size of 512 and processing 4096 batches per epoch. We also employed an L2 weight regularisation with λ = 6.093 × 10−5. This setup was determined via hyperparameter optimisation, using the Hyper-band algorithm (Li et al. 2018), a procedure that combines a random grid search approach with adaptive resource allocation and an early stopping criterion. Hyperband provides an efficient framework to test a large number of (randomly generated) hyper-parameter configurations that finds a balance between running the training in full only for configurations that appear promising early (i.e. converge fast), while also allowing for some slower converging models that might reach a better final result. For more details on the logistics of Hyperband we refer to Li et al. (2018). Training a single network with the final setup described above takes about 19 h using GPU acceleration on a NVIDIA RTX 2080Ti graphics card.

At the prediction time, we then generated S = 4096 posterior samples for each new query LoS. This number of samples is chosen as a compromise between storage requirements and sample density, although experiments with even larger sample numbers have actually not shown a notable difference in the predicted posterior distributions, so this did not seem necessary within the framework of our analysis. A trained cINN can generate this amount of samples for 1024 LoSs (that is a single cube) in about 28 s (on a NVIDIA RTX 2080Ti), making the inference of the posterior distributions of the dust properties very efficient.

3.4 Making point estimates

To better compare the cINN predictions to the ground truth hypercubes, X, in our synthetic test set, we computed a point estimate X̂ from the hypercube of posterior distribution samples, Xsamp, returned by the cINN. The most straightforward approach for this is to derive the maximum a posteriori (MAP) prediction values for the dust density and temperature in every pixel of the 3D cube, which consists of determining the most likely value of the target parameters from the corresponding posterior distribution. In the following, we describe how we tested two methods for computing the MAP estimate given the predicted posterior distribution from the cINN.

In the first approach, we treated the posterior distributions for density and temperature of each pixel individually, marginalising over all other pixels along the LoS, for which the cINN generated samples of the joint posterior distribution. From the corresponding set of posterior samples for each pixel, we identified the MAP estimate for density and temperature by employing a kernel density estimate (KDE) to first explicitly derive the probability density curve of the posterior and then find the peak of this curve. In practise, we used a Gaussian kernel function, determining the kernel bandwidth automatically with Silverman’s rule of thumb (Silverman 1986) and evaluating it on an evenly spaced grid of 1024 points (between the minimum and maximum value of the posterior samples) to determine the MAP point estimates.

The cINN does not actually generate samples from the posterior distributions of dust density and temperature of each individual pixel but, rather, from the full joint posterior for density and temperature for all pixels along a given LoS. Therefore, to be completely correct, the MAP has to be determined as the most probable combination of values in the full 64-dimensional space that the cINN constructs the posterior samples in. To find the maximum of the probability density in this very high-dimensional space and then compare it to the marginalised MAP estimate, we employed the MeanShift algorithm (Fukunaga & Hostetler 1975; Comaniciu & Meer 2002). It is a gradient ascent approach whereby, given a set of N samples, the modes of the underlying density distribution can be found. MeanShift is an iterative procedure, in which the center of a kernel window is continuously moved into the direction of the maximum increase in density until convergence is reached. Given a kernel function K(x) (e.g. a Gaussian kernel) and an initial position for the center x of the kernel window, the algorithm computes the so-called mean shift: m(x)=i=1NK(xxi)xii=1NK(xxi)x,${\bf{m}}({\bf{x}}) = {{\sum\limits_{i = 1}^N K \left( {{\bf{x}} - {{\bf{x}}_{\bf{i}}}} \right){{\bf{x}}_i}} \over {\sum\limits_{i = 1}^N K \left( {{\bf{x}} - {{\bf{x}}_{\bf{i}}}} \right)}} - {\bf{x}}$(9)

which is the difference between the kernel weighted mean and the center of the kernel window. As demonstrated by Comaniciu & Meer (2002), this vector is proportional to the estimate of the density gradient estimate obtained with the same kernel; thus, it always points in the direction of maximum increase in density. Iteratively translating the kernel window in direction of the mean shift will therefore find a (local) maximum for the underlying density distribution (Comaniciu & Meer 2002). To find all modes of the distribution (and ideally the global maximum) this approach is then repeated for other initial kernel positions, scoring the identified peaks by their corresponding (kernel) density estimate. In a post-processing step, any spurious mode detections (such as plateaus in the distribution) or very close-by modes can then be further pruned (see for example Comaniciu & Meer 2002, for further details).

In practise, we employed the scikit-learn (Pedregosa et al. 2011) Python implementation of the MeanShift algorithm, which uses a flat kernel: K(x)={ 1 if xλ0 if x>λ ,$K({\bf{x}}) = \left\{ {\matrix{ {1{\rm{ if }}{\bf{x}} \le \lambda } \hfill \cr {0{\rm{ if }}{\bf{x}} > \lambda } \hfill \cr } } \right.$(10)

where λ denotes the bandwidth. To speed up computation, this implementation provides a binned seeding strategy, where the initial guesses for the kernel starting position are selected on a discretised grid instead of testing all of the individual sample points. The coarseness of this grid is determined by the bandwidth selected forthe kernel. The automatic bandwidth selection that comes with this implementation (based on a nearest neighbour distance estimation) has, however, proven not to be robust enough for our very high-dimensional parameter space and would often select bandwidths that are too small for the kernel windows to find any data points inside of them (when used in combination with the binned-seeding approach). Since the computation time becomes prohibitively large without the binned seeding, we adopted a simple bandwidth selection procedure where we iteratively doubled an initial bandwidth guess of 32 until a bandwidth is found, with which the MeanShift algorithm converges. In practise, this simple approach leads to the selection of a bandwidth of 64 or 128 in most cases.

3.5 Spatial consistency

As we outline above, the cINN approach predicts the posterior distributions for density and temperature for a single LoS jointly. Consequently, the prediction preserves the consistency of the predicted posteriors along the LoS. Perpendicular to the LoS, however, we have (by construction) no such spatial consistency guarantee. Figure 1 provides an example of this behaviour, highlighting the gradual shift of the posteriors along the LoS, whereas perpendicular to it, they are not necessarily consistent. As a consequence, the MAP or MeanShift point estimates can often exhibit sharp discontinuities in the predicted densities and temperatures. This can be seen, for example, in the MAP estimates in the left panels of Fig. 1. As these discontinuities and sharp jumps are rather unphysical, we experimented with two approaches in order to mitigate the spatial consistency issue.

3.5.1 MNPCP point estimator

Our first approach consists of introducing a third, alternative point estimator that enforces a degree of spatial consistency perpendicular to the LoSs, which we refer to as the median neighbour pixel combined posterior (MNPCP) in the following. Figure 2 outlines the steps of the MNPCP approach. Looping over all pixels in the 3D cube of generated posterior samples, Xsamp, we first collected the samples for the current pixel and its 26 neighbouring pixels. We then determined the n and Tdust point estimates for the current pixel as the weighted median of this combined set of posterior samples. Here, each sample has been weighted according to the distance of the pixel to the query pixel using the city-block distance metric (Manhattan distance). For edge cases, we accumulate only samples from the existing neighbour pixels, meaning that no form of padding was applied. Taking, for example, a corner pixel, this means that samples from only the seven neighbour pixels are accumulated, as compared to the 26 neighbours available for an interior pixel.

3.5.2 Neighbour LoS reformulation

Aside from introducing an alternative point estimator to combat the spatial consistency issue, we also investigated whether a different reformulation of the inverse problem may improve the situation, in comparison to our primary formulation (introduced in Sect. 3.2). We refer to cINNs trained on the primary reformulation as a single LoS cINN (SLoS-cINN) in the following, whereas models for the alternative formulation outlined below shall be denoted as a neighbour LoS cINN (NLoS-cINN). To directly compare the SLoS and NLoS approaches, we tested both of them with all three introduced point estimators, namely: MAP, MeanShift, and MNPCP.

The NLoS reformulation aims at improving the spatial consistency perpendicular to the LoS by adding information of the neighbouring LoSs to the observables. Instead of taking only the vector of fluxes corresponding to the pixel of a given LoS, we go on to also consider the observed dust emission in the eight neighbouring pixels, so that the inverse problem becomes: 9K2N(y1,,y9)(n1,,nN,Tdust ,1,,Tdust ,N),$\eqalign{ & {^{9K}} \to {^{2N}} \cr & \left( {{{\bf{y}}_1}, \ldots ,{{\bf{y}}_9}} \right) \to \left( {{n_1}, \ldots ,{n_N},{T_{{\rm{dust }},1}}, \ldots ,{T_{{\rm{dust }},N}}} \right), \cr} $(11)

where yi=(ei,λ1,,ei,λK)${{\bf{y}}_i} = \left( {{e_{i,{\lambda _1}}}, \ldots ,{e_{i,{\lambda _K}}}} \right)$ denote the nine emission flux vectors corresponding to the pixel (y5) and its neighbourhood. This reformulation has one immediate drawback, however, in that we lose some of the available training data. As we now require every LoS to have eight neighbours, we can no longer consider the edge cases in our training cubes, reducing the total amount of LoSs available for training data to 2 × 11036 × 30 × 30 = 19 864 800. Another disadvantage is a notably increased memory requirement when storing the training data as a simple csv-table, since we increased the size of the observables vector by a factor of 9. In our case, the table size increases from 55 to 138 GB, even though the latter set contains 2 736 928 fewer LoSs. Nevertheless, this is the most straightforward approach to providing the cINN with information on the vicinity of a given query pixel.

thumbnail Fig. 1

Comparison of the predicted posterior distributions along vs. perpendicular to the LoS (into the plane). The left column shows an example slice with the MAP estimates for dust density (top) and dust temperature (bottom). The other three columns show the posterior distributions of dust density (top) and temperature (bottom) for the lines indicated in the left panels in black, blue and purple, respectively. Here, the black square denotes a LoS going into the plane of the image, whereas the blue and purple lines are perpendicular to the LoS along the x and y axis, respectively.

3.6 Performance evaluation

To quantify the overall performance on the held-out test set of 100 coherent cubes, we computed two metrics for the three different point estimation approaches as an average over all Ntest = 100 × 32 × 32 × 32 = 3 276 800 test pixels. The first one is the normalised root mean squared error (NRMSE), defined as:  NRMSE =1ΔxTS1Ntest i=1Ntest (xi, pred xi, true )2${\rm{ NRMSE }} = {1 \over {\Delta {x_{{\rm{TS}}}}}}\sqrt {{1 \over {{N_{{\rm{test }}}}}}\sum\limits_{i = 1}^{{N_{{\rm{test }}}}} {{{\left( {{x_{{\rm{i}},{\rm{ pred }}}} - {x_{{\rm{i}},{\rm{ true }}}}} \right)}^2}} } {\rm{, }}$(12)

where xi,true and xi,pred refer to the ground truth and point estimate prediction of target parameter x for pixel i, and ∆xTS = max(xTS) − min(xTS) denotes the range of target parameter, x, in the training data (6.82 and 1.58 for log(n/m−3) and log(Tdust/K), respectively). The second metric that we computed is the median |ērel| (and 25% and 75% quantiles) of the absolute relative error |ei,rel|, defined as: | ei, rel  |=| xi, pred xi, true xi, true  |,$\left| {{e_{{\rm{i}},{\rm{ rel }}}}} \right| = \left| {{{{x_{{\rm{i}},{\rm{ pred }}}} - {x_{{\rm{i}},{\rm{ true }}}}} \over {{x_{{\rm{i}},{\rm{ true }}}}}}} \right|,$(13)

for pixel i.

4 Results

In this section, we outline the evaluation results regarding the predictive performance of our trained cINN models on the held-out 100 test cubes. Table 1 provides a summary of the NRMSE and absolute relative errors achieved by our three different cINN setups with the three different point estimation approaches. In addition, it also shows a breakdown of the results between the two radiative transfer configurations, that is, ISRF-only and ISRF + star. In the following, we first discuss the influence of the point estimator choice on the prediction results on the example of the SLoS-cINN that accounts for all 23 wavelengths (Sect. 4.1). We then compare the outcomes of the NLoS approach to the SLoS setup (Sect. 4.2) and present an analysis of the SLoS performance for the more realistically limited wavelength coverage experiment (Sect. 4.3). We conclude with a comparison of our approach with a classical SED fit to determine column densities (Sect. 4.4), followed by discussions on the physical feasibility of the approach (Sect. 4.5) and on the application of our setup to real observational data (Sect. 4.6).

4.1 Choice of the point estimator and influence of the radiation configuration

Figure 3 shows a qualitative comparison of the point estimates for dust density and temperature to the ground truth for the MAP, MeanShift and MNPCP estimators. In particular we show the outcome for a single slice (perpendicular to the LoS) of one example cube of the test set. Here, the first four rows show the prediction results based on the SLoS-cINN, distinguishing the ISRF-only scenario (rows 1 and 2) and the ISRF + star radiation configuration (rows 3 and 4). In the ISRF scenario, we can see that all three point estimators provide a very decent reconstruction result for both dust density and temperature. However, the discontinuities in the MAP prediction (as previously discussed in Sect. 3) are quite notable and give the prediction outcome a noisy character. The MeanShift result in the third column also suffers from this effect, albeit to a slightly lesser degree. Given that both of these estimators have no spatial consistency guarantee perpendicular to the LoS, this is of course an expected result. Contrary to that, we can see that our MNPCP approach (fourth column) provides a much more consistent and smoothed prediction result than the other two estimators. Nevertheless, this can come at the expense of losing some of the finer, high-density features of the dust distribution. A full comparison of the prediction with the SLoS-cINN for all slices of this example cube in the ISRF-only configuration is given in Fig. B.1. We also refer to Fig. B.2 for a 3D visualisation of the prediction results in terms of isodensity surfaces.

The predictions for the same cube with the alternative radiation setup (rows 3 and 4, Fig. 3) indicate that the inclusion of a star affects not only the prediction of the dust temperature (as expected given the additional heating from the star), but the dust density estimates as well. Although the overall reconstruction of both dust density and temperature remain quite good in this example, it is obvious that the reconstructed dust density has lost accuracy in comparison to the cINN prediction for the same cube subject to only the ISRF. While the overall larger scale structures are still recovered well, a lot of the finer details of the dust distribution are lost compared to the prediction in the ISRF-only scenario. The inclusion of a star inside of the cube thus appears to add not only complexity to the prediction of the dust temperature, but also renders the recovery of the density more difficult. For a complete comparison of all slices of the example cube in the ISRF + star radiation configuration, we refer to Fig. B.4. The corresponding 3D isodensity surface visualisation is provided in Fig. B.5. To provide additional insights into the difference between the predictions on the ISRF-only and ISRF+star cube, Fig. B.3 presents an extension to Fig. 3, where some examples of the predicted posterior distributions are shown. Here, we find that the density posterior distributions become notably wider in the RT scenario that includes the star, which results in flat, plateau-like or even multi-peaked distributions for some pixels. In the latter cases, the peaks of the distributions may then no longer coincide with the ground truth (although the ground truth is always part of the distribution); for instance, the MAP estimator returns a suboptimal result in comparison to the prediction for the same cube in the ISRF-only scenario. There are two possible explanations for the broadening of the density posterior distributions in the second RT scenario. The first is that this an intrinsic degeneracy of the problem. While the cubes share the same density distribution between the two RT setups, the resulting dust temperatures naturally differ. It is not unreasonable to believe that recovering the density can become more or less ambiguous depending on the temperature given that opacity depends on temperature. It is also worth noting that the temperature posterior distributions do not suffer the same broadening effect and appear similarly well constrained in both RT configurations. The second possible explanation is a suboptimal convergence of the network related to the sampling of the training data, which is discussed in more detail further below.

To quantify the overall performance of the SLoS-cINN in combination with the three different point estimators, we present a direct one-to-one comparison of the predicted dust densities and temperatures to the respective ground truth values, and the corresponding relative residuals for all 100 × 323 pixels in the test set in Fig. 4. As indicated by Table 1 and Fig. 4, the overall predictive performance of the SLoS-cINN in the 23-wavelength configuration is quite excellent across all three point estimation approaches. Here, we achieved NRMSEs between 0.06 to 0.07 in log(n/m−3) and between 0.025 to 0.03 in log(Tdust/K), corresponding to median absolute relative residuals, |ērel|, in the ranges of 1.8 to 2.0% and 0.9 to 1.0%, respectively. Although there is a notable dispersion around a perfect one-to-one correlation between the estimated densities and temperatures and the corresponding ground truth, the binned median curve (and binned 25 and 75% quantile curves) in the relative residual diagrams indicates that a majority of pixels is indeed close to a perfect recovery for most of the range covered in density and temperature. Nevertheless, the relative residuals can reach up to 40 to 50% for a small number of individual pixels in terms of both density and temperature.

The binned median relative residual curves also highlight some systematic trends in the point estimation outcome. For the dust density, we find a notable trend towards overestimation of the density for pixels with a true density below log(n/m−3) = 8. There is also a tendency (although to a lesser extent) for under-estimations at the high density end, starting at log(n/m−3) = 11. For the dust temperature, we also observe a systematic underestimation at temperatures higher than 100 K, and for the MeanShift and MNPCP point estimates a slight tendency for overestimation below 10 K. It appears that the recovery of dust density and temperature in terms of the point estimation tends to struggle more overall towards the extreme ends of the respective parameter range. Part of this difficulty can likely be attributed to the relative complexity of these more extreme environments, but there might be a more direct issue with our training data that could explain this decrease in the predictive performance at the edges of the parameter space. Figure A.2 shows the prior distributions of dust density and temperature across all pixels in our training set. Comparing the thresholds at which the binned median relative error starts to show systematic offsets in Fig. 4, that is, log(n/m−3) < 8, log(n/m−3) > 11 and log(Tdust/K) > 2, to the training set priors, we can see that there are comparatively a lot fewer pixels in these parameter ranges in the training data. This (relative) lack of training examples within these parameter ranges could lead to a suboptimal convergence of the cINN, so that it does not achieve the same robustness at the edges of the parameter space as it does within the intervals where a lot of training data is available. Given that the prior distributions of the dust densities and temperatures in our training data are (in part) dictated by the underlying dust cloud simulations, achieving a more even sampling across the parameter spaces is not a trivial matter and at this stage, this is beyond the scope of this proof of concept. It is also worth noting that very high temperature regions (Tdust > 100 K) are both rare in reality and likely affected by strong feedback, being either part of an HII region or related to strong outflow activity. This adds further complexity to these extreme environments, which is also currently not accounted for in the Cloud Factory as noted in Sect. 2. We plan to investigate these effects and the sampling strategy further in future optimisation of our training set generation.

In comparing the three point estimators more in detail, we find that both MeanShift and MNPCP are less prone to large outlier values, as evident by the smaller dispersion around the one-to-one correlation in Fig. 4 and the lower NRMSE in Table 1. At the same time, it is the MAP estimator that returns the overall best |ērel| with 1.85% for log(n/m−3) and 0.87% for log(Tdust/K). Although the MNPCP estimator has a nominally better result for |ērel| with 1.84% for log(n/m−3), it incurs a notably larger error in log(Tdust/K) with 0.96%. The latter small performance decrease of the MNPCP approach in terms of |ērel| is likely a result of the effective smoothing that this estimator performs. As Fig. 4 shows, this enhances the underestimation tendencies at the high temperature end (also for high density but to a lesser degree) and, thus, the average error. For the MeanShift, which performs the worst in terms of |ērel|, this might be a consequence of a suboptimally chosen bandwidth from our simplified bandwidth selection procedure (as described in Sect. 3). If, for instance, the selected bandwidth is too large, the MeanShift kernel will overly smooth the density distribution and likely miss narrow peaks. This results in a comparable (over-) smoothing effect to the MNPCP, as evidenced by the similar behaviour of the two methods in Fig. 4.

Figure 5 provides a breakdown of Fig. 4 for the best and worst case prediction outcomes, that is the five cubes with the best and five cubes with the worst NRMSE in the MAP point estimate. Averaged over the five best cubes |ērel| goes down to about 1% and 0.5% in log(n/m−3) and log(Tdust/K), respectively. In the worst cases on the other hand, |ērel| reaches up to 4.4% and 1.7% in log(n/m−3) and log(Tdust/K), respectively. What is interesting to note here is that the five best cubes are all only subject to the ISRF, whereas the five worst ones are all in the ISRF + star radiation configuration. This reaffirms our earlier assessment that the presence of a star in the cube notably complicates the problem. We further quantified this by breaking down the overall performance on the test set between the two radiation setups in Table 1. As we can see, |ērel| increases by almost a factor of two for the cubes with ISRF + star setup in comparison to the ISRF-only configuration cubes regardless of the choice of the point estimator. We also want to emphasise that the observed performance is not dependent on the selected viewing angle of the cubes. We have confirmed in a test limited to the 50 ISRF-only cubes that the cINN returns a similarly excellent reconstructive performance, when the cubes are observed from different directions. Thus, our choice to only generate synthetic observations from one direction in the training data has not introduced a bias in the form of a preferred viewing angle (for more details, see Appendix B.1).

In Sect. 3, we specifically introduce the MNPCP approach, because the MAP and MeanShift estimators per construction of our inverse problem do not have a spatial consistency guarantee perpendicular to the LoS (as demonstrated in Fig. 3). To quantify whether the MNPCP approach improves upon this situation (beyond the qualitative comparison in Fig. 3), we computed the median difference in density and temperature for neighbouring pixels following: ΔLoS =median{ xi+1,j,kxi,j,k }{ i{1,,N1}j,k{1,,N} ${\Delta _{\mid {\rm{LoS }}}} = {\mathop{\rm median}\nolimits} \left\{ {{x_{i + 1,j,k}} - {x_{i,j,k}}} \right\}\left\{ {\matrix{ {\forall i \in \{ 1, \ldots ,N - 1\} } \hfill \cr {\forall j,k \in \{ 1, \ldots ,N\} } \hfill \cr } } \right.$(14)

and perpendicular to the LoS: ΔLoS=median{ Δj,Δk },${\Delta _{ \bot {\mathop{\rm LoS}\nolimits} }} = {\mathop{\rm median}\nolimits} \left\{ {{\Delta _j},{\Delta _k}} \right\},$(15)

where Δj={ xi,j+1,kxi,j,k }{ j{1,,N1}i,k{1,,N} Δk={ xi,j,k+1xi,j,k }{ k{1,,N1}i,j{1,,N}, $\eqalign{ & {\Delta _j} = \left\{ {{x_{i,j + 1,k}} - {x_{i,j,k}}} \right\}\left\{ {\matrix{ {\forall j \in \{ 1, \ldots ,N - 1\} } \hfill \cr {\forall i,k \in \{ 1, \ldots ,N\} } \hfill \cr } } \right. \cr & {\Delta _k} = \left\{ {{x_{i,j,k + 1}} - {x_{i,j,k}}} \right\}\left\{ {\matrix{ {\forall k \in \{ 1, \ldots ,N - 1\} } \hfill \cr {\forall i,j \in \{ 1, \ldots ,N\} ,} \hfill \cr } } \right. \cr} $

for our three different point estimators. We then compared these results to the respective values obtained from the ground truth. The results of this analysis on the test set are summarised in Table 2. As expected, there is no preferred direction in the ground truth, with values of about 0.075 in log(n/m−3) and 0.01 in log(Tdust/K) for both ∆||LoS and ∆⊥LoS. In the MAP and Mean-Shift prediction results, on the other hand, we find ∆⊥LoS to be about twice as large as ∆||LoS on average, confirming again the spatial consistency issue. In contrast, the MNPCP estimator offers a much more balanced result, achieving about even ∆⊥LoS and ∆||LoS for log(Tdust/K), and at least reducing ∆⊥LoS to about 1.5∆||LoS for log(n/m−3). Yet even with that outcome, the MNPCP estimate does not quite achieve the balance of the ground truth results. It is also interesting to note that all three point estimators return solutions where ∆||LoS is notably smaller than in the ground truth. This indicates that the cINN prediction tends to return a smoother transition along the LoS than the ground truth. This is likely a result of the fact that the cINN returns a smooth continuous output, whereas the ground truth is limited by the coarseness of the simulation resolution.

In summary, we find an overall very satisfactory performance of the SLoS-cINN in the 23-wavelength configuration, providing a fairly robust recovery of dust density and temperature for most of the tested parameter range. We do note, however, a systematic decrease in performance towards the lower and upper limits of the trained range in terms of density and temperature, which can potentially be traced back to a relative lack of examples in these regimes in the training data. We also identified a dependence of the performance on the radiation setup of the test cubes, where the ones also hosting a star in addition to the ISRF appear more difficult to reconstruct. Regarding the choice of the point estimator, there is no clear winner in terms of the NRMSE and |ērel| performance indicators. Nevertheless, we believe that the MNPCP approach appears as the most reasonable solution because it can provide smooth reconstruction solutions both along and perpendicular to the LoS. One caveat to keep in mind is the fact that the MNPCP point estimator does amplify the systematic error tendencies at the lower and upper limits of the density and temperature ranges due to its inherent smoothing effect.

thumbnail Fig. 2

Schematic outline of the median neighbour pixel combined posterior approach for point estimations based on the example of dust density. The procedure follows the panels from top left to bottom left in clockwise order. The top-left panel shows the MAP density estimates for a single slice of a dust cube (perpendicular to the LoS), highlighting the discontinuities that can occur in the MAP estimate. The query pixel, for which the MNPCP estimate will be computed, and its neighbourhood are indicated in orange and purple, respectively. The top-right panel shows the predicted posterior distributions for the dust density for the query pixel with index 7/11/8 (center subpanel indicated by the orange outline) and all 26 neighbouring pixels. The colours of the curves indicate the slice, i, that they come from, whereas the indices in the top right corner denote the pixel index perpendicular to the LoS. The bottom-right panel shows a histogram of the all posterior samples accumulated from the query pixel and its neighbours. Here, the orange dotted curve indicates the original (marginalised 1D) posterior distribution of the query pixel and the purple line marks the MNPCP estimate, which is the city-block distance weighted median of all posterior samples (the distance in pixels when allowing only right angle moves, no diagonals). Finally, the bottom-left panel presents the MNPCP dust density estimates of the slice shown in the top left with the orange and purple boxes indicating again the query pixel and its neighbourhood.

Table 1

Summary of the predictive performance for our three different cINN setups and the three different point estimation methods.

thumbnail Fig. 3

Predicted dust densities and temperatures for a single example cube slice perpendicular to the LoS. A comparison between the three point estimation methods to the ground truth is shown in the left column. The top two rows give the 23-wavelength SLoS-cINN result for the ISRF-only scenario, while rows 3 and 4 display the counterpart for the ISRF + star case. Rows 5 and 6 present the NLoS-based outcome for the ISRF-only case, and the last two rows show the corresponding seven wavelengths SLoS-cINN prediction. The listed NRMSE and median absolute relative errors are averages over this slice only and not the entire cube.

thumbnail Fig. 4

Performance breakdown of the SLoS-cINN using 23 wavelengths. 2D histograms comparing the cINN predictions for dust density (top two rows) and temperature (bottom two rows) to the ground truth across all pixels of the test set data are given, distinguishing the results of the three point estimation procedures: MAP, MeanShift, and MNPCP. Rows 1 and 3 present the direct one-to-one correlation of the predicted parameters to the ground truth, whereas rows 2 and 4 provide the corresponding relative residuals. In the latter panels, the black curve and grey shaded area indicates a binned median relative residual along with the interquantile range between the 25% and 75% quantile of these bins.

thumbnail Fig. 5

Breakdown of the predictive performance of the SLoS-cINN for the best and worst cases. Analogously to Fig. 4, we show the 2D histograms for the one-to-one comparison of the prediction results (rows one and three) and their respective residuals (rows 2 and 4). The left three columns present the five best reconstructed cubes, whereas the five worst reconstructed ones are shown in the three right columns, respectively.

4.2 Taking the neighbouring LoSs into account

Rows 5 and 6 of Fig. 3 provide the qualitative comparison between the prediction outcomes of the three point estimation approaches for the NLoS-cINN, which is the model trained for the alternative formulation of the inverse problem described in Sect. 3.5. For direct compatibility the shown slice and example cube are the same as for the SLoS-cINN outcomes in Fig. 3, except that the 124 border pixels, which lack the required number of neighbouring LoSs for the prediction with the NLoS-cINN, are missing. At first glance the MAP and MeanShift results appear less noisy, which is subject to fewer strong discontinuities, with the NLoS-cINN in comparison to the SLoS-cINN outcome (first two rows of Fig. 3). In addition, the NLoS-cΓNN based reconstructions seem overall to be slightly more faithful to the ground truth in terms of the recovered details.

Looking at the spatial discontinuities perpendicular to the LoSs in the MAP and MeanShift estimates (see Table 2), it appears that the NLoS-cINN suffers on average from the same issue as the SLoS-based prediction results. ∆||LoS is still twice as large as ∆||LoS for both density and temperature. Again, only the MNPCP approach achieves a more balanced result, but not quite at the level of the ground truth. Thus, accounting for the fluxes in the neighbouring LoSs does not appear to lead to a significant improvement of the spatial consistency perpendicular to the LoS in the prediction of dust density and temperature. It is worth noting, however, that this observation may only hold in the fully resolved dust emission map scenario that we have posed in this study, which renders neighbouring LoSs effectively independent. In real observations, however, where the PSF of a given instrument is larger than a single pixel, neighbouring pixels in the dust emission maps may become correlated. In the latter case, it is possible that the NLoS-cINN may perform better with regards to the spatial consistency. We will conduct a corresponding test in our subsequent work, once we have established a proper treatment of the instrument-related resolution effects during training.

Looking at the overall performance of the NLoS-cINN in comparison to the SLoS-cINN (see Table 1 and Fig. 6), we do find a general improvement. In particular, the NRMSE goes down to values as low as 0.054 and 0.0213 for log(n/m−3) and log(Tdust/K) with the MNPCP estimator, compared to the SLoS-cINN results of 0.062 and 0.0245. |ērel| with, for instance, the MNPCP estimator improves to 1 .54% and 0.82% opposed to the SLoS-based performance of 1.84% and 0.96%, respectively. It is possible that this performance improvement is only a data selection effect, since the NLoS and SLoS test sets are not fully identical, with the former missing the edge LoSs of every cube. To test this hypothesis, we recomputed the SLoS-cINN performance, limited to the LoSs of the NLoS test set. This experiment reveals that the observed average performance improvement of the NLoS-cINN is real, as the SLoS-cINN performs even slightly worse on this limited test set, returning for instance |ērel| values of 1.86% and 0.98% for log(n/m−3) and log(Tdust/K) with the MNPCP estimator, respectively.

In summary, accounting for the fluxes of the neighbouring LoSs in the input has not achieved its initial goal of improving the spatial consistency of the predicted dust densities and temperatures perpendicular to the LoS. It has, however, demonstrated a slight improvement of the predictive performance of the model, indicating that knowledge of the fluxes in the neighbouring LoSs provides additional constraints for the prediction of the dust properties. However, this comes at a cost of flexibility, as all query LoSs now also require observations of the eight adjacent LoSs in this approach. All in all, the NLoS-cINN does not increase the spatial consistency and despite a slight performance improvement does not appear to be a markedly superior approach.

Table 2

Overview of the median difference, ∆, in dust density and temperature between neighbouring pixels along and perpendicular to the LoS for the three different point estimators and two inverse problem setups.

4.3 Realistic wavelength coverage test

As our final experiment, we trained and tested an SLoS-cINN for a more realistically limited wavelength coverage, corresponding to the central wavelengths of the following seven bands: WISE 22 µm, SOFIA 89 µm and 154 µm, Herschel PACS 100 µm and 160 µm, Herschel SPIRE 350 µm, and LABOCA 870 µm. The last two rows in Fig. 3 provide the qualitative example of the dust density and temperature prediction results for a single cube slice in comparison to the ground truth (see also Fig. A.3 for the corresponding input emission maps at the seven selected wavelengths, as indicated by the highlighted panels). It is immediately evident that the large reduction in wavelength coverage leads to a notably decreased quality in the reconstruction. While larger scale and more diffuse features of the density and temperature distributions are still being recovered very well, this is no longer true for narrow details at high density and low temperature, in particular, which tend to be only partially reconstructed. This is even more apparent in the full prediction summary of this cube in Fig. B.8 (and the corresponding isodensity surface diagram in Fig. B.9). In addition, the MAP point estimator appears to produce a lot more and larger spatial discrepancies in the predicted dust densities and temperatures perpendicular to the LoS. Interestingly, where the MeanShift algorithm appeared to show similar behaviour to the MAP estimator in the previous SLoS-cINN setup in terms of spatial inconsistencies, it seems to provide more consistent prediction results here. Looking at ∆⊥LoS and ∆||LoS in Table 2, this can be quantitatively confirmed at least for the dust density as well, with ∆j LoS being only about two times greater than ∆||LoS in the MeanShift result, compared to ∆⊥LoS ≈ 3∆||LoS in the MAP outcome. With this outcome, it seems that a determination of the most likely solution in the joint target parameter space in this setup is more robust in terms of the spatial consistency of the dust density than the marginalisation approach in the MAP estimator.

Looking at the overall performance on the test set in Table 1, the limitation to only seven wavelengths in this setup increases the NRMSE to 0.084–0.098 and 0.036–0.046 for log(n/m−3) and log(Tdust/K), respectively. The corresponding |ērel| values rise to 2.6–3.1% and 1.5–2.0%. Although this is a notable performance decrease with |ērel| increasing by a factor of 1.5–2 in comparison to the 23-wavelength setup, considering that this cINN has 16 wavelengths fewer to work with, this is still a quite satisfactory performance. This indicates that the 3D reconstruction approach can be feasible even for more limited observational coverage.

Investigating the prediction performance in more detail in Fig. 7 (which provides a breakdown, analogously to Figs. 4 and 6), we find that the change in wavelength coverage also influences the systematic tendencies of the prediction results. Where for instance the binned median relative residual curve for the dust density in the 23-wavelength SLoS-cINN outcome exhibits aplateau between 108 and 1011 m−3 and only really leans into systematic behaviour below and above this range, the systematic offsets are amplified in the seven wavelength prediction outcomes. In particular for the MeanShift and MNPCP point estimates we now find a pivot point at around 1010 m−3, below which the density tends to be overestimated and above which it is systematically underestimated. We find a similar amplification of the earlier observed systematic behaviour in the MeanShift and MNPCP estimates of the dust temperature, where there is now a pivot point between over- and underestimation at about 19 K. Nevertheless, for most pixels, the error remains comparatively small, as indicated by |ērel|, and only few individual pixels will manage to reach the maximum relative residual of about 60%. Part of this systematic behaviour can likely be attributed to the relative lack of examples in the training data towards the lower and upper limits of density and temperature, as we discuss in Sect. 4.1. However, given that this cINN also exhibits systematic offsets within the parameter ranges, where a lot of training data is available, this also points towards the increased complexity of the recovery task with much less observational information; in particular in the intermediate density and temperature ranges. Evidence for the intrinsic increase in complexity can be found in the shape of the predicted posterior distributions, a few examples of which are shown in Fig. B.3. Compared to the full 23-wavelength SLoS-cINN, the predicted posteriors of the seven wavelength limited model appear notably broader and often exhibit double or multi-peaked distributions for both density and dust temperature even in the ISRF-only RT configuration. This shows that the cINN has both a harder time to constrain the prediction and finds more degeneracy in the more limited problem.

In summary, although the 7-wavelength SLoS-cINN exhibits an (expected) reduction in predictive performance compared to the 23-wavelength counterpart, this experiment proves that the 3D reconstruction of the dust distributions is quite feasible even when subject to more realistic observational constraints. We also want to emphasise again that our wavelength selection does not necessarily preserve the most information for the given inverse problem. A different, more optimal selection of wavelengths might retain more of the predictive performance of the full 23-wavelength cINN. We plan to look further into both the optimal choice of wavelength combination and relative importance of each wavelength (for the reconstruction) for realistic applications in our continued development of this 3D reconstruction approach.

thumbnail Fig. 6

Prediction performance summary for the NLoS-cINN on the held-out test data. 2D histograms of the direct one-to-one comparison of prediction results to the ground truth (rows one and three) and the respective residuals (rows 2 and 4) are shown for the three different MAP estimators (analogously to Fig. 4).

4.4 Comparison with classical SED fit

The dust emission is often modelled by a modified black body (MBB) in order to construct a column density map and the temperature distribution from multi-wavelength observations. As a final test, we compared the predictive performance of our cINN method to the results of a classical MBB fit. For the purposes of this test, we employed the full 23 wavelengths of coverage and focussed on the simpler RT scenario, comparing SED fit and cINN approach on the 50 cubes that are only subject to the ISRF. The dust emission, Iν, may be approximated as an MBB by: Iv=Bv(Tdust)(1eτv)Bv(Tdust)τv=μgmHNHδgdκvBv(Tdust),${I_v} = {B_v}\left( {{T_{{\rm{dust }}}}} \right)\left( {1 - {{\rm{e}}^{ - {\tau _v}}}} \right) \approx {B_v}\left( {{T_{{\rm{dust }}}}} \right){\tau _v} = {\mu _{\rm{g}}}{m_{\rm{H}}}{N_{\rm{H}}}{\delta _{{\rm{gd}}}}{\kappa _v}{B_v}\left( {{T_{{\rm{dust }}}}} \right),$(16)

where τv is the optical depth, mH is the hydrogen mass, and the black body Bv (Tdust) spectrum is modified by the dust opacity: κv=2(vv0)βcm2 g1.${\kappa _v} = 2{\left( {{v \over {{v_0}}}} \right)^\beta }{\rm{c}}{{\rm{m}}^2}{{\rm{g}}^{ - 1}}$(17)

For the opacity, we take a characteristic frequency of v0 = 600 GHz and a spectral index of β = 2. Here, the dust temperature Tdust and the gas column density NH are the fit parameters to be determined by a least-squares fit.

To compare the cINN outcome with the SED fit results we compute the column number density, N, by integrating the density along each LoS for both the ground truth and SLoS-cINN prediction outcome, that is: N=0Ln dl=i=132niΔl$N = \int_0^L n {\rm{d}}l = \sum\limits_{i = 1}^{32} {{n_i}} \Delta l{\rm{, }}$(18)

where L indicates the depth of our test cubes of 0.2pc and ∆l = L/32 is our cube spatial resolution. As a proxy for the temperature determined by the SED fit, we can compute a density-weighted average temperature T¯dust${\bar T_{{\rm{dust }}}}$ following: T¯dust =i=132niTdust ,ii=132ni.${\bar T_{{\rm{dust }}}} = {{\sum\limits_{i = 1}^{32} {{n_i}} {T_{{\rm{dust }},i}}} \over {\sum\limits_{i = 1}^{32} {{n_i}} }}.$(19)

Figure 8 provides an example comparison of the column number densities and density weighted average temperatures between the outcome of the SED fit, the cINN prediction and the ground truth for one example cube that is only subject to the ISRF. As this example shows, both the SED fit and cINN approach manage to reproduce the column number density map for this cube quite accurately with overall absolute relative errors below 1%. However, the result based on the MAP cINN estimates returns notably smaller residual errors than the SED fit. The maps based on the MeanShift and MNPCP point estimators on the other hand are about on par with the SED fit, if not slightly better. For Mean-Shift this is likely explained by the on average higher absolute relative error of the method itself in comparison to the MAP estimator (as we describe in Sect. 4.1). The MNPCP estimator on the other hand performs worse here because of its inherent smoothing action perpendicular to the LoS. As such small amounts of material may get mixed between adjacent LoSs, increasing the error on the column density. Looking now at the density averaged temperature T¯dust${\bar T_{{\rm{dust }}}}$, we find excellent results for the cINN based estimates, whereas the SED fit outcome notably underestimates the temperature by up to 10% or more. However, it is important to note that the density-weighted average temperature is only an approximation of the temperature that an SED fit would return, so a discrepancy is to be expected. Lastly, Fig. 9 provides the performance comparison between the SED fit and cINN approach across all pixels of the 50 considered test cubes. This figure confirms that the cINN based estimates of both column number density and density weighted average temperature are overall quite excellent. In addition, the cINN approach outperforms the SED fit in all cases, although the margin is comparatively small in the column number density for the MeanShift-based outcome. We also note a slight tendency of the MeanShift and MNPCP derived results towards overestimation of the density averaged temperature. We emphasise that the MBB SED fit may only provide a 2D estimate of the underlying density and temperature distributions, whereas the advantage of the cINN approach is the reconstruction of the full 3D information along the LoS. We also note that while this test indicates the MAP point estimate as the best suited choice to recover the column density, for the full 3D reconstruction, we still recommend the MNPCP estimate because of its higher degree of spatial consistency in the 3D structure.

thumbnail Fig. 7

Summary of the predictive performance on the held-out test data for the SLoS-cINN using only seven wavelenghts as input. Shown are 2D histograms of the direct one-to-one comparison of prediction results to the ground truth (rows one and three) and the respective residuals (rows 2 and 4) for the three different MAP estimators (analogously to Fig. 4).

thumbnail Fig. 8

Comparison of estimated column density and average temperature maps for one example cube in the ISRF-only RT configuration between a classical SED fit and our cINN approach. The large panels on the top show the ground truth for the column density (left) and density weighted average temperature (right), respectively. The smaller panels below each ground truth panel present a map of the estimates from the respective method on the left and a map of the absolute relative error |erel| on the right. The example cube as in Figs. 3, B.1, and B.2 is shown.

thumbnail Fig. 9

Comparison of the predictive performance for column density and density averaged temperature between a classical SED fitting technique and our cINN approach across all pixels of the 50 test cubes that are only subject to the ISRF. The top two rows show the results for the estimated column density, whereas the bottom two present the density averaged dust temperature, T¯dust${\bar T_{{\rm{dust }}}}$. In each group, the upper panels provide a direct one-to-one comparison of the predicted values to the respective ground truth, whereas the lower panels show the corresponding relative residuals.

4.5 On the physical feasibility of the 3D reconstruction

The physical reason why our approach of reconstructing the underlying 3D density and temperature structure works so well is that the dust opacity strongly depends on the energy of the incident electromagnetic wave. In addition, there is a dependence on the chemical composition and the grain size distribution, but we keep those fixed here to the standard Milky Way values (Sect. 2.2). For further details we refer to Tielens (2010) and Draine (2011). Overall, this means that different wavelengths trace different optical depths within the cloud. As a consequence, the observed radiation is accumulated in different ways along the LoS and the resulting SED encodes critical information about the density and temperature structure of the cloud, as also discussed in the Sect. 4.4. This process is highly degenerate, as varying cloud configurations can lead to very similar SEDs; thus, solving the inverse problem of reconstructing the cloud properties from the SED is a challenging task. This is where the INN architecture can play out its full potential since it belongs to the group of normalising flow methods. It offers direct access to the full posterior distribution function and enables us to approach this challenge in a statistically reliable way.

4.6 Application to real data

An application to real observational data of cloud cores is at this stage not currently feasible with our setup. The main reason for this is our treatment of the synthetic dust emission observations. The choice to only use monochromatic dust emission and to not include instrument-related effects, such as a consideration of the PSF or noise renders our synthetic dust emission observations notably different to what the actual observatories would measure. This results in a gap between the synthetic observable space learned by our cINN models and the actual observable space spanned by real measurements, so that prediction attempts are bound to fail (for an example see Appendix B.2). For an application to real data, the generation of the synthetic dust observations thus needs to be further refined by accounting for instance for the actual band width of real instruments and ensuring a proper sampling and modelling of their respective PSFs. We plan to address these challenges in our subsequent development of our method.

Apart from the current limitations of our synthetic dust emission observations, we also identified another area that may require improvements towards an application to real data. As we describe in Sect. 2, our training data generation is aimed at creating simplified, synthetic analogues to objects such as ρ Oph A. However, we suspect that our simplified prescription for adding a star to the cubes, that is, randomly placing it in a region of low dust density to emulate stellar feedback, is not likely to produce truly realistic results. In this approach, the dust clouds are merely illuminated by the star, rather than being mechanically affected by the stellar feedback. Consequently, we do not actually model clouds that are actively shaped by their nearby stars, as is (in particular) the case for cores such as that of ρ Oph A. For that reason, we plan to move to more sophisticated MHD simulations, where stellar feedback is properly accounted for before we apply our method to real observations.

5 Summary and conclusions

In this paper, we present a proof-of-concept deep-learning approach to reconstructing interstellar dust distributions in 3D from observed dust emission maps, focussing on the sub-parsec length scales of individual star-forming clumps. To train this approach, we modelled simplified analogues of the ρ Oph A star-forming clump, employing the Cloud Factory dust cloud simulations by Smith et al. (2020), Izquierdo et al. (2021). For the basis of our training database, we selected 11.065 cubes from the Cloud Factory, centered on clump-like dust aggregations, with a side length of 0.2 pc and 32 × 32 × 32 pixel resolution. We simulated the corresponding dust emission observations with the POLARIS (Reissl et al. 2016, 2019) radiative transfer code at 23 different wavelengths between 12 µm and 1300 µm, matching the central wavelengths of the observational instruments of WISE, MSX, Spitzer, SOFIA, Herschel, ALMA, APEX, and CSO. For the radiative transfer, we considered two irradiation scenarios. In the first, the dust is only subject to the interstellar radiation field (ISRF) with an amplitude that matches conditions in nearby star-forming cores. In the second, we randomly inserted one B4-type star in a low density area inside the cube in addition to the ISRF in order to emulate cloud cores that are subject to the radiation of nearby stars. This procedure yields a final training dataset of 22.130 mock dust clumps, including their dust density and temperature distributions on a 3D grid and the corresponding simulated dust emission fluxes in 23 wavelengths.

To reduce the complexity and dimensionality of the inverse problem posed by the 3D reconstruction task, we broke the problem down to individual LoS under the assumption that (to first order) the emission measured in a given pixel only depends on the material along the corresponding LoS. Specifically, we aimed to recover the 32 dust densities and temperatures along a given LoS from the measured fluxes at different wavelengths in the corresponding pixel of the dust emission map. For this purpose, we trained a conditional invertible neural network (cINN), which is a deep-learning approach that can efficiently estimate full posterior distributions for the target parameters conditioned on the observations. To recover point estimates from the posterior distributions predicted by the cINN (and compare these to the ground truth), we tested three different methods. The first uses a 1D kernel density estimate to determine the maximum a posteriori (MAP) prediction (that is the most likely value) for the dust density and temperature from the marginalised 1D posterior distributions of the individual pixels. The second method employs the MeanShift algorithm to determine the most likely solutions as the point with the highest (probability) density in the full 64-dimensional space of the predicted posterior distributions. As both the MAP and MeanShift estimators have no spatial consistency guarantee perpendicular to the LoS, as per our reduction of the reconstruction task, we introduced a third estimator to rectify this circumstance. This method, dubbed median neighbour pixel combined posterior (MNPCP), determines the point estimates for a given pixel by accumulating the posterior samples of all neighbouring pixels and computing the median of dust density and temperature on this collection.

In total, we trained and tested cINNs for three different formulations of the reconstruction task. The first consists of a perfect information scenario, where we have access to the dust emission maps at all 23 wavelengths. In the second, we extended the network input to also account for the observed fluxes in the neighbouring pixels of the query LoS to evaluate whether this improves the spatial consistency of the prediction outcome. Lastly, we considered a more realistically limited observational scenario, where observations are only available in a combination of seven wavelengths (matching the coverage of real observational data that is for instance available for the star-forming core ρ Oph A). We then evaluated the performance of our trained cINN models on a synthetic test set that consists of 50 mock dust clouds in both irradiation scenarios. Our main findings are summarised in the following.

The model trained for the 23-wavelength scenario achieves an excellent overall predictive performance, returning median absolute relative errors |ērel| of 1.84 to 2.01% in log(n/m−3) and 0.87 to 1.01% in log(Tdust/K), depending on the choice of the point estimator. Averaged over the five best reconstructed cloud cores, |ērel| even goes as low as 1% and 0.5% in log(n/m−3) and log(Tdust /K), respectively. In general, we find that the reconstructive performance is better for clouds that are only subject to the ISRF, indicating that the presence of a star in the vicinity of a mock dust cloud adds a notable level of complexity to the reconstruction task. Breaking the predictive performance down to the individual pixels, we also identified some systematic behaviours. Although the dust densities and temperatures were recovered well overall, we found trends for overestimating the density in the low density regime (n < 108 m−3) and of slight underesti-mations for very-high-density regions (n > 1011 m−3). A similar tendency for underestimation also appears in the dust temperature point estimates for Tdust > 100 K. We identified a bias in the training data as a potential explanation for this behaviour, as the training database contains significantly fewer pixels in these parameter ranges. It should be noted, however, that (in particular) the Tdust > 100 K regime represents more extreme conditions that are more typically found in HII regions or in regions of strong outflow activity; however, these are neither accounted for in the Cloud Factory simulations nor the focus of our analysis. This incomplete modelling of the high temperature regime may also tie into the comparatively worse reconstructive performance.

The comparison of the three point estimators reveals no clear favourite in terms of the quantitative performance measures. Although the MeanShift and MNPCP estimators achieve a slightly better NRMSE than the MAP and the MAP estimator provides the smallest |ērel|, the difference between the three methods is only on the order of 10−3 in NRMSE and 0.1% in |ērel|. The MAP and MeanShift estimates do suffer from the aforementioned spatial inconsistencies perpendicular to the LoS (as per construction). Although the MeanShift results appear to be slightly less affected by this, we find that determining an optimal bandwidth for MeanShift in the 64-dimensional target parameter space can be tricky and may lead to oversmoothing. The MNPCP approach provides much better results in term of spatial consistency, but also exhibits oversmoothing behaviour, which amplifies the systematic trends of the method at the edges of the parameter ranges. Ultimately, we would generally recommend the MNPCP solution (despite its flaws), as it quite effectively avoids the more severe and unphysical spatial inconsistencies perpendicular to the LoS that the MAP and MeanShift approaches can hardly avoid (by construction).

For the cINN model that also accounts for the emission in the neighbouring LoSs, we found an overall slight increase in the predictive performance, but no significant improvement in the spatial consistency of the prediction outcome. This experiment indicates that accounting for the measured fluxes in the neighbouring pixels may help with constraining the dust density and temperatures overall, but cannot counteract the inherent spatial consistency bias of the LoS approach. In addition, this setup is less flexible as it requires a query LoS to have measurements for all eight neighbouring pixels, which excludes (for instance) edge pixels. Although the slight performance improvement of this approach is certainly desirable, it is overall not large enough to clearly distinguish this setup as a superior method in comparison to the single LoS approach.

Limiting the input to a more realistic coverage of seven wavelengths – in our case corresponding to the central wavelengths of the WISE 22 µm, SOFIA 89 µm and 154 µm, Herschel PACS 100 µm and 160 µm, Herschel SPIRE 350 µm, and LABOCA 870 µm bands – naturally leads to a decrease in predictive performance. Nevertheless, the cINN overall still achieves a satisfactory performance with |ērel| on the order of 2.6 to 3.1% for log(n/m−3) and 1.5 to 2.0% in log(Tdust/K). The most notable change in the behaviour of the seven wavelength cINN is an amplification of the average systematic errors in the predicted point estimates, leading to more difficulties overall in reconstructing the dust distributions in very dense structures or extremely diffuse regions. It is worth emphasising, however, that the tested selection of wavelengths was inspired by available data for ρ Oph A and not curated to maximise the network performance. A more optimal wavelength configuration is certainly likely to achieve even better reconstructive power. Lastly, we find that an application of our approach to real observational data is not yet feasible at this stage, as our simplified simulation setup is not able to correctly model all the nuances of real dust emission observations. We conclude that a more in-depth treatment of the synthetic dust emission observations and improvements to our simulation basis (to e.g. properly model the interaction between stars and the dust through stellar feedback) may be required.

In summary, we have shown that a cINN-based approach for the 3D reconstruction of dust distributions produces quite excellent results in a perfect information scenario on synthetic data and still retains a satisfactory performance, even for more realistly limited observational coverage. For future applications to real data, however, the method still requires a further refinement of the simulation setup for the training data to resolve the mismatch between real and synthetic observations, with some adjustment to the training set sampling to reduce the potential bias, and an analysis of the most informative band combinations to maximise performance in realistic coverage scenarios. Once these points have been resolved over the course of our follow-up studies and we can demonstrate a successful application to real observational data, we plan to make this approach available to the community in the form of an open-source tool. Access to a work-in-progress code may be provided upon reasonable request beforehand.

Acknowledgements

The team in Heidelberg acknowledges funding from the European Research Council via the ERC Synergy Grant “ECOGAL” (project ID 855130), from the German Excellence Strategy via the Heidelberg Cluster of Excellence (EXC 2181 – 390900948) “STRUCTURES”, and from the German Ministry for Economic Affairs and Climate Action in project “MAINN” (funding ID 50OO2206). They also thank for computing resources provided by The Länd and DFG through grant INST 35/1134-1 FUGG and for data storage at SDS@hd through grant INST 35/1314-1 FUGG. RJS gratefully acknowledges an STFC Ernest Rutherford fellowship (grant ST/N00485X/1).

Appendix A Training set generation

Table A.1

Overview of filter bands and instruments considered in this study.

This appendix provides additional material with regard to the construction of our training data. Table A.1 provides a summary of the different bands we consider in our generation of the synthetic dust emission maps in Section 3. Listed are the instrument name, band ID, the central wavelength we assume for the band, and the reference indicating from where we have extracted the respective information. Figures A.1 and A.2 provide histograms for the effective prior distributions for the dust density and temperature, and the simulated fluxes in all wavelengths that the cINN sees during training, summarised over all the pixels in our training data set. We emphasise that these effective priors are an outcome of our training data selection, depicting the actual distributions of the respective parameters in the final training data set, and not a prescription after which the training data is selected. Lastly, Figure A.3 shows an example of the emission maps generated by POLARIS at the 23 different wavelengths that we consider for the example cube used in Figures 3 (top row), B.1, B.2, and B.3 (top row).

thumbnail Fig. A.1

Histograms of the prior distributions and correlation of the target parameters in the training data over all pixels. The top-left and bottom-right panels show the 1D distributions of the number density and dust temperature, respectively. In both panels, the boxes at the top left provide the minimum and maximum of the respective parameter. The bottom left panel presents a 2D histogram of the effective prior distribution in the combined density-temperature space.

thumbnail Fig. A.2

Histograms of the prior distributions for the observables in the training data over all cubes and pixels

thumbnail Fig. A.3

Example of the synthetic dust emission maps at all 23 considered wavelengths that serve as the input to our cINN approach. These correspond to bands of specific instruments (as labelled on the top of each panel), but do not account for PSF-related resolution effects of the respective telescopes. The shown example corresponds to the example cube in the ISRF-only configuration that is also the subject of Figures 3, B.1, and B.2. The panels outlined with the black dashed lines correspond to the seven wavelengths considered in our more limited experiment in Section 4.3. We emphasise that the presented flux maps are corrected for distance.

Appendix B Additional material for the performance analysis

This appendix provides complementary diagrams for the performance analysis of our cINN approach outlined in Section 4. Figure B.1 provides an extended comparison of the point estimation results to the ground truth for the example cube shown in the first to rows of Figure 3, depicting all 32 slices of the cube. As an alternative visualisation of this comparison, Figure B.2 shows 3D isodensity surface diagrams for densities of 1010 (grey) and 1011 m−3 (red) for two different rotation angles of the example cube, comparing again the results of our three point estimators to the ground truth. As discussed in Section 4.1, this diagram illustrates the very good recovery of the 3D dust structure at intermediate densities, whereas some finer, high-density structures may be subject to underestimation and are, thus, lost in this visualisation. As an extension to Figure 3, Figure B.3 provides examples for the predicted posterior distributions in the different network and cube configurations analysed in Figure 3.

thumbnail Fig. B.1

Comparison of the point estimate prediction results to the ground truth for all slices of one example cube that is only subject to the ISRF. In each panel, the subpanels show from top left to bottom right the cube slices going along the LoS from front to back. The left and right columns show the dust density and dust temperature respectively. From top to bottom, the rows indicate the ground truth and the MAP, MeanShift, and MNPCP estimates based on the outcome of the single LoS cINN using all 23 wavelengths. This diagram shows the full cube of the single slices shown in the first two rows of Figure 3.

thumbnail Fig. B.2

3D isodensity surface diagrams for one example test cube in the ISRF-only radiation configuration, comparing the SLoS-cINN prediction results to the ground truth. Here the two rows show different rotation angles of the given cube. The grey surfaces indicate a density of 1010 m−3, whereas red surfaces mark the 1011 m−3 density level.

Table B.1

Performance summary for the viewing angle comparison experiment.

Analogously to Figures B.1 and B.2, Figures B.4 and B.5 show the respective results for the cube in the ISRF + star radiation configuration (rows 2&3 of Figure 3), Figures B.6 and B.7 present the outcome as derived from the NLoS-cINN (Section 4.2), and Figures B.8 and B.9 illustrate the predictions from the SLoS-cINN that is limited to the seven wavelengths configuration (Section 4.3).

Appendix B.1 Influence of the observation angle

In Section 2.2, we outline that during our training set generation we only simulate synthetic observations for the simulation cubes from one viewing angle. In particular, it our simulated observations are performed from the same direction for all training cubes. Although somewhat unlikely, this circumstance begs the question, whether the chosen viewing angle may have introduced a bias in our trained method, so the reconstruction from other viewing angles may differ. Naturally, the simulated synthetic observations depend on the viewing angle of the cloud as the contribution of different parcels of gas in the cloud to the observed dust emission depends on the 3D position after all. Therefore, it is to be expected that a reconstruction of the same cloud from two different viewing angles is not 100% identical. To test how the viewing angle may affect the reconstructive performance and rule out a direction bias in our method, we conducted an experiment where we rotated the 50 cubes that are only subject to the ISRF, resimulating the corresponding synthetic dust emission observations and then reconstructing again the 3D distribution of the dust with the 23-wavelength SLoS-cINN. Compared to the main viewing angle, for which we present the performance in Section 4.1, we tested two additional observation directions. In the first, the cubes are rotated by 90° clockwise around the vertical axis in the plane of the primary viewing angle; namely, we are now observing the right face of the cube instead of the front face. For the second test, we rotated the cube around the horizontal axis by 90° clockwise, so that we are now observing the bottom face of the cube.

thumbnail Fig. B.3

Comparison of the predicted posterior distributions for a few example pixels of the cubes shown in Figure 3. The panels in the leftmost and right-most column show the MAP prediction results for density and dust temperature for the same slice as in Figure 3, respectively. The middle two columns provide histograms of the predicted posterior distributions for the nine pixels indicated by the white square in the left-most and right-most columns. The orange curve represents the kernel density estimate used to determine the MAP values, whereas the red dashed line mark the respective ground truth value of each example pixel. We note that within each of the posterior histogram panels, the subpanels share the same x axis column-wise and the same y axis row-wise.

Figure B.10 provides an example comparison of the reconstruction result for one slice (along the original LoS) of one example cube (the same as in Figure 3) from the three different viewing angles that we have tested. As we can see, the cINN recovers the bulk of the structure in this slice quite well, independently of the observation direction. Naturally, smaller details do differ a bit, but as mentioned before this is to be expected, both because the information encoded in the synthetic dust emission maps may not be identical and because the spatial consistency of the predicted posterior distributions depends on the viewing angles. Where the original viewing angles produces posteriors that are consistent along the LoS going into the plane of the shown slice, the other observation directions provide consistency along the perpendicular LoSs now. As a consequence, features in the reconstruction may appear a little more elongated along the given LoS compared to the other viewing angles, as becomes apparent when comparing the two right columns in Figure B.10. Table B.1 provides a summary of the predictive performance of the 23-wavelength SLoS-cINN for the three different viewing angles across all 50 test cubes. As we can see, the performance is very similar, independently of the direction from where the cubes were observed. As such, we can conclude that the cINN has not been biased towards a prefered observational direction and can reconstruct the synthetic dust clouds quite reliably even from different viewing angles.

thumbnail Fig. B.4

Comparison of the point estimate prediction results to the ground truth for all slices of one example cube that is subject to the ISRF and one B4 star. In each panel, the subpanels show (from top-left to bottom-right) the cube slices going along the LoS from front to back. The left and right column show the dust density and dust temperature respectively. From top to bottom, the rows indicate the ground truth and the MAP, MeanShift and MNPCP estimates, based on the outcome of the SLoS-cINN using all 23 wavelengths. This diagram shows the full cube of the single slice shown in rows 3 and 4 of Figure 3. The white star symbol in the right column indicates the approximate position of the star, defined as the hottest pixel in the cube for both the ground truth and the prediction results.

thumbnail Fig. B.5

3D isodensity surface diagrams for one example test cube in the ISRF + star radiation configuration, comparing the SLoS-cINN prediction results to the ground truth. Here the two rows show different rotation angles of the given cube. The grey surfaces indicate a density of 1010 m−3, whereas red surfaces mark the 1011 m−3 density level. The yellow star indicates the position of the B4 anaologue placed inside the cube.

Appendix B.2 Influence of the instrument PSF

In Section 2.2, we are ignoring any instrument related effects in our generation of the synthetic dust emission observations used throughout the rest of this study. Our reasoning for this is that proper modelling of these effects is not trivial. Nevertheless, the latter will be a necessary step in the ongoing development of our approach in order to ultimately apply it to real observational data. In this appendix, we perform a simplified test to demonstrate how a cINN that is trained on fully resolved data behaves on data that is subject to resolution effects. For this purpose, we took the 50 test cubes subject only to the ISRF and modified their corresponding synthetic dust emission maps to simulate the influence of the PSFs of the respective telescope instruments. For simplicity, we approximated this effect by convolving the fully resolved synthetic dust emission maps with a Gaussian, where the standard deviations are matched to the full width half maxima (FWHM) of the PSFs. Notably, this includes the implicit assumption that all considered telescopes would have the same pixel size; in this case, matching that of our input simulated data.

For the convolution, we assumed a distance to the observed cubes, at which the PSFs of all considered instruments are sampled by at least two pixels in the dust emission maps (i.e. to fulfil the Nyquist sampling criterion). Given the excellent spatial resolution of ALMA, this distance would be quite large if we were to consider all 23 simulated wavelengths (namely larger than 8500 pc). Therefore, we conduct this test only on the seven wavelength subset used in Section 4.3, for which the PSF sampling criterion is fulfilled at a distance of d = 397 pc. Figure B.11 shows an example for how this PSF consideration affects the input dust emission maps (analogously to Figure A.3, we select the same example cube here). As we can see, if we require that all seven considered filters are Nyquist sampled, meaning that the selected distance is naturally dictated by the instrument with the best resolution out of the seven (in this case Herschel PACS at 100 µm), then there is already a notable loss in detail in the input dust emission maps for instruments with a worse resolution.

We went on to test how a cINN (namely the seven wavelength SLoS-cINN discussed in Section 4.3) that is trained on perfectly resolved data performs on input observations that are subject to these varying resolution effects at the 397 pc distance on the 50 ISRF-only test cubes. Figure B.12 presents an example MAP prediction result in comparison to the ground truth for the same cube shown in Figure B.1. As we can see, the cINN can no longer recover the dust densities and temperatures at all, a result that we find for all 50 tested cubes. This outcome is not surprising, however, as this resolution effect is not small, rendering data affected by it likely quite far outside of the observable domain that this cINN has learned, where it is bound to fail. While this particularly strong failure here might be exacerbated by the limited wavelength coverage or filter choice, this still confirms the necessity to account for instrumental effects already during training.

To identify at which point the prediction breaks down, we followed up with an additional test that further simplifies the consideration of the PSF effects. Instead of convolving the emission maps in each wavelength with the (approximate) PSF of the corresponding instrument filter, we applied the same PSF resolution to all wavelengths and investigated the beam size at which the prediction starts to deteriorate. For simplicity, we directly specified the FWHM of the Gaussian convolution kernel in pixels for this experiment. For reference, the tested FWHM values of 1.5, 2, 3, 5, and 10 pixels correspond to distances of 79.9, 106.5, 159.8, 266.4, and 532.7 pc, respectively, when using (for instance) the Herschel SPIRE 350 µm filter. Figure B.13 provides a summary of this test for the same example cube (in the ISRF-only configuration) used in Figure B.12, comparing the prediction results on the smoothed emission maps for varying FWHMs to the reference outcome on the original synthetic observations. As we can see, the cINN predictions are somewhat robust with respect to the smoothing of the input emission maps up to a FWHM of the Gaussian beam of 3 pixels, exhibiting only small changes in the NRMSEs for this example. However, the prediction outcome rapidly deteriorates for the larger tested FWHMs of 5 and 10 pixels. Given this outcome, the failure in the previous test might be explained by a combination of the complexity of varying resolution at each wavelength and some rather large PSFs at the tested distance for the central wavelengths of Herschel SPIRE 350µm and APEX LABOCA. In conclusion, this test shows that our approach can compensate a minor degree of unaccounted for resolution effects if they are the same at all wavelengths. Nevertheless, for realistic applications with variations between observational instruments, proper modelling of the instrumental effects within the training data will be necessary to build a truly robust reconstruction model.

thumbnail Fig. B.6

Comparison of the point estimate prediction results to the ground truth for all slices of one example cube that is only subject to the ISRF. In each panel, the subpanels show from top left to bottom right the cube slices going along the LoS from front to back. The left and right columns show the dust density and dust temperature, respectively. From top to bottom, the rows indicate the ground truth and the MAP, MeanShift, and MNPCP estimates based on the outcome of the NLoS-cINN using all 23 wavelengths (as opposed to the SLoS-cINN outcome shown in Fig. B.1). This diagram shows the full cube of the single slice shown in rows 5&6 of Figure 3.

thumbnail Fig. B.7

3D isodensity surface diagrams for one example test cube in the ISRF-only radiation configuration, comparing the NLoS-cINN prediction results to the ground truth in the left column. Here, the two rows show different rotation angles of the given cube. The grey surfaces indicate a density of 1010 m−3, whereas red marks the 1011 m−3 density level. These data are analogous to the SLoS-cINN results in Figure B.2.

thumbnail Fig. B.8

Comparison of the point estimate prediction results to the ground truth for all slices of one example cube that is only subject to the ISRF. In each panel, the subpanels show (from top-left to bottom-right) the cube slices going along the LoS from front to back. The left and right columns show the dust density and dust temperature respectively. From top to bottom the rows indicate the ground truth and the MAP, MeanShift, and MNPCP estimates based on the SLoS-cINN that uses only seven wavelengths. This diagram shows the full cube of the single slice shown in the last two rows of Figure 3.

thumbnail Fig. B.9

3D isodensity surface diagrams for one example test cube in the ISRF-only radiation configuration, comparing the SLoS-cINN prediction results, based on only seven wavelengths, to the ground truth. Here, the two rows show different rotation angles of the given cube. The grey surfaces indicate a density of 1010 m−3, whereas red surfaces mark the 1011 m−3 density level. These data are analogous to the 23-wavelength SLoS results shown in Figure B.2.

thumbnail Fig. B.10

Comparison of MNPCP prediction results from different viewing angles to the ground truth for a single slice of an example cube subject only to the ISRF. Shown is the same example cube as in Figure 3 and all predictions are determined with the 23-wavelength SLoS-cINN. The presented slice is taken along the main viewing angle analysed in this work (Section 4). The results in the second column correspond to the nominal prediction outcome derived from the main viewing angle (i.e. the cube is observed from the front). The arrow and eye symbol in the third and fourth columns indicate the direction from which the cube is observed instead to generate the corresponding prediction result. The NRMSEs and median absolute relative errors listed in the three right-most panels are determined over the depicted slice only.

thumbnail Fig. B.11

Comparison of dust emission maps between the perfect resolution scenario and a case that accounts for the PSFs of the respective instruments for the limited filter configuration used in Section 4.3. The red circle in the bottom row panels indicates the FWHM of the respective PSFs. We note that this example is based on the assumption that all considered telescopes share the same pixel size, matching our simulation resolution at a query distance of d = 397 pc.

thumbnail Fig. B.12

Comparison of the seven wavelength SLoS-cINN MAP prediction result to the ground truth for one example cube that is only subject to the ISRF. Same cube as in Figure B.8 is shown, with the difference being that here the input dust emission maps were first convolved with the PSFs of the respective instruments, corresponding to the input shown in the bottom row in Figure B.11.

thumbnail Fig. B.13

Comparison of prediction results with the seven wavelength SLoS-cINN fordifferent amounts of smoothing applied to the inputobservations at all wavelengths equally. The top row shows the emission maps at the central wavelength of the Herschel SPIRE 350 µm filter as a an example to illustrate the effect of the convolution with a Gaussian beam. The red circle in each of these panels indicates the FWHM of the respective Gaussian kernel. The second and third row show the corresponding MAP estimates for dust density and temperature, respectively, for one example slice of the same cube analysed throughout the paper (i.e. the same slice as in Figures 3, B.3, B.10). The fourth and the fifth rows present a 2D histogram that directly compares the ground truth to the predicted density and temperature MAP estimates, respectively, for all pixels of this test cube, summarising the predictive performance.

References

  1. André, P., Men’shchikov, A., Bontemps, S., et al. 2010, A&A, 518, A102 [Google Scholar]
  2. Ardizzone, L., Kruse, J., Rother, C., & Köthe, U. 2019a, Analyzing Inverse Problems with Invertible Neural Networks, in International Conference on Learning Representations [Google Scholar]
  3. Ardizzone, L., Lüth, C., Kruse, J., Rother, C., & Köthe, U. 2019b, CoRR, 1907.02392 [Google Scholar]
  4. Ballesteros-Paredes, J., & Mac Low, M.-M. 2002, ApJ, 570, 734 [Google Scholar]
  5. Beaumont, C. N., Offner, S. S. R., Shetty, R., Glover, S. C. O., & Goodman, A. A. 2013, ApJ, 777, 173 [NASA ADS] [CrossRef] [Google Scholar]
  6. Bister, T., Erdmann, M., Köthe, U., & Schulte, J. 2022, Eur. Phys. J. C, 82, 171 [NASA ADS] [CrossRef] [Google Scholar]
  7. Bjorkman, J. E., & Wood, K. 2001, ApJ, 554, 615 [NASA ADS] [CrossRef] [Google Scholar]
  8. CASATeam, Bean, B., Bhatnagar, S., et al. 2022, PASP, 134, 114501 [NASA ADS] [CrossRef] [Google Scholar]
  9. Comaniciu, D., & Meer, P. 2002, IEEE Trans. Pattern Anal. Mach. Intell., 24, 603 [CrossRef] [Google Scholar]
  10. Cortes, P. C., Remijan, A., Hales, A., et al. 2022, ALMA Technical Handbook, ALMA Doc. 9.3, ver. 1.0 [Google Scholar]
  11. Dinh, L., Krueger, D., & Bengio, Y. 2015, in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Workshop Track Proceedings, eds. Y. Bengio, & Y. LeCun [Google Scholar]
  12. Dinh, L., Sohl-Dickstein, J., & Bengio, S. 2017, Density estimation using Real NVP, in International Conference on Learning Representations [Google Scholar]
  13. Dole, H., Lagache, G., & Puget, J.-L. 2003, ApJ, 585, 617 [NASA ADS] [CrossRef] [Google Scholar]
  14. Draine, B. 2003, ARA&A, 41, 241 [NASA ADS] [CrossRef] [Google Scholar]
  15. Draine, B. T. 2011, Physics of the Interstellar and Intergalactic Medium (Princeton: Princeton University Press) [Google Scholar]
  16. Egan, M. P., Price, S. D., Moshir, M. M., Cohen, M., & Tedesco, E. 1999, The Midcourse Space Experiment Point Source Catalog Version 1.2 Explanatory Guide, Technical Report, AD-A381933; AFRL-VS-TR-1999-1522 [Google Scholar]
  17. Eisert, L., Pillepich, A., Nelson, D., et al. 2023, MNRAS, 519, 2199 [Google Scholar]
  18. Elia, D., Merello, M., Molinari, S., et al. 2021, MNRAS, 504, 2742 [NASA ADS] [CrossRef] [Google Scholar]
  19. Exter, K. 2017, Quick-Start Guide to HERSCHEL–PACS The Photometer, HERSCHEL-HSC-DOC-2151, version 1.0 [Google Scholar]
  20. Fukunaga, K., & Hostetler, L. 1975, IEEE Trans. Information Theory, 21, 32 [CrossRef] [Google Scholar]
  21. Galliano, F., Galametz, M., & Jones, A. P. 2018, ARA&A, 56, 673 [Google Scholar]
  22. Garcia-Cuesta, E., de la Torre, F., & de Castro, A. J. 2009, Machine Learning Approaches for the Inversion of the Radiative Transfer Equation, eds. S.-I. Ao, B. Rieger, & S.-S. Chen (Dordrecht: Springer Netherlands), 319 [Google Scholar]
  23. Glover, S. C. O., & Mac Low, M. 2007, ApJS, 169, 239 [NASA ADS] [CrossRef] [Google Scholar]
  24. Glover, S. C. O., & Mac Low, M. M. 2011, MNRAS, 412, 337 [NASA ADS] [CrossRef] [Google Scholar]
  25. Gould, R. J., & Salpeter, E. E. 1963, ApJ, 138, 393 [NASA ADS] [CrossRef] [Google Scholar]
  26. Haldemann, J., Ksoll, V., Walter, D., et al. 2023, A&A, 672, A180 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  27. Harper, D. A., Runyan, M. C., Dowell, C. D., et al. 2018, J. Astron. Instrum., 7, 1840008 [Google Scholar]
  28. Hauser, M. G., & Dwek, E. 2001, ARA&A, 39, 249 [Google Scholar]
  29. Hill, R., Masui, K. W., & Scott, D. 2018, Appl. Spectrosc., 72, 663 [NASA ADS] [CrossRef] [Google Scholar]
  30. Hyvärinen, A., & Oja, E. 2000, Neural Networks, 13, 411 [CrossRef] [Google Scholar]
  31. Izquierdo, A. F., Smith, R. J., Glover, S. C. O., et al. 2021, MNRAS, 500, 5268 [Google Scholar]
  32. Jones, A. P., & Ysard, N. 2019, A&A, 627, A38 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  33. Kang, D. E., Pellegrini, E. W., Ardizzone, L., et al. 2022, MNRAS, 512, 617 [CrossRef] [Google Scholar]
  34. Kang, D. E., Klessen, R. S., Ksoll, V. F., et al. 2023a, MNRAS, 520, 4981 [NASA ADS] [CrossRef] [Google Scholar]
  35. Kang, D. E., Ksoll, V. F., Itrich, D., et al. 2023b, A&A, 674, A175 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  36. Kingma, D. P., & Dhariwal, P. 2018, in Advances in Neural Information Processing Systems, eds. S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett, (Curran Associates, Inc.), 31 [Google Scholar]
  37. Klessen, R. S., & Glover, S. C. O. 2016, in Saas-Fee Advanced Course, eds. Y. Revaz, P. Jablonka, R. Teyssier, & L. Mayer, 43, 85 [NASA ADS] [CrossRef] [Google Scholar]
  38. Kobyzev, I., Prince, S. J., & Brubaker, M. A. 2021, IEEE Trans. Pattern Anal. Mach. Intell., 43, 3964 [NASA ADS] [CrossRef] [Google Scholar]
  39. Ksoll, V. F., Ardizzone, L., Klessen, R., et al. 2020, MNRAS, 499, 5447 [NASA ADS] [CrossRef] [Google Scholar]
  40. Lallement, R., Capitanio, L., Ruiz-Dern, L., et al. 2018, A&A, 616, A132 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  41. Lallement, R., Babusiaux, C., Vergely, J. L., et al. 2019, A&A, 625, A135 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  42. Lallement, R., Vergely, J. L., Babusiaux, C., & Cox, N. L. J. 2022, A&A, 661, A147 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  43. Leike, R. H., Glatzle, M., & Enßlin, T. A. 2020, A&A, 639, A138 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  44. Leike, R. H., Edenhofer, G., Knollmüller, J., et al. 2022, ArXiv e-prints [arXiv: 2204.11715] [Google Scholar]
  45. Li, A., & Draine, B. T. 2001, ApJ, 554, 778 [Google Scholar]
  46. Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. 2018, J. Mach. Learn. Res., 18, 1 [Google Scholar]
  47. Liseau, R., Larsson, B., Lunttila, T., et al. 2015, A&A, 578, A131 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  48. Loinard, L., Torres, R. M., Mioduszewski, A. J., & Rodriguez, L. F. 2008, ApJ, 675, L29 [NASA ADS] [CrossRef] [Google Scholar]
  49. Loren, R. B., Wootten, A., & Wilking, B. A. 1990, ApJ, 365, 269 [NASA ADS] [CrossRef] [Google Scholar]
  50. Lucy, L. B. 1999, A&A, 344, 282 [NASA ADS] [Google Scholar]
  51. Mathis, J. S., Rumpl, W., & Nordsieck, K. H. 1977, ApJ, 217, 425 [Google Scholar]
  52. Mathis, J. S., Mezger, P. G., & Panagia, N. 1983, A&A, 128, 212 [NASA ADS] [Google Scholar]
  53. Molinari, S., Swinyard, B., Bally, J., et al. 2010, A&A, 518, A100 [Google Scholar]
  54. Molinari, S., Schisano, E., Elia, D., et al. 2016, A&A, 591, A149 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  55. Nelson, R. P., & Langer, W. D. 1997, ApJ, 482, 796 [NASA ADS] [CrossRef] [Google Scholar]
  56. Paszke, A., Gross, S., Chintala, S., et al. 2017, Automatic Differentiation in PyTorch, in NIPS Autodiff Workshop [Google Scholar]
  57. Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, J. Mach. Learn. Res., 12, 2825 [Google Scholar]
  58. Planck Collaboration XIX. 2011, A&A, 536, A19 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  59. Planck Collaboration XI. 2014, A&A, 571, A11 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  60. Planck Collaboration XXXV. 2016, A&A, 586, A138 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  61. Reissl, S., Wolf, S., & Brauer, R. 2016, A&A, 593, A87 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  62. Reissl, S., Klessen, R. S., Mac Low, M.-M., & Pellegrini, E. W. 2018, A&A, 611, A70 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  63. Reissl, S., Brauer, R., Klessen, R. S., & Pellegrini, E. W. 2019, ApJ, 885, 15 [NASA ADS] [CrossRef] [Google Scholar]
  64. Reissl, S., Meehan, P., & Klessen, R. S. 2023, A&A, 674, A47 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  65. Rezaei Kh., S., & Kainulainen, J. 2022, ApJ, 930, L22 [NASA ADS] [CrossRef] [Google Scholar]
  66. Rezaei Kh., S., Bailer-Jones, C. A. L., Hanson, R. J., & Fouesneau, M. 2017, A&A, 598, A125 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  67. Rezaei Kh., S., Bailer-Jones, C. A. L., Soler, J. D., & Zari, E. 2020, A&A, 643, A151 [EDP Sciences] [Google Scholar]
  68. Rezende, D., & Mohamed, S. 2015, in Proceedings of Machine Learning Research, Proceedings of the 32nd International Conference on Machine Learning, eds. F. Bach, & D. Blei (Lille, France: PMLR), 37, 1530 [Google Scholar]
  69. Santos, F. P., Chuss, D. T., Dowell, C. D., et al. 2019, ApJ, 882, 113 [NASA ADS] [CrossRef] [Google Scholar]
  70. Silverman, B. W. 1986, Density Estimation for Statistics and Data Analysis (Chapman and Hall) [Google Scholar]
  71. Smith, R. J., Glover, S. C. O., Clark, P. C., Klessen, R. S., & Springel, V. 2014, MNRAS, 441, 1628 [NASA ADS] [CrossRef] [Google Scholar]
  72. Smith, R. J., Treß, R. G., Sormani, M. C., et al. 2020, MNRAS, 492, 1594 [NASA ADS] [CrossRef] [Google Scholar]
  73. Springel, V. 2010, MNRAS, 401, 791 [Google Scholar]
  74. Steinacker, J., Bacmann, A., Henning, T., Klessen, R., & Stickel, M. 2005, A&A, 434, 167 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  75. Tabak, E., & Vanden-Eijnden, E. 2010, Commun. Math. Sci., 8, 217 [CrossRef] [Google Scholar]
  76. Tabak, E., & Turner, C. 2013, Commun. Pure Appl. Math., 66, 145 [CrossRef] [Google Scholar]
  77. Tielens, A. G. G. M. 2010, The Physics and Chemistry of the Interstellar Medium [Google Scholar]
  78. Tress, R. G., Smith, R. J., Sormani, M. C., et al. 2020, MNRAS, 492, 2973 [Google Scholar]
  79. Valtchanov, I. 2018, The Spectral And Photometric Imaging Receiver (SPIRE) Handbook, HERSCHEL-HSC-DOC-0798, version 3.2 [Google Scholar]
  80. Wolfire, M. G., McKee, C. F., Hollenbach, D., & Tielens, A. G. G. M. 2003, ApJ, 587, 278 [Google Scholar]
  81. Wright, E. L., Eisenhardt, P. R. M., Mainzer, A. K., et al. 2010, AJ, 140, 1868 [Google Scholar]
  82. Zhang, X., Green, G. M., & Rix, H.-W. 2023, MNRAS, 524, 1855 [NASA ADS] [CrossRef] [Google Scholar]
  83. Zucker, C., Goodman, A., Alves, J., et al. 2021, ApJ, 919, 35 [NASA ADS] [CrossRef] [Google Scholar]
  84. Zucker, C., Goodman, A. A., Alves, J., et al. 2022, Nature, 601, 334 [NASA ADS] [CrossRef] [Google Scholar]
  85. Zucker, C., Alves, J., Goodman, A., Meingast, S., & Galli, P. 2023, in Protostars and Planets VII, eds. S. Inutsuka, Y. Aikawa, T. Muto, K. Tomida, & M. Tamura, ASP Conf. Ser., 534, 43 [NASA ADS] [Google Scholar]

2

CASA website: https://casa.nrao.edu/

All Tables

Table 1

Summary of the predictive performance for our three different cINN setups and the three different point estimation methods.

Table 2

Overview of the median difference, ∆, in dust density and temperature between neighbouring pixels along and perpendicular to the LoS for the three different point estimators and two inverse problem setups.

Table A.1

Overview of filter bands and instruments considered in this study.

Table B.1

Performance summary for the viewing angle comparison experiment.

All Figures

thumbnail Fig. 1

Comparison of the predicted posterior distributions along vs. perpendicular to the LoS (into the plane). The left column shows an example slice with the MAP estimates for dust density (top) and dust temperature (bottom). The other three columns show the posterior distributions of dust density (top) and temperature (bottom) for the lines indicated in the left panels in black, blue and purple, respectively. Here, the black square denotes a LoS going into the plane of the image, whereas the blue and purple lines are perpendicular to the LoS along the x and y axis, respectively.

In the text
thumbnail Fig. 2

Schematic outline of the median neighbour pixel combined posterior approach for point estimations based on the example of dust density. The procedure follows the panels from top left to bottom left in clockwise order. The top-left panel shows the MAP density estimates for a single slice of a dust cube (perpendicular to the LoS), highlighting the discontinuities that can occur in the MAP estimate. The query pixel, for which the MNPCP estimate will be computed, and its neighbourhood are indicated in orange and purple, respectively. The top-right panel shows the predicted posterior distributions for the dust density for the query pixel with index 7/11/8 (center subpanel indicated by the orange outline) and all 26 neighbouring pixels. The colours of the curves indicate the slice, i, that they come from, whereas the indices in the top right corner denote the pixel index perpendicular to the LoS. The bottom-right panel shows a histogram of the all posterior samples accumulated from the query pixel and its neighbours. Here, the orange dotted curve indicates the original (marginalised 1D) posterior distribution of the query pixel and the purple line marks the MNPCP estimate, which is the city-block distance weighted median of all posterior samples (the distance in pixels when allowing only right angle moves, no diagonals). Finally, the bottom-left panel presents the MNPCP dust density estimates of the slice shown in the top left with the orange and purple boxes indicating again the query pixel and its neighbourhood.

In the text
thumbnail Fig. 3

Predicted dust densities and temperatures for a single example cube slice perpendicular to the LoS. A comparison between the three point estimation methods to the ground truth is shown in the left column. The top two rows give the 23-wavelength SLoS-cINN result for the ISRF-only scenario, while rows 3 and 4 display the counterpart for the ISRF + star case. Rows 5 and 6 present the NLoS-based outcome for the ISRF-only case, and the last two rows show the corresponding seven wavelengths SLoS-cINN prediction. The listed NRMSE and median absolute relative errors are averages over this slice only and not the entire cube.

In the text
thumbnail Fig. 4

Performance breakdown of the SLoS-cINN using 23 wavelengths. 2D histograms comparing the cINN predictions for dust density (top two rows) and temperature (bottom two rows) to the ground truth across all pixels of the test set data are given, distinguishing the results of the three point estimation procedures: MAP, MeanShift, and MNPCP. Rows 1 and 3 present the direct one-to-one correlation of the predicted parameters to the ground truth, whereas rows 2 and 4 provide the corresponding relative residuals. In the latter panels, the black curve and grey shaded area indicates a binned median relative residual along with the interquantile range between the 25% and 75% quantile of these bins.

In the text
thumbnail Fig. 5

Breakdown of the predictive performance of the SLoS-cINN for the best and worst cases. Analogously to Fig. 4, we show the 2D histograms for the one-to-one comparison of the prediction results (rows one and three) and their respective residuals (rows 2 and 4). The left three columns present the five best reconstructed cubes, whereas the five worst reconstructed ones are shown in the three right columns, respectively.

In the text
thumbnail Fig. 6

Prediction performance summary for the NLoS-cINN on the held-out test data. 2D histograms of the direct one-to-one comparison of prediction results to the ground truth (rows one and three) and the respective residuals (rows 2 and 4) are shown for the three different MAP estimators (analogously to Fig. 4).

In the text
thumbnail Fig. 7

Summary of the predictive performance on the held-out test data for the SLoS-cINN using only seven wavelenghts as input. Shown are 2D histograms of the direct one-to-one comparison of prediction results to the ground truth (rows one and three) and the respective residuals (rows 2 and 4) for the three different MAP estimators (analogously to Fig. 4).

In the text
thumbnail Fig. 8

Comparison of estimated column density and average temperature maps for one example cube in the ISRF-only RT configuration between a classical SED fit and our cINN approach. The large panels on the top show the ground truth for the column density (left) and density weighted average temperature (right), respectively. The smaller panels below each ground truth panel present a map of the estimates from the respective method on the left and a map of the absolute relative error |erel| on the right. The example cube as in Figs. 3, B.1, and B.2 is shown.

In the text
thumbnail Fig. 9

Comparison of the predictive performance for column density and density averaged temperature between a classical SED fitting technique and our cINN approach across all pixels of the 50 test cubes that are only subject to the ISRF. The top two rows show the results for the estimated column density, whereas the bottom two present the density averaged dust temperature, T¯dust${\bar T_{{\rm{dust }}}}$. In each group, the upper panels provide a direct one-to-one comparison of the predicted values to the respective ground truth, whereas the lower panels show the corresponding relative residuals.

In the text
thumbnail Fig. A.1

Histograms of the prior distributions and correlation of the target parameters in the training data over all pixels. The top-left and bottom-right panels show the 1D distributions of the number density and dust temperature, respectively. In both panels, the boxes at the top left provide the minimum and maximum of the respective parameter. The bottom left panel presents a 2D histogram of the effective prior distribution in the combined density-temperature space.

In the text
thumbnail Fig. A.2

Histograms of the prior distributions for the observables in the training data over all cubes and pixels

In the text
thumbnail Fig. A.3

Example of the synthetic dust emission maps at all 23 considered wavelengths that serve as the input to our cINN approach. These correspond to bands of specific instruments (as labelled on the top of each panel), but do not account for PSF-related resolution effects of the respective telescopes. The shown example corresponds to the example cube in the ISRF-only configuration that is also the subject of Figures 3, B.1, and B.2. The panels outlined with the black dashed lines correspond to the seven wavelengths considered in our more limited experiment in Section 4.3. We emphasise that the presented flux maps are corrected for distance.

In the text
thumbnail Fig. B.1

Comparison of the point estimate prediction results to the ground truth for all slices of one example cube that is only subject to the ISRF. In each panel, the subpanels show from top left to bottom right the cube slices going along the LoS from front to back. The left and right columns show the dust density and dust temperature respectively. From top to bottom, the rows indicate the ground truth and the MAP, MeanShift, and MNPCP estimates based on the outcome of the single LoS cINN using all 23 wavelengths. This diagram shows the full cube of the single slices shown in the first two rows of Figure 3.

In the text
thumbnail Fig. B.2

3D isodensity surface diagrams for one example test cube in the ISRF-only radiation configuration, comparing the SLoS-cINN prediction results to the ground truth. Here the two rows show different rotation angles of the given cube. The grey surfaces indicate a density of 1010 m−3, whereas red surfaces mark the 1011 m−3 density level.

In the text
thumbnail Fig. B.3

Comparison of the predicted posterior distributions for a few example pixels of the cubes shown in Figure 3. The panels in the leftmost and right-most column show the MAP prediction results for density and dust temperature for the same slice as in Figure 3, respectively. The middle two columns provide histograms of the predicted posterior distributions for the nine pixels indicated by the white square in the left-most and right-most columns. The orange curve represents the kernel density estimate used to determine the MAP values, whereas the red dashed line mark the respective ground truth value of each example pixel. We note that within each of the posterior histogram panels, the subpanels share the same x axis column-wise and the same y axis row-wise.

In the text
thumbnail Fig. B.4

Comparison of the point estimate prediction results to the ground truth for all slices of one example cube that is subject to the ISRF and one B4 star. In each panel, the subpanels show (from top-left to bottom-right) the cube slices going along the LoS from front to back. The left and right column show the dust density and dust temperature respectively. From top to bottom, the rows indicate the ground truth and the MAP, MeanShift and MNPCP estimates, based on the outcome of the SLoS-cINN using all 23 wavelengths. This diagram shows the full cube of the single slice shown in rows 3 and 4 of Figure 3. The white star symbol in the right column indicates the approximate position of the star, defined as the hottest pixel in the cube for both the ground truth and the prediction results.

In the text
thumbnail Fig. B.5

3D isodensity surface diagrams for one example test cube in the ISRF + star radiation configuration, comparing the SLoS-cINN prediction results to the ground truth. Here the two rows show different rotation angles of the given cube. The grey surfaces indicate a density of 1010 m−3, whereas red surfaces mark the 1011 m−3 density level. The yellow star indicates the position of the B4 anaologue placed inside the cube.

In the text
thumbnail Fig. B.6

Comparison of the point estimate prediction results to the ground truth for all slices of one example cube that is only subject to the ISRF. In each panel, the subpanels show from top left to bottom right the cube slices going along the LoS from front to back. The left and right columns show the dust density and dust temperature, respectively. From top to bottom, the rows indicate the ground truth and the MAP, MeanShift, and MNPCP estimates based on the outcome of the NLoS-cINN using all 23 wavelengths (as opposed to the SLoS-cINN outcome shown in Fig. B.1). This diagram shows the full cube of the single slice shown in rows 5&6 of Figure 3.

In the text
thumbnail Fig. B.7

3D isodensity surface diagrams for one example test cube in the ISRF-only radiation configuration, comparing the NLoS-cINN prediction results to the ground truth in the left column. Here, the two rows show different rotation angles of the given cube. The grey surfaces indicate a density of 1010 m−3, whereas red marks the 1011 m−3 density level. These data are analogous to the SLoS-cINN results in Figure B.2.

In the text
thumbnail Fig. B.8

Comparison of the point estimate prediction results to the ground truth for all slices of one example cube that is only subject to the ISRF. In each panel, the subpanels show (from top-left to bottom-right) the cube slices going along the LoS from front to back. The left and right columns show the dust density and dust temperature respectively. From top to bottom the rows indicate the ground truth and the MAP, MeanShift, and MNPCP estimates based on the SLoS-cINN that uses only seven wavelengths. This diagram shows the full cube of the single slice shown in the last two rows of Figure 3.

In the text
thumbnail Fig. B.9

3D isodensity surface diagrams for one example test cube in the ISRF-only radiation configuration, comparing the SLoS-cINN prediction results, based on only seven wavelengths, to the ground truth. Here, the two rows show different rotation angles of the given cube. The grey surfaces indicate a density of 1010 m−3, whereas red surfaces mark the 1011 m−3 density level. These data are analogous to the 23-wavelength SLoS results shown in Figure B.2.

In the text
thumbnail Fig. B.10

Comparison of MNPCP prediction results from different viewing angles to the ground truth for a single slice of an example cube subject only to the ISRF. Shown is the same example cube as in Figure 3 and all predictions are determined with the 23-wavelength SLoS-cINN. The presented slice is taken along the main viewing angle analysed in this work (Section 4). The results in the second column correspond to the nominal prediction outcome derived from the main viewing angle (i.e. the cube is observed from the front). The arrow and eye symbol in the third and fourth columns indicate the direction from which the cube is observed instead to generate the corresponding prediction result. The NRMSEs and median absolute relative errors listed in the three right-most panels are determined over the depicted slice only.

In the text
thumbnail Fig. B.11

Comparison of dust emission maps between the perfect resolution scenario and a case that accounts for the PSFs of the respective instruments for the limited filter configuration used in Section 4.3. The red circle in the bottom row panels indicates the FWHM of the respective PSFs. We note that this example is based on the assumption that all considered telescopes share the same pixel size, matching our simulation resolution at a query distance of d = 397 pc.

In the text
thumbnail Fig. B.12

Comparison of the seven wavelength SLoS-cINN MAP prediction result to the ground truth for one example cube that is only subject to the ISRF. Same cube as in Figure B.8 is shown, with the difference being that here the input dust emission maps were first convolved with the PSFs of the respective instruments, corresponding to the input shown in the bottom row in Figure B.11.

In the text
thumbnail Fig. B.13

Comparison of prediction results with the seven wavelength SLoS-cINN fordifferent amounts of smoothing applied to the inputobservations at all wavelengths equally. The top row shows the emission maps at the central wavelength of the Herschel SPIRE 350 µm filter as a an example to illustrate the effect of the convolution with a Gaussian beam. The red circle in each of these panels indicates the FWHM of the respective Gaussian kernel. The second and third row show the corresponding MAP estimates for dust density and temperature, respectively, for one example slice of the same cube analysed throughout the paper (i.e. the same slice as in Figures 3, B.3, B.10). The fourth and the fifth rows present a 2D histogram that directly compares the ground truth to the predicted density and temperature MAP estimates, respectively, for all pixels of this test cube, summarising the predictive performance.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.