Charting nearby dust clouds using Gaia data only

Aims: Highly resolved maps of the local Galactic dust are an important ingredient for sky emission models. In nearly the whole electromagnetic spectrum one can see imprints of dust, many of which originate from dust clouds within 300pc. Having a detailed 3D reconstruction of these local dust clouds enables detailed studies, helps to quantify the impact on other observables and is a milestone necessary to enable larger reconstructions, as every sightline for more distant objects will pass through the local dust. Methods: To infer the dust density we use parallax and absorption estimates published by the Gaia collaboration in their second data release. We model the dust as a log-normal process using a hierarchical Bayesian model. We also infer non-parametrically the kernel of the log-normal process, which corresponds to the physical spatial correlation power spectrum of the log-density. Results: Using only Gaia data of the second Gaia data release, we reconstruct the 3D dust density and its spatial correlation spectrum in a 600pc cube centered on the Sun. We publish the resulting 3D dust map as well as the power spectrum and posterior samples that permit to explore the uncertainties of these results. We report a spectral index of the logarithmic dust density of 3.1 on Fourier scales with wavelengths between 2pc and 125pc.


Introduction
Emission and absorption by Galactic dust is a prominent astronomical foreground at many wavelengths. Therefore, knowing its distribution on the 2D sky and in 3D is essential for many astronomical observations. However, dust is also interesting to be studied on its own, as it provides information about the physical conditions in the interstellar medium and informs us about star forming regions. Dust has been mapped out by surveys for a long time, the first notable contribution being Burstein & Heiles (1978). Their dust reconstruction, as most reconstructions of the dust distribution so far, was focused on mapping the dust in 2D on the sky. This can be done by looking at the sky in infra-red wavelengths, where it is dominated by dust emission. However, when mapping out dust using infra-red emission one is biased by the radiation field, as dust emission is not only proportional to the dust density but also to the amount of starlight absorbed by the dust. Furthermore, dust maps that were produced by mapping infrared light might contain extended infrared sources that are not within our galaxy, as was shown by Chiang & Ménard (2018). On the other hand, a hypothetical cold dust cloud cannot be seen in infra-red, leading to systematic errors in the analysis of distant targets, for example quasars or the Cosmic Microwave Background (CMB).
For accurate analyses of objects in our galactic vicinity, it is vital to have a 3D dust map as a foreground model, which informs us about regions that cannot be observed, or only be observed with less fidelity, due to dust obscuration. The first reconstruction of galactic dust in 3D published is Vergely et al. (2001). Since then there have been many attempts to chart the dust density in 3D in increasing resolution, accuracy and for an ever greater part of our galaxy. A large driving force for 3D dust reconstruction and astronomy in general are the large surveys like 2MASS, Pan-STARRS and SDSS/APOGEE. These surveys provide parallax measurements and accurate spectra for thousands of stars. There are two 3D dust reconstructions based on these data sets that are closely related to the approach taken in this paper.
In Lallement et al. (2018) reddening data from 71 000 sources has been used to perform a 3D reconstruction using Gaussian process regression. The resulting dust map covers a 4kpc square of the galactic plane and 600pc in perpendicular direction with a voxel size of (5pc) 3 .
In Green et al. (2018) a 3D dust map is produced by combining the star data of Pan-STARRS and 2MASS, binning it in angular and distance bins, and performing independent Bayesian reconstructions per angular bin. The result is a dust map that covers three quarters of the sky to a distance up to 2kpc. This reconstruction shows artificial radial structures called the "fingers of God effect" in analogy to the well known phenomenon in cosmology. One way to mitigate this effect is to use more accurate parallax information.
A prominent new survey is performed by the Gaia collaboration (Brown et al. 2016). In its second data release (DR2, Brown et al. (2018)) accurate parallaxes for roughly 2 billion stars are published. The provided catalogue also contains estimates of absorption coefficients for a subset of about 88 million stars, using spectral information of the Gaia satellite's three energy bands. Due to the limited spectral information, the accuracy of the absorption coefficients estimated for individual sightlines is quite low. For this reason it is recommended by Andrae et al. (2018) to not use the information of individual sightlines but only the joint information of several sightlines. Even though the data quality of individual sightlines is rather low, the sheer amount of data points and the accuracy of the parallaxes outweigh this limitation as our work shows.
Article number, page 1 of 12 arXiv:1901.05971v1 [astro-ph.GA] 17 Jan 2019 So far, 3D dust reconstruction has never been performed using solely Gaia data, instead Gaia data has been used for its accurate parallax measurements only and the more accurate spectral information of other surveys was used (e.g. Lallement et al. (2018)).
In this paper we present a 3D dust reconstruction using Gaia DR2 data only. The inference of the unknown dust density from the absorption data is performed by a critical filter, a method for Gaussian process regression with non-parametric kernel learning, first published in Enßlin & Frommert (2011). Since then there were several improvements of this method allowing for a stable and fast inference. The relevant numerical method used in this paper is outlined in Knollmüller & Enßlin (2019). For a theoretical discussion of the underlying inference framework of information field theory we refer to Enßlin (2018). The algorithm was implemented using the Python package NIFTy5, which is the newest version of the software package NIFTy (Selig 2013;Steininger et al. 2017) In Sec. 2 we discuss which part of the Gaia data we used. We introduce our statistical model of the interstellar dust density as well as of the measurement in Sec. 3. The results of our dust density reconstruction are presented in Sec. 4. This section also contains a brief recommendation on how to use our results. Our dust reconstruction is compared to other 3D dust density reconstructions in Sec. 5. In Sec. 6 we summarize the findings of this paper. The appendix contains further technical details on our statistical model.

Data
We used the data from the Gaia DR2 catalogue by Brown et al. (2018), to reconstruct the galactic dust in the nearby interstellar medium. From the Gaia data archive we extract the parallaxes, the G-band absorption, the latitude and longitude as well as their respective uncertainties. A plot of the full Gaia absorption data set can be seen in Fig. 1.
We select sources according to the following criteria: 1. the above mentioned data are available for the source 2. the parallax ω is inside a 600pc cube around the Sun 3. the relative parallax error is sufficiently low, ω/σ ω > 5 4. Priam flag 0100001 or 0100002 The last two criteria are suggested by Andrae et al. (2018). There are about 3.7 million stars selected by these criteria. Fig. 1 shows a sky average of the data points used in the reconstruction. In this data plot one can observe structures present also in other dust maps, for example the Planck dust map (Fig 1).

Algorithm
The algorithm is derived from Bayesian reasoning. In Bayesian reasoning information some data d provides about a quantity of interest s is calculated according to Bayes theorem: In our case, the main quantity of interest is the logarithmic Gband dust absorption cross-section density s = ln(αρ/pc), henceforth called the logarithmic dust density. Hereby ρ denotes the 1 The version of NIFTy used for this reconstruction will be released in February 2019 on https://gitlab.mpcdf.mpg.de/ift/NIFTy . . The color-scheme was saturated to visually match that of the average Gaia dust absorption data seen in the second plot.
The second plot shows all Gaia data. The third plot shows data within a 600pc cube centered on the Sun; the data used in this paper. The scale is natural logarithm of the absorption data in magnitudes. Data points in the same direction were averaged.
actual dust mass density and α the average G-band dust cross section. The value of α is uncertain, which is why we report absorption densities instead.
Article number, page 2 of 12 R. H. Leike and T. A. Enßlin : Charting nearby dust clouds using Gaia data only Calculating the evidence term P(d) = ds P(d|s)P(s) in Eq. (1) is generally not tractable for high-dimensional s, we approximate the posterior with a Gaussian The approximation is obtained by minimizing the Kullback-Leibler divergence (Kullback & Leibler 1951) This approach is known as variational Bayes (Nasrabadi 2007) or Gibbs free energy approach (Enßlin & Weig 2010). The approach followed in this paper is described by Knollmüller & Enßlin (2019). More algorithmic details are described in appendix A, but the general approach can be summarized as follows: 1. Calculate the negative log-probability for the problem, disregarding normalization terms like the evidence 2. Perform coordinate transformations of the unknown quantities until those are a-priori Gaussian distributed with unit covariance. (Kucukelbir et al. 2017;Knollmüller & Enßlin 2018) 3. Choose the class of approximating distributions to be Gaussian with variable mean m and covariance where M m is the Fisher information metric at the current m.
Here 1 is the contribution of the prior which was transformed in step 2 to have unit covariance. (Knollmüller & Enßlin 2019) 4. Minimize Eq. (4) with respect to m using a second order scheme with the covariance D of Q as curvature. The expectation value with respect to Q is hereby approximated through sampling. This is also equivalent to the natural gradient descent for the parameters of the Gaussian Q.
As there is some ambiguity how to perform steps 1. and 2. of this recipe, a detailed description of our log posterior probability, prior choices and the coordinate transformations that were made can be found in appendix A

Likelihood
The likelihood P(d|s) can be split into two parts, one part states how the true absorption depends on the dust density and one part that states how the actual data is distributed given the true absorption on that line of sight. The first part, which we call the response R, states how the unknown dust absorption density ρ imprints itself on the data. The absorption of light on the i-th line of sight L i is given by the line integral Here α is the average dust cross section per unit of mass and the line of sight L i = L i (ω) is dependent on the true parallax ω. As noted in subsection 3.1, the value of α is uncertain and we reconstruct the dust absorption density s = ln αρ/pc instead. The absorption is additive because the absorption data are given in the magnitudes scale, which is logarithmic. The true parallax ω of the star is uncertain. We assume the true parallax ω to be Gaussian distributed around the published parallax ω with a standard deviation equal to the published parallax error σ ω : The parallaxes of Gaia DR2 were shown to be Gaussian distributed with incredible reliability by Luri et al. (2018). However, it was also noted by the same authors that there can be outliers. By restricting ourselves to close-by sources for which G-band absorption values are published, we expect to have cut out most of the outliers. We do not reconstruct the actual positions of the stars in our reconstruction, thus we have to marginalize them out to obtain the response: Here denotes the survival function of a standard normal distribution and denotes the indicator function for the closed interval [a, b]. Note that we did an approximation where we replace the true absorption by the expected absorption. As a consequence the lines of sight are smeared out by the parallax uncertainty in our approximation. A fully Bayesian analysis would treat the true parallax as unknown and infer these along the other unknowns, but this is beyond the scope of this work. For the algorithm, the integral in Eq. (7) is discretized into a weighted sum, such that each voxel contributes to the line integral over L i exactly equal to the length of the line segment of L i within that voxel while being discounted by the probability P(l| ω, σ ω ) of that voxel being on the line of sight. Applying the response R thus takes O(N data N side ) operations, where N data = 3661286 is the number of data points used in the reconstruction and N side = 256 is the number of voxels per axis. Due to the large number of data points, evaluating the response on a computer turns out to be numerically expensive. The inference algorithm (see Sec. 3.1) is a minimization for which this response has to be evaluated many times. To make the dust inference feasible in a reasonable amount of time we restricted our reconstruction to a 600pc cube centered on the Sun. Article number, page 3 of 12 A&A proofs: manuscript no. paper The second part of the likelihood states how the published data A G is distributed given the true absorptions (A G ) i . We use the data likelihood recommended by the Gaia collaboration in Andrae et al. (2018). This likelihood assumes the data A G to be distributed according to a truncated Gaussian with a global variance N = 0.46 mag 2 1. This leads to the likelihood Here cdf G (R(ρ) i ,N ii ) denotes the cumulative density function of a normal distribution with mean R(s) i and variance N ii . We took the boundaries of the truncated Gaussian to be A min G = 0 and A min G = 3.609 mag as recommended in Andrae et al. (2018).

Prior
We assume the dust density to be a positive quantity that can vary over orders of magnitude. The dust is assumed to be spatially correlated and statistically homogeneous and isotropic. The statistical model is constructed to be as general as possible with these two properties in mind. For more algorithmic details on the prior we refer to appendix A.

Results
We reconstruct the dust density in a 600pc cube using 256 3 voxels, resulting in a resolution of (2.34pc) 3 per voxel. For our reconstructed volume we also infer the spatial correlation power spectrum of the log-density, see Fig. 2. The unit of the y-axis is pc 3 . The functions can be interpreted as the a-priori expected value of |Fρ| 2 /V, where V is the volume the density ρ is defined on and F is the Fourier transform. The region between 0.0008/pc and 0.426/pc is almost power-law like with a slope of 3.1, the spectral index of the power law.
In Fig. 3b one can see a projection of the reconstructed dust onto the sky in galactic coordinates. Fig. 4b shows the corresponding expected logarithmic dust density.
In Fig. 3a the projection of the dust reconstruction on the galactic plane is shown. This view is especially interesting to study the dust morphology as this projection introduces no perspective-based distortion. It is especially suited to spot underdense regions such as the local bubble in high resolution. A logarithmic plot of the projection on the galactic plane can be seen in Fig. 4a.
We provide posterior uncertainty estimation via samples. One should note that these uncertainties might be underestimated due to the variational approach taken in this paper. One can see a map of the expected posterior variance of the sky projection in Fig. 5a and in the plane projection in Fig. 6a.

Using the reconstruction
The results of the reconstruction are provided online on https://wwwmpa.mpa-garching.mpg.de/~ensslin/ research/data/dust.html and can be used under the terms of the ODC-By 1.0 license. Proper attribution should be given to this paper as well as to the Gaia collaboration (Gai 2019).
We give an overview of known systematic effects and advice on how to use the provided dust map.
-We do not recommend to use the outer 15pc of the reconstruction. Periodic boundary conditions were assumed for algorithmic reasons, which leads to correlations leaking around the border of the cube. The inferred prior correlation kernel (Fig. A.3) suggests that correlations are vanishing after 30pc. -We provide posterior samples. When doing further analysis of our reconstruction we recommend doing so for every sample in order to propagate errors. -We anticipate a perception threshold that leads to the absence of extremely low density dust clouds. The two main reasons for this are that the truncated Gaussian likelihood provides less evidence in the regime where the absorption is 0 and that variational Bayesian schemes are known to underestimate errors. Studying a larger volume will shed further light on this subject, as sightlines for more distant stars still provide information about nearby dust clouds. One should note that the overall Gaia absorption data provides 20 times more sightlines that were used in this reconstruction.

Discussion
We compare our result to other dust maps. Unfortunately, comparisons to 2D dust maps are only possible on a qualitative level, since it is not clear whether structures visible in the 2D maps that are not present in the 3D map are simply further away or are too noisy in the data for the algorithm to pick them up. On a qualitative level it is possible to see several morphological similarities of our reconstruction in Fig. 3b to the Planck dust map (et al 2018) in Fig. 1. These figures also show that many dust structures that are not inside the galactic plane are local features. The two 3D dust maps mentioned in Sec. 1 permit a more thorough analysis. A compilation of projected dust densities for our reconstruction as well as the reconstruction of Lallement et al. (2018) and Green et al. (2018) can be seen in Fig. 3. A logarithmic version of this figure is provided by Fig. 4.
While our map seems to agree on large scales with the other maps, there seems to be a pronounced tension in the predictions The Sun is at coordinate (0, 0), the galactic center is located towards the bottom of the plot, and the galactic west is on the left. The right column shows all-sky dust maps of the same region, projected towards the location of the Sun. The first row is the result of the reconstruction discussed in this paper, the second row is the reconstruction performed by Lallement et al. (2018), the last row shows the reconstruction by Green et al. (2018).
Article number, page 5 of 12 A&A proofs: manuscript no. paper of the position of some dust clouds compared to the reconstruction of Lallement et al. (2018). Compared to the map of Green et al. (2018) we recover the small scales significantly better and suffer far less from radial smearing. This is probably due to the high quality parallax data from Gaia DR2. Additionally our reconstruction takes the full 3D correlation structure into account. Our reconstruction as well as the reconstruction of Green et al. (2018) permit quantifying uncertainties using samples. A plot of uncertainties of the dust density reconstructions projected into the galactic plane can be seen in Fig. 6. Uncertainties of the dust density reconstructions in the sky projection can be seen in Fig. 5.
To quantify the dynamic range of the reconstruction and as a prediction on the variability of the logarithmic dust density we calculated histograms of dust density which show how many voxels have which dust density. These histograms can be seen in Fig. 7a. One can see that the histogram of our reconstruction extends slightly more towards high dust densities and substantially towards low dust densities. This is possibly because our reconstruction is more sharply resolved, thus regions of high dust density get captured better and bleed less into the regions where dust is absent.
We characterize how much pairs of those reconstructions agree by the heatmaps of their voxel-wise value pairs. These heatmaps can be seen in Fig. 7. For two perfectly agreeing reconstructions the heatmap would show a line with slope 1. Again it can be seen that the dust density in our reconstruction varies significantly more than in the two other reconstruction. While all maps agree more or less for high dust densities, our dust map predicts vastly more low dust densities.
The reconstruction of Lallement et al. (2018) is performed using Gaussian process regression, as is ours. Thus one can compute the prior Gaussian process correlation power spectrum used in their reconstruction. Fig. 2 shows both our inferred power spectrum as well as their assumed power spectrum. These two power spectra disagree strongly, even when taking into account the uncertainties. Especially in the higher Fourier modes, the spectrum assumed by Lallement et al. (2018) is below ours by orders of magnitude and should have let to a suppression of small scales in their reconstruction. On the other hand, the power spectrum assumed by Lallement et al. (2018) on large scales is orders of magnitudes above the one we infer. This might have caused the reconstruction of Lallement et al. (2018) to be overconfident on intermediate and large scales, possibly explaining the presence of dust structures where our reconstruction seems to exclude the presence of substantial amounts of dust.
One can empirically compute power spectra of the dust density using a Fourier transformation. A comparison plot with all the three mentioned reconstruction can be found in Fig. 8. This shows a white noise floor in the reconstruction of Green et al. (2018), which can visually also be seen as small scale structures in the plane projections shown in Figs. 3 and 4.

Conclusions
1. We provide a highly resolved map of the local dust density using only Gaia data. This map agrees on large scales with previously published maps of Lallement et al. (2018) and Green et al. (2018), but also shows significant differences on small scales. It shows many structures visible in the Planck dust map (et al 2018). 2. In comparison to previous maps, we were able to improve on 3D resolution while still being mostly consistent on the large scales. A comparison to 2D maps like the Planck dust map seems to confirm the features present in our map. 3. We find that the logarithmic density of dust exhibits a 3D spectral index of 3.1, corresponding to a 1D index of 1.1. This is a significantly harder spectrum as that of a passive tracer expected in Kolmogorov turbulence, which would be 5/3. The harder spectrum is probably caused by the sharp edges of the local bubble and other ionization or dust evaporation fronts. 4. In contrast to other dust reconstructions, we predict very low dust densities inside the local bubble. It is not yet clear whether this discrepancy is an artifact of our reconstruction or of the other existing reconstructions. 5. We hope that by providing accurate reconstructions of the nearby dust clouds, further studies of dust morphology will be possible as well as the construction of more accurate absorption models for photon observations in a large range of frequency bands.  Fig. 3, or dimensionless for logarithmic uncertainties. The first column shows the variance for the dust absorption and the second column shows the variance of the logarithmic projected dust density on natural log-scale which can be interpreted as a relative error.
Article number, page 9 of 12 A&A proofs: manuscript no. paper Article number, page 11 of 12 A&A proofs: manuscript no. paper

Appendix A: algorithmic details
The algorithm that was used for the reconstruction is based on work by Knollmüller & Enßlin (2019), where imaging of a positive quantity by simultaneous inference of the quantity itself and its log-correlation kernel is discussed. Many aspects of the algorithm are already discussed by Knollmüller et al. (2017).
We take the dust density ρ to be log-normal distributed with is assumed to be Gaussian distributed with Gaussian process kernel S . By using an exponentiated Gaussian process we allow the dust density to vary by orders of magnitude while simultaneously ensuring that the dust density is a positive quantity. Note that taking a Gaussian distribution instead of a log-normal introduces spurious regions of negative dust and artificially smooths out high density regions, see e.g. Kh et al. (2018). In Eq. (A.2) S is the prior covariance. If we assume no point or direction to be special a-priori, then according to the Wiener-Khinchin theorem S can be fully characterized by its power spectrum. We non-parametrically infer this power spectrum as well. Fig. A.2 depicts the dependency of the known absorption data A G in the inferred logarithmic dust density ln(ρ) and the inferred power spectrum S kk . Our statistical model for the power spectrum S kk is a falling power law with Gaussian distributed slope and offset as well as differentiable non-parametric deviations from it. Fig. A.1 shows a few examples of prior samples of power spectra using our statistical model. Reconstructing the power spectrum is equiv- alent to reconstructing the correlation kernel. We show our reconstructed normalized kernel as well as the one assumed by Lallement et al. (2018) in Fig. A.3. Certain biases can appear when using a fixed kernel, for example introducing a characteristic length scale of the order of the FWHM of the kernel.
A G |A G A G |ρ log(ρ)|S kk S kk Fig. A.2: Graphical representation of the data model for our reconstruction. The logarithmic dust density log(ρ) is a Gaussian process with a smooth Gaussian process power spectrum S kk . The true absorptions A G are directly dependent on the dust density ρ on each line of sight. The measured absorptions A G are assumed to be distributed around the true absorptions A G following a truncated Gaussian distribution as described in section 3.2. One can see that the dust is assumed to be strongly correlated at a distance scale of up to about 30pc. This plot shows normalized one dimensional cuts through the three dimensional Fourier transform of the log-normal spatial correlation power spectrum shown in Fig. 2.