Open Access
Issue
A&A
Volume 659, March 2022
Article Number A115
Number of page(s) 21
Section Cosmology (including clusters of galaxies)
DOI https://doi.org/10.1051/0004-6361/202140291
Published online 16 March 2022

© P. Pranav 2022

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction

At the epoch of recombination, matter and radiation separate, allowing radiation to stream freely in the Universe. This free-streaming radiation permeating the Universe, which we observe as the cosmic microwave background (CMB) radiation, encodes a treasure trove of information about the initial conditions in the Universe (Ryden 2003; Jones 2017). Although it has a remarkably consistent average temperature, the CMB still exhibits tiny deviations of about 10−5 from the background average. The temperature fluctuations in the CMB trace the fluctuations in the underlying matter distribution in the early Universe that are linked to the spontaneous quantum fluctuations generated in an otherwise homogeneous medium (Harrison 1970; Peebles & Yu 1970). Studying the properties of the temperature fluctuations in the CMB is therefore essential for understanding the properties of the primordial matter field.

The lambda cold dark matter (LCDM) paradigm is the standard paradigm of cosmology. Together with the inflationary models in their simplest form (Starobinsky 1982; Guth & Pi 1982), the standard model of cosmology predicts the nature of the primordial stochastic matter distribution field to be that of an isotropic and homogeneous Gaussian random field (Harrison 1970; Guth 1981). This prediction finds allies theoretically in the central limit theorem, and observationally in the various measurements of the CMB temperature anisotropy field via ground- and space-based probes such as the BOOMERanG balloon-based experiment (Masi 2002) and the Wilkinson Microwave Anisotropy Probe (WMAP) satellite (Bennett et al. 2013). The latest endeavor of measuring the CMB temperature anisotropies was the launch of the Planck satellite, which boasts of the highest resolution in measurements to date. The resolution is at scales of a few arcminutes (Planck Collaboration I 2020). Despite the general consensus that the CMB exhibits the characteristics of an isotropic and homogeneous Gaussian random field, a growing body of evidence shows anomalies in the observed CMB field with respect to the base model. They include in particular the observed hemispherical asymmetry in the CMB power spectrum (Eriksen et al. 2004a) as well as the alignment of low multipoles (Copi et al. 2015); see Schwarz et al. (2016) for a review. These observed anomalies raise doubts about the assumption of statistical isotropy and homogeneity, respectively.

Testing the assumption of Gaussianity requires tools that encode information about higher orders. Traditional endeavor in this direction has focused on higher-order correlation functions (Durrer et al. 1996), which are generally extremely resource-intensive computationally (Planck Collaboration XVII 2016). Recently, attention has turned toward developing alternative tools beyond the correlation functions and multispectra, which may potentially encode information of all orders. The principal tools in this regard have arisen from integral geometry and involve computing the Minkowski functionals or the Lifshitz-Killing curvatures (Adler 1981; Mecke et al. 1994; Kerscher et al. 1996; Schmalzing & Gorski 1998; Sahni et al. 1998; Codis et al. 2013; Ducout et al. 2013; Matsubara 2010; Chingangbam et al. 2017; Pranav et al. 2019b; Appleby et al. 2021). The jth Minkowski functional and (D − j)th Lifshitz-Killing curvature of a D-dimensional manifold 𝕄 are related by , where j = 0, …, D, and ωj is the volume of the j-dimensional unit ball. There are d such quantifiers for a D-dimensional set, where d = 0, …, D. All but one are purely geometrical quantities that are related to the d-dimensional volume of the manifold. The exception is the 0th Lifshitz-Killing curvature, or equivalently, the Dth Minkowski functional, which is related to a purely topological quantity, the Euler characteristic (Euler 1758; Adler 1981; Pranav et al. 2019b; Telschow et al. 2019), via Gauss’s Theorema Egrerium (Gauss 1900; Adler 1981; Pranav et al. 2019b). The Minkowski functional computations of the CMB have consistently shown the observations to be congruent with the standard model (Planck Collaboration XVI 2016).

More recently, developments in computational topology have paved the way for extracting topological information from datasets at the level of homology (Munkres 1984; Edelsbrunner & Harer 2010; van de Weygaert et al. 2011; Pranav 2015, 2021a; Pranav et al. 2017; Moraleda et al. 2019) and its hierarchical extension, persistent homology (Edelsbrunner & Harer 2010; Pranav et al. 2017; Shivashankar et al. 2016; Adler et al. 2017; Pranav 2021b; Heydenreich et al. 2021). Topological data analysis (TDA) involving homology and persistent homology has recently started finding application in astrophysical disciplines, for example, in the context of structure identification (Shivashankar et al. 2016; Xu 2019) and quantification of large-scale structures (Kono et al. 2020; Wilding et al. 2021), including detection and quantification of non-Gaussianities (Feldbrugge et al. 2019; Biagetti et al. 2021). Homology describes the topology of a space by identifying the holes and the topological cycles that bound them. A d-dimensional space may contain topological cycles of 0 up to d dimensions. The cycles and holes are associated with the homology groups of the space. The pth Betti number, βp, is the rank of the pth homology group, ℍp, p = 0…d. While itself a purely topological quantity, the Euler characteristic is also the alternating sum of the Betti numbers of all ambient dimensions of a manifold, as denoted by the Euler-Poincaré formula (Adler 1981; Pranav et al. 2017, 2019b). The Euler characteristic has a long history in the analysis of cosmological fields (Gott et al. 1986; Pogosyan et al. 2009; Park et al. 2013; Appleby et al. 2020).

Building on and refining existing tools from computational topology in the context of analyzing the CMB field, this paper presents the homology characteristics of the temperature fluctuation maps of the cosmic microwave background obtained by the Planck satellite (Planck Collaboration I 2020). We perform our experiments on the fourth and final data release, Planck 2020 Data Release 4 (DR4), which is based on the NPIPE data-processing pipeline (Planck Collaboration Int. LVII 2020). The NPIPE dataset represents a natural evolution of the Planck data-processing pipeline, integrating the best practices from the LFI and HFI pipelines separately. The result is an overall amplification of signal and reduction in the associated systematic, noise, and residuals at almost all angular scales (Planck Collaboration Int. LVII 2020). For comparison, we also present results for the Planck 2018 (DR3; Planck Collaboration I 2020), which is based on the Full Focal Plane (FFP) data processing pipeline (Planck Collaboration XII 2016), resulting in the FFP10 simulations (Planck Collaboration IV 2020). The paper follows the spirit of Pranav et al. (2019a) in methods and analysis, where we present results for the intermediate Planck 2015 (DR2; Planck Collaboration XVI 2016); also see Adler et al. (2017). The novel aspect of the methods presented here and in Pranav et al. (2019a) is an analysis pipeline that takes regions with unreliable data on 𝕊2 into account. In the case of CMB, this is reflected in the obfuscation effects of the measurements that are due to foreground objects such as our own galaxy, as well as other extra- and intragalactic foreground sources. We masked these regions and computed the homology of the excursion sets relative to the mask.

We present a brief description of the topological background in Sect. 2, followed by the results in Sect. 3. We discuss the ramifications of the results and conclude the main body of the paper in Sect. 4. The appendices present a brief description of the datasets and the computational pipeline as well as a validation for the statistical tests.

2. Topological background

Commensurate with our intention of analyzing the topology of the CMB temperature fluctuations, we restricted ourselves to topological definitions on 𝕊2 (Pranav et al. 2019a) and invoked relative homology to account for analysis in the presence of masked regions. The standard reference for this section is Edelsbrunner & Harer (2010); also see Pranav et al. (2019a) for discussion in the context of CMB, as well as Heydenreich et al. (2021) in the context of cosmic shear fields.

2.1. Homology characteristics of excursion sets of 𝕊2

Denoting the CMB temperature fluctuations on 𝕊2 as f: 𝕊2 → ℝ, we define the excursion set1 at a temperature ν as the subset of 𝕊2 where the temperature is higher than or equal to ν,

(1)

In the cosmological setting, the usual practice is to examine the dimensionless threshold, which is the mean-subtracted and variance-scaled field derived from the original field, in which case, ν = (f − μf)/σf, and μf and σf are the mean and the standard deviation of the field f. If 𝔼(ν) does not cover the entire 𝕊2, it may be composed of isolated components and holes. Figure 1 presents a visualisation of the temperature fluctuations in the CMB sky smoothed at 5 degrees, while Fig. 2 presents excursion sets corresponding to two different thresholds. For high thresholds, presented in the left panel, the excursion set is dominated by components, while for low thresholds, presented in the right panel, the excursion set is dominated by a few large connected objects indented with holes, which are bounded by loops. The Betti numbers β0 and β1 count the number of independent components and loops of the excursion set, respectively. In general, for a d-dimensional topological space, βp is the rank of the pth homology group, Hp; p = 0, …, d, and counts the number of independent p-dimensional cycles (Munkres 1984; Edelsbrunner & Harer 2010; Pranav et al. 2017). If 𝔼(ν) does not cover the entire 𝕊2, the number of independent loops is one less than the total number of loops. If 𝔼(ν) covers the entire 𝕊2, there are no loops, and β2 = 1, because of the void that is enclosed by the boundary-less surface of the sphere. A related quantity that has a long history of usage in cosmological analyses is the Euler characteristic, or alternatively, the genus (Gott et al. 1986; Park et al. 2013), which is the alternating sum of the Betti numbers of the excursion set,

(2)

thumbnail Fig. 1.

Visualization of the temperature fluctuations in the CMB sky. The survey surface 𝕊2 is distorted at each point in the direction of the surface normal. The distortion is proportional to the fluctuation in direction and magnitude. The visualization is based on the observed CMB sky cleaned by the NPIPE pipeline and smoothed at 5 degrees.

thumbnail Fig. 2.

Cosmic microwave background sky thresholded at moderately positive (left) and negative (right) levels. For high thresholds, the excursion set is dominated by isolated components, while at low thresholds, it gives the appearance of a single connected surface indented by numerous holes. For sufficiently low thresholds, the holes fill up, and the excursion set covers the entire sphere, which is composed of a connected surface without a boundary that encloses a single void (cf. Fig. 1).

The Euler characteristic also has a geometric interpretation as one of the Lifshitz-Killing curvatures of the manifold (Adler 1981; Pranav et al. 2019a).

2.2. Masks and relative homology

The measurement of the CMB signal is unreliable in certain parts of the sky due to interference from bright foreground objects. These include extended objects such as our galaxy, as well as bright point sources. We masked these regions and computed the homology characteristics of the excursion set relative to the mask. Figure 3 presents a visualization of the masked CMB sky. Letting M ⊆ 𝕊2 be the mask and 𝔼(ν) the excursion set, we considered the relative homology of the pair of closed spaces, (E, M), where E = 𝔼(ν) and M = M ∩ 𝔼(ν). We note that M is contained in E and is an open set by definition in this context. We denote the rank of the relative homology groups of the pair (E, M) by bp = rank Hp(E, M); p = 0, 1, 2. The Betti numbers computed considering the pair (E, M) are different from the Betti numbers of the excursion set without a mask. For a more detailed discussion about relative homology in the context of masked CMB sky, see Pranav et al. (2019a). The relative Euler characteristic, as in the case of absolute homology, is the alternating sum of the rank of relative homology groups,

(3)

thumbnail Fig. 3.

Visualization of the temperature fluctuations in the CMB sky in the presence of masks (plotted in gray). It consists of an equatorial belt and numerous patches on the northern and southern cap, which correspond to our galaxy and other bright foreground objects. The visualization is based on the observed CMB sky cleaned by the NPIPE pipeline and smoothed at 5 degrees.

3. Results

In this section, we present our results in terms of the ranks of homology groups, bp, p = 0, 1, as well as the Euler characteristic, ECrel, relative to the mask. Section 3.1 presents results from the DR4 NPIPE dataset based on 600 simulations, obtained using the SEVEM component separation pipeline. Similar results for the DR3 FFP10 dataset based on 300 simulations, obtained using the SMICA component separation pipeline, are presented in Sect. 3.22. The latter dataset facilitates a direct comparison with the topo-geometrical studies of the CMB performed by the Planck team in Planck Collaboration VII (2020). For both datasets, we begin by examining the graphs of b0, b1, and of the (relative) Euler characteristic, ECrel. This is followed by statistical tests that estimate the significance of results, taking into account all of the levels together for a given resolution and scale. We employed the standard χ2-test in the empirical and theoretical settings, as well as the model-independent Tukey depth test for this purpose. We chose a-priori levels k, where k = k/2, setting kb0[0 : 6],kb1 = [ − 6 : 0],kECrel = [ − 6 : 6]. We did this to restrict ourselves to analyzing b0 for positive thresholds, b1 for negative thresholds, and ECrel across the full threshold range. The choice of regions is determined by the fact that b0(ν) tends to be small and carries little information for ν < 0, b1(ν) tends to be small for ν > 0, while the Euler characteristic is informative over the full range of levels. These levels are consistent with the levels investigated in Planck Collaboration XVI (2016) and Planck Collaboration VII (2020).

3.1. Planck 2020 Data Release 4: NPIPE dataset

In this section, we analyze the graphs of the topological descriptors for the NPIPE dataset. Treating the topological descriptors as discrete random variables, we compute the mean, μsim, and the standard deviation, σsim, from the simulations in the usual sense for each of the levels. We use them to assess the model-independent significance of the difference between the simulations and observations.

3.1.1. Graph of b0

Figure 4 presents the graphs of b0 for the various degraded resolutions and the associated smoothing scales. The mean and the standard deviation are computed from the simulations. In general, we find that the significance across resolutions and levels is approximately 2σ or lower.

thumbnail Fig. 4.

Graphs of b0 for the NPIPE dataset for various degraded and smoothing scales. Panel a: observational curve in yellow, and the curves corresponding to the average of simulations are presented in gray. Error bands corresponding to (1σ : 3σ) are also drawn. Panel b: curve for the significance of the differences, where the observed curves are presented in red, and the values for the simulations are presented as dotted gray lines.

3.1.2. Graph of b1

Figure 5 presents the graphs of b1 for the various resolutions and corresponding scales. For resolutions between N = 2048, …, 64, corresponding to FWHM = 5′,…,160′, we observe the significance to be approximately 2σ or lower. However, for the next higher smoothing scales corresponding to FWHM = 320′ (approximately 5 degrees) and FWHM = 640′ (approximately 10 degrees), we note interesting deviations that we examine next. These cases are also presented in Fig. 6 in an enlarged view for clarity.

thumbnail Fig. 5.

Graphs of b1 for the NPIPE dataset for various resolutions and smoothing scales. In panel a, the observational curves are presented in yellow, and the curves corresponding to the average of simulations are presented in gray, while panel b presents the significance of the differences. Error bands corresponding (1σ : 4σ) are also presented in various colors.

Concentrating on the top left panel of Fig. 6, at a Gaussian smoothing scale of FWHM = 320′, approximately 5 degrees, and at a moderately low dimensionless density threshold ν = −2.5, we find a 3.91σ deviation between the simulations and the observation. We examine the behavior of b1 at this level further in Fig. 7, where we present the histogram of the distribution of b1 from the simulations in gray boxes. The observed value is indicated by a red vertical line. Evidently, the distribution is nonsymmetric and hence non-Gaussian. Our tests also indicate that the Poisson distribution is a poor fit to the data as well. We detect 28 loops in the observational curves, compared to the average of ∼21 loops in the simulations, with a standard deviation of ∼1.78. Within the Gaussian context, this would yield the significance of the difference at approximately 4σ. However, since the distribution in this bin is distinctly non-Gaussian and does not obey Poisson statistics, ascribing a σ significance in the usual sense is not viable in this case. As a result, we simply note that the significance is higher than what may be resolvable by 600 simulations, yielding an empirical p-value of at most 0.0016. While the generally low number of loops in both simulations and observations, owing to the large scale of probing, push us to the regime of small numbers, the behavior of the statistics indicates a stable regime.

thumbnail Fig. 6.

Graph of b1, selectively for N = 32 and N = 16, corresponding to FWHM = 320′ and FWHM = 640′, presenting an enlarged view of the concerned resolutions. The observational curve is presented in yellow, and the curves corresponding to the average of simulations are presented in gray. Error bands corresponding to (1σ : 4σ) are also drawn. The bottom two panels present the visualization of the scalar temperature field in order to facilitate an appreciation of the structure of the field.

thumbnail Fig. 7.

Histogram of b1 at ν = −2.5, where the ∼4σ deviation occurs. The minimum value attained across 600 simulations is 18, with a mean at ∼21 and a standard deviation of ∼1.7. The observed map exhibits b1 = 28 and is well outside the distribution.

Further evidence in support of this argument is presented in Appendix C, where we examine the distribution of χ2 and Tukey depth from the simulations, and compare them with the observations. In general, for the χ2-test, we find good agreement between the histogram of simulations and the theoretical curve. Therefore, the highly significant deviation merits consideration. The deviant behavior of loops in the earlier analysis of Planck 2015 Data release 2 (DR2) presented in Pranav et al. (2019a) at similar scales is also noteworthy in this context, as is the deviant Euler characteristic from WMAP observational data reported by Eriksen et al. (2004b) and Park (2004). Figure 8 presents a visualization of some of these loops at moderately negative thresholds for observational maps smoothed at FWHM = 320′.

thumbnail Fig. 8.

Visualization of the loops surrounding the low-density regions at a moderately negative threshold. The gaps in the manifold correspond to the mask that has been removed. We clearly note the equatorial belt corresponding to our galaxy, and more patches in the visible cap. Some loops live fully in the excursion region that is not influenced by the mask. We show them in red. We also depict some representative relative loops whose end points are masked. We draw closed loops for this category as well and show the portions in which they overlap the masked regions in white. The visualization is based on the observed CMB map from the NPIPE data set smoothed at 5 degrees.

At the next higher smoothing scale of FWHM = 640′, approximately 10 degrees, presented in the right panel of Fig. 6, the shapes of the observational curve and the mean of the simulated curves for the negative thresholds, where topological loops are the dominant entities, are widely different. As differences in shapes may be a stronger indicator of inherent differences in the models than simply differences in the amplitudes, our assessment is that this case merits scrutiny as well. However, since the numbers involved are small, about 5 and fewer in some bins3, we consider the statistics emerging from this resolution as not stable and reject them as possible statistical fluke.

3.1.3. Graph of ECrel

Figure 9 presents the graphs of ECrel for the various resolutions and corresponding scales. The relative Euler characteristic also deviates for the resolutions where b0/b1, or both, deviate. However, the significance of the difference shows milder characteristics owing to cancellation effects (Pranav et al. 2019a). This is because the Euler characteristic is not strictly an independent quantity because it is an alternating sum of the ranks of the relative homology groups. It therefore merely reflects the deviations in the contributing Betti numbers.

thumbnail Fig. 9.

Graphs of ECrel for the NPIPE dataset. Panels a,b: similar information as Figs. 4 and 5. ECrel exhibits deviations commensurate with those in b0 and b1.

3.2. Planck 2018 Data Release 3: FFP10 dataset

In this section, we analyze the topological characteristics of the FFP10 dataset. We also compare our results for the Euler characteristic with those presented in Planck Collaboration VII (2020).

3.2.1. Graph of b0

Figure 10 presents the graphs for the topological components in panels a and b, similar to Fig. 4. In general, the observational values are within the 2σ band when compared with the simulations for almost all resolutions and levels. The exception is the number of components at N = 128, FWHM = 80′, where the observations deviate from the simulations at 2.96σ at the dimensionless density threshold ν = 0.5. The smoothing scale, approximately a degree, and the moderate threshold at which the deviation occurs ensure that the statistics are away from the low-number regime.

thumbnail Fig. 10.

Graph of b0 for the FFP10 dataset. In panel a, the observational curves are presented in yellow, and the curves corresponding to the average of simulations are presented in gray. Error bands corresponding to (1σ : 3σ) are also drawn. Panel b: significance of the differences. The dataset exhibits milder deviations than the NPIPE dataset in general. However, we note a 2.96σ deviation in the number of components between the simulations and observation at N = 128, FWHM = 80′.

3.2.2. Graph of b1

Figure 11 presents the graphs for the topological loops. The deviations between the observations and simulations are within approximately 2σ for most of the smoothing scales and thresholds. However, for the dimensionless threshold, ν = −2.5, we note a more than 2σ deviation for N = 64, FWHM = 160′ onward toward higher smoothing scales. For N = 32, FWHM = 320′, the deviation is 2.77σ, and similar for N = 16, FWHM = 640′. These deviations, while not as significant as exhibited in the NPIPE dataset, occur at similar scales and thresholds. In addition, we also note a more than 4σ deviation for N = 16, FWHM = 640′ at ν = −3. However, we reject this as a possible statistical fluke because of the extremely low numbers involved, which are about 5 or fewer.

thumbnail Fig. 11.

Graph of b1 for the FFP10 dataset. In panel a, the observational curves are presented in yellow, and the curves corresponding to the average of simulations are presented in gray. Error bands corresponding to (1σ : 4σ) are also drawn. Panel b: significance of the differences. The dataset exhibits milder deviations than the NPIPE dataset in general, but there are mildly significant deviations at 2.77σ, at scales and thresholds where the NPIPE dataset shows strong deviations as well.

3.2.3. Graph of ECrel

Figure 12 presents the graphs of ECrel for the various resolutions and corresponding scales. The Euler characteristic shows commensurate deviations with respect to b0(b1) because of their contributions in the alternating sum.

thumbnail Fig. 12.

Graph of ECrel for the FFP10 dataset. In panel a, the observational curve is presented in yellow, and the curves corresponding to the average of simulations are presented in gray. Error bands corresponding to (1σ : 3σ) are also drawn. Panel b: significance of the differences. The Euler characteristic exhibits deviations commensurate with b0 and b1.

3.3. Statistical significance from combined thresholds

We considered the two methods detailed in Appendix A.2 to compute the statistics of the combined thresholds and present p-values of the observed maps for them. The first method is the Mahalanobis distance (Mahalanobis 1936), also known as the χ2-test, which works in model-dependent and in empirical settings. The second method is the nonparametric Tukey depth test. In Appendix C we examine the distribution characteristics of these statistics from the simulations. The simulation and theoretical curve agree well for the χ2 statistic. Despite this, we computed the p-values both theoretically and from the empirical distribution, and found that the empirical distributions yield a milder significance. We also find that the depth histograms and distributions are well behaved and represent a meaningful quantification. Considering all the three topological quantities b0, b1 and ECrel, we computed the statistics for all resolutions separately to ascribe a scale dependence to the signals.

3.3.1. NPIPE dataset

All entries before the last in Table 1, panel a, present p-values for the variables, computed from maps degraded at specific resolutions, with additional Gaussian smoothing applied. The χ2 results are presented for the theoretical and empirical distributions in the left and middle blocks, respectively. The results for the Tukey depth are presented in the rightmost block. When we consider all the three tests, the p-values computed from the empirical distribution using the χ2 statistic yield the most conservative estimate of the significance, and we favor it in order to be conservative in our interpretation. b1 shows a significant difference between the observational maps and the simulations for N = 32 and N = 16. Additionally, b0 also shows a significant difference at N = 32. However, we note in this context that the distributions in this case are manifestly non-Gaussian, and their form is poorly understood theoretically. As a result, the χ2 values may be regarded with caution. This also presents the case for admitting the nonparametric Tukey depth test, which detects an outlier event in this case, yielding a p-value of 0.0. The Euler characteristic shows highly significant difference at N = 32, which is an order of magnitude larger than either b0 or b1. This is due to the significant difference shown by the contributing b0 and b1 at this resolution, in combination with the fact that these deviations occur near the tail of the distribution. In general, because the Euler characteristic is an alternating sum of the Betti numbers, its behavior is influenced by both the Betti numbers. Cancellation effects are dominant if the deviations in the contributing Betti numbers occur in the zone of overlap, which is more toward median thresholds. If the deviations occur toward the tail, where either one of the Betti numbers is dominant, the signals of the contributing Betti numbers are amplified in the Euler characteristic signal. As an example, for the next lower resolution, N = 16, the Euler characteristic shows no significant difference even though b1 exhibits significant difference. This is because of the highly nonsignificant behavior of b0, whose contribution cancels the contribution of b1 toward the Euler characteristic. In many instances, the Tukey depth test yields p-values of 0.0 for all the descriptors, indicating that the observation is a true outlier compared to the simulations. We note the trend that these instances occur when the p-values computed from the Mahalanobis distance also exhibit generally low values. We help interpret the Tukey depth values in Appendix C, where we examine the trends in their distribution.

Table 1.

Two-tailed p-values for relative homology for the datasets.

3.3.2. FFP10 dataset

Results for the FFP10 dataset are presented in panel b of Table 1. We concentrate our analysis on the middle block, which presents the p-values from the χ2 statistic using the empirical distribution. For N = 32, FWHM = 320′, we note that all the topological descriptors show a nonsignificant deviation between the simulations and the observation, which is in contrast with the behavior at the same resolution in the NPIPE dataset, where the values are an order of magnitude lower. For the next larger scale of probing, N = 16, FWHM = 640′, we note a per mil significance of the difference between simulations and observations for the number of loops, which is an order of magnitude smaller than in the NPIPE dataset. Additionally, we also note the low p-value for b0 at N = 128, FWHM = 80′, which is the scale at which an approximately 3σ deviation occurs for the number of components between the simulations and the observation. When examining the Euler characteristic at this scale, we note that the values are mildly significant, in contrast with the NPIPE dataset, which exhibits nonsignificant behavior. The Tukey depth test shows the observations to be true outliers in instances at different resolutions for all the three variables, yielding p-values of 0.0. As in the case with the NPIPE dataset, these instances occur when the χ2 test also exhibits small p-values.

3.4. Comparison with earlier topo-geometrical results in the literature

Topo-geometrical studies aimed at testing isotropy, homogeneity, and Gaussianity of the cosmic fields, such as the CMB and the 3D density distribution, are standard in the cosmological literature. The principle tools employed for this purpose are the geometrical Minkowski functionals and the topological Euler characteristic. The bridge between topology and geometry is provided by the Theorema Egrerium of Gauss, which establishes the equivalence between the geometric 0th Lipschitz-Killing curvature (equivalently the Dth Minkowski functional), and the purely topological notion of Euler characteristic of a D-dimensional space. Our results, on the other hand, involve novel and purely topological notions arising from homology theory quantified by the Betti numbers. The Euler characteristic has connections to these topological measures due to the Euler-Poincaré formula, which expresses the Euler characteristic as the alternating sum of Betti numbers.

The pioneering works that examined the Minkowski functionals and Euler characteristic of the CMB are by Eriksen et al. (2004b) and Park (2004), performed on the WMAP data (Bennett et al. 2013). While Eriksen et al. (2004b) examined the whole set of Minkowski functionals as well as the Morse-theoretical concept of a skeleton length, Park (2004) restricted themselves to genus measurement, which is linearly related to the Euler characteristic. The purely geometric Minkowski functionals, namely the area functional and the contour length functional, show consistency with the standard model. In contrast, both works reported anomalous measurements of the Euler characteristic or genus. Eriksen et al. (2004b) reported that the Euler characteristic at negative density thresholds is anomalous with the base model simulations at more than 3σ for the northern hemisphere at smoothing scales of FWHM = 3.40°. The corresponding χ2 value reported for this scale is at the 95% confidence level, which translates into a p-value of 0.05. Park (2004) reported an anomalous genus at more than 2.5σ. The anomalous behavior of genus or the Euler characteristic, specifically at negative levels, is linked to and generated by the anomalous behavior of the first homology group, represented by topological loops, and establishes a connection between our results and previous results, showing a consistency across datasets.

The analysis of the Minkowski functionals and Euler characteristic on the Planck datasets was performed by the Planck collaboration itself and was reported in Planck Collaboration XVI (2016) for DR2 (FFP8) and in Planck Collaboration VII (2020) for DR3 (FFP10). Planck Collaboration VII (2020) examined the Minkowski functionals, and the graphs for the Euler characteristic were presented for N = 1024, 256, 32. The general indication is that the observational values are within the 2.5σ band computed from the simulations. Our results are largely consistent with this observation from the Planck analysis for the given resolutions, which is also exhibited by the nonanomalous p-values obtained from the empirical χ2 test. However, we also note that our results show stronger differences than the Planck results. In this context, it is important to note that in the Planck analysis pipeline, it has been the practice to combine all the Minkowski functionals to perform the statistical tests and subsequently present the combined p-values. Beyond the fact that this combination mixes signals from topo-geometrical descriptors that are a priori known to represent independent properties, it also prevents a direct comparison with our results of Euler characteristic computations.

In this context, another subtle but important point must be considered. In a detailed treatment, Pranav et al. (2019b) showed that the standard equations for Minkowski functionals used in cosmology are volume-normalized representations and represent exact computations only under specific conditions. The simplest case is a compact manifold without a boundary, for example, the complete 2-sphere in 2D, or a periodic 3D Euclidean grid. When the manifold has boundaries, for example, the masked CMB sphere, additional boundary terms are involved in the exact computation of the Minkowski functionals. For purely geometric quantities, due to their localized nature, these boundary terms may be ignored when the manifold is large compared to the boundary. For topological quantities, which are nonlocal by nature, there is an interplay between the size of the topological objects, represented by the smoothing scale, and the size of the manifold. As a result, in case of topological descriptors such as the Euler characteristic and the Betti numbers, for large scales, boundary effects become increasingly more important. A full exact computation that takes the boundary effect generated by the complicated mask into account may therefore be a more accurate reflection of reality. Our computational methods for the topological descriptors perform exact computations in the presence of arbitrary masks. For the purely geometric components of the Minkowski functionals, this will involve a computation via the full Gaussian Kinematic Formula (Adler & Taylor 2010; Pranav et al. 2019a), which takes the boundary terms into account that were, for example, developed in Fantaye et al. (2015).

4. Discussions and conclusion

We presented a topological analysis of the temperature fluctuations in the CMB in terms of homology. To account for regions with unreliable data, we computed the homology of the excursion sets relative to the mask covering these areas. We performed a multiscale analysis by degrading and smoothing the maps for a range of pixel resolutions, and a subsequent convolution with a Gaussian filter for a range of scales. We performed our analysis on the fourth and final NPIPE data release from the Planck team. The pipeline represents a natural evolution of the data-processing pipeline, commensurate with better understanding of systematics, residuals, and noise over a period of time in successive data releases. It incorporates the best strategies from the LFI and HFI processing pipeline, so that the level of noise and residuals at all scales is reduced (Planck Collaboration Int. LVII 2020). We also investigated the Planck 2018 Data release 3 (DR3), accompanied by the FFP10 simulations (Planck Collaboration IV 2020), for comparison and completeness. The present paper is a successor to Pranav et al. (2019a), where we investigated the topology of the temperature fluctuation maps based on the intermediate Planck 2015 Data Release 2 (DR2), accompanied by FFP8 simulations. These two investigated topological characteristics of the CMB temperature fluctuation maps for the latest three data releases by the Planck team. The overall trends in the results in the datasets show the consistency of the data processing pipeline and our own methods.

Examining the behavior of topological components or isolated objects represented by the 0D homology group, we find that the observations deviate by approximately 2σ or less from the simulations for the NPIPE dataset. For the FFP10 dataset, b0 exhibits a deviation of 2.96σ at N = 128, FWHM = 80′. The 1D homology group, representing and quantifying the topological loops, is also consistent with the base model at 2σ for most of the resolutions and scales. However, for FWHM = 320′ and at moderately low negative thresholds, we record a high deviation between the simulations and observations for the NPIPE dataset. In a Gaussian setting, this would amount to an approximately 4σ significance. However, the distribution characteristics at this threshold are manifestly non-Gaussian, rendering the usual σ significance nonviable. The distribution characteristics do not obey the Poisson statistics either. In view of this, and because we lack a true theoretical understanding of the distribution characteristics, we simply note that the significance is larger than what may be resolved by 600 simulations, yielding an empirical p-value of at most 0.0016. The FFP10 dataset exhibits a deviant behavior at ∼2.8σ at the same threshold and resolution for the loops. Although the deviation occurs in a regime in which the low threshold and large smoothing scale may indicate statistics with low numbers, we find the statistics to be well behaved and stable. The high significance value, combined with the fact that the deviation is at a scale at which anomalies have been reported in several methods and datasets (Eriksen et al. 2004b; Park 2004; Schwarz et al. 2016), the case merits consideration. However, in this context, we also note that the statistical analysis is based on merely 600 and 300 simulations for the NPIPE and FFP10 datasets, respectively. For the Euler characteristic, the NPIPE dataset exhibits a high significance due to the high significance exhibited by the loops. However, for the FFP10 data set, the Euler characteristic is within approximately the 2.5σ band computed from simulations, which is largely consistent with the observations in Planck Collaboration VII (2020). However, we note that our results generally exhibit stronger differences than the Planck analysis pipeline. While ascertaining the source of this discrepancy is beyond the scope of this paper, a plausible source of the difference may be the different methods adopted in estimating the Euler characteristic and Minkowski functionals in general in Planck Collaboration VII (2020), which are based on theoretical equations for boundary-less manifolds. At the next higher smoothing scale, N = 16, FWHM = 640′, both datasets exhibit a significantly anomalous behavior with respect to the number of loops, but not the number of components. However, we disregard this resolution because of the low numbers involved in computing the statistics, owing to the large smoothing scale and low thresholds.

In order to test for nonrandom discrepancies, we computed the p-values using the theoretical model-dependent χ2 test, as well as its nonparametric version computed from empirical distributions, by combining all relevant thresholds at a given resolution. We also presented the nonparametric and model-independent Tukey depth test, which indicated that the observations are true outliers with respect to the simulations in numerous instances. However, in the final interpretation, we focused on the most conservative p-values, estimated using the χ2 test from the empirical distributions. For the NPIPE dataset, the observations are consistent with the simulations, with p-values higher than 0.05 for most resolutions. However, the number of components and loops shows mildly significant deviations at N = 32, FWHM = 320′. The corresponding deviation for the Euler characteristic exhibits an order-of-magnitude lower value at per mil levels. For the next larger scale, we find that the components and the Euler characteristic are consistent with the base model, while the loops exhibit a mildly anomalous p-value of 0.03. While emphasizing that the statistics are based on low numbers, we note the contrasting behavior of the different topological descriptors. For the FFP10 dataset, the components and the Euler characteristic exhibit p-values higher than 0.01 for all resolutions. This is consistent with the observation in Planck Collaboration VII (2020) that the p-values obtained from the combined Minkowski functionals are consistent with the base model within the 99% confidence level. The number of loops exhibits a significantly anomalous behavior, yielding per mil p-values, in contrast with the number of components and the Euler characteristic, at N = 16, FWHM = 640. However, we note that this deviation occurs at large smoothing scales, where the numbers involved in computing the statistics are small. Disregarding this anomalous behavior of loops at large scales, which might be affected by low-number statistics, we find that the FFP10 dataset is consistent with the standard model simulations at the 99% confidence level.

In summary, while both datasets are largely consistent with the simulations based on the standard model, we find instances of interesting discrepancies in both datasets, which may be difficult to reject summarily as statistical flukes. Although most but not all of the anomalies occur on large scales, which inherently indicates statistics based on small numbers, we find them to be in statistically stable regimes in most instances. Based on the evidence presented in this paper, our assessment is that it may be a difficult task to accept or reject the null hypothesis summarily. A primary but crucial requirement may be a significantly larger number of simulations in order to achieve a clear verdict. In the absence of a true understanding of the origin of the possible anomalous behavior, any attempt to classify it will merely be speculative in nature, and we refrain from it. However, to facilitate a deeper understanding, we envision future research directed toward examining the fiducial frequency maps, as well as the polarization maps, among others, possibly with a larger suite of simulations. This will also involve testing smaller patches of the sky in order to determine whether the effect is global in nature or restricted to a specific part of the sky.


1

Excursion sets are also known as superlevel sets and their boundary thresholds as levelsets, or simply levels.

2

The Planck CMB maps are hosted at https://pla.esac.esa.int/. The secondary datasets encapsulating topological information of CMB maps are hosted at https://www.pratyushpranav.org/cmb_data/cmb_data_archive.tar.gz. The analysis codes are available at https://www.pratyushpranav.org/codes/topos2.tar.gz

3

The standard requirement for the validity of χ2 test is a minimum of five samples in each bin considered for computing the statistic.

Acknowledgments

I am indebted to the anonymous referee, whose incisive, yet extremely insightful comments have helped bring this draft to a balanced place. I am greatly indebted to Robert Adler, Thomas Buchert, Herbert Edelsbrunner, Bernard Jones, Armin Schwarzman, Gert Vegter, and Rien van de Weygaert for encouraging this solo venture, and for extremely helpful discussions. My gratitude also to Julian Borrill and Reijo Keskitalo for their patience in clarifying doubts, and their constructive comments on the draft. I also thank Tal Eliezri for insightful comments on the artwork. This work is supported by the ERC advanced grant ARThUs (grant no: 740021; PI: Thomas Buchert), with contributing influence from ERC advanced grant URSAT (grant no: 320422; PI: Robert Adler). I gratefully acknowledge the support of PSMN (Pôle Scientifique de Modélisation Numérique) of the ENS de Lyon, and the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory, operated under Contract No. DE-AC02-05CH11231, for the use of computing resources.

References

  1. Adler, R. 1981, in The Geometry of Random Fields, Classics in applied mathematics, (3600 Market Street, Floor 6, Philadelphia, PA 19104: SIAM), Society for Industrial and Applied Mathematics [Google Scholar]
  2. Adler, R. J., & Taylor, J. E. 2010, Random Fields and Geometry, Springer Monographs in Mathematics (Springer) [CrossRef] [Google Scholar]
  3. Adler, R. J., Agami, S., & Pranav, P. 2017, Proc. Nat. Acad. Sci., 114, 11878 [CrossRef] [Google Scholar]
  4. Appleby, S., Park, C., Hong, S. E., Hwang, H. S., & Kim, J. 2020, ApJ, 896, 145 [NASA ADS] [CrossRef] [Google Scholar]
  5. Appleby, S., Park, C., Pranav, P., et al. 2021, ArXiv e-prints [arXiv:2110.06109] [Google Scholar]
  6. Bauer, U., Kerber, M., Reininghaus, J., & Wagner, H. 2014, in Mathematical Software – ICMS 2014 (Berlin Heidelberg: Springer), 137 [CrossRef] [Google Scholar]
  7. Bennett, C. L., Larson, D., Weiland, J. L., et al. 2013, ApJS, 208, 20 [Google Scholar]
  8. Biagetti, M., Cole, A., & Shiu, G. 2021, JCAP, 2021, 061 [CrossRef] [Google Scholar]
  9. Chingangbam, P., Yogendran, K. P., Joby, P. K., et al. 2017, JCAP, 12, 023 [CrossRef] [Google Scholar]
  10. Codis, S., Pichon, C., Pogosyan, D., Bernardeau, F., & Matsubara, T. 2013, MNRAS, 435, 531 [NASA ADS] [CrossRef] [Google Scholar]
  11. Copi, C., Huterer, D., Schwarz, D., & Starkman, G. 2015, MNRAS, 449, 3458 [NASA ADS] [CrossRef] [Google Scholar]
  12. Ducout, A., Bouchet, F. R., Colombi, S., Pogosyan, D., & Prunet, S. 2013, MNRAS, 429, 2104 [NASA ADS] [CrossRef] [Google Scholar]
  13. Durrer, R., Gangui, A., & Sakellariadou, M. 1996, Phys. Rev. Lett., 76, 579 [CrossRef] [Google Scholar]
  14. Edelsbrunner, H., & Harer, J. 2010, Computational Topology - an Introduction (American Mathematical Society), 1 [Google Scholar]
  15. Eriksen, H. K., Hansen, F. K., Banday, A. J., Górski, K. M., & Lilje, P. B. 2004a, ApJ, 605, 14 [NASA ADS] [CrossRef] [Google Scholar]
  16. Eriksen, H. K., Novikov, D. I., Lilje, P. B., Banday, A. J., & Górski, K. M. 2004b, ApJ, 612, 64 [NASA ADS] [CrossRef] [Google Scholar]
  17. Euler, L. 1758, Novi Commentarii academiae scientiarum Petropolitanae, 4, 140 [Google Scholar]
  18. Fantaye, Y., Marinucci, D., Hansen, F., & Maino, D. 2015, Phys. Rev. D, 91, 063501 [NASA ADS] [CrossRef] [Google Scholar]
  19. Feldbrugge, J., van Engelen, M., van de Weygaert, R., Pranav, P., & Vegter, G. 2019, JCAP, 2019, 052 [CrossRef] [Google Scholar]
  20. Gauss, C. F. 1900, K. Gesellschaft Wissenschaft, 8 [Google Scholar]
  21. Górski, K. M., Hivon, E., Banday, A. J., et al. 2005, ApJ, 622, 759 [Google Scholar]
  22. Gott, J. R., III, Dickinson, M., & Melott, A. L. 1986, ApJ, 306, 341 [NASA ADS] [CrossRef] [Google Scholar]
  23. Guth, A. H. 1981, Phys. Rev. D, 23, 347 [Google Scholar]
  24. Guth, A. H., & Pi, S.-Y. 1982, Phys. Rev. Lett., 49, 1110 [CrossRef] [Google Scholar]
  25. Harrison, E. R. 1970, Phys. Rev. D, 1, 2726 [CrossRef] [Google Scholar]
  26. Heydenreich, S., Brück, B., & Harnois-Déraps, J. 2021, A&A, 648, A74 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  27. Jones, B. J. T. 2017, Precision Cosmology: The First Half Million Years (Cambridge University Press) [CrossRef] [Google Scholar]
  28. Kerscher, M., Schmalzing, J., & Buchert, T. 1996, in Mapping, measuring and modelling the universe, eds. P. Coles, V. Martinez, & M. J. P. Borderia, 247 [Google Scholar]
  29. Kono, K. T., Takeuchi, T. T., Cooray, S., Nishizawa, A. J., & Murakami, K. 2020, ArXiv e-prints [arXiv:2006.02905] [Google Scholar]
  30. Mahalanobis, P. C. 1936, in Proceedings National Institute of Science, India, 2, 49 [Google Scholar]
  31. Masi, S. 2002, Prog. Part. Nucl. Phys., 48, 243 [NASA ADS] [CrossRef] [Google Scholar]
  32. Matsubara, T. 2010, Phys. Rev. D, 81, 083505 [NASA ADS] [CrossRef] [Google Scholar]
  33. Mecke, K. R., Buchert, T., & Wagner, H. 1994, A&A, 288, 697 [NASA ADS] [Google Scholar]
  34. Moraleda, R., Valous, N., Xiong, W., & Halama, N. 2019, Computational Topology for Biomedical Image and Data Analysis: Theory and Applications, Focus Series in Medical Physics and Biomedical Engineering, (CRC Press) [Google Scholar]
  35. Morozov, D. 2005, BioGeometry News, Dept. Comput. Sci., Duke Univ., USA [Google Scholar]
  36. Munkres, J. 1984, Elements of Algebraic Topology, Advanced book classics (Perseus Books) [Google Scholar]
  37. Park, C.-G. 2004, MNRAS, 349, 313 [NASA ADS] [CrossRef] [Google Scholar]
  38. Park, C., Pranav, P., Chingangbam, P., et al. 2013, J. Korean Astron. Soc., 46, 125 [CrossRef] [Google Scholar]
  39. Peebles, P. J. E., & Yu, J. T. 1970, ApJ, 162, 815 [NASA ADS] [CrossRef] [Google Scholar]
  40. Planck Collaboration I. 2020, A&A, 641, A1 [Google Scholar]
  41. Planck Collaboration IV. 2020, A&A, 641, A4 [Google Scholar]
  42. Planck Collaboration VII. 2020, A&A, 641, A7 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  43. Planck Collaboration XII. 2016, A&A, 594, A12 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  44. Planck Collaboration XVII. 2016, A&A, 594, A17 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  45. Planck Collaboration XVI. 2016, A&A, 594, A16 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  46. Planck Collaboration Int. LVII. 2020, A&A, 643, A42 [Google Scholar]
  47. Pogosyan, D., Gay, C., & Pichon, C. 2009, Phys. Rev. D, 80, 081301 [NASA ADS] [CrossRef] [Google Scholar]
  48. Pranav, P. 2015, PhD Thesis, University of Groningen, The Netherlands [Google Scholar]
  49. Pranav, P. 2021a, IEEE Signal Proc. Mag., 38, 130 [NASA ADS] [CrossRef] [Google Scholar]
  50. Pranav, P. 2021b, ArXiv e-prints [arXiv:2109.08721] [Google Scholar]
  51. Pranav, P., Edelsbrunner, H., van de Weygaert, R., et al. 2017, MNRAS, 465, 4281 [Google Scholar]
  52. Pranav, P., van de Weygaert, R., Vegter, G., et al. 2019a, MNRAS, 485, 4167 [Google Scholar]
  53. Pranav, P., Adler, R. J., Buchert, T., et al. 2019b, A&A, 627, A163 [EDP Sciences] [Google Scholar]
  54. Ryden, B. 2003, Introduction to Cosmology (Addison-Wesley) [Google Scholar]
  55. Sahni, V., Sathyprakash, B., & Shandarin, S. 1998, ApJ, 507, L109 [NASA ADS] [CrossRef] [Google Scholar]
  56. Schmalzing, J., & Gorski, K. M. 1998, MNRAS, 297, 355 [CrossRef] [Google Scholar]
  57. Schwarz, D. J., Copi, C. J., Huterer, D., & Starkman, G. D. 2016, CQG, 33, 184001 [NASA ADS] [CrossRef] [Google Scholar]
  58. Shivashankar, N., Pranav, P., Natarajan, V., et al. 2016, IEEE Trans. Vis. Comput. Graph., 22, 1745 [CrossRef] [Google Scholar]
  59. Starobinsky, A. A. 1982, Phys. Lett. B, 117, 175 [Google Scholar]
  60. Telschow, F., Schwartzman, A., Cheng, D., & Pranav, P. 2019, ArXiv e-prints [arXiv:1908.02493] [Google Scholar]
  61. The CGAL Project 2021, CGAL User and Reference Manual, 5.3 edn., (CGAL Editorial Board) [Google Scholar]
  62. Tukey, J. W. 1975, in Proceedings of the 1974 international congress of mathematicians, 2, 523 [Google Scholar]
  63. van de Weygaert, R., Vegter, G., Edelsbrunner, H., et al. 2011, Trans. Comput. Sci., 14, 60 [NASA ADS] [CrossRef] [Google Scholar]
  64. Wilding, G., Nevenzeel, K., van de Weygaert, R., et al. 2021, MNRAS, 507, 2968 [NASA ADS] [CrossRef] [Google Scholar]
  65. Xu, X., Cisewski-Kehe, J., Green, S., & Nagai, D. 2019, Astron. Comput., 27, 34 [NASA ADS] [CrossRef] [Google Scholar]

Appendix A: Data and methods

In this section, we briefly describe the dataset we used in the experiments. We also present a short account of the computational pipeline, referring to Pranav et al. (2019a) for details.

A.1. Datasets

Over a period of time, the Planck team has invested significant effort in understanding and calibrating the source of noise as well as systematics that affect observations. This has led to the release of a series of datasets over a period of time, with successive data releases achieving better calibration and accuracy. In Pranav et al. (2019a), we performed our experiments on DR2, which is one of the intermediate data releases. In this paper, we concentrate on two data releases simultaneously, which we briefly describe below.

Planck 2020 Data Release 4 NPIPEdataset. The primary results are based on the observational maps and simulations based on the NPIPE (Planck Collaboration Int. LVII 2020) analysis pipeline (hereafter just NPIPE), which employs the SEVEM component separation technique. The NPIPE dataset is the final data release from the Planck team, and the pipeline is the most evolved and sophisticated of all data-generation pipelines. It combines some of the most powerful features of the separate LFI and HFI analysis piplelines, resulting in frequency and component maps that have lower levels of noise and systematics at essentially all angular scales (cf. Planck Collaboration Int. LVII 2020). The observational maps are accompanied by a suite of 600 simulations modeled as isotropic, homogeneous Gaussian random fields.

Planck 2018 Data Release 3 FFP10 dataset. For comparison, we also analyzed the temperature maps from Planck 2018 full-mission data release (DR3 hereafter), which is the third data release by the Planck team, and employs the SMICA component separation technique. The noise and systematics in this release are better constrained than in the previous data releases. The component-separated observational maps are accompanied by a suite of 300 simulations generated using the Full Focal Plane Plane pipeline, hereafter referred to as the FFP10 simulations (Planck Collaboration IV 2020), also based on the standard model.

A.2. Computational pipeline

The computational pipeline is composed of the steps that we briefly describe below. See Pranav et al. (2019a) for a substantially detailed description.

Preprocessing. We performed a range of preprocessing steps to the maps using the HealPix package (Górski et al. 2005), which is also the format of the initial data. First, the CMB and the noise maps were added pixel-wise for each realization of the simulations. Subsequently, the observational and noise-added simulation maps were degraded and smoothed at a range of scales for the values presented in Table A.1. The mask presented in Figure A.1, which is a binary map, was also degraded and smoothed at the same resolution as the CMB maps, and subsequently thresholded at a value of 0.9 to make it binary again. Table A.1 presents the percentage of sky covered by the unmasked region for the degraded resolution and associated smoothing scales analyzed in this paper. Figure A.2 presents a visualization of the degraded and smoothed maps after applying the mask. Next, we computed the mean and standard deviation from the unmaksed pixels for each realization and resolution and rescaled the maps pixel-wise by the standard deviation after subtracting the mean. As a final step, we assigned the value infinity to the masked pixels (numerically simulated by a very large number).

thumbnail Fig. A.1.

Visualization of the mask employed for the analyses in this paper. It is the common mask released in Data Release 3 (2018) and masks our galaxy as well as other bright foreground sources, including point sources.

thumbnail Fig. A.2.

Visualization of the degraded, smoothed, and masked observational maps employed in this paper. The maps are degraded at N = 2048, 1024, 512, 256, 128, 64, 32, and 16, and smoothed at FWHM = 5′,10′,20′,40′,80′,160′,320′,and 640′, respectively. The mask is degraded and smoothed at the same scales as the map, and subsequently thresholded at 0.9 for rebinarization.

Table A.1.

Percentage of sky area covered by the unmasked regions in the sky.

Triangulation. As a first step, we projected pixels of the maps projected onto 𝕊2, and we triangulated this set of points in ℝ3. Taking the convex hull of this triangulation produces a triangulation of the point-set on 𝕊2. This triangulation consists of V = 12N2 vertices, 3V − 6 edges, and 2V − 4 triangles, where N is the resolution parameter in HealPix format. This was the input to all downstream computations and represents the temperature field, f: 𝕊2 → ℝ, by storing the temperature value at each vertex; see Pranav et al. (2019a). We assumed a piece-wise linear interpolation along higher dimensional simplices.

Upper-star filtration. Given the triangulation K constructed in the previous step, we ordered its simplices such that σ precedes τ if (i) f(σ) > f(τ) or (ii) f(σ) = f(τ) and dim σ < dim τ, in which f(σ) is the minimum temperature value of the vertices of σ. Any ordering that satisfies (i) and (ii) is called an upper-star filter of K and f. The corresponding upper-star filtration consists of all prefixes of the filter, each representing an excursion set of f.

Persistence computation. We constructed a boundary matrix from the upper-star filtration of K. Writing σ1, σ2, …, σn for the sorted simplices of the upper-star filtration, the boundary matrix ∂[1..n, 1..n] is defined by ∂[i, j] = 1, if σi is a face of σj and dim σi = dim σj − 1, and ∂[i, j] = 0, otherwise. Computing the persistence birth death pair from this ordered boundary matrix involves reduction of the columns to the lowest-j form. We resorted to reduction from left to right, and a column of the matrix was reduced if it was zero or its lowest 1 had only zeros in the same row to its left. Each column with a unique lowest-j contributes to the birth death pair of the persistence diagram, where the index of the birth and death simplices are precisely the row and column indices of the lowest-j. Ranks of homology groups relative to the mask were inferred from the persistence diagrams by setting the vertices belonging to the mask at +∞; see Pranav et al. (2019a):

(A.1)

Tests for determining statistical significance. Our main aim was comparing the observational CMB maps with the null-hypothesis simulation maps, which assume isotropy, homogeneity, and Gaussianity. Let xi ∈ ℝm, i = 1, …, n, be a sample of i.i.d. m-dimensional vectors, drawn from a distribution G. Let y ∈ ℝm be another sample point, assumed to be drawn from a distribution F. We wish to test the (null) hypothesis that F = G, and give the test results in terms of p-values, which compute the probability that y is ‘consistent’ with this hypothesis. We employed two different statistical tests that we briefly describe below.

Mahalanobis distance. The first is the parametric Mahalanobis distance, or the familiar χ2 test (Mahalanobis 1936). If G is assumed to be Gaussian and n is large, then under the hypothesis that G = F, the squared Mahalanobis distance is approximately distributed as a χ2 distribution with m degrees of freedom. Thus the corresponding p-value is

(A.2)

Tukey depth. The second method is a nonparametric test based on the Tukey depth (Tukey 1975). It is a general metric for identifying outliers in a flexible manner and in a nonparametric setting, making no assumptions on the structure of F and G. Let z be any point in ℝm. Then the half-space depth ddep(z; x1, …, xn) of z within the sample of the xi is the smallest fraction of the n points x1, …, xn to either side of any hyperplane passing through z. Points that have the same depth constitute a nonparametric estimate of the isolevel contour of the distribution F. We first compute ddj = ddep(xj; x1, …, xn) for every point xj, j = 1, …, n, yielding an empirical distribution of depth. The p-value of y was computed as the proportion of points whose depth is lower than that of y,

(A.3)

Appendix B: Code description

All the analyses in this paper have been performed by deploying TopoS2. TopoS2 is a specialized software written for computing the topology of scalar functions on S2, especially adapted to the HealPix data format. It is written in the context of the topological analysis of Planck CMB maps, however useful for other scalar functions on 𝕊2. TopoS2 is tested on Debian Linux 9, with the following dependencies:

* [boost] — general purpose C++ libraries

* [CGAL]-4.7 — triangulation and filtration building

* [CMake] — build process

* [Dionysus] — data structure and topology computation

* [HealPiX] — data-processing operations on the sphere

* [PHAT] — data structure and topology computation

* [boost]: http://www.boost.org

* [CGAL]: http://www.cgal.org

* [CMake]: http://www.cmake.org

* [Dionysus]: https://www.mrzv.org/software/dionysus/

* [HealPix]: https://healpix.jpl.nasa.gov/

* [PHAT]: https://github.com/blazs/phat

B.1. Description

TopoS2 has three executables that work in tandem to produce the persistence diagrams of scalar fields on 𝕊2. For Planck CMB files supplied in HealPix format the code helper_func/fits_RDS_maskRDS_nrm.cpp reads the CMB and mask fits files, degrades and smooths them to specified values, and outputs an ascii masked file that serves as input to the next code in sequence filtration/streaming_filtration_builder_superlevelset.cpp. The output of this operation is a filtration of superlevel sets to the standard output stream. This stream is fed to the final part of the code persistent_homology/diagram_generator.cpp, which outputs the persistence diagram to the standard output stream. The code can be run in isolation for each data file, or batch-processed on large clusters. Higher resolutions are memory intensive:

  • 2048 ∼120 GB

  • 1024 ∼ 35 GB

  • 512 ∼ 8 GB

B.2. Building

We assume we are in TopoS2 root directory.

1.) For the fits operation handling, the code is compiled against healpix C++ library. This amounts to modifying the helper_func/compile file to point to the healpix directories, and executing it.

cd helper_func/
./compile
cd ..

2.) Building the triangulation and filtration invokes the CGAL library (The CGAL Project 2021). It is best compiled by invoking the cgal_create_cmake_script scriptfile to generate the cmake components. Dependency problems, if arising, are resolved by interactively editing the cmake configuration file, and setting the right paths. If the CGAL root directory is $CGAL_DIR, this amounts to executing the following commands in sequence:

cd filtration/
$CGAL_DIR/scripts/cgal_create_cmake_script
cmake . (or ccmake . for interactive)
make
cd ..

3.) Computing persistence diagrams is based on Dionysus (Morozov 2005) and PHAT (Bauer et al. 2014) library. The code is compiled by executing the following commands in sequence:

cd persistent_homology
mkdir build
cd build
cmake ..
make
cd ../..

B.3. Analyzing planck dataset

The core code and associated helper scripts have evolved over a period of time keeping in mind the central requirement to analyze the full Planck dataset as fast as possible. Given the memory requirements as stated above, the best strategy is to engage full nodes with multiple cores on large clusters for each run, especially at higher resolutions. The codes are written such that they default on memory overflow. As memory consumption on the higher resolutions is the bottleneck, the codes are written with a capability to run on a single processor, as well as multiple processors, and crash if memory overflows.

B.3.1. Generating persistence diagrams

Given all of this, and a successful compilation of the codes, and an assumption that we are in the directory containing the NPIPE observation file npipe6v20_sevem_cmb_005a_2048.fits, and simulations with CMB and Noise added (numbering from 200 to 799, SEVEM kind), we provide two scripts that run on SunGrid Engine cluster environment and produce a multi-scale analysis in an automated fashion. The helper script sge_job_script_npipe_DS_obs.py in the “example” directory serves as an example to process multiple resolutions for the observation files on a cluster running the SGE grid engine. The script file sge_job_script_npipe_DS.py processes the simulations. In the script, the variables “binary1”, “binary2” and “binary3” set the paths to fits processing script, filtration builder code, and persistent homology calculator code. At this moment we have the persistence diagrams from the observations and simulations, from which the Betti numbers may be calculated.

B.3.2. Computing Betti numbers

To compute the Betti numbers from the persistence diagrams, we need to insert a heading in each .dgm file which is the number of rows in it. The script heading_fixer.py does this for the simulations, similarly heading_fixer_obs.py for observations. Thereafter the Betti numbers are calculated by the C++ code betti_superlevel_relative (to be compiled like a normal c++ code). For batch processing two script files betti_calculator.py and betti_calculator_obs.py do this. The scripts in this paragraph need to be placed in the outdir directory of sge_job_script_npipe_DS.py

Appendix C: Validation of the statistical tests

In this section, we examine the validity of the regime of our statistical tests. To this end, we examined the empirical distribution of the χ2 and Tukey depth values from the simulations, and compared it with the observations.

Figure C.1 presents the case for the χ2 distribution. The top two rows present the values for b0, the middle two rows for b1, and the bottom two rows for the Euler characteristic. The distribution of the values from the 600 NPIPE simulations is denoted by the gray box histograms. The blue curve denotes the theoretical curve for the given degrees of freedom, and the red vertical line shows the value of the observational map with respect to the distribution for the simulations. In general, we find a good agreement between the theoretical curves and the histograms from the simulations, which lends additional support to the validity of the χ2 test in our regimes of testing.

thumbnail Fig. C.1.

Distribution of the Mahalanobis distance. The box histograms present the distribution of the Mahalanobis distance of the simulations with respect to the mean. The blue curve plots the theoretical χ2 distribution for the given degrees of freedom, and the red vertical line presents the Mahalanobis distance of the observation with respect to mean. The generally good fit of the theoretical curve with the histogram of the simulations for all resolutions lends credibility to the validity of the statistical tests in the chosen regimes.

Figure C.2 presents the case for the Tukey depth, with the values from the simulations presented in the gray box, and the observational value presented by the red vertical line. The top two rows present the values for b0, the middle two rows for b1, and the bottom two rows for the Euler characteristic. We note the well-behaved nature of the distribution from the histograms. To help interpret these diagrams, we note that depth values increase from left to right on the horizontal axis, and the p-values are computed as the fraction of points that have lower depth than the candidate. The extreme p-values of 0.0 in the main paper then simply means that the observational points are true outliers, and no simulated points are shallower than the observation. The p-values are consistent with the distribution characteristics, where in these cases, the observed red vertical line is to the left of all simulated box histograms.

thumbnail Fig. C.2.

Distribution of the Tukey depth. The box histograms present the distribution of the depth of the simulations, and the red vertical line presents the depth of the observation.

All Tables

Table 1.

Two-tailed p-values for relative homology for the datasets.

Table A.1.

Percentage of sky area covered by the unmasked regions in the sky.

All Figures

thumbnail Fig. 1.

Visualization of the temperature fluctuations in the CMB sky. The survey surface 𝕊2 is distorted at each point in the direction of the surface normal. The distortion is proportional to the fluctuation in direction and magnitude. The visualization is based on the observed CMB sky cleaned by the NPIPE pipeline and smoothed at 5 degrees.

In the text
thumbnail Fig. 2.

Cosmic microwave background sky thresholded at moderately positive (left) and negative (right) levels. For high thresholds, the excursion set is dominated by isolated components, while at low thresholds, it gives the appearance of a single connected surface indented by numerous holes. For sufficiently low thresholds, the holes fill up, and the excursion set covers the entire sphere, which is composed of a connected surface without a boundary that encloses a single void (cf. Fig. 1).

In the text
thumbnail Fig. 3.

Visualization of the temperature fluctuations in the CMB sky in the presence of masks (plotted in gray). It consists of an equatorial belt and numerous patches on the northern and southern cap, which correspond to our galaxy and other bright foreground objects. The visualization is based on the observed CMB sky cleaned by the NPIPE pipeline and smoothed at 5 degrees.

In the text
thumbnail Fig. 4.

Graphs of b0 for the NPIPE dataset for various degraded and smoothing scales. Panel a: observational curve in yellow, and the curves corresponding to the average of simulations are presented in gray. Error bands corresponding to (1σ : 3σ) are also drawn. Panel b: curve for the significance of the differences, where the observed curves are presented in red, and the values for the simulations are presented as dotted gray lines.

In the text
thumbnail Fig. 5.

Graphs of b1 for the NPIPE dataset for various resolutions and smoothing scales. In panel a, the observational curves are presented in yellow, and the curves corresponding to the average of simulations are presented in gray, while panel b presents the significance of the differences. Error bands corresponding (1σ : 4σ) are also presented in various colors.

In the text
thumbnail Fig. 6.

Graph of b1, selectively for N = 32 and N = 16, corresponding to FWHM = 320′ and FWHM = 640′, presenting an enlarged view of the concerned resolutions. The observational curve is presented in yellow, and the curves corresponding to the average of simulations are presented in gray. Error bands corresponding to (1σ : 4σ) are also drawn. The bottom two panels present the visualization of the scalar temperature field in order to facilitate an appreciation of the structure of the field.

In the text
thumbnail Fig. 7.

Histogram of b1 at ν = −2.5, where the ∼4σ deviation occurs. The minimum value attained across 600 simulations is 18, with a mean at ∼21 and a standard deviation of ∼1.7. The observed map exhibits b1 = 28 and is well outside the distribution.

In the text
thumbnail Fig. 8.

Visualization of the loops surrounding the low-density regions at a moderately negative threshold. The gaps in the manifold correspond to the mask that has been removed. We clearly note the equatorial belt corresponding to our galaxy, and more patches in the visible cap. Some loops live fully in the excursion region that is not influenced by the mask. We show them in red. We also depict some representative relative loops whose end points are masked. We draw closed loops for this category as well and show the portions in which they overlap the masked regions in white. The visualization is based on the observed CMB map from the NPIPE data set smoothed at 5 degrees.

In the text
thumbnail Fig. 9.

Graphs of ECrel for the NPIPE dataset. Panels a,b: similar information as Figs. 4 and 5. ECrel exhibits deviations commensurate with those in b0 and b1.

In the text
thumbnail Fig. 10.

Graph of b0 for the FFP10 dataset. In panel a, the observational curves are presented in yellow, and the curves corresponding to the average of simulations are presented in gray. Error bands corresponding to (1σ : 3σ) are also drawn. Panel b: significance of the differences. The dataset exhibits milder deviations than the NPIPE dataset in general. However, we note a 2.96σ deviation in the number of components between the simulations and observation at N = 128, FWHM = 80′.

In the text
thumbnail Fig. 11.

Graph of b1 for the FFP10 dataset. In panel a, the observational curves are presented in yellow, and the curves corresponding to the average of simulations are presented in gray. Error bands corresponding to (1σ : 4σ) are also drawn. Panel b: significance of the differences. The dataset exhibits milder deviations than the NPIPE dataset in general, but there are mildly significant deviations at 2.77σ, at scales and thresholds where the NPIPE dataset shows strong deviations as well.

In the text
thumbnail Fig. 12.

Graph of ECrel for the FFP10 dataset. In panel a, the observational curve is presented in yellow, and the curves corresponding to the average of simulations are presented in gray. Error bands corresponding to (1σ : 3σ) are also drawn. Panel b: significance of the differences. The Euler characteristic exhibits deviations commensurate with b0 and b1.

In the text
thumbnail Fig. A.1.

Visualization of the mask employed for the analyses in this paper. It is the common mask released in Data Release 3 (2018) and masks our galaxy as well as other bright foreground sources, including point sources.

In the text
thumbnail Fig. A.2.

Visualization of the degraded, smoothed, and masked observational maps employed in this paper. The maps are degraded at N = 2048, 1024, 512, 256, 128, 64, 32, and 16, and smoothed at FWHM = 5′,10′,20′,40′,80′,160′,320′,and 640′, respectively. The mask is degraded and smoothed at the same scales as the map, and subsequently thresholded at 0.9 for rebinarization.

In the text
thumbnail Fig. C.1.

Distribution of the Mahalanobis distance. The box histograms present the distribution of the Mahalanobis distance of the simulations with respect to the mean. The blue curve plots the theoretical χ2 distribution for the given degrees of freedom, and the red vertical line presents the Mahalanobis distance of the observation with respect to mean. The generally good fit of the theoretical curve with the histogram of the simulations for all resolutions lends credibility to the validity of the statistical tests in the chosen regimes.

In the text
thumbnail Fig. C.2.

Distribution of the Tukey depth. The box histograms present the distribution of the depth of the simulations, and the red vertical line presents the depth of the observation.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.