Issue 
A&A
Volume 659, March 2022



Article Number  A115  
Number of page(s)  21  
Section  Cosmology (including clusters of galaxies)  
DOI  https://doi.org/10.1051/00046361/202140291  
Published online  16 March 2022 
Anomalies in the topology of the temperature fluctuations in the cosmic microwave background: An analysis of the NPIPE and FFP10 data releases
Univ. Lyon, ENS de Lyon, Univ. Lyon1, CNRS, Centre de Recherche Astrophysique de Lyon UMR5574, 69007 Lyon, France
email: pratyuze@gmail.com
Received:
6
January
2021
Accepted:
16
November
2021
We present a topological analysis of the temperature fluctuation maps from the Planck 2020 Data Release 4 NPIPE dataset and the Planck 2018 Data Release 3 FFP10 dataset. We performed a multiscale analysis in terms of the homology characteristics of the maps, invoking relative homology to account for the analysis in the presence of masks. We performed our analysis for a range of smoothing scales spanning sub and superhorizon scales corresponding to a full width at half maximum (FWHM) of 5′,10′,20′,40′,80′,160′,320′, and 640′, and employed simulations based on the standard model for comparison, which assumes the initial fluctuation field to be an isotropic and homogeneous Gaussian random field. Examining the behavior of topological components, represented by the 0D homology group, we find the observations to be approximately 2σ or less deviant from the simulations for all resolutions and scales for the NPIPE dataset. For the FFP10 dataset, we detect a 2.96σ deviation between the observations and simulations at N = 128, FWHM = 80′. For the topological loops, represented by the first homology group, the simulations and observations are consistent within 2σ for most resolutions and scales for both the datasets. However, for the NPIPE dataset, we observe a high deviation between the observation and simulations in the number of loops at FWHM = 320′, but at a low dimensionless threshold ν = −2.5. Under a Gaussian assumption, this would amount to a deviation of ∼4σ. However, the distribution in this bin is manifestly nonGaussian and does not obey Poisson statistics either. In the absence of a true theoretical understanding, we simply note that the significance is higher than what may be resolved by 600 simulations, yielding an empirical pvalue of at most 0.0016. Specifically in this case, our tests indicate that the numbers arise from a statistically stable regime, despite being based on small numbers. For the FFP10 dataset, the differences are not as strong as for the NPIPE dataset, indicating a 2.77σ deviation at this resolution and threshold. The Euler characteristic, which is the alternating sum of the ranks of relative homology groups, reflects the deviations in the components and loops. To assess the significance of combined levels for a given scale, we employed the empirical and theoretical versions of the χ^{2} test as well as the nonparametric Tukey depth test. Although all statistics exhibit a stable distribution, we favor the empirical version of the χ^{2} test in the final interpretation, as it indicates the most conservative differences. For the NPIPE dataset, we find that the components and loops differ at more than 95%, but agree within the 99% confidence level with respect to the base model at N = 32, FWHM = 320′. The Euler characteristic at this resolution displays a per mil deviation. In contrast, the FFP10 dataset shows that the observations are consistent with the base model within the 95% confidence level, at this and smaller scales. This is consistent with the observations of the Planck analysis pipeline via Minkowski functionals. For the largest smoothing scale, N = 16, FWHM = 640′, both datasets exhibit an anomalous behavior of the loops, where FFP10 data exhibit a deviation that is larger by an order of magnitude than that of the NPIPE dataset. In contrast, the values for the topological components and the Euler characteristic agree between observations and model to within a confidence level of 99%. However, for the largest scales, the statistics are based on low numbers and may have to be regarded with caution. Even though both datasets exhibit mild to significant discrepancies, they also exhibit contrasting behaviors at various instances. Therefore, we do not find it feasible to convincingly accept or reject the null hypothesis. Disregarding the largescale anomalies that persist at similar scales in WMAP and Planck, observations of the cosmic microwave background are largely consistent with the standard cosmological model within 2σ.
Key words: methods: data analysis / methods: numerical / methods: statistical / cosmic background radiation / cosmology: observations / early Universe
© P. Pranav 2022
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction
At the epoch of recombination, matter and radiation separate, allowing radiation to stream freely in the Universe. This freestreaming radiation permeating the Universe, which we observe as the cosmic microwave background (CMB) radiation, encodes a treasure trove of information about the initial conditions in the Universe (Ryden 2003; Jones 2017). Although it has a remarkably consistent average temperature, the CMB still exhibits tiny deviations of about 10^{−5} from the background average. The temperature fluctuations in the CMB trace the fluctuations in the underlying matter distribution in the early Universe that are linked to the spontaneous quantum fluctuations generated in an otherwise homogeneous medium (Harrison 1970; Peebles & Yu 1970). Studying the properties of the temperature fluctuations in the CMB is therefore essential for understanding the properties of the primordial matter field.
The lambda cold dark matter (LCDM) paradigm is the standard paradigm of cosmology. Together with the inflationary models in their simplest form (Starobinsky 1982; Guth & Pi 1982), the standard model of cosmology predicts the nature of the primordial stochastic matter distribution field to be that of an isotropic and homogeneous Gaussian random field (Harrison 1970; Guth 1981). This prediction finds allies theoretically in the central limit theorem, and observationally in the various measurements of the CMB temperature anisotropy field via ground and spacebased probes such as the BOOMERanG balloonbased experiment (Masi 2002) and the Wilkinson Microwave Anisotropy Probe (WMAP) satellite (Bennett et al. 2013). The latest endeavor of measuring the CMB temperature anisotropies was the launch of the Planck satellite, which boasts of the highest resolution in measurements to date. The resolution is at scales of a few arcminutes (Planck Collaboration I 2020). Despite the general consensus that the CMB exhibits the characteristics of an isotropic and homogeneous Gaussian random field, a growing body of evidence shows anomalies in the observed CMB field with respect to the base model. They include in particular the observed hemispherical asymmetry in the CMB power spectrum (Eriksen et al. 2004a) as well as the alignment of low multipoles (Copi et al. 2015); see Schwarz et al. (2016) for a review. These observed anomalies raise doubts about the assumption of statistical isotropy and homogeneity, respectively.
Testing the assumption of Gaussianity requires tools that encode information about higher orders. Traditional endeavor in this direction has focused on higherorder correlation functions (Durrer et al. 1996), which are generally extremely resourceintensive computationally (Planck Collaboration XVII 2016). Recently, attention has turned toward developing alternative tools beyond the correlation functions and multispectra, which may potentially encode information of all orders. The principal tools in this regard have arisen from integral geometry and involve computing the Minkowski functionals or the LifshitzKilling curvatures (Adler 1981; Mecke et al. 1994; Kerscher et al. 1996; Schmalzing & Gorski 1998; Sahni et al. 1998; Codis et al. 2013; Ducout et al. 2013; Matsubara 2010; Chingangbam et al. 2017; Pranav et al. 2019b; Appleby et al. 2021). The jth Minkowski functional and (D − j)th LifshitzKilling curvature of a Ddimensional manifold 𝕄 are related by , where j = 0, …, D, and ω_{j} is the volume of the jdimensional unit ball. There are d such quantifiers for a Ddimensional set, where d = 0, …, D. All but one are purely geometrical quantities that are related to the ddimensional volume of the manifold. The exception is the 0th LifshitzKilling curvature, or equivalently, the Dth Minkowski functional, which is related to a purely topological quantity, the Euler characteristic (Euler 1758; Adler 1981; Pranav et al. 2019b; Telschow et al. 2019), via Gauss’s Theorema Egrerium (Gauss 1900; Adler 1981; Pranav et al. 2019b). The Minkowski functional computations of the CMB have consistently shown the observations to be congruent with the standard model (Planck Collaboration XVI 2016).
More recently, developments in computational topology have paved the way for extracting topological information from datasets at the level of homology (Munkres 1984; Edelsbrunner & Harer 2010; van de Weygaert et al. 2011; Pranav 2015, 2021a; Pranav et al. 2017; Moraleda et al. 2019) and its hierarchical extension, persistent homology (Edelsbrunner & Harer 2010; Pranav et al. 2017; Shivashankar et al. 2016; Adler et al. 2017; Pranav 2021b; Heydenreich et al. 2021). Topological data analysis (TDA) involving homology and persistent homology has recently started finding application in astrophysical disciplines, for example, in the context of structure identification (Shivashankar et al. 2016; Xu 2019) and quantification of largescale structures (Kono et al. 2020; Wilding et al. 2021), including detection and quantification of nonGaussianities (Feldbrugge et al. 2019; Biagetti et al. 2021). Homology describes the topology of a space by identifying the holes and the topological cycles that bound them. A ddimensional space may contain topological cycles of 0 up to d dimensions. The cycles and holes are associated with the homology groups of the space. The pth Betti number, β_{p}, is the rank of the pth homology group, ℍ_{p, p = 0…d}. While itself a purely topological quantity, the Euler characteristic is also the alternating sum of the Betti numbers of all ambient dimensions of a manifold, as denoted by the EulerPoincaré formula (Adler 1981; Pranav et al. 2017, 2019b). The Euler characteristic has a long history in the analysis of cosmological fields (Gott et al. 1986; Pogosyan et al. 2009; Park et al. 2013; Appleby et al. 2020).
Building on and refining existing tools from computational topology in the context of analyzing the CMB field, this paper presents the homology characteristics of the temperature fluctuation maps of the cosmic microwave background obtained by the Planck satellite (Planck Collaboration I 2020). We perform our experiments on the fourth and final data release, Planck 2020 Data Release 4 (DR4), which is based on the NPIPE dataprocessing pipeline (Planck Collaboration Int. LVII 2020). The NPIPE dataset represents a natural evolution of the Planck dataprocessing pipeline, integrating the best practices from the LFI and HFI pipelines separately. The result is an overall amplification of signal and reduction in the associated systematic, noise, and residuals at almost all angular scales (Planck Collaboration Int. LVII 2020). For comparison, we also present results for the Planck 2018 (DR3; Planck Collaboration I 2020), which is based on the Full Focal Plane (FFP) data processing pipeline (Planck Collaboration XII 2016), resulting in the FFP10 simulations (Planck Collaboration IV 2020). The paper follows the spirit of Pranav et al. (2019a) in methods and analysis, where we present results for the intermediate Planck 2015 (DR2; Planck Collaboration XVI 2016); also see Adler et al. (2017). The novel aspect of the methods presented here and in Pranav et al. (2019a) is an analysis pipeline that takes regions with unreliable data on 𝕊^{2} into account. In the case of CMB, this is reflected in the obfuscation effects of the measurements that are due to foreground objects such as our own galaxy, as well as other extra and intragalactic foreground sources. We masked these regions and computed the homology of the excursion sets relative to the mask.
We present a brief description of the topological background in Sect. 2, followed by the results in Sect. 3. We discuss the ramifications of the results and conclude the main body of the paper in Sect. 4. The appendices present a brief description of the datasets and the computational pipeline as well as a validation for the statistical tests.
2. Topological background
Commensurate with our intention of analyzing the topology of the CMB temperature fluctuations, we restricted ourselves to topological definitions on 𝕊^{2} (Pranav et al. 2019a) and invoked relative homology to account for analysis in the presence of masked regions. The standard reference for this section is Edelsbrunner & Harer (2010); also see Pranav et al. (2019a) for discussion in the context of CMB, as well as Heydenreich et al. (2021) in the context of cosmic shear fields.
2.1. Homology characteristics of excursion sets of 𝕊^{2}
Denoting the CMB temperature fluctuations on 𝕊^{2} as f: 𝕊^{2} → ℝ, we define the excursion set^{1} at a temperature ν as the subset of 𝕊^{2} where the temperature is higher than or equal to ν,
In the cosmological setting, the usual practice is to examine the dimensionless threshold, which is the meansubtracted and variancescaled field derived from the original field, in which case, ν = (f − μ_{f})/σ_{f}, and μ_{f} and σ_{f} are the mean and the standard deviation of the field f. If 𝔼(ν) does not cover the entire 𝕊^{2}, it may be composed of isolated components and holes. Figure 1 presents a visualisation of the temperature fluctuations in the CMB sky smoothed at 5 degrees, while Fig. 2 presents excursion sets corresponding to two different thresholds. For high thresholds, presented in the left panel, the excursion set is dominated by components, while for low thresholds, presented in the right panel, the excursion set is dominated by a few large connected objects indented with holes, which are bounded by loops. The Betti numbers β_{0} and β_{1} count the number of independent components and loops of the excursion set, respectively. In general, for a ddimensional topological space, β_{p} is the rank of the pth homology group, H_{p}; p = 0, …, d, and counts the number of independent pdimensional cycles (Munkres 1984; Edelsbrunner & Harer 2010; Pranav et al. 2017). If 𝔼(ν) does not cover the entire 𝕊^{2}, the number of independent loops is one less than the total number of loops. If 𝔼(ν) covers the entire 𝕊^{2}, there are no loops, and β_{2} = 1, because of the void that is enclosed by the boundaryless surface of the sphere. A related quantity that has a long history of usage in cosmological analyses is the Euler characteristic, or alternatively, the genus (Gott et al. 1986; Park et al. 2013), which is the alternating sum of the Betti numbers of the excursion set,
Fig. 1.
Visualization of the temperature fluctuations in the CMB sky. The survey surface 𝕊^{2} is distorted at each point in the direction of the surface normal. The distortion is proportional to the fluctuation in direction and magnitude. The visualization is based on the observed CMB sky cleaned by the NPIPE pipeline and smoothed at 5 degrees. 
Fig. 2.
Cosmic microwave background sky thresholded at moderately positive (left) and negative (right) levels. For high thresholds, the excursion set is dominated by isolated components, while at low thresholds, it gives the appearance of a single connected surface indented by numerous holes. For sufficiently low thresholds, the holes fill up, and the excursion set covers the entire sphere, which is composed of a connected surface without a boundary that encloses a single void (cf. Fig. 1). 
The Euler characteristic also has a geometric interpretation as one of the LifshitzKilling curvatures of the manifold (Adler 1981; Pranav et al. 2019a).
2.2. Masks and relative homology
The measurement of the CMB signal is unreliable in certain parts of the sky due to interference from bright foreground objects. These include extended objects such as our galaxy, as well as bright point sources. We masked these regions and computed the homology characteristics of the excursion set relative to the mask. Figure 3 presents a visualization of the masked CMB sky. Letting M ⊆ 𝕊^{2} be the mask and 𝔼(ν) the excursion set, we considered the relative homology of the pair of closed spaces, (E, M), where E = 𝔼(ν) and M = M ∩ 𝔼(ν). We note that M is contained in E and is an open set by definition in this context. We denote the rank of the relative homology groups of the pair (E, M) by b_{p} = rank H_{p}(E, M); p = 0, 1, 2. The Betti numbers computed considering the pair (E, M) are different from the Betti numbers of the excursion set without a mask. For a more detailed discussion about relative homology in the context of masked CMB sky, see Pranav et al. (2019a). The relative Euler characteristic, as in the case of absolute homology, is the alternating sum of the rank of relative homology groups,
Fig. 3.
Visualization of the temperature fluctuations in the CMB sky in the presence of masks (plotted in gray). It consists of an equatorial belt and numerous patches on the northern and southern cap, which correspond to our galaxy and other bright foreground objects. The visualization is based on the observed CMB sky cleaned by the NPIPE pipeline and smoothed at 5 degrees. 
3. Results
In this section, we present our results in terms of the ranks of homology groups, b_{p}, p = 0, 1, as well as the Euler characteristic, EC_{rel}, relative to the mask. Section 3.1 presents results from the DR4 NPIPE dataset based on 600 simulations, obtained using the SEVEM component separation pipeline. Similar results for the DR3 FFP10 dataset based on 300 simulations, obtained using the SMICA component separation pipeline, are presented in Sect. 3.2^{2}. The latter dataset facilitates a direct comparison with the topogeometrical studies of the CMB performed by the Planck team in Planck Collaboration VII (2020). For both datasets, we begin by examining the graphs of b_{0}, b_{1}, and of the (relative) Euler characteristic, EC_{rel}. This is followed by statistical tests that estimate the significance of results, taking into account all of the levels together for a given resolution and scale. We employed the standard χ^{2}test in the empirical and theoretical settings, as well as the modelindependent Tukey depth test for this purpose. We chose apriori levels ℓ_{k}, where ℓ_{k} = k/2, setting k_{b0}[0 : 6],k_{b1} = [ − 6 : 0],k_{ECrel} = [ − 6 : 6]. We did this to restrict ourselves to analyzing b_{0} for positive thresholds, b_{1} for negative thresholds, and EC_{rel} across the full threshold range. The choice of regions is determined by the fact that b_{0}(ν) tends to be small and carries little information for ν < 0, b_{1}(ν) tends to be small for ν > 0, while the Euler characteristic is informative over the full range of levels. These levels are consistent with the levels investigated in Planck Collaboration XVI (2016) and Planck Collaboration VII (2020).
3.1. Planck 2020 Data Release 4: NPIPE dataset
In this section, we analyze the graphs of the topological descriptors for the NPIPE dataset. Treating the topological descriptors as discrete random variables, we compute the mean, μ_{sim}, and the standard deviation, σ_{sim}, from the simulations in the usual sense for each of the levels. We use them to assess the modelindependent significance of the difference between the simulations and observations.
3.1.1. Graph of b_{0}
Figure 4 presents the graphs of b_{0} for the various degraded resolutions and the associated smoothing scales. The mean and the standard deviation are computed from the simulations. In general, we find that the significance across resolutions and levels is approximately 2σ or lower.
Fig. 4.
Graphs of b_{0} for the NPIPE dataset for various degraded and smoothing scales. Panel a: observational curve in yellow, and the curves corresponding to the average of simulations are presented in gray. Error bands corresponding to (1σ : 3σ) are also drawn. Panel b: curve for the significance of the differences, where the observed curves are presented in red, and the values for the simulations are presented as dotted gray lines. 
3.1.2. Graph of b_{1}
Figure 5 presents the graphs of b_{1} for the various resolutions and corresponding scales. For resolutions between N = 2048, …, 64, corresponding to FWHM = 5′,…,160′, we observe the significance to be approximately 2σ or lower. However, for the next higher smoothing scales corresponding to FWHM = 320′ (approximately 5 degrees) and FWHM = 640′ (approximately 10 degrees), we note interesting deviations that we examine next. These cases are also presented in Fig. 6 in an enlarged view for clarity.
Fig. 5.
Graphs of b_{1} for the NPIPE dataset for various resolutions and smoothing scales. In panel a, the observational curves are presented in yellow, and the curves corresponding to the average of simulations are presented in gray, while panel b presents the significance of the differences. Error bands corresponding (1σ : 4σ) are also presented in various colors. 
Concentrating on the top left panel of Fig. 6, at a Gaussian smoothing scale of FWHM = 320′, approximately 5 degrees, and at a moderately low dimensionless density threshold ν = −2.5, we find a 3.91σ deviation between the simulations and the observation. We examine the behavior of b_{1} at this level further in Fig. 7, where we present the histogram of the distribution of b_{1} from the simulations in gray boxes. The observed value is indicated by a red vertical line. Evidently, the distribution is nonsymmetric and hence nonGaussian. Our tests also indicate that the Poisson distribution is a poor fit to the data as well. We detect 28 loops in the observational curves, compared to the average of ∼21 loops in the simulations, with a standard deviation of ∼1.78. Within the Gaussian context, this would yield the significance of the difference at approximately 4σ. However, since the distribution in this bin is distinctly nonGaussian and does not obey Poisson statistics, ascribing a σ significance in the usual sense is not viable in this case. As a result, we simply note that the significance is higher than what may be resolvable by 600 simulations, yielding an empirical pvalue of at most 0.0016. While the generally low number of loops in both simulations and observations, owing to the large scale of probing, push us to the regime of small numbers, the behavior of the statistics indicates a stable regime.
Fig. 6.
Graph of b_{1}, selectively for N = 32 and N = 16, corresponding to FWHM = 320′ and FWHM = 640′, presenting an enlarged view of the concerned resolutions. The observational curve is presented in yellow, and the curves corresponding to the average of simulations are presented in gray. Error bands corresponding to (1σ : 4σ) are also drawn. The bottom two panels present the visualization of the scalar temperature field in order to facilitate an appreciation of the structure of the field. 
Fig. 7.
Histogram of b_{1} at ν = −2.5, where the ∼4σ deviation occurs. The minimum value attained across 600 simulations is 18, with a mean at ∼21 and a standard deviation of ∼1.7. The observed map exhibits b_{1} = 28 and is well outside the distribution. 
Further evidence in support of this argument is presented in Appendix C, where we examine the distribution of χ^{2} and Tukey depth from the simulations, and compare them with the observations. In general, for the χ^{2}test, we find good agreement between the histogram of simulations and the theoretical curve. Therefore, the highly significant deviation merits consideration. The deviant behavior of loops in the earlier analysis of Planck 2015 Data release 2 (DR2) presented in Pranav et al. (2019a) at similar scales is also noteworthy in this context, as is the deviant Euler characteristic from WMAP observational data reported by Eriksen et al. (2004b) and Park (2004). Figure 8 presents a visualization of some of these loops at moderately negative thresholds for observational maps smoothed at FWHM = 320′.
Fig. 8.
Visualization of the loops surrounding the lowdensity regions at a moderately negative threshold. The gaps in the manifold correspond to the mask that has been removed. We clearly note the equatorial belt corresponding to our galaxy, and more patches in the visible cap. Some loops live fully in the excursion region that is not influenced by the mask. We show them in red. We also depict some representative relative loops whose end points are masked. We draw closed loops for this category as well and show the portions in which they overlap the masked regions in white. The visualization is based on the observed CMB map from the NPIPE data set smoothed at 5 degrees. 
At the next higher smoothing scale of FWHM = 640′, approximately 10 degrees, presented in the right panel of Fig. 6, the shapes of the observational curve and the mean of the simulated curves for the negative thresholds, where topological loops are the dominant entities, are widely different. As differences in shapes may be a stronger indicator of inherent differences in the models than simply differences in the amplitudes, our assessment is that this case merits scrutiny as well. However, since the numbers involved are small, about 5 and fewer in some bins^{3}, we consider the statistics emerging from this resolution as not stable and reject them as possible statistical fluke.
3.1.3. Graph of EC_{rel}
Figure 9 presents the graphs of EC_{rel} for the various resolutions and corresponding scales. The relative Euler characteristic also deviates for the resolutions where b_{0}/b_{1}, or both, deviate. However, the significance of the difference shows milder characteristics owing to cancellation effects (Pranav et al. 2019a). This is because the Euler characteristic is not strictly an independent quantity because it is an alternating sum of the ranks of the relative homology groups. It therefore merely reflects the deviations in the contributing Betti numbers.
Fig. 9.
Graphs of EC_{rel} for the NPIPE dataset. Panels a,b: similar information as Figs. 4 and 5. EC_{rel} exhibits deviations commensurate with those in b_{0} and b_{1}. 
3.2. Planck 2018 Data Release 3: FFP10 dataset
In this section, we analyze the topological characteristics of the FFP10 dataset. We also compare our results for the Euler characteristic with those presented in Planck Collaboration VII (2020).
3.2.1. Graph of b_{0}
Figure 10 presents the graphs for the topological components in panels a and b, similar to Fig. 4. In general, the observational values are within the 2σ band when compared with the simulations for almost all resolutions and levels. The exception is the number of components at N = 128, FWHM = 80′, where the observations deviate from the simulations at 2.96σ at the dimensionless density threshold ν = 0.5. The smoothing scale, approximately a degree, and the moderate threshold at which the deviation occurs ensure that the statistics are away from the lownumber regime.
Fig. 10.
Graph of b_{0} for the FFP10 dataset. In panel a, the observational curves are presented in yellow, and the curves corresponding to the average of simulations are presented in gray. Error bands corresponding to (1σ : 3σ) are also drawn. Panel b: significance of the differences. The dataset exhibits milder deviations than the NPIPE dataset in general. However, we note a 2.96σ deviation in the number of components between the simulations and observation at N = 128, FWHM = 80′. 
3.2.2. Graph of b_{1}
Figure 11 presents the graphs for the topological loops. The deviations between the observations and simulations are within approximately 2σ for most of the smoothing scales and thresholds. However, for the dimensionless threshold, ν = −2.5, we note a more than 2σ deviation for N = 64, FWHM = 160′ onward toward higher smoothing scales. For N = 32, FWHM = 320′, the deviation is 2.77σ, and similar for N = 16, FWHM = 640′. These deviations, while not as significant as exhibited in the NPIPE dataset, occur at similar scales and thresholds. In addition, we also note a more than 4σ deviation for N = 16, FWHM = 640′ at ν = −3. However, we reject this as a possible statistical fluke because of the extremely low numbers involved, which are about 5 or fewer.
Fig. 11.
Graph of b_{1} for the FFP10 dataset. In panel a, the observational curves are presented in yellow, and the curves corresponding to the average of simulations are presented in gray. Error bands corresponding to (1σ : 4σ) are also drawn. Panel b: significance of the differences. The dataset exhibits milder deviations than the NPIPE dataset in general, but there are mildly significant deviations at 2.77σ, at scales and thresholds where the NPIPE dataset shows strong deviations as well. 
3.2.3. Graph of EC_{rel}
Figure 12 presents the graphs of EC_{rel} for the various resolutions and corresponding scales. The Euler characteristic shows commensurate deviations with respect to b_{0}(b_{1}) because of their contributions in the alternating sum.
Fig. 12.
Graph of EC_{rel} for the FFP10 dataset. In panel a, the observational curve is presented in yellow, and the curves corresponding to the average of simulations are presented in gray. Error bands corresponding to (1σ : 3σ) are also drawn. Panel b: significance of the differences. The Euler characteristic exhibits deviations commensurate with b_{0} and b_{1}. 
3.3. Statistical significance from combined thresholds
We considered the two methods detailed in Appendix A.2 to compute the statistics of the combined thresholds and present pvalues of the observed maps for them. The first method is the Mahalanobis distance (Mahalanobis 1936), also known as the χ^{2}test, which works in modeldependent and in empirical settings. The second method is the nonparametric Tukey depth test. In Appendix C we examine the distribution characteristics of these statistics from the simulations. The simulation and theoretical curve agree well for the χ^{2} statistic. Despite this, we computed the pvalues both theoretically and from the empirical distribution, and found that the empirical distributions yield a milder significance. We also find that the depth histograms and distributions are well behaved and represent a meaningful quantification. Considering all the three topological quantities b_{0}, b_{1} and EC_{rel}, we computed the statistics for all resolutions separately to ascribe a scale dependence to the signals.
3.3.1. NPIPE dataset
All entries before the last in Table 1, panel a, present pvalues for the variables, computed from maps degraded at specific resolutions, with additional Gaussian smoothing applied. The χ^{2} results are presented for the theoretical and empirical distributions in the left and middle blocks, respectively. The results for the Tukey depth are presented in the rightmost block. When we consider all the three tests, the pvalues computed from the empirical distribution using the χ^{2} statistic yield the most conservative estimate of the significance, and we favor it in order to be conservative in our interpretation. b_{1} shows a significant difference between the observational maps and the simulations for N = 32 and N = 16. Additionally, b_{0} also shows a significant difference at N = 32. However, we note in this context that the distributions in this case are manifestly nonGaussian, and their form is poorly understood theoretically. As a result, the χ^{2} values may be regarded with caution. This also presents the case for admitting the nonparametric Tukey depth test, which detects an outlier event in this case, yielding a pvalue of 0.0. The Euler characteristic shows highly significant difference at N = 32, which is an order of magnitude larger than either b_{0} or b_{1}. This is due to the significant difference shown by the contributing b_{0} and b_{1} at this resolution, in combination with the fact that these deviations occur near the tail of the distribution. In general, because the Euler characteristic is an alternating sum of the Betti numbers, its behavior is influenced by both the Betti numbers. Cancellation effects are dominant if the deviations in the contributing Betti numbers occur in the zone of overlap, which is more toward median thresholds. If the deviations occur toward the tail, where either one of the Betti numbers is dominant, the signals of the contributing Betti numbers are amplified in the Euler characteristic signal. As an example, for the next lower resolution, N = 16, the Euler characteristic shows no significant difference even though b_{1} exhibits significant difference. This is because of the highly nonsignificant behavior of b_{0}, whose contribution cancels the contribution of b_{1} toward the Euler characteristic. In many instances, the Tukey depth test yields pvalues of 0.0 for all the descriptors, indicating that the observation is a true outlier compared to the simulations. We note the trend that these instances occur when the pvalues computed from the Mahalanobis distance also exhibit generally low values. We help interpret the Tukey depth values in Appendix C, where we examine the trends in their distribution.
Twotailed pvalues for relative homology for the datasets.
3.3.2. FFP10 dataset
Results for the FFP10 dataset are presented in panel b of Table 1. We concentrate our analysis on the middle block, which presents the pvalues from the χ^{2} statistic using the empirical distribution. For N = 32, FWHM = 320′, we note that all the topological descriptors show a nonsignificant deviation between the simulations and the observation, which is in contrast with the behavior at the same resolution in the NPIPE dataset, where the values are an order of magnitude lower. For the next larger scale of probing, N = 16, FWHM = 640′, we note a per mil significance of the difference between simulations and observations for the number of loops, which is an order of magnitude smaller than in the NPIPE dataset. Additionally, we also note the low pvalue for b_{0} at N = 128, FWHM = 80′, which is the scale at which an approximately 3σ deviation occurs for the number of components between the simulations and the observation. When examining the Euler characteristic at this scale, we note that the values are mildly significant, in contrast with the NPIPE dataset, which exhibits nonsignificant behavior. The Tukey depth test shows the observations to be true outliers in instances at different resolutions for all the three variables, yielding pvalues of 0.0. As in the case with the NPIPE dataset, these instances occur when the χ^{2} test also exhibits small pvalues.
3.4. Comparison with earlier topogeometrical results in the literature
Topogeometrical studies aimed at testing isotropy, homogeneity, and Gaussianity of the cosmic fields, such as the CMB and the 3D density distribution, are standard in the cosmological literature. The principle tools employed for this purpose are the geometrical Minkowski functionals and the topological Euler characteristic. The bridge between topology and geometry is provided by the Theorema Egrerium of Gauss, which establishes the equivalence between the geometric 0th LipschitzKilling curvature (equivalently the Dth Minkowski functional), and the purely topological notion of Euler characteristic of a Ddimensional space. Our results, on the other hand, involve novel and purely topological notions arising from homology theory quantified by the Betti numbers. The Euler characteristic has connections to these topological measures due to the EulerPoincaré formula, which expresses the Euler characteristic as the alternating sum of Betti numbers.
The pioneering works that examined the Minkowski functionals and Euler characteristic of the CMB are by Eriksen et al. (2004b) and Park (2004), performed on the WMAP data (Bennett et al. 2013). While Eriksen et al. (2004b) examined the whole set of Minkowski functionals as well as the Morsetheoretical concept of a skeleton length, Park (2004) restricted themselves to genus measurement, which is linearly related to the Euler characteristic. The purely geometric Minkowski functionals, namely the area functional and the contour length functional, show consistency with the standard model. In contrast, both works reported anomalous measurements of the Euler characteristic or genus. Eriksen et al. (2004b) reported that the Euler characteristic at negative density thresholds is anomalous with the base model simulations at more than 3σ for the northern hemisphere at smoothing scales of FWHM = 3.40°. The corresponding χ^{2} value reported for this scale is at the 95% confidence level, which translates into a pvalue of 0.05. Park (2004) reported an anomalous genus at more than 2.5σ. The anomalous behavior of genus or the Euler characteristic, specifically at negative levels, is linked to and generated by the anomalous behavior of the first homology group, represented by topological loops, and establishes a connection between our results and previous results, showing a consistency across datasets.
The analysis of the Minkowski functionals and Euler characteristic on the Planck datasets was performed by the Planck collaboration itself and was reported in Planck Collaboration XVI (2016) for DR2 (FFP8) and in Planck Collaboration VII (2020) for DR3 (FFP10). Planck Collaboration VII (2020) examined the Minkowski functionals, and the graphs for the Euler characteristic were presented for N = 1024, 256, 32. The general indication is that the observational values are within the 2.5σ band computed from the simulations. Our results are largely consistent with this observation from the Planck analysis for the given resolutions, which is also exhibited by the nonanomalous pvalues obtained from the empirical χ^{2} test. However, we also note that our results show stronger differences than the Planck results. In this context, it is important to note that in the Planck analysis pipeline, it has been the practice to combine all the Minkowski functionals to perform the statistical tests and subsequently present the combined pvalues. Beyond the fact that this combination mixes signals from topogeometrical descriptors that are a priori known to represent independent properties, it also prevents a direct comparison with our results of Euler characteristic computations.
In this context, another subtle but important point must be considered. In a detailed treatment, Pranav et al. (2019b) showed that the standard equations for Minkowski functionals used in cosmology are volumenormalized representations and represent exact computations only under specific conditions. The simplest case is a compact manifold without a boundary, for example, the complete 2sphere in 2D, or a periodic 3D Euclidean grid. When the manifold has boundaries, for example, the masked CMB sphere, additional boundary terms are involved in the exact computation of the Minkowski functionals. For purely geometric quantities, due to their localized nature, these boundary terms may be ignored when the manifold is large compared to the boundary. For topological quantities, which are nonlocal by nature, there is an interplay between the size of the topological objects, represented by the smoothing scale, and the size of the manifold. As a result, in case of topological descriptors such as the Euler characteristic and the Betti numbers, for large scales, boundary effects become increasingly more important. A full exact computation that takes the boundary effect generated by the complicated mask into account may therefore be a more accurate reflection of reality. Our computational methods for the topological descriptors perform exact computations in the presence of arbitrary masks. For the purely geometric components of the Minkowski functionals, this will involve a computation via the full Gaussian Kinematic Formula (Adler & Taylor 2010; Pranav et al. 2019a), which takes the boundary terms into account that were, for example, developed in Fantaye et al. (2015).
4. Discussions and conclusion
We presented a topological analysis of the temperature fluctuations in the CMB in terms of homology. To account for regions with unreliable data, we computed the homology of the excursion sets relative to the mask covering these areas. We performed a multiscale analysis by degrading and smoothing the maps for a range of pixel resolutions, and a subsequent convolution with a Gaussian filter for a range of scales. We performed our analysis on the fourth and final NPIPE data release from the Planck team. The pipeline represents a natural evolution of the dataprocessing pipeline, commensurate with better understanding of systematics, residuals, and noise over a period of time in successive data releases. It incorporates the best strategies from the LFI and HFI processing pipeline, so that the level of noise and residuals at all scales is reduced (Planck Collaboration Int. LVII 2020). We also investigated the Planck 2018 Data release 3 (DR3), accompanied by the FFP10 simulations (Planck Collaboration IV 2020), for comparison and completeness. The present paper is a successor to Pranav et al. (2019a), where we investigated the topology of the temperature fluctuation maps based on the intermediate Planck 2015 Data Release 2 (DR2), accompanied by FFP8 simulations. These two investigated topological characteristics of the CMB temperature fluctuation maps for the latest three data releases by the Planck team. The overall trends in the results in the datasets show the consistency of the data processing pipeline and our own methods.
Examining the behavior of topological components or isolated objects represented by the 0D homology group, we find that the observations deviate by approximately 2σ or less from the simulations for the NPIPE dataset. For the FFP10 dataset, b_{0} exhibits a deviation of 2.96σ at N = 128, FWHM = 80′. The 1D homology group, representing and quantifying the topological loops, is also consistent with the base model at 2σ for most of the resolutions and scales. However, for FWHM = 320′ and at moderately low negative thresholds, we record a high deviation between the simulations and observations for the NPIPE dataset. In a Gaussian setting, this would amount to an approximately 4σ significance. However, the distribution characteristics at this threshold are manifestly nonGaussian, rendering the usual σ significance nonviable. The distribution characteristics do not obey the Poisson statistics either. In view of this, and because we lack a true theoretical understanding of the distribution characteristics, we simply note that the significance is larger than what may be resolved by 600 simulations, yielding an empirical pvalue of at most 0.0016. The FFP10 dataset exhibits a deviant behavior at ∼2.8σ at the same threshold and resolution for the loops. Although the deviation occurs in a regime in which the low threshold and large smoothing scale may indicate statistics with low numbers, we find the statistics to be well behaved and stable. The high significance value, combined with the fact that the deviation is at a scale at which anomalies have been reported in several methods and datasets (Eriksen et al. 2004b; Park 2004; Schwarz et al. 2016), the case merits consideration. However, in this context, we also note that the statistical analysis is based on merely 600 and 300 simulations for the NPIPE and FFP10 datasets, respectively. For the Euler characteristic, the NPIPE dataset exhibits a high significance due to the high significance exhibited by the loops. However, for the FFP10 data set, the Euler characteristic is within approximately the 2.5σ band computed from simulations, which is largely consistent with the observations in Planck Collaboration VII (2020). However, we note that our results generally exhibit stronger differences than the Planck analysis pipeline. While ascertaining the source of this discrepancy is beyond the scope of this paper, a plausible source of the difference may be the different methods adopted in estimating the Euler characteristic and Minkowski functionals in general in Planck Collaboration VII (2020), which are based on theoretical equations for boundaryless manifolds. At the next higher smoothing scale, N = 16, FWHM = 640′, both datasets exhibit a significantly anomalous behavior with respect to the number of loops, but not the number of components. However, we disregard this resolution because of the low numbers involved in computing the statistics, owing to the large smoothing scale and low thresholds.
In order to test for nonrandom discrepancies, we computed the pvalues using the theoretical modeldependent χ^{2} test, as well as its nonparametric version computed from empirical distributions, by combining all relevant thresholds at a given resolution. We also presented the nonparametric and modelindependent Tukey depth test, which indicated that the observations are true outliers with respect to the simulations in numerous instances. However, in the final interpretation, we focused on the most conservative pvalues, estimated using the χ^{2} test from the empirical distributions. For the NPIPE dataset, the observations are consistent with the simulations, with pvalues higher than 0.05 for most resolutions. However, the number of components and loops shows mildly significant deviations at N = 32, FWHM = 320′. The corresponding deviation for the Euler characteristic exhibits an orderofmagnitude lower value at per mil levels. For the next larger scale, we find that the components and the Euler characteristic are consistent with the base model, while the loops exhibit a mildly anomalous pvalue of 0.03. While emphasizing that the statistics are based on low numbers, we note the contrasting behavior of the different topological descriptors. For the FFP10 dataset, the components and the Euler characteristic exhibit pvalues higher than 0.01 for all resolutions. This is consistent with the observation in Planck Collaboration VII (2020) that the pvalues obtained from the combined Minkowski functionals are consistent with the base model within the 99% confidence level. The number of loops exhibits a significantly anomalous behavior, yielding per mil pvalues, in contrast with the number of components and the Euler characteristic, at N = 16, FWHM = 640. However, we note that this deviation occurs at large smoothing scales, where the numbers involved in computing the statistics are small. Disregarding this anomalous behavior of loops at large scales, which might be affected by lownumber statistics, we find that the FFP10 dataset is consistent with the standard model simulations at the 99% confidence level.
In summary, while both datasets are largely consistent with the simulations based on the standard model, we find instances of interesting discrepancies in both datasets, which may be difficult to reject summarily as statistical flukes. Although most but not all of the anomalies occur on large scales, which inherently indicates statistics based on small numbers, we find them to be in statistically stable regimes in most instances. Based on the evidence presented in this paper, our assessment is that it may be a difficult task to accept or reject the null hypothesis summarily. A primary but crucial requirement may be a significantly larger number of simulations in order to achieve a clear verdict. In the absence of a true understanding of the origin of the possible anomalous behavior, any attempt to classify it will merely be speculative in nature, and we refrain from it. However, to facilitate a deeper understanding, we envision future research directed toward examining the fiducial frequency maps, as well as the polarization maps, among others, possibly with a larger suite of simulations. This will also involve testing smaller patches of the sky in order to determine whether the effect is global in nature or restricted to a specific part of the sky.
The Planck CMB maps are hosted at https://pla.esac.esa.int/. The secondary datasets encapsulating topological information of CMB maps are hosted at https://www.pratyushpranav.org/cmb_data/cmb_data_archive.tar.gz. The analysis codes are available at https://www.pratyushpranav.org/codes/topos2.tar.gz
Acknowledgments
I am indebted to the anonymous referee, whose incisive, yet extremely insightful comments have helped bring this draft to a balanced place. I am greatly indebted to Robert Adler, Thomas Buchert, Herbert Edelsbrunner, Bernard Jones, Armin Schwarzman, Gert Vegter, and Rien van de Weygaert for encouraging this solo venture, and for extremely helpful discussions. My gratitude also to Julian Borrill and Reijo Keskitalo for their patience in clarifying doubts, and their constructive comments on the draft. I also thank Tal Eliezri for insightful comments on the artwork. This work is supported by the ERC advanced grant ARThUs (grant no: 740021; PI: Thomas Buchert), with contributing influence from ERC advanced grant URSAT (grant no: 320422; PI: Robert Adler). I gratefully acknowledge the support of PSMN (Pôle Scientifique de Modélisation Numérique) of the ENS de Lyon, and the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory, operated under Contract No. DEAC0205CH11231, for the use of computing resources.
References
 Adler, R. 1981, in The Geometry of Random Fields, Classics in applied mathematics, (3600 Market Street, Floor 6, Philadelphia, PA 19104: SIAM), Society for Industrial and Applied Mathematics [Google Scholar]
 Adler, R. J., & Taylor, J. E. 2010, Random Fields and Geometry, Springer Monographs in Mathematics (Springer) [CrossRef] [Google Scholar]
 Adler, R. J., Agami, S., & Pranav, P. 2017, Proc. Nat. Acad. Sci., 114, 11878 [CrossRef] [Google Scholar]
 Appleby, S., Park, C., Hong, S. E., Hwang, H. S., & Kim, J. 2020, ApJ, 896, 145 [NASA ADS] [CrossRef] [Google Scholar]
 Appleby, S., Park, C., Pranav, P., et al. 2021, ArXiv eprints [arXiv:2110.06109] [Google Scholar]
 Bauer, U., Kerber, M., Reininghaus, J., & Wagner, H. 2014, in Mathematical Software – ICMS 2014 (Berlin Heidelberg: Springer), 137 [CrossRef] [Google Scholar]
 Bennett, C. L., Larson, D., Weiland, J. L., et al. 2013, ApJS, 208, 20 [Google Scholar]
 Biagetti, M., Cole, A., & Shiu, G. 2021, JCAP, 2021, 061 [CrossRef] [Google Scholar]
 Chingangbam, P., Yogendran, K. P., Joby, P. K., et al. 2017, JCAP, 12, 023 [CrossRef] [Google Scholar]
 Codis, S., Pichon, C., Pogosyan, D., Bernardeau, F., & Matsubara, T. 2013, MNRAS, 435, 531 [NASA ADS] [CrossRef] [Google Scholar]
 Copi, C., Huterer, D., Schwarz, D., & Starkman, G. 2015, MNRAS, 449, 3458 [NASA ADS] [CrossRef] [Google Scholar]
 Ducout, A., Bouchet, F. R., Colombi, S., Pogosyan, D., & Prunet, S. 2013, MNRAS, 429, 2104 [NASA ADS] [CrossRef] [Google Scholar]
 Durrer, R., Gangui, A., & Sakellariadou, M. 1996, Phys. Rev. Lett., 76, 579 [CrossRef] [Google Scholar]
 Edelsbrunner, H., & Harer, J. 2010, Computational Topology  an Introduction (American Mathematical Society), 1 [Google Scholar]
 Eriksen, H. K., Hansen, F. K., Banday, A. J., Górski, K. M., & Lilje, P. B. 2004a, ApJ, 605, 14 [NASA ADS] [CrossRef] [Google Scholar]
 Eriksen, H. K., Novikov, D. I., Lilje, P. B., Banday, A. J., & Górski, K. M. 2004b, ApJ, 612, 64 [NASA ADS] [CrossRef] [Google Scholar]
 Euler, L. 1758, Novi Commentarii academiae scientiarum Petropolitanae, 4, 140 [Google Scholar]
 Fantaye, Y., Marinucci, D., Hansen, F., & Maino, D. 2015, Phys. Rev. D, 91, 063501 [NASA ADS] [CrossRef] [Google Scholar]
 Feldbrugge, J., van Engelen, M., van de Weygaert, R., Pranav, P., & Vegter, G. 2019, JCAP, 2019, 052 [CrossRef] [Google Scholar]
 Gauss, C. F. 1900, K. Gesellschaft Wissenschaft, 8 [Google Scholar]
 Górski, K. M., Hivon, E., Banday, A. J., et al. 2005, ApJ, 622, 759 [Google Scholar]
 Gott, J. R., III, Dickinson, M., & Melott, A. L. 1986, ApJ, 306, 341 [NASA ADS] [CrossRef] [Google Scholar]
 Guth, A. H. 1981, Phys. Rev. D, 23, 347 [Google Scholar]
 Guth, A. H., & Pi, S.Y. 1982, Phys. Rev. Lett., 49, 1110 [CrossRef] [Google Scholar]
 Harrison, E. R. 1970, Phys. Rev. D, 1, 2726 [CrossRef] [Google Scholar]
 Heydenreich, S., Brück, B., & HarnoisDéraps, J. 2021, A&A, 648, A74 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Jones, B. J. T. 2017, Precision Cosmology: The First Half Million Years (Cambridge University Press) [CrossRef] [Google Scholar]
 Kerscher, M., Schmalzing, J., & Buchert, T. 1996, in Mapping, measuring and modelling the universe, eds. P. Coles, V. Martinez, & M. J. P. Borderia, 247 [Google Scholar]
 Kono, K. T., Takeuchi, T. T., Cooray, S., Nishizawa, A. J., & Murakami, K. 2020, ArXiv eprints [arXiv:2006.02905] [Google Scholar]
 Mahalanobis, P. C. 1936, in Proceedings National Institute of Science, India, 2, 49 [Google Scholar]
 Masi, S. 2002, Prog. Part. Nucl. Phys., 48, 243 [NASA ADS] [CrossRef] [Google Scholar]
 Matsubara, T. 2010, Phys. Rev. D, 81, 083505 [NASA ADS] [CrossRef] [Google Scholar]
 Mecke, K. R., Buchert, T., & Wagner, H. 1994, A&A, 288, 697 [NASA ADS] [Google Scholar]
 Moraleda, R., Valous, N., Xiong, W., & Halama, N. 2019, Computational Topology for Biomedical Image and Data Analysis: Theory and Applications, Focus Series in Medical Physics and Biomedical Engineering, (CRC Press) [Google Scholar]
 Morozov, D. 2005, BioGeometry News, Dept. Comput. Sci., Duke Univ., USA [Google Scholar]
 Munkres, J. 1984, Elements of Algebraic Topology, Advanced book classics (Perseus Books) [Google Scholar]
 Park, C.G. 2004, MNRAS, 349, 313 [NASA ADS] [CrossRef] [Google Scholar]
 Park, C., Pranav, P., Chingangbam, P., et al. 2013, J. Korean Astron. Soc., 46, 125 [CrossRef] [Google Scholar]
 Peebles, P. J. E., & Yu, J. T. 1970, ApJ, 162, 815 [NASA ADS] [CrossRef] [Google Scholar]
 Planck Collaboration I. 2020, A&A, 641, A1 [Google Scholar]
 Planck Collaboration IV. 2020, A&A, 641, A4 [Google Scholar]
 Planck Collaboration VII. 2020, A&A, 641, A7 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Planck Collaboration XII. 2016, A&A, 594, A12 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Planck Collaboration XVII. 2016, A&A, 594, A17 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Planck Collaboration XVI. 2016, A&A, 594, A16 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Planck Collaboration Int. LVII. 2020, A&A, 643, A42 [Google Scholar]
 Pogosyan, D., Gay, C., & Pichon, C. 2009, Phys. Rev. D, 80, 081301 [NASA ADS] [CrossRef] [Google Scholar]
 Pranav, P. 2015, PhD Thesis, University of Groningen, The Netherlands [Google Scholar]
 Pranav, P. 2021a, IEEE Signal Proc. Mag., 38, 130 [NASA ADS] [CrossRef] [Google Scholar]
 Pranav, P. 2021b, ArXiv eprints [arXiv:2109.08721] [Google Scholar]
 Pranav, P., Edelsbrunner, H., van de Weygaert, R., et al. 2017, MNRAS, 465, 4281 [Google Scholar]
 Pranav, P., van de Weygaert, R., Vegter, G., et al. 2019a, MNRAS, 485, 4167 [Google Scholar]
 Pranav, P., Adler, R. J., Buchert, T., et al. 2019b, A&A, 627, A163 [EDP Sciences] [Google Scholar]
 Ryden, B. 2003, Introduction to Cosmology (AddisonWesley) [Google Scholar]
 Sahni, V., Sathyprakash, B., & Shandarin, S. 1998, ApJ, 507, L109 [NASA ADS] [CrossRef] [Google Scholar]
 Schmalzing, J., & Gorski, K. M. 1998, MNRAS, 297, 355 [CrossRef] [Google Scholar]
 Schwarz, D. J., Copi, C. J., Huterer, D., & Starkman, G. D. 2016, CQG, 33, 184001 [NASA ADS] [CrossRef] [Google Scholar]
 Shivashankar, N., Pranav, P., Natarajan, V., et al. 2016, IEEE Trans. Vis. Comput. Graph., 22, 1745 [CrossRef] [Google Scholar]
 Starobinsky, A. A. 1982, Phys. Lett. B, 117, 175 [Google Scholar]
 Telschow, F., Schwartzman, A., Cheng, D., & Pranav, P. 2019, ArXiv eprints [arXiv:1908.02493] [Google Scholar]
 The CGAL Project 2021, CGAL User and Reference Manual, 5.3 edn., (CGAL Editorial Board) [Google Scholar]
 Tukey, J. W. 1975, in Proceedings of the 1974 international congress of mathematicians, 2, 523 [Google Scholar]
 van de Weygaert, R., Vegter, G., Edelsbrunner, H., et al. 2011, Trans. Comput. Sci., 14, 60 [NASA ADS] [CrossRef] [Google Scholar]
 Wilding, G., Nevenzeel, K., van de Weygaert, R., et al. 2021, MNRAS, 507, 2968 [NASA ADS] [CrossRef] [Google Scholar]
 Xu, X., CisewskiKehe, J., Green, S., & Nagai, D. 2019, Astron. Comput., 27, 34 [NASA ADS] [CrossRef] [Google Scholar]
Appendix A: Data and methods
In this section, we briefly describe the dataset we used in the experiments. We also present a short account of the computational pipeline, referring to Pranav et al. (2019a) for details.
A.1. Datasets
Over a period of time, the Planck team has invested significant effort in understanding and calibrating the source of noise as well as systematics that affect observations. This has led to the release of a series of datasets over a period of time, with successive data releases achieving better calibration and accuracy. In Pranav et al. (2019a), we performed our experiments on DR2, which is one of the intermediate data releases. In this paper, we concentrate on two data releases simultaneously, which we briefly describe below.
Planck 2020 Data Release 4 NPIPEdataset. The primary results are based on the observational maps and simulations based on the NPIPE (Planck Collaboration Int. LVII 2020) analysis pipeline (hereafter just NPIPE), which employs the SEVEM component separation technique. The NPIPE dataset is the final data release from the Planck team, and the pipeline is the most evolved and sophisticated of all datageneration pipelines. It combines some of the most powerful features of the separate LFI and HFI analysis piplelines, resulting in frequency and component maps that have lower levels of noise and systematics at essentially all angular scales (cf. Planck Collaboration Int. LVII 2020). The observational maps are accompanied by a suite of 600 simulations modeled as isotropic, homogeneous Gaussian random fields.
Planck 2018 Data Release 3 FFP10 dataset. For comparison, we also analyzed the temperature maps from Planck 2018 fullmission data release (DR3 hereafter), which is the third data release by the Planck team, and employs the SMICA component separation technique. The noise and systematics in this release are better constrained than in the previous data releases. The componentseparated observational maps are accompanied by a suite of 300 simulations generated using the Full Focal Plane Plane pipeline, hereafter referred to as the FFP10 simulations (Planck Collaboration IV 2020), also based on the standard model.
A.2. Computational pipeline
The computational pipeline is composed of the steps that we briefly describe below. See Pranav et al. (2019a) for a substantially detailed description.
Preprocessing. We performed a range of preprocessing steps to the maps using the HealPix package (Górski et al. 2005), which is also the format of the initial data. First, the CMB and the noise maps were added pixelwise for each realization of the simulations. Subsequently, the observational and noiseadded simulation maps were degraded and smoothed at a range of scales for the values presented in Table A.1. The mask presented in Figure A.1, which is a binary map, was also degraded and smoothed at the same resolution as the CMB maps, and subsequently thresholded at a value of 0.9 to make it binary again. Table A.1 presents the percentage of sky covered by the unmasked region for the degraded resolution and associated smoothing scales analyzed in this paper. Figure A.2 presents a visualization of the degraded and smoothed maps after applying the mask. Next, we computed the mean and standard deviation from the unmaksed pixels for each realization and resolution and rescaled the maps pixelwise by the standard deviation after subtracting the mean. As a final step, we assigned the value infinity to the masked pixels (numerically simulated by a very large number).
Fig. A.1.
Visualization of the mask employed for the analyses in this paper. It is the common mask released in Data Release 3 (2018) and masks our galaxy as well as other bright foreground sources, including point sources. 
Fig. A.2.
Visualization of the degraded, smoothed, and masked observational maps employed in this paper. The maps are degraded at N = 2048, 1024, 512, 256, 128, 64, 32, and 16, and smoothed at FWHM = 5′,10′,20′,40′,80′,160′,320′,and 640′, respectively. The mask is degraded and smoothed at the same scales as the map, and subsequently thresholded at 0.9 for rebinarization. 
Percentage of sky area covered by the unmasked regions in the sky.
Triangulation. As a first step, we projected pixels of the maps projected onto 𝕊^{2}, and we triangulated this set of points in ℝ^{3}. Taking the convex hull of this triangulation produces a triangulation of the pointset on 𝕊^{2}. This triangulation consists of V = 12N^{2} vertices, 3V − 6 edges, and 2V − 4 triangles, where N is the resolution parameter in HealPix format. This was the input to all downstream computations and represents the temperature field, f: 𝕊^{2} → ℝ, by storing the temperature value at each vertex; see Pranav et al. (2019a). We assumed a piecewise linear interpolation along higher dimensional simplices.
Upperstar filtration. Given the triangulation K constructed in the previous step, we ordered its simplices such that σ precedes τ if (i) f(σ) > f(τ) or (ii) f(σ) = f(τ) and dim σ < dim τ, in which f(σ) is the minimum temperature value of the vertices of σ. Any ordering that satisfies (i) and (ii) is called an upperstar filter of K and f. The corresponding upperstar filtration consists of all prefixes of the filter, each representing an excursion set of f.
Persistence computation. We constructed a boundary matrix from the upperstar filtration of K. Writing σ_{1}, σ_{2}, …, σ_{n} for the sorted simplices of the upperstar filtration, the boundary matrix ∂[1..n, 1..n] is defined by ∂[i, j] = 1, if σ_{i} is a face of σ_{j} and dim σ_{i} = dim σ_{j} − 1, and ∂[i, j] = 0, otherwise. Computing the persistence birth death pair from this ordered boundary matrix involves reduction of the columns to the lowestj form. We resorted to reduction from left to right, and a column of the matrix was reduced if it was zero or its lowest 1 had only zeros in the same row to its left. Each column with a unique lowestj contributes to the birth death pair of the persistence diagram, where the index of the birth and death simplices are precisely the row and column indices of the lowestj. Ranks of homology groups relative to the mask were inferred from the persistence diagrams by setting the vertices belonging to the mask at +∞; see Pranav et al. (2019a):
Tests for determining statistical significance. Our main aim was comparing the observational CMB maps with the nullhypothesis simulation maps, which assume isotropy, homogeneity, and Gaussianity. Let x_{i} ∈ ℝ^{m}, i = 1, …, n, be a sample of i.i.d. mdimensional vectors, drawn from a distribution G. Let y ∈ ℝ^{m} be another sample point, assumed to be drawn from a distribution F. We wish to test the (null) hypothesis that F = G, and give the test results in terms of pvalues, which compute the probability that y is ‘consistent’ with this hypothesis. We employed two different statistical tests that we briefly describe below.
Mahalanobis distance. The first is the parametric Mahalanobis distance, or the familiar χ^{2} test (Mahalanobis 1936). If G is assumed to be Gaussian and n is large, then under the hypothesis that G = F, the squared Mahalanobis distance is approximately distributed as a χ^{2} distribution with m degrees of freedom. Thus the corresponding pvalue is
Tukey depth. The second method is a nonparametric test based on the Tukey depth (Tukey 1975). It is a general metric for identifying outliers in a flexible manner and in a nonparametric setting, making no assumptions on the structure of F and G. Let z be any point in ℝ^{m}. Then the halfspace depth d_{dep}(z; x_{1}, …, x_{n}) of z within the sample of the x_{i} is the smallest fraction of the n points x_{1}, …, x_{n} to either side of any hyperplane passing through z. Points that have the same depth constitute a nonparametric estimate of the isolevel contour of the distribution F. We first compute dd_{j} = d_{dep}(x_{j}; x_{1}, …, x_{n}) for every point x_{j}, j = 1, …, n, yielding an empirical distribution of depth. The pvalue of y was computed as the proportion of points whose depth is lower than that of y,
Appendix B: Code description
All the analyses in this paper have been performed by deploying TopoS2. TopoS2 is a specialized software written for computing the topology of scalar functions on S2, especially adapted to the HealPix data format. It is written in the context of the topological analysis of Planck CMB maps, however useful for other scalar functions on 𝕊^{2}. TopoS2 is tested on Debian Linux 9, with the following dependencies:
* [boost] — general purpose C++ libraries
* [CGAL]4.7 — triangulation and filtration building
* [CMake] — build process
* [Dionysus] — data structure and topology computation
* [HealPiX] — dataprocessing operations on the sphere
* [PHAT] — data structure and topology computation
* [boost]: http://www.boost.org
* [CGAL]: http://www.cgal.org
* [CMake]: http://www.cmake.org
* [Dionysus]: https://www.mrzv.org/software/dionysus/
* [HealPix]: https://healpix.jpl.nasa.gov/
* [PHAT]: https://github.com/blazs/phat
B.1. Description
TopoS2 has three executables that work in tandem to produce the persistence diagrams of scalar fields on 𝕊^{2}. For Planck CMB files supplied in HealPix format the code helper_func/fits_RDS_maskRDS_nrm.cpp reads the CMB and mask fits files, degrades and smooths them to specified values, and outputs an ascii masked file that serves as input to the next code in sequence filtration/streaming_filtration_builder_superlevelset.cpp. The output of this operation is a filtration of superlevel sets to the standard output stream. This stream is fed to the final part of the code persistent_homology/diagram_generator.cpp, which outputs the persistence diagram to the standard output stream. The code can be run in isolation for each data file, or batchprocessed on large clusters. Higher resolutions are memory intensive:

2048 ∼120 GB

1024 ∼ 35 GB

512 ∼ 8 GB
B.2. Building
We assume we are in TopoS2 root directory.
1.) For the fits operation handling, the code is compiled against healpix C++ library. This amounts to modifying the helper_func/compile file to point to the healpix directories, and executing it.
cd helper_func/ ./compile cd ..
2.) Building the triangulation and filtration invokes the CGAL library (The CGAL Project 2021). It is best compiled by invoking the cgal_create_cmake_script scriptfile to generate the cmake components. Dependency problems, if arising, are resolved by interactively editing the cmake configuration file, and setting the right paths. If the CGAL root directory is $CGAL_DIR, this amounts to executing the following commands in sequence:
cd filtration/ $CGAL_DIR/scripts/cgal_create_cmake_script cmake . (or ccmake . for interactive) make cd ..
3.) Computing persistence diagrams is based on Dionysus (Morozov 2005) and PHAT (Bauer et al. 2014) library. The code is compiled by executing the following commands in sequence:
cd persistent_homology mkdir build cd build cmake .. make cd ../..
B.3. Analyzing planck dataset
The core code and associated helper scripts have evolved over a period of time keeping in mind the central requirement to analyze the full Planck dataset as fast as possible. Given the memory requirements as stated above, the best strategy is to engage full nodes with multiple cores on large clusters for each run, especially at higher resolutions. The codes are written such that they default on memory overflow. As memory consumption on the higher resolutions is the bottleneck, the codes are written with a capability to run on a single processor, as well as multiple processors, and crash if memory overflows.
B.3.1. Generating persistence diagrams
Given all of this, and a successful compilation of the codes, and an assumption that we are in the directory containing the NPIPE observation file npipe6v20_sevem_cmb_005a_2048.fits, and simulations with CMB and Noise added (numbering from 200 to 799, SEVEM kind), we provide two scripts that run on SunGrid Engine cluster environment and produce a multiscale analysis in an automated fashion. The helper script sge_job_script_npipe_DS_obs.py in the “example” directory serves as an example to process multiple resolutions for the observation files on a cluster running the SGE grid engine. The script file sge_job_script_npipe_DS.py processes the simulations. In the script, the variables “binary1”, “binary2” and “binary3” set the paths to fits processing script, filtration builder code, and persistent homology calculator code. At this moment we have the persistence diagrams from the observations and simulations, from which the Betti numbers may be calculated.
B.3.2. Computing Betti numbers
To compute the Betti numbers from the persistence diagrams, we need to insert a heading in each .dgm file which is the number of rows in it. The script heading_fixer.py does this for the simulations, similarly heading_fixer_obs.py for observations. Thereafter the Betti numbers are calculated by the C++ code betti_superlevel_relative (to be compiled like a normal c++ code). For batch processing two script files betti_calculator.py and betti_calculator_obs.py do this. The scripts in this paragraph need to be placed in the outdir directory of sge_job_script_npipe_DS.py
Appendix C: Validation of the statistical tests
In this section, we examine the validity of the regime of our statistical tests. To this end, we examined the empirical distribution of the χ^{2} and Tukey depth values from the simulations, and compared it with the observations.
Figure C.1 presents the case for the χ^{2} distribution. The top two rows present the values for b_{0}, the middle two rows for b_{1}, and the bottom two rows for the Euler characteristic. The distribution of the values from the 600 NPIPE simulations is denoted by the gray box histograms. The blue curve denotes the theoretical curve for the given degrees of freedom, and the red vertical line shows the value of the observational map with respect to the distribution for the simulations. In general, we find a good agreement between the theoretical curves and the histograms from the simulations, which lends additional support to the validity of the χ^{2} test in our regimes of testing.
Fig. C.1.
Distribution of the Mahalanobis distance. The box histograms present the distribution of the Mahalanobis distance of the simulations with respect to the mean. The blue curve plots the theoretical χ^{2} distribution for the given degrees of freedom, and the red vertical line presents the Mahalanobis distance of the observation with respect to mean. The generally good fit of the theoretical curve with the histogram of the simulations for all resolutions lends credibility to the validity of the statistical tests in the chosen regimes. 
Figure C.2 presents the case for the Tukey depth, with the values from the simulations presented in the gray box, and the observational value presented by the red vertical line. The top two rows present the values for b_{0}, the middle two rows for b_{1}, and the bottom two rows for the Euler characteristic. We note the wellbehaved nature of the distribution from the histograms. To help interpret these diagrams, we note that depth values increase from left to right on the horizontal axis, and the pvalues are computed as the fraction of points that have lower depth than the candidate. The extreme pvalues of 0.0 in the main paper then simply means that the observational points are true outliers, and no simulated points are shallower than the observation. The pvalues are consistent with the distribution characteristics, where in these cases, the observed red vertical line is to the left of all simulated box histograms.
Fig. C.2.
Distribution of the Tukey depth. The box histograms present the distribution of the depth of the simulations, and the red vertical line presents the depth of the observation. 
All Tables
All Figures
Fig. 1.
Visualization of the temperature fluctuations in the CMB sky. The survey surface 𝕊^{2} is distorted at each point in the direction of the surface normal. The distortion is proportional to the fluctuation in direction and magnitude. The visualization is based on the observed CMB sky cleaned by the NPIPE pipeline and smoothed at 5 degrees. 

In the text 
Fig. 2.
Cosmic microwave background sky thresholded at moderately positive (left) and negative (right) levels. For high thresholds, the excursion set is dominated by isolated components, while at low thresholds, it gives the appearance of a single connected surface indented by numerous holes. For sufficiently low thresholds, the holes fill up, and the excursion set covers the entire sphere, which is composed of a connected surface without a boundary that encloses a single void (cf. Fig. 1). 

In the text 
Fig. 3.
Visualization of the temperature fluctuations in the CMB sky in the presence of masks (plotted in gray). It consists of an equatorial belt and numerous patches on the northern and southern cap, which correspond to our galaxy and other bright foreground objects. The visualization is based on the observed CMB sky cleaned by the NPIPE pipeline and smoothed at 5 degrees. 

In the text 
Fig. 4.
Graphs of b_{0} for the NPIPE dataset for various degraded and smoothing scales. Panel a: observational curve in yellow, and the curves corresponding to the average of simulations are presented in gray. Error bands corresponding to (1σ : 3σ) are also drawn. Panel b: curve for the significance of the differences, where the observed curves are presented in red, and the values for the simulations are presented as dotted gray lines. 

In the text 
Fig. 5.
Graphs of b_{1} for the NPIPE dataset for various resolutions and smoothing scales. In panel a, the observational curves are presented in yellow, and the curves corresponding to the average of simulations are presented in gray, while panel b presents the significance of the differences. Error bands corresponding (1σ : 4σ) are also presented in various colors. 

In the text 
Fig. 6.
Graph of b_{1}, selectively for N = 32 and N = 16, corresponding to FWHM = 320′ and FWHM = 640′, presenting an enlarged view of the concerned resolutions. The observational curve is presented in yellow, and the curves corresponding to the average of simulations are presented in gray. Error bands corresponding to (1σ : 4σ) are also drawn. The bottom two panels present the visualization of the scalar temperature field in order to facilitate an appreciation of the structure of the field. 

In the text 
Fig. 7.
Histogram of b_{1} at ν = −2.5, where the ∼4σ deviation occurs. The minimum value attained across 600 simulations is 18, with a mean at ∼21 and a standard deviation of ∼1.7. The observed map exhibits b_{1} = 28 and is well outside the distribution. 

In the text 
Fig. 8.
Visualization of the loops surrounding the lowdensity regions at a moderately negative threshold. The gaps in the manifold correspond to the mask that has been removed. We clearly note the equatorial belt corresponding to our galaxy, and more patches in the visible cap. Some loops live fully in the excursion region that is not influenced by the mask. We show them in red. We also depict some representative relative loops whose end points are masked. We draw closed loops for this category as well and show the portions in which they overlap the masked regions in white. The visualization is based on the observed CMB map from the NPIPE data set smoothed at 5 degrees. 

In the text 
Fig. 9.
Graphs of EC_{rel} for the NPIPE dataset. Panels a,b: similar information as Figs. 4 and 5. EC_{rel} exhibits deviations commensurate with those in b_{0} and b_{1}. 

In the text 
Fig. 10.
Graph of b_{0} for the FFP10 dataset. In panel a, the observational curves are presented in yellow, and the curves corresponding to the average of simulations are presented in gray. Error bands corresponding to (1σ : 3σ) are also drawn. Panel b: significance of the differences. The dataset exhibits milder deviations than the NPIPE dataset in general. However, we note a 2.96σ deviation in the number of components between the simulations and observation at N = 128, FWHM = 80′. 

In the text 
Fig. 11.
Graph of b_{1} for the FFP10 dataset. In panel a, the observational curves are presented in yellow, and the curves corresponding to the average of simulations are presented in gray. Error bands corresponding to (1σ : 4σ) are also drawn. Panel b: significance of the differences. The dataset exhibits milder deviations than the NPIPE dataset in general, but there are mildly significant deviations at 2.77σ, at scales and thresholds where the NPIPE dataset shows strong deviations as well. 

In the text 
Fig. 12.
Graph of EC_{rel} for the FFP10 dataset. In panel a, the observational curve is presented in yellow, and the curves corresponding to the average of simulations are presented in gray. Error bands corresponding to (1σ : 3σ) are also drawn. Panel b: significance of the differences. The Euler characteristic exhibits deviations commensurate with b_{0} and b_{1}. 

In the text 
Fig. A.1.
Visualization of the mask employed for the analyses in this paper. It is the common mask released in Data Release 3 (2018) and masks our galaxy as well as other bright foreground sources, including point sources. 

In the text 
Fig. A.2.
Visualization of the degraded, smoothed, and masked observational maps employed in this paper. The maps are degraded at N = 2048, 1024, 512, 256, 128, 64, 32, and 16, and smoothed at FWHM = 5′,10′,20′,40′,80′,160′,320′,and 640′, respectively. The mask is degraded and smoothed at the same scales as the map, and subsequently thresholded at 0.9 for rebinarization. 

In the text 
Fig. C.1.
Distribution of the Mahalanobis distance. The box histograms present the distribution of the Mahalanobis distance of the simulations with respect to the mean. The blue curve plots the theoretical χ^{2} distribution for the given degrees of freedom, and the red vertical line presents the Mahalanobis distance of the observation with respect to mean. The generally good fit of the theoretical curve with the histogram of the simulations for all resolutions lends credibility to the validity of the statistical tests in the chosen regimes. 

In the text 
Fig. C.2.
Distribution of the Tukey depth. The box histograms present the distribution of the depth of the simulations, and the red vertical line presents the depth of the observation. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.