Unexpected Topology of the Temperature Fluctuations in the Cosmic Microwave Background

We study the topology generated by the temperature fluctuations of the Cosmic Microwave Background (CMB) radiation, as quantified by the number of components and holes, formally given by the Betti numbers, in the growing excursion sets. We compare CMB maps observed by the Planck satellite with a thousand simulated maps generated according to the LCDM paradigm with Gaussian distributed fluctuations. The survey of the CMB over $\mathbb{S}^2$ is incomplete due to obfuscation effects by bright point sources and other extended foreground objects like our own galaxy. To deal with such situations, where analysis in the presence of"masks"is of importance, we introduce the concept of relative homology. The parametric $\chi^2$-test shows differences between observations and simulations, yielding $p$-values at per-cent to less than per-mil levels roughly between 2 to 7 degrees. The highest observed deviation for $b_0$ and $b_1$ is approximately between $3\sigma$-4$\sigma$ at scales of 3 to 7 degrees. There are reports of mildly unusual behaviour of the Euler characteristic at 3.66 degrees in the literature, computed from independent measurements of the CMB temperature fluctuations by Planck's predecessor WMAP satellite. The mildly anomalous behaviour of Euler characteristic is related to the strongly anomalous behaviour of components and holes. These are also the scales at which the observed maps exhibit low variance compared to the simulations. Non-parametric tests show even stronger differences at almost all scales. Regardless, beyond the trivial possibility that this may still be a manifestation of an extreme Gaussian case, these observations, along with the super-horizon scales involved, may motivate to look at primordial non-Gaussianity. Alternative scenarios worth exploring may be models with non-trivial topology.


Introduction
The Λ cold dark matter (or ΛCDM) standard paradigm of cosmology postulates that the Universe consists primarily of cold non-relativistic dark matter, which reveals its presence only through gravitational interactions, and the Universe is currently driven by dark energy, causing accelerated volume expansion in this model. The cosmic microwave background (CMB) radiation, which originates at the epoch of recombination, is the most important observational probe into the validity of the standard paradigm today (Jones 2017). It is the earliest visible light and offers a glimpse into the processes during the nascent stage of the Universe. Fluctuations about the mean in the temperature field of the CMB correspond to the fluctuations in the distribution of matter in the early Universe. Understanding the CMB is therefore crucial to understanding the primordial Universe.
The ΛCDM paradigm together with the inflationary theories in their simplest forms, predict the primordial perturbations to be realizations of a homogeneous and isotropic Gaussian random field (Guth & Pi 1982). This hypothesis is supported experimentally by CMB observations (Smoot et al. 1992;Bennett et al. 2003;Spergel et al. 2007;Komatsu et al. 2011;Planck Collaboration XIII 2016) and theoretically by the central limit theorem. While it has largely been agreed upon that the CMB exhibits characteristics of a homogeneous and isotropic Gaussian field, there are lingering doubts. The pioneering works of Eriksen et al. (2004a) and Park (2004) challenge the assumption of homogeneity, and the alignment of low multipoles (Copi et al. 2015) challenges the assumption of isotropy. Other noted anomalies include the vanishing correlation function at large scales, and the unusually low variance at approximately 3 • ; see Schwarz et al. (2016) for a review and possible A&A 627, A163 (2019) interpretations. Planck Collaboration XXIII (2014) independently confirms these anomalies.
The primordial non-Gaussianity remains a topic of ongoing debate. Deviations from Gaussianity, if found, will point to new physics driving the Universe in its nascent stages. The consensus is in relative favour of the absence of non-Gaussianity (Komatsu et al. 2011;Planck Collaboration XXIII 2014;Planck Collaboration XXIV 2014;Matsubara 2010; Bartolo et al. 2010); see also Buchert et al. (2017) for a review and a modelindependent route of analysis. Despite mildly unusual behaviour of the Euler characteristic, pointed out in Eriksen et al. (2004b) and Park (2004), the methods employed until today have not provided compelling evidence of non-Gaussianity in the CMB. In contrast, the topological methods of this paper find the observed CMB maps (Planck Collaboration IX 2016) to be significantly different from the Full Focal Plane 8 (FFP8) simulations (Planck Collaboration XXIII 2014;Planck Collaboration XII 2016) that assume the initial perturbations to be Gaussian.
Topology is the branch of mathematics concerned with properties of shapes and spaces preserved under continuous deformations, such as stretching and bending, but not tearing and gluing. It is related to but different from geometry, which measures size and shape. Both geometry and topology have been used in the past to study the structure of the CMB radiation and other cosmic fields. Historically, the predominant tools in this endeavour were the Minkowski functionals, which for a 2-manifold embedded in the three-dimensional space are related to the enclosed volume, the area, the total mean curvature, and the total Gaussian curvature. By the Gauss-Bonnet Theorem, for 2-manifolds, the latter is 2π times the Euler characteristic (Euler 1758), thus providing a bridge between geometry and topology. Early topological studies of the cosmic mass distribution were based on the Euler characteristic of the iso-density surfaces, which generically are 2-manifolds (Doroshkevich 1970; Bardeen et al. 1986;Gott et al. 1986;Park et al. 2013). The full set of Minkowski functionals was later introduced to cosmology in Mecke et al. (1994), Schmalzing & Buchert (1997), Sahni et al. (1998), Schmalzing & Gorski (1998). For Gaussian, and Gaussian-related random fields, the expected values of the Minkowski functionals of excursion sets have known analytic expressions (Adler 1981;Adler & Taylor 2010), which is one of the main reasons they have played a key role in the study of real valued fields arising in cosmology and other disciplines. Analyses based on Minkowski functionals have been used for predicting and quantifying the presence of non-Gaussianity in the CMB maps obtained with the Planck satellite (Ducout et al. 2013;Buchert et al. 2017).
While the Minkowski functionals have been instructive, the topological information contained in them is limited and convolved with geometric information. Moreover, they are not equipped to address the hierarchical aspects of the matter distribution directly, although partial Minkowski functionals (Schmalzing et al. 1999) may be useful in certain settings. We therefore analyse CMB fluctuations in terms of the purely topological concepts of homology (Munkres 1984), as quantified by Betti numbers (Betti 1871) and persistence (Edelsbrunner et al. 2002;Edelsbrunner & Harer 2010). Following the Euler-Poincaré formula, the Euler characteristic is the alternating sum of the Betti numbers, implying that the latter provide a finer description of topology (Munkres 1984). A broad exposition of these concepts in a cosmological setting is given in Pranav et al. (2017), Pranav (2015), van de Weygaert et al. (2011); also see Park et al. (2013), Sousbie (2011), Shivashankar et al. (2016), Adler et al. (2017), Makarenko et al. (2018), Cole & Shiu (2018) for some applica-tions. Related but slightly different methodologies used for the analysis of cosmological datasets, emanating from concepts in Morse theory, maybe found in Colombi et al. (2000), Novikov et al. (2006), Sousbie et al. (2008).
Our main result is an anomaly of the observed CMB radiation when compared with simulations based on Gaussian prescriptions. The χ 2 -test yields a significant difference between the number of components and holes in the observed sky compared to the simulations, with p-values at percent to less than permil levels at scales of roughly 2-7 • . The differences peak sporadically at more than 3σ at these scales. Non-parametric tests reveal an even more significant difference between the observation and the simulations at almost all scales. The χ 2test shows the anomaly at roughly the same scales at which the power spectrum exhibits a dip. Eriksen et al. (2004b) reports a mildly unusual Euler characteristic at approximately 3 • in the earlier measurements of the CMB radiation by the Wilkinson Microwave Anisotropy Probe (WMAP) satellite, which is related to the anomalous behaviour of components and holes. The noted anomaly motivates a closer look at the standard paradigm. Possible scenarios include but are not limited to primordial non-Gaussianity, topological defect models, and models with nontrivial topology (Bouchet et al. 2002;Aurich & Steiner 2001;Aurich et al. 2007;Bernui et al. 2018).
The workflow in this paper is as follows. Topological descriptors are computed from cosmology data, and statistical tests based on these descriptors are used to compare the observations with simulations. Section 2 gives a summary of the topological concepts. Section 3 describes the data, the computational pipeline, and a brief account of the statistical tests employed. Section 4 presents the main results of the paper, followed by a summary and conclusions in Sect. 5.

Topological background
Since the CMB radiation is observed as a scalar field on the two-dimensional sphere, the topological concepts needed in this paper are elementary, namely the components and the holes of subsets of this sphere. To count them in the presence of regions with unreliable data, we compute the ranks of the homology groups relative to the mask that covers these regions.

Excursion sets and absolute homology
Writing S 2 for the two-dimensional sphere and f : S 2 → R for the temperature field of the CMB, the excursion set at a temperature ν is the subset of the sphere in which the temperature is ν or larger: It is a closed set, and we write β 0 (ν) for its number of components. A hole is a component of the complement, S 2 \ E(ν). Assuming there is at least one hole, we write β 1 (ν) + 1 for the number of holes, and we set β 2 (ν) = 0, because E(ν) does not cover the entire sphere. On the other hand, if there is no hole, we set β 1 (ν) = 0 and β 2 (ν) = 1; see the left panel of Fig. 1 for an illustration. These definitions are motivated by the more general theory (Munkres 1984) in which the p-th Betti number is the rank of the p-th homology group: β p = rank H p for p = 0, 1, 2. These are the basic objects of homology. The Euler characteristic of the excursion set is the alternating sum of the Betti numbers: The Euler characteristic has a long history in the CMB literature, largely due to the fact that a simple analytic formula for its expected value is known when the CMB is modelled as Fig. 1. Left: blue excursion set on the sphere consisting of an upper-left component with a hole, an upper-right component, and a lower component. Its Betti numbers are β 0 = 3, β 1 = 1, β 2 = 0, and its Euler characteristic is EC = 3 − 1 + 0 = 2. Middle: pink mask in which the data are not reliable. The mask covers part of the upper-left component and hole; its hole is fully contained in the upper-right component, and it overlaps the lower component in two disconnected pieces. Right: visualization of the relative homology groups obtained by shrinking the mask to a point and pulling the excursion set with it. We have b 0 = 0 because all three components connect to the shrunken mask, b 1 = 2 because the loop in the upper-left component is preserved and a new loop in the lower component is formed, and b 2 = 1 because the upper-right component takes on the shape of a sphere. The (relative) Euler characteristic is therefore EC rel = 0 − 2 + 1 = −1.
a Gaussian random field (Adler 1981;Adler & Taylor 2010). While such formulas are not known for the Betti numbers, the information they carry is richer. Pranav et al. (2019) present a numerical study of the Betti numbers of Gaussian random fields, and compare them to the Euler characteristic and Minkowski functionals, and find that Betti numbers present a more detailed account of the topological properties of the field compared to the Euler characteristic (also see Park et al. 2013). In general, near the mean level of ν, one expects the components and holes of the excursion set to be of similar size and number. Accordingly, one expects β 0 (ν) and β 1 (ν) to be of similar magnitude, combining to give an Euler characteristic close to zero. Such an Euler characteristic tells us nothing about the individual Betti numbers beyond the fact that they are similar.

Masks and relative homology
We define the mask to be the region in which the data is not reliable, and denote it by M ⊆ S 2 . In our application, the mask includes a belt around the equator corresponding to the thickened disc of the Milky Way, along with other galactic and extragalactic bright foreground objects that interfere with the observation of the CMB radiation. In an effort to exclude the mask from our computations, we consider the reduced excursion set: Treating M as a closed set, this difference is not necessarily closed. An appropriate topological measure is the relative homology of a pair of closed spaces, (E, M), with the second being contained in the first. In our setting, the pair is E = E(ν) and M = M ∩ E(ν). Just as in the absolute case, we get relative homology groups in dimensions 0, 1, and 2, and we use their ranks for quantification. It is tempting to refer to these ranks as relative Betti numbers, but this is not the traditional terminology, and we simply write b p = rank H p (E, M) for p = 0, 1, 2. If M = ∅, then b p = β p , for all three choices of p, but if the mask overlaps with the excursion set, then there are differences. We explain some of these differences with reference to Fig. 1 because the part of its boundary curve outside the mask is open, with endpoints in M. If a hole of M is contained in the excursion set, we get a surface without a boundary. The relation between absolute and relative homology is compactly expressed by the exact sequence of the pair M ⊆ E (Munkres 1984): Briefly, this means that we can assign non-negative integers to the arrows, meaning that the rank of each group is the sum of integers assigned to its incoming and outgoing arrows. For example, in Fig. 1, we have 0 → 0 → 0 → 1 → 1 → 1 → 2 → 4 → 3 → 0 → 0, and it is easy to find the assignment of integers that satisfies the stated property.

Variationally maximal loops
When we count β 1 + 1 holes in absolute homology, we in fact count β 1 loops needed to separate them. In relative homology, the connection is not as intuitive because we also have open loops, whose endpoints lie in the mask; see Fig. 2. Generally, there are uncountably many ways to draw a loop, and in homology they are all considered equivalent. The set of equivalent loops is called a homology class, and any one of the loops in the class is a representative. These classes are the elements of the one-dimensional homology group, which is a vector space. The rank of this group counts the classes that are needed to span the vector space. For visualization, it is desirable to have a unique representative for each class. Similar to the intuitive notion of the rim of a crater, we choose this representative to be as high as possible, alternating between peaks and saddles of f which it connects via ridges within the reduced excursion set. We refer to this loop as the variationally maximal representative of its class; see Fig. 2 for an example. While constructing variational maxima for smooth scalar fields may be problematic, the persistence A&A 627, A163 (2019) Fig. 3. Facets of the rhombic dodecahedron which serve as patches in the HealPix representation of the sphere. In sequence, we show the 12 patches decomposed into 1, 4, 16, and 64 pixels. The final representation is obtained by central projection of the pixel centres and a distortion yielding an approximately equal-area decomposition of the sphere (not shown). algorithm applied to a piecewise linear scalar field produces them as a byproduct of reducing the boundary matrix; see Sect. 3.2.

Data
Decades after the accidental discovery of the CMB, its first space-based observational probe was carried out by the CMB Explorer (COBE) satellite (Smoot et al. 1992), establishing that the CMB is a perfect black-body radiation. Later, the WMAP was launched to study the temperature fluctuations in greater detail (Spergel et al. 2007). Most recently, the high-precision Planck mission was launched, measuring temperature fluctuations to an accuracy of 10 −5 degrees (Planck Collaboration I 2014), and at a resolution of five arc-minutes, giving the most detailed and precise measurement of CMB temperature fluctuations currently available. We use the Planck maps for our analyses (Planck Collaboration IX 2016; Planck Collaboration XII 2016).
Format. The CMB sky maps are presented in the HealPix format (Górski et al. 2005), which is an equal-area pixelisation of the sphere, which we denote as S 2 ; see Fig. 3. Using the faces of the rhombic dodecahedron, we start by decomposing the sphere into twelve patches. Fine resolution is achieved by dividing these patches into N 2 equal area pixels each, meaning that the total number of pixels at this resolution is 12 × N 2 . At maximum resolution N = 2048, the maps have about 50 million pixels.
Observed sky. The Planck satellite observes the sky at seven different frequency bands, leading to component-separated maps using four different techniques: Commander-Ruler (C-R), NILC, SEVEM, and SMICA; cf. Planck Collaboration IX (2016). These are the publicly available maps from Planck data release 2 (PR2-2015) 1 . We use these component-separated CMB maps throughout. These maps are contaminated by noise from various sources, including inherent detector noise, and efforts by the Planck team to denoise the data have not been completely successful. Consequently, our analysis is performed on the maps produced by combining the CMB and noise maps for each realisation of the simulation. This is a fairly simple task, given that the map-making exercise is linear in nature: (2) Simulations. In addition to the observed data, the Planck team released a set of Full Focal Plane 8 (or FFP8) simulations (Planck Collaboration XII 2016) of both the CMB and noise. We use 1000 NILC simulations for our computational experiments. These simulations assume that the CMB is a Gaussian random field, consistent with the null hypothesis of Gaussianity for the CMB, which is what we wish to check, and we use them to estimate the error-bars for testing the significance of differences between observed and simulated maps. Important to note is that these simulations include the effects of realistic foreground models for gravitational lensing, Reyleigh scattering, and more (Planck Collaboration XII 2016, Sect. 3.3.1).
Degradation. In order to perform a scale-dependent analysis of the CMB maps, we degrade them to resolutions between N = 1024 and 8, dividing N by two from one level to the next. The process of degradation amounts to decomposing them into spherical harmonics on the full sky at the input resolution. The spherical harmonics coefficients a lm are then convolved to the new resolution using the formula (Planck Collaboration XXIII 2014): where, b l is the beam transfer function, p l is the pixel window function, and in and out denote the input and the output functions at the different resolutions, respectively. These are then synthesised into maps at the output resolution directly.
Masks. The observation of the CMB by the Planck satellite is incomplete in some regions of the sky, typically as a result of interference from bright foreground objects such as our own galactic disc and bright point sources. In these regions, the CMB sky map is reconstructed as a constrained Gaussian field. In order to avoid these areas in the analysis, we use the most conservative UT78 mask released by the Planck team; see Fig. 4. This mask is a combination of all foreground objects with the least sky coverage and therefore leads to a conservative analysis. The mask is a binary map, where reliable pixels of the CMB map are marked by the value 1, and the unreliable parts by 0. For the scale-dependent analysis, we also degrade the masks, so that the map and the mask have the same resolution. Degrading the original binary UT78 mask converts it into a non-binary mask in a thickened zone at the boundary separating the reliable part of the mask from the non-reliable part. Figure 5 presents these yet-to-be-binarized masks. To re-convert them to binary masks, we set a range of binarization thresholds for our experiments: 0.7, 0.8, 0.9, 0.95. Pixels with values above or equal to the binarization threshold are marked as 1, and the rest as 0. Figure 6 presents the binarized maps at various degraded resolutions, for binarization threshold 0.9. Table 1 presents the percentage of sky that is useable for analysis after masking at various resolutions for this threshold. The percentage of usable area drops with decreasing resolution, with only 66% for N = 16. Figure 7 presents a visualization of the degraded and masked maps for all the resolutions analysed in this paper in the Mollweide projection view.

Computational pipeline
The computational pipeline is tailored specifically to the Planck data. The preprocessing step involves converting the CMB maps given in absolute units to a dimensionless unit, corrected for mean and scaled by the standard deviation (computed using the non-masked pixels only). We use the HealPix package for the preprocessing step. The output of this operation is the A&A 627, A163 (2019) normalized temperature values on 12N 2 pixels, along with their coordinates on the sphere, which is the input to subsequent steps, which we discuss in five sections: (i) triangulating the surface of the sphere with the pixel centres as vertices, (ii) sorting the vertices, edges, and triangles to form an upper-star filteration, (iii) computing the persistence in terms of a reduced boundary matrix, (iv) computing the ranks of the relative homology groups b p , p = 0, 1, 2, and (v) computing the variationally optimal loops from the reduced matrix. The software is written in C++ and designed to handle very large data sets 2 .
Triangulation. The HealPix format stores the data in twelve square arrays of N 2 pixels each, with N = 2048 at the finest resolution. The centres of these pixels are points on the faces of a rhombic dodecahedron. With central projection, these points are mapped to the 2-sphere and distorted to achieve an approximately equal-area decomposition. Taking the convex hull of these points in R 3 , we get a convex polytope whose boundary is homeomorphic to the sphere. Most of the faces will be triangles, and the occasional faces with k ≥ 4 sides can be subdivided into k − 2 triangles to obtain a triangulation of the sphere. This triangulation is the input to all the downstream computations; consisting of V = 12N 2 vertices, 3V − 6 edges, and 2V − 4 triangles, it represents the temperature field, f : S 2 → R, by storing the temperature value at every vertex. We implicitly assume a piecewise linear interpolation along the edges and the triangles. Figure 8 illustrates such a triangulation, using colours to visualize the temperature field. We use the CGAL library (The CGAL Project 2018) to implement the triangulation.
Upper-star filtration. Given a triangulation K of S 2 , let K(ν) ⊆ K contain all simplices (vertices, edges, and triangles) whose temperature values are ν or larger. We use K(ν) as a proxy for E(ν), the corresponding excursion set. Indeed, because of the linear interpolation, there is a deformation retraction from E(ν) to K(ν) (Edelsbrunner & Harer 2010), which implies that the two have corresponding components and holes. To process the sequence of excursion sets, it makes sense to sort the vertices of K in the order of decreasing temperature value. More precisely, we order the simplices of K such that σ precedes τ if (i) is the minimum temperature value of the one, two, or three vertices of σ. The remaining ties are broken arbitrarily. Assuming any two vertices have different temperature values, then the edges and triangles that immediately follow a vertex are exactly the ones in the upper star of that vertex. We therefore refer to any ordering that satisfies (i) and (ii) as an upper-star filter of K and f . The corresponding upper-star filtration consists of all prefixes of the filter, each representing an excursion set. This filtration is instrumental in computing the persistence of components and holes.
Computing persistence. Given an upper-star filter of the piecewise linear temperature field, there is optimised software available to compute its persistence (Bauer et al. 2014). We base our persistence computation on an adaptation of the software. This software is a sophisticated implementation of the basic algorithm, which we now describe and modify to obtain the variationally optimal loops. We write σ 1 , σ 2 , . . . , σ n for the simplices in the triangulation of the sphere, sorted into an upperstar filter. Let ∂[1 . . . n, 1 . . . n] be the corresponding ordered boundary matrix, with ∂[i, j] = 1, if σ i is a face of σ j and dim σ i = dim σ j − 1, and ∂[i, j] = 0, otherwise. This matrix is sparse and stored as such. The standard persistence algorithm reduces the matrix from left to right. To reduce column j, we subtract columns to the left of j with the goal to move the lowest 1 in column j higher or eliminate it altogether. We use modulo 2 arithmetic, and therefore subtracting is the same as adding: 1 − 1 = 1 + 1 = 0. We refer to column j as reduced if it is zero or its lowest 1 has only zeros in the same row to its left. We modify the standard algorithm by continuing the reduction even if the lowest 1 can no longer be changed, referring to the final result as fully reduced. To be unambiguous, we explain this algorithm in pseudo-code, where we write pivot( j) for the row index of the lowest 1 in column j.
for j = 1 to n do while ∃k < j with ∂[pivot(k), j] = 1 do add column k to column j endwhile endfor. The running time of this algorithm is cubic in the number of simplices in the worst case, but the available optimised software is typically much faster. Ranks of relative homology groups. For computing the ranks of the homology groups relative to the mask, we set the vertices belonging to the mask at +∞, and consider the complex M induced by the union of these vertices. This mask is closed by definition. We then compute the filtration and persistence diagram corresponding to absolute homology of E ∪ M. Writing Dgm p (E ∪ M) for the p-dimensional persistence diagram, and recalling that each diagram consists of intervals with real birth and death values, b > d, we obtain the ranks of homology groups relative to the mask: For computing absolute homology, we set the mask pixels at −∞ and consider the union of such vertices as the mask, which is open by definition.

Statistical tests
The data consist of topological summaries (b 0 , b 1 , EC rel ) obtained from 1000 simulations, as well as of the observed CMB field, processed according to the NILC scheme. The goal is to estimate the probability that the physical model that produced the simulations produces quantities consistent with those from the observed CMB field. Let x i ∈ R m , i = 1, . . . , n, be a sample of i.i.d. m-dimensional vectors drawn from a distribution F. Let y ∈ R m be another sample point, assumed to be drawn from a distribution G. We wish to test the (null) hypothesis that F = G, and give the test results in terms of p-values, which compute the probability that y is "consistent" with this hypothesis. We consider two methods of testing for statistical consistency. The first is a parametric test based on the Mahalanobis distance (Mahalanobis 1936), also known as the χ 2 -test. The second is a non-parametric test based on the Tukey depth (Tukey 1975).
The χ 2 -test is more standard but has the disadvantage of assuming that the compared quantities follow a Gaussian distribution, while the Tukey depth works without any assumption on the distribution.
, the sample mean and covariance matrix of the sample x 1 , . . . , x n , respectively. The squared Mahalanobis distance of y tox is then If F is assumed to be Gaussian and n is large, then under the hypothesis that G = F the squared Mahalanobis distance (5) is approximately distributed as a χ 2 distribution with m • of freedom. Thus the corresponding p-value is Tukey depth. As shown in the data analysis below, the distribution F does not always conform to elliptical contours and therefore may not be Gaussian. In such a setting, p-values computed using the Mahalanobis distance may not be reliable.
The Tukey half-space depth provides a general metric for identifying outliers in a flexible manner and in a non-parametric setting. Take x i , i = 1, . . . , n and y as above, making no assumptions on the structure of F and G, and let z be any point in R m . Then the half-space depth d dep (z; x 1 , . . . , x n ) of z within the sample of the x i is the smallest fraction of the n points x 1 , . . . , x n to either side of any hyperplane passing through z. By definition, the half-space depth is a number between 0 and 0.5. Points that have the same depth constitute a non-parametric estimate of the isolevel contour of the distribution F.
To evaluate a p-value for y, we first compute d j = d dep (x j ; x 1 , . . . , x n ) for every point x j , j = 1, . . . , n, yielding an empirical distribution of depth. The p-value is then computed as the proportion of points whose depth is lower than that of y: We note that by construction the depth p-value increases in units of 1/n. For computing half-space depths below, we use the R package depth.

Results
We use the Planck maps for our analyses, which measure fluctuations about the mean in the CMB temperature to an accuracy of 10 −5 K (Planck Collaboration I 2014). Our primary resources for the comparison between the observations and the simulations are the component-separated observed maps obtained using NILC, C-R, SEVEM, and SMICA techniques, as well as 1000 FFP8 simulations obtained using the NILC technique (Planck Collaboration XII 2016). The simulations are based on the ΛCDM paradigm and assume that the temperature fluctuations have a Gaussian distribution. We perform our analyses for a range of scales between 0.05 and 7.33 • , which correspond to resolutions between N = 1024 and N = 8 in the HealPix format (Górski et al. 2005). Further degradation of the maps destabilises the statistics due to the low number of data points in these cases. We do this for a range of mask binarization thresholds: 0.7, A163, page 7 of 30 A&A 627, A163 (2019) 0.8, 0.9, and 0.95; see Sect. 3 for the details of degradation and masking.
In addition, we also compare the observed maps cleaned using the NILC, C-R, SEVEM, and SMICA techniques with 100 simulations each based on the SEVEM and SMICA techniques. The graphs and the p-value tables for these two cases are presented in the appendix. The motivation for this comparison based on a smaller number of simulations is primarily to ascertain if the trends observed are generally consistent irrespective of the cleaning methods. We confirm that this is indeed the case.
We present our analyses in terms of the ranks of relative homology groups, b p for 0 ≤ p ≤ 1. The relative components and loops are quantified by the relative component function, b 0 : R → R, and the relative loop function, b 1 : R → R. We present the graphs of b 0 , b 1 , and of the (relative) Euler characteristic, EC rel , followed by statistical tests that estimate the significance of results. If f (x) : S 2 → R is the absolute temperature at a location x, and f 0 the mean temperature of the distribution, the dimensionless temperature is given by: where σ( f ) is the standard deviation computed from the nonmasked pixels. We then obtain the ranks of relative homology groups as functions of the normalized temperature.

Ranks of relative homology groups
To carry out omnibus tests, we choose 13 a priori levels, −6 , . . . , 6 , where k = k/2, meaning that the normalized temperature thresholds run from −3 to +3 in steps of 0.5, and consider collections of random variables b 0 ( k ), b 1 ( k ), and EC rel ( k ), for −6 ≤ k ≤ 6.
The top two rows of Figs. 9-11 present the curves of b 0 , b 1 , and EC rel , respectively, for resolutions between N = 1024 and N = 8, for mask threshold 0.9. The graphs present the average curve (black) from 1000 NILC simulations, with errorbands drawn up to 3σ. The individual curves from simulations are drawn as dotted black lines, a few of which escape the 3σ band. Also plotted are curves from NILC, C-R, SEVEM, and SMICA observed maps. The bottom two rows present the difference between the observations and simulations in terms of the number of standard deviations for the various temperature thresholds. b 0 (ν), b 1 (ν) and EC rel (ν) show a difference from simulations peaking near 2σ for some temperature levels for all resolutions. Additionally, b 0 and b 1 show differences peaking between 3 and 4σ sporadically between N = 32 and N = 8. Noteworthy is the 4.5σ deviation of b 1 between the observations and simulations at N = 8 at the normalized temperature threshold ν = −3 in Fig. 10. For the same case, the numbers based on the SEVEM and the SMICA simulations are approximately 5.5σ. However, the low temperature and resolution at which this deviation occurs entail a small number of topological objects on which the statistics are based. As a result, these numbers should perhaps be regarded with a degree of scepticism.
At N = 1024, the observed maps based on SEVEM and SMICA simulations deviate very significantly from the NILC simulations map in the range 4−6σ at the mean temperature threshold ν = 0. This may perhaps be attributed to the differences in the cleaning pipelines. However, the fact that for lower resolutions the curves are broadly consistent with each other points to the robustness of the underlying data measurement, as well as the mutual consistency of the cleaning methods. A similarly consistent trend is observed in the graphs based on SEVEM and SMICA simulations in the appendix. It is evident that the comparisons based on the NILC (in the main paper), the SEVEM, and the SMICA simulations (in the appendix) show consistent trends.

Experimental evidence of Euler characteristic suppression
As noted above, the Euler-Poincaré formula states that the Euler characteristic is the alternating sum of the Betti numbers. As a consequence, the signals in Euler characteristic are suppressed by design, due to the cancellation of the constituent Betti numbers. Our experiments provide evidence for such suppressions of the topological signals emanating from the Euler characteristic.
As an example, consider the quantities at the degraded resolution N = 16, and temperature threshold value ν = 0.5. At this resolution and threshold, there is a significant difference in b 0 between observations and simulations at 3.7σ ( Fig. 9), but the corresponding value for the Euler characteristic is 2.4σ (Fig. 11). This is because of the cancellation effects between b 0 and b 1 in determining the Euler characteristic. More instances of such cancellation effects can be seen in the graphs, particularly at N = 1024, where even though the graphs for b 0 and b 1 from SEVEM and SMICA observed maps deviate by 4−6σ from the NILC simulations at ν = 0, the graph of EC rel shows no significant deviation at this threshold.

Statistical significance of the results
We consider the two methods detailed in Sect. 3.3, and present p-values of the observed maps for both. We consider the variables b 0 ( k=0,...,6 ), b 1 ( k=−6,...,0 ), and EC rel ( k=−6,...,6 ) for estimating the statistical significance of the results. The choice of regions is determined by the fact that b 0 (ν) tends to be small, and carries little information for ν < 0, b 1 (ν) tends to be small for ν > 0, and the Euler characteristic is informative over the full range of levels. We perform summary and specific tests for mask binarization threshold values corresponding to 0.7, 0.8, 0.9, and 0.95. Appendix A presents the table of p-values and the principal component graphs based on 1000 NILC simulations.

Global or summary tests
As a global test for the evidence of a non-random discrepancy in any of the degraded resolutions analysed, we take the full set of normalised differences -for the eight degraded resolutions and the relevant thresholds -as a single vector for each of the three topological quantities separately. Thus, in terms of the notations in Sect. 3.3, for each of b 0 and b 1 , seven thresholds result in m = 56 (8 degraded resolutions and 7 temperature thresholds for each resolution); and for the EC, 13 thresholds result in m = 104 (8 degraded resolutions and 13 temperature thresholds for each resolution). The last entry for each mask threshold in Table A.1 presents the summary χ 2 and depth p-values. Overall, there is strong indication that the observations differ from the simulations. While at any given degraded resolution the p-value to test for differences is not always small, the fact that they all point in a consistent direction is captured by the summary statistics, which show a very high level of statistical significance.  Fig. 9. b 0 graphs for resolutions between N = 1024 and N = 8. Top two rows: observed curve obtained using NILC, C-R, SEVEM, and SMICA methods, and the expected (black) curve computed from 1000 NILC simulations, along with bands drawn up to 3σ. Also plotted underneath are the curves from individual simulations. Bottom two rows: curve presenting the difference between the observations and simulations in terms of the number of standard deviations for the various temperature thresholds. Maximum noted deviation is at N = 16 at 3.7σ. The threshold along the horizontal axis runs from positive to negative, in view of the fact that we analyse superlevel sets of the normalized temperature field.

Tests for specific degraded resolutions
This is followed by specific tests for each resolution. The rest of the entries in Table A.1 present the Mahalanobis and the depth p-values for each resolution, for different mask thresholds. The Mahalanobis distances are particularly small for b 1 at N = 16, and very significant at N = 8, across all binarization thresholds. Although they are not stable across binarization thresholds, b 0 and EC rel also show significance. The depth p-values are very significant for b 1 for N = 16 and N = 8, while b 0 shows high significance at N = 32 consistently across the range of binarization thresholds. The depth p-values also show high significance at higher resolutions, more often for b 1 than for b 0 , but not at all for EC rel , presumably because of cancellation effects. When considering Tukey depth, b 1 shows significance more often than b 0 and EC rel , and is an order of magnitude more significant compared to b 0 and EC rel when considering the Mahalanobis distance. These Regardless of the choice of test or the cleaning pipeline, it is evident that the model and observations disagree significantly at least in the number of loops on a range of scales between approximately 1 and 7 • . For the Mahalanobis values, the general trend of significance increases up to N = 8, providing additional evidence that the deviations are not purely due to chance. For both tests, the p-values for the summary tests tend to be more significant than for individual resolutions. Another general trend is that the non-parametric test shows the difference between the observed and the simulated maps to be starker than the parametric test. To help interpret this difference, Figs Fig. 10. b 1 graphs for resolutions between N = 1024 and N = 8. Top two rows: observed curve obtained using NILC, C-R, SEVEM, and SMICA methods, and the expected (black) curve computed from 1000 NILC simulations, along with bands drawn up to 3σ. Also plotted underneath are the 1000 curves from individual simulations. Bottom two rows: curve presenting the difference between the observations and simulations in terms of the number of standard deviations for the various temperature thresholds. Maximum noted deviation is at N = 8 at 4.5σ. It is difficult to judge the validity of this number, as the low temperature threshold (ν = −3) entails a low total number of objects on which the statistics are based. The next peak in the curve is located at a moderate threshold (ν = −0.5), and indicates a deviation at 2.9σ. The threshold along the horizontal axis runs from positive to negative, in view of the fact that we analyse superlevel sets of the normalized temperature field.
plots that visualise to what extent the assumption of a Gaussian distribution for the compared quantities is justified; see also  . Observed CMB points are in red. Examining the diagrams corresponding to the Mahalanobis distance, the hypothesis that the distribution conforms to elliptical contours is questionable. Figures A.2-A.4 present the projection onto the first two principal components for b 0 , b 1 , and EC rel , respectively, for specific resolutions. Also drawn are the Mahalanobis (top two rows) and Tukey depth (bottom two rows) contours. In general, the symmetric Mahalanobis contours do not always fit the data. However, as the resolution decreases, the Mahalanobis contours, which are Gaussian in nature, seem to fit the data well, and may be a reasonable approximation after all. Such graphs based on SEVEM and SMICA simulations are presented in Figs. B.7-B.10. It is evident that 100 simulations may not be enough to reliably resolve the p-values.

Summary and conclusions
We provide evidence for the deviation of the observed Planck CMB maps from the Gaussian predictions of the standard ΛCDM model. Specifically, we find an over-abundance of loops in the observed maps, deviating from the simulations at per cent to less than per mil levels. This is in terms of p-values computed using χ 2 statistics, between the resolutions N = 32 and N = 8. The difference in the number of components and loops peaks sporadically at more than 3σ from the predictions between  Fig. 11. Euler characteristic graphs for resolutions between N = 1024 and N = 8. Top two rows: observed curve obtained using NILC, C-R, SEVEM, and SMICA methods, and the expected (black) curve computed from 1000 NILC simulations, along with bands drawn up to 3σ. Also plotted underneath are the 1000 curves from individual simulations. Bottom two rows: curve presenting the difference between the observations and simulations in terms of the number of standard deviations for the various temperature thresholds. Maximum noted deviation is at N = 32 at 2.9σ. The threshold along the horizontal axis runs from positive to negative, in view of the fact that we analyse superlevel sets of the normalized temperature field. N = 32 and N = 8. Results based on smoothed maps corroborate with those based on degraded maps in terms of approximate scales at which the anomaly is observed. We also compute the absolute homology for the dataset, and confirm that the results are consistent with those from relative homology. External evidence that these deviations are not a result of overanalysing the data comes from the fact that the variance of the observed CMB is anomalous with respect to the standard model at N = 16 (Planck Collaboration XXIII 2014), and the computed power spectrum exhibits a dip roughly at this range of scales. In addition, there are reports of a mildly significant Euler characteristic at 3.66 • (N = 16) (Eriksen et al. 2004b), computed from independent measurements of the CMB by Planck's predecessor -Wilkinson Microwave Anisotropy Probe (WMAP) satellite. This can be explained by the significantly high number of loops and components, together with cancellation effects that the Euler characteristic suffers from. Similar observations by independent satellites suggests that it is unlikely that the source of the anomaly has its origin in instrumental noise or systematic effects. Moreover the medium super-horizon scales at which we observe it, could possibly point to a cosmological origin. The non-parametric Tukey depth test shows the observations to be different from the simulations at almost all resolutions. Regardless of the preferred test, the topological structure of the CMB appears to deviate from the simulations, at least on some scale. This trend is robust and consistent irrespective of the choice of the cleaning method, thus ruling out the possibility that the deviations we observe are merely an artifact of the cleaning method.
We can rule out this anomaly being the effect of the cold spot in the CMB sky, or any previously detected directional A163, page 11 of 30 A&A 627, A163 (2019) Fig. 12. Visualization of the loops for the largest excursion set, which consists of the entire sphere minus the mask. To improve the visualization, the temperature field has been smoothed by a small amount, and we do not draw very short loops. From left to right: the sphere from the top, the bottom, the left, and the right views.
anomalies. Our statistics are based on a large number of loops surrounding the low-density regions, to which the loop generated by the cold spot may contribute at most only a few, and often only one. Moreover, to support this claim, we visually confirm that these loops are scattered all over the sky (see Fig. 12). We also test and confirm that simulations that are based on Gaussian prescriptions and match the characteristics of the observed "dipped" power spectrum cannot resolve this anomaly. Additionally, we present topological methods that are suitable in the presence of obfuscating masks. As such, the results presented in this paper are robust despite lacking full sky coverage, and are model-independent.
In conclusion, we reiterate that we present clear evidence of departure of the observed CMB maps with respect to the simu-lations based on the ΛCDM paradigm, but make no attempt to address the issue of the physical mechanism behind this phenomenon; a question we leave to the wider cosmological community. Nevertheless, our analysis demonstrates the existence of unexpected topology in the CMB. Possible, but non-exhaustive scenarios worth exploring may be primordial non-Gaussianity, as well as models with non-trivial topology including topological defect models.

Appendix B: Comparison with SEVEM and SMICA simulations
This appendix presents the results from the comparison of the observed maps with a hundred simulations from each of SEVEM and SMICA methods. In summary, the results are consistent with those based on the NILC simulations presented in the main paper and Appendix A.    Notes. The tables display relative homology obtained from parametric (Mahalanobis distance) and non-parametric (Tukey depth) tests, for four mask binarization thresholds. The last entry for each threshold is the p-value for the summary statistic computed across all resolutions. Marked in boldface are p-values of 0.05 or smaller.