T-ReX: a graph-based filament detection method

Numerical simulations and observations show that galaxies are not uniformly distributed in the universe but rather spread on a filamentary structure. In this large scale pattern, highly dense regions are linked together by bridges and walls, all of them surrounded by vast nearly-empty areas. While nodes of the network are widely studied in the literature, simulations indicate that half of the mass budget comes from a more diffuse part of the network made of filaments. In the context of recent and upcoming large galaxy surveys, it becomes essential to identify and classify features of the Cosmic Web in an automatic way to study their physical properties and the impact of the cosmic environment on galaxies and their evolution. In this work, we propose a new approach to automatically retrieve the underlying filamentary structure from a 2D or 3D galaxy distribution using graph theory and the assumption that paths linking galaxies together with the minimum total length highlight the underlying distribution. To obtain a smoothed version of this topological prior, we embed it in a Gaussian mixtures framework. In addition to a geometrical description of the pattern, a bootstrap-like estimate of these regularized minimum spanning trees allows to obtain a map characterising the frequency at which an area of the domain is crossed. Using distribution of halos derived from numerical simulations, we show that the proposed method is able to recover the filamentary pattern in a 2D or 3D distribution of points with noise and outliers robustness with few and comprehensible parameters.


Introduction
Large galaxy surveys like the Sloan Digital Sky Survey (SDSS, York et al. 2000) confirmed the pattern drawn by the matter at very large scales, initially depicted by analytical works and observed in the first N-body simulations (e.g. Zel 'dovich 1970;Doroshkevich & Shandarin 1978). In this pattern commonly called the Cosmic Web (Bond et al. 1996), filaments act like cosmic highways, linking together large overdensities of matter and playing a key role in the dynamics of the universe. Since then, the community considerably enhanced the quality and the resolution of simulations with, to name a few, Millenium (Springel et al. 2005), Illustris (Vogelsberger et al. 2014) and Horizon-AGN (Dubois et al. 2014). These high resolution simulations of the dark matter (DM) evolution, sometimes even including baryonic matter, hence led to a more accurate spatial distribution of matter and allowed to quantitatively characterise the different cosmic structures in terms of morphology, density, composition, etc. (see e.g. Colberg 2007;Aragon-Calvo et al. 2010;Cautun et al. 2014;Gheller et al. 2016;Gheller & Vazza 2019). Revealing the faint filamentary pattern of the Cosmic Web in data often relies on the use of galaxies as tracers of the dark matter distribution and allows to study the influence of cosmic environment on the formation and evolution of those tracers (e.g. Alpaslan et al. 2014a,b;Martinez et al. 2016;Kuutma et al. 2017;Malavasi et al. 2017;Laigle et al. 2018;Codis et al. 2018;Kraljic et al. E-mail: tony.bonnaire@ias.u-psud.fr 2019; Sarron et al. 2019;Malavasi et al. 2019). It usually involves either stacking or individual inspection of objects after their detection. The observation of the filamentary pattern is currently performed using different observables: X-ray emissions (see e.g. Dietrich et al. 2012;Eckert et al. 2015;Nicastro et al. 2018), weak lensing (e.g. Gouin et al. 2017;Epps & Hudson 2017) or through Sunyaev-Zel'dovich effect (see e.g. Bonjean et al. 2018;Tanimura et al. 2019b;de Graaf et al. 2019;Tanimura et al. 2019a).
To perform such statistical and physical analyses, it is essential to detect the filamentary pattern in an automatic way and this task is even more challenging when dealing with real observations. By visual inspection, one can easily identify the underlying structure, especially in mock datasets, might it be the filament-like or clustered parts of the pattern. Over the years, the key question quickly became "How can we extract automatically what is visually observed?". In 1985, Barrow et al. used for the first time a minimal spanning tree (MST, Borůvka 1926) approach in a cosmological context to exhibit the underlying filamentary pattern from a 2D or 3D galaxy distribution, arguing that usual statistical procedures as two-points correlation function are not sensitive to this specific feature. Since then, several methods have been developed to analyze and describe this gigantic network and still, filaments do not have a unique well-posed definition. In an intuitive way, filaments correspond to bridges of matter between two dense regions of the space. With this simple idea, many algorithms with their own mathematical definitions have emerged. With no Article number, page 1 of 15 arXiv:1912.00732v1 [astro-ph.CO] 2 Dec 2019 A&A proofs: manuscript no. aanda aim to be exhaustive (see Libeskind et al. 2017, for a detailed review about existing methods to classify cosmic web elements): -Methods using the minimum spanning tree, an object coming from graph theory. The resulting tree highlights a preferable path minimizing the total distance to link galaxies together (Barrow et al. 1985;Alpaslan et al. 2014a). After several processing of the graph proper to each method, filaments are extracted as branches of the tree. -The study of the topological properties of the continuous density field through the Discrete Morse Theory led Aragón-Calvo et al. (2010) and Sousbie (2011) to define filaments as the set of gradient lines linking maxima and saddle points, -The seminal work of Aragon-Calvo et al. (2007) allowed Cautun et al. (2013) to build Nexus, an algorithm that performs a scale-space representation of the field in which filaments are defined locally through the relative strength between eigenvalues of the Hessian matrix of a smoothed continuous density obtained from the Delaunay Tessellation Field Estimator (Schaap & Weygaert 2000), -Another class of approaches uses a statistical representation of stochastic point processes to model the geometry of the filamentary structure. In particular, Stoica et al. (2007) are modeling filaments as connected and aligned cylinders through the marked point-processes theory, - Genovese et al. (2014) and Chen et al. (2015) proposed to identify cosmic filaments as ridges in the distribution of galaxies using an automatic algorithm moving iteratively a set of points along the projected gradient, -Some indirect methods try to first recover the initial density field and then make it evolve forward in time using the Lagrangian perturbation theory. Kitaura (2013) and Jasche & Wandelt (2013) indeed respectively paved the way for Bos et al. (2014) and Leclercq et al. (2016) to develop such tools. Note that these methods are indirect reconstructions and are not directly related to our issue of detecting cosmic web elements even though Leclercq et al. (2016) used the inferred final density field in a game theory framework to classify structures in the reconstructed density field.
This large variety of approaches, all aiming at identifying filaments in a spatial distribution of matter tracers, reveals how this problem can be hard to handle and of great importance for observational cosmology. Also, some of the above methods are designed on simulations and using dark matter particles to detect those features but if we want algorithms to be able to handle real datasets, we need it to work specifically with galaxies (or halos in simulation) as inputs. With this in mind, we developed an algorithm using a set of 2D or 3D galaxy positions to build a smooth representation given by a graph structure and standing in the ridges of the distribution. The presented method does not rely on any density estimation but directly on the set of observed datapoints. It does not assume any shape for filaments but rather a global weak prior on the Cosmic Web connectivity and can be easily extended to any topological prior as long as given by a graph structure. Furthermore, it can be used as a denoised representation of the Cosmic Web for other applications than filament detection. In a first section, we present the datasets we use all along this article to illustrate steps and results of the algorithm, called T-ReX, with some of the previously mentioned methods. Section 3 provides the required mathematical formalism used to build the procedure. Section 4 develops the method step by step and illustrate the obtained results on a simple dataset while Section 5 discuss the effect of each parameter on the resulting estimate of the underlying structure. Finally, Section 6 shows and discuss outputs obtained on cosmological datasets and compare it with other existing methods, namely Bisous, DisPerSE and Nexus.

Data
In order to develop and test the main steps of the algorithm, we use a simple and non-cosmological dataset, hereafter called toy dataset, shown in Fig. 1. It is constructed so that it mimics a regularly curved structure, the filament, linking two clusters of points standing for overdense regions. The use of this toy dataset enables us to explore the impact of the parameters and test the reliability of the algorithm.
As a realistic cosmological dataset representing the Cosmic Web, we use the Illustris simulation outputs 1 (Vogelsberger et al. 2014). It is a set of large-scale hydrodynamical simulations with different resolutions in which an initial set of particles (dark matter or baryonic gas) distributed over a 75Mpc/h box is evolved forward in time from high redshift to z = 0. From the resulting distribution at z = 0, halos of dark matter are identified using a Friend-of-Friend algorithm (More et al. 2011). To assess the algorithm on cosmological cases and mimic its use for a galaxy survey, we consider structures inside halos, called subhalos, identified with the Subfind algorithm (Springel et al. 2008) and provided by the Illustris package, as already done in previous recent studies (Coutinho et al. 2016). For convenience, we sometimes refer to these subhalos as "galaxies". tion obtained from the Illustris-3 simulation in which subhalos have been extracted. We can see how these "galaxies" trace the underlying web, more precisely drawn by the dark matter particles. In the following, each time we use a dataset built from the Illustris simulation, it always concerns the box at redshift z = 0 and the Illustris-3 resolution obtained from 455 3 dark matter particles with a mass resolution of 4.0 × 10 8 M . When needed, we will explicitly specify the settings with which the subset of particles is obtained. Namely, we will specify the type of particles we are showing (subhalos or DM particles), the cut in the spatial distribution (over x e , y e or z e spatial axes) and the cut in total mass M over the considered particles in the spatial range.
Finally, to compare our results with other methods, we also apply T-ReX on FoF halos extracted from a 200Mpc/h box of a Gadget-2 N-body simulation with 512 3 particles (Springel et al. 2005). This particular simulation 2 is the one used in Libeskind et al. (2017) who proposed a unified comparison of the main existing procedures to classify elements of the cosmic web using either dark matter particles or dark matter halos as input.

General formalism
Relying on the simple and only assumption that observed points (i.e. galaxies) are tracing the underlying Cosmic Web, the main idea of T-ReX is to model the filamentary structure as the set of ridges (or principal curves) in the input point cloud. To extract these ridges, we use the minimum spanning tree and extend its previous application in cosmology by building a smooth version of it standing 'in the middle' of the cloud. The basic idea behind this approach is that the true filamentary structure is a continuous 2 http://data.aip.de/tracingthecosmicweb/doi: 10.17876/data/2017_1 manifold that can be described with a graph structure while the observed galaxies represent a sparse and noisy sampling of that manifold. More precisely, in this paper, we aim at finding the best 1D representation of that manifold using a tree topology. This section introduces the required formalism to highlight how clustering methods as Gaussian Mixture Models (GMM), combined with graph theory, can be used to build such a representation starting from a general set of N datapoints

Elements from graph theory
Let G = (V, E) be an undirected graph, with V the collection of vertices, E = {(i, j)|(i, j) ∈ V 2 } the set of edges linking nodes together and {w ij } (i,j)∈V 2 the set of edges weights such that ∀(i, j) ∈ V, w ij ≥ 0. In our case, we consider w ij = x i − x j 2 2 . Let us also define d i the degree of a node i ∈ V as the number of edges directly connected with it.
We call minimum spanning tree M the subgraph of G with |V| − 1 edges that is reaching all nodes of V with the minimum total weight. By construction, M has no loops and is unique if there are not two edges with the same weight in G, which, in our case, does not seem likely to happen since it would imply galaxies with the exact same distance between them. Still, it would only create very local modifications of the tree structure that would be erased by future operations. In a tree-like structure, we can define three exclusive typologies for a node i depending on its degree: extremity node (d i = 1), junction node (d i = 2) or bifurcation node (d i > 2).
Graphs can be represented by some computable quantities encoding the full graph information. A first representation is given by the adjacency matrix of G, noted A, which is a symmetric |V| × |V| matrix encoding whether two vertices are linked or not. Elements A ij of this matrix take values as follows: This matrix encodes all the knowledge about the connectivity of vertices in the graph G, and if we consider the matrix W such that W ij = w ij A ij , we end up with a matrix describing the full graph. Another useful representation of a graph is the Laplacian matrix from which spectral decomposition gives fundamental information about the graph structure (Lurie 1999). Let G be an undirected simple graph with an adjacency matrix A and D is a diagonal |V| × |V| matrix in which the element D ii corresponds to the degree of the node i. Then, the Laplacian matrix of G is the symmetric, positive semidefinite |V| × |V| matrix defined as (2) As the MST reaches all datapoints, the resulting graph is not smooth and therefore does not reveal properly the local geometry of the underlying distribution (see Fig. 3). In order to recover the shape of the distribution, we span the set of datapoints with a given number of centroids that will coarse grain the density distribution. This task is achieved by using Gaussian Mixture Models. The key idea of T-ReX is thus to learn a smooth representation of the d-dimensional dataset standing in its ridges by computing a set of centroids with an enforced topology given by a graph structure.

Expectation-Maximization for Gaussian Mixture Models
Gaussian Mixture Models (GMM) are part of parametric mixture models that can be used to map a cloud of points to a density distribution by using a restricted number K of kernels to model the distribution. Starting with random parameters for Gaussian kernels, their positions and variances are adjusted iteratively to fit best the observed data. GMM are also extensively used in unsupervised clustering approaches where the aim is to partition the datapoints into K clusters by defining a probability that a given datapoint is part of the k th cluster. Using GMM, each cluster is represented by a Gaussian distribution and the clustering is reduced to an estimation problem of the Gaussian's parameters. Here we extend this second approach so that the clusters pave the observed set of datapoints in its ridges.
In practice, we define K ≤ N centroids {f k } K k=1 with f k ∈ R d and assume that the dataset X is drawn in an independant and identically distributed way from an unknown density that we model as a weighted linear combination of K Gaussians where θ = {π 1 , . . . , π K , f 1 , . . . , f K , Σ 1 , . . . , Σ K } is the set of model parameters, π k is the weight of the k th component such that Note that this goal could also be achieved using the K-Means algorithm (Macqueen 1967) where we minimize the L 2 risk This kind of similarity-based clustering of the data, however, generates a hard partition of the input domain meaning that each point x i can only be member of one group f k and generally lacks of flexibility and robustness to noise and outliers. Mixture models can be used to face this difficulty by considering the conditional probability of a datapoint being part of a cluster given the assumed model.
From the assumption that the data are drawn from such a density, all we have to do is to estimate the values for θ fitting best the observed data. This is generally achieved by maximizing the log-likelihood function from which, in this case, it is impossible to get an analytic solution when maximizing with respect to θ.
To bypass this difficulty, we use an Expectation-Maximization (EM) approach (Dempster et al. 1977) by defining a set of latent variables Z = {z i } N i=1 encoding the partition of the dataset: z i ∈ 1, K denotes which of the K Gaussian components x i belongs to. The completed loglikelihood is then which can be maximized using EM approach. As we introduced a new unknown quantity through Z, the central idea of EM algorithm is to alternatively estimate Z by the expectation over p(z | x) (E-step) and then update the parameters of the mixture θ by maximizing the new likelihood on the basis of the current distribution for Z (M-step). This procedure provides an algorithm maximizing locally the true likelihood. Mathematically, the procedure can be understood more generally as follows; for any probability distribution over the latent variables, q(Z), it reads where D KL (q || p) ≥ 0 is the Kullback-Leibler divergence (Kullback & Leibler 1951), implying that L(q, θ) is a lower bound for the log-likelihood. The idea behind EM formalism is to maximize the lower bound L(q, θ) instead of the log-likelihood directly. The Estep consists of fixing θ and maximizing L(q, θ) with respect to q. By noting that L(θ; X) does not depend on q, we simply need the divergence to be cancelled out in order to maximize the lower bound and thus which can be computed using Bayes' theorem. In the Mstep, considering we are performing the t th iteration, we fix q(z) = p(z | x, θ (t) ) and update the optimal set of parameters such that θ (t+1) = argmax θ L(q, θ).
To summarize, EM is an iterative approach able to identify K clusters from the data itself with guaranteed convergence. In a first step (E), a probabilistic (soft) assignment of each datapoint to mixture components is computed and in a second one (M) an estimation of mixtures' parameters is performed given the distribution for the latent variables. The main advantage over the K-means method is that GMM allow a soft partitioning of the input dataset through this q(z) distribution.

Regularized GMM for ridge extraction
So far, we simply addressed the Gaussian mixture clustering with an Expectation-Maximization approach and have access to K separated clusters with their own means {f k } K k=1 and covariances {Σ k } K k=1 representing the data but with no smoothness constraints or topology enforced. From the observation that the MST naturally traces ridges and the underlying connectivity of datapoints without any free parameter, we can enforce a tree topology to our centroids to obtain a representation combining this idea of the MST and the local averaging naturally provided by GMM to impose smoothness. The question the full formalism tries to answer is then: what is the smooth minimal tree structure that is fitting best the set of observed data? In general, if we want the centroids to have a given shape, we need to incorporate a prior distribution p(θ) in previous equations. The presented framework is very close and inspired, in its form and spirit, to manifold learning methods for dimensionality reduction (see e.g. Roweis & Saul 2000;Gorban & Zinovyev 2005) and, in particular, principal curves (Hastie et al. 1989) field which already studied the application of mixture models to curve extraction from point distribution (Tibshirani 1992;Bishop & Svensén 1998).
With such a prior, we no longer aim at maximizing directly the likelihood but the posterior log p(θ | x) ∝ L(θ; X) + log p(θ). In this context of maximum a posteriori estimation, previous equations and results from EM algorithm remains unchanged for the E-step, the maximization over q being independant on p(θ). In the case of the M-step, the update is computed such that θ (t+1) = argmax θ L(q, θ) + log p(θ).
The log-prior can be considered as a regularization term on the log-likelihood and keeping in mind its role helps us choosing it correctly. In particular, we want to give centroids a smoothness constraint and to enforce a topology through a given graph structure G. Hence we use a Gaussian form for the prior with a variance ν 2 thus acting on the L 2 norm F 2 G to constrain the smoothness of centroids on the graph domain, as usually done in statistics (Smola et al. 2001) and inspired by previous studies on elastic topology regularization (Durbin & Willshaw 1987;Yuille 1990) and manifold learning (Gorban & Zinovyev 2005), where F ∈ R d×K such that column k of F contains f k and L is the Laplacian matrix as defined in eq. (2).
In the context of this paper and its application, we simplify this formalism by considering equidistributed Gaus-sian mixtures (∀k ∈ 1, K , π k = 1/K) with identical and isotropic covariances σ 2 I d , where I d denotes the d×d identity matrix. This reduces the problem to the estimate of θ = {f k } K k=1 during the M-step. By noting p ik = p(z i = k | x i , θ k ) the probability of a given datapoint x i being well represented by the cluster k, we find The first term of this optimization problem corresponds to a soft K-means clustering (Bezdek 1981) while the righthand side is an elastic regularization term constraining the topology of centroids. Under the previous simplifications and looking for specific topology given by the minimum spanning tree, the presented formalism is equivalent to the work of Mao et al. (2015). To simplify again the notation and to link the two variances σ 2 and ν 2 , we can introduce the parameter λ = σ 2 ν 2 as the relative strength of the two kernels. The final problem of the M-step can hence be written The first term of this equation tries to minimize the error when datapoints are approximated by centroids while the second term acts like an elastic constraint on centroids when they are linked together in the considered graph. λ can be seen as a regularization parameter acting like a soft constraint on the total length of the graph and thus as a trade-off parameter between the data fidelity term and the penalty term constraining the smoothness and simplicity of the graph representation.

T-ReX: Tree-based Ridge eXtractor
Given a set of N observed datapoints X = {x i } N i=1 , each living in a d-dimensional euclidean space R d , the first step of T-ReX is to build a graph with a tree structure. This is achieved by computing the MST over X resulting in a unique preferable path to link points together (see Fig. 3). This tree then goes through several processes to obtain a version that is robust to noise, outliers and to gain some smoothness properties.

Pruning of the tree
Considering that we obtained a graph with a tree structure, we adopt a simple denoising operation by cutting all the nodes standing in branches of the tree at a level l. In practice, branches are defined as the set of connected nodes linking an extremity node to a bifurcation node (defined in Sect. 3.1). Strictly speaking, we iteratively remove all nodes of degree one in the graph structure. By doing so, we remove the most spurious part of the structure corresponding to nodes that are more likely to be found in physically irrelevant regions for the underlying pattern (i.e. underdense regions). This approach is iterative, meaning that nodes which are initially bifurcations can become junctions or extremities (or even be removed if there are only branches with path length strictly lower than l connected to it). To give a representative image of this procedure, it acts like iterative peeling of an onion, attributing to each node a depth in terms of layers to peel before we reach it and starting from extremities (Hébert-Dufresne et al. 2016). This method is very close to the first step introduced by Barrow et al. (1985) where all branches with a path length inferior to l are removed (meaning that there are less than l nodes in the branch) except that our approach also cuts extremities of longer branches.
Previous MST methods usually perform, in addition to this pruning, a removal of all edges above a given physical length. In our case, we do not consider this operation not to introduce a new parameter that is not easy to tune but also because we argue that all connections, even 'long ones', can bring information about the underlying structure. Of course, it has the effect that if two unconnected parts of a network are given as an input to the presented method, they will end up connected. Figure 4 shows the pruned MST obtained with a given cut-off level on the toy dataset. We can clearly observe that removing extremity nodes iteratively acts like a denoising operation, deleting small branches and irrelevant ones while preserving the core of the pattern. The choice of the pruning level is essential for a single realization of a tree, especially when dealing with noisy data. Section 5 analyzes the impact of this parameter on the resulting tree.

The regularized minimum spanning tree
As discussed in Sect. 3, the MST does not have a smooth behaviour. To enforce this constraint in our representation, we solve Expectation-Maximization equations (8) and (11) following the work and notations of Mao et al. (2016) by applying Algorithm 1. It is worth noting that the inverse of 2λL+Λ always exists since L is a positive semi-definite matrix and Λ is a positive diagonal matrix. The convergence is guaranteed by the EM approach and characterized by a slow displacement of the projected points F t −F t−1 2 2 ≤ where t denotes the iteration index.
The computational complexity of Algorithm 1 can be divided into three components: i) The computation of the MST over the centroids, ii) The computation of the assignment matrix P to solve the E-step and iii) The matrix inversion to update centroids positions during the M-step. As already pointed out in Mao et al. (2015), the total complexity is O(K 3 + DN K + K 2 D).

Algorithm 1 Regularized minimum spanning tree
Input: Data: X ∈ R d×N , parameters: λ and σ. Output: F ∈ R d×K , the set of centroids and B, the associated adjacency matrix.
Initialize F = X or with K-Means clustering.
while convergence do Compute the minimum spanning tree B from F . Compute the Laplacian matrix L of B via eq. (2).

E-step:
Compute the assignment matrix P with (i, k) entry Solve eq. (11) to update the position of centroids, F = XP (2λL + Λ) −1 . 3 end while Figure 5 shows the difference between the MST directly built on datapoints and its regularized version obtained from Algorithm 1. The regularized minimum spanning tree (RMST) has smooth extensions (visible on the zooms of Fig. 5) while preserving the global shape of the tree-like structure. In inflexion regions of the filament, we observe that the tree is a creating bifurcations. This is due to the chosen MST topology for the centroids. In this precise case, with a single filament, the best topology for centroids would be a straight line described by an adjacency matrix such that A ij = 2δ i,i − δ i,j+1 − δ i,j−1 , where δ i,j denotes the Kronecker delta function. It should be noted that such a topology could be handled by the formalism presented in Sect. 3.3.

The probability map
As previously mentioned in Sect. 3, a graph with a tree structure has no loops, and hence can not represent holes but just connected components in the Cosmic Web topology. In addition to that, the MST highlights one particular path linking datapoints together but does not provide any idea of uncertainty or reliability of this latter. Both of these issues can be overcome by introducing a robust representation that takes into account the eventual variations in the input distribution. To do so, we build B different samples b=1 from the initial one X and compute the regularized MST for each of them, in a similar fashion that done in bootstrap approaches. The entire procedure is described through the Algorithm 2.
From the B realizations of RMST, one can build a map I characterizing the probability, in a frequentist meaning, of a position x to be crossed by a realization of a tree: where 1 A is the indicator function and H b is the binary histogram obtained from the projected points F b . The random nature of I thus comes from the uniformly at random resampling of X and not from Algorithm 1 that is a deterministic optimization step.

Algorithm 2 Bootstrap RMST
Input: Data X, parameters λ, σ, l, B, N B . Output: S, the set of points describing the skeleton.
Apply Algorithm 1 on Y b with parameters λ and σ to obtain the regularized MST B R b and optimal F b . end for Figure 6 shows a probability map obtained from the toy dataset in which the intensity of each pixel corresponds to the frequency that an edge of the MST crossed it. This way, we quantify the reliability of the various paths in the input domain. In practice, to build I(x), we use both the projected points F b and the set of edges linking vertices encoded in B R b that contains information on the paths used and consequently should be taken into account in the final distribution. Edges are thus sampled and counted in the computation of H b for eq. (12). In what follows, we may refer to a quantity called the superlevel set of those maps defined as Γ p (I) = {x | I(x) ≥ p}. Those sets are used to threshold the probability maps and keep only regions with a probability higher than p.

Choice of T-ReX parameters
We summarize in Table 1 the parameters of the algorithm together with their roles. We also give the baseline values further used in our study. As we are dealing with simulations of the Cosmic Web, we fix the cut-off level to a low value l = 4 and look for B = 100 regularized minimum spanning trees using uniformly at random 75% of the dataset for each sample. However, each of these parameters has a different and specific impact on the detection of the pattern that we discuss below.

Elastic constraint λ
As already mentioned in Sect. 3.3, λ is a regularization parameter acting like a trade-off between a set of centroids minimizing the data reconstruction error and the strength of the smooth tree topology we enforced. Hence, we understand that the larger λ, the more important the second part of eq. (11), leading to a shorter and smoother tree, as seen on Fig. 7. λ can be seen as a soft-constraint on the total length of the tree, a high value leading to a tree representation that has short extensions and projected points are more uniformly distributed over the tree. Given the definition of λ in Sect. 3.3, it is also the ratio between both variances of Gaussian kernels we used, one for the data fitting term and the other for the prior on centroids to introduce the elastic regularization term. Choosing λ = 1 thus induces that the two kernels have the same variance. When dealing with outliers and/or highly noisy datasets, λ also helps increasing the robustness and maintains the tree structure in the   Projected points are also represented, respectively by triangles, dots and crosses. We note that curves for λ = 1 and λ = 0.001 are almost superimposed. desired regions without extending in noisy and underdense regions. Mao et al. (2015) proposed to tune λ using the gap statistics, originally presented by Tibshirani et al. (2001) to choose the number of clusters in the K-means algorithm. This method requires several runs of the Algorithm 1 with a range of λ which can be very costly when dealing with large datasets. We hence choose to fix λ = 1 in our runs, leading to satisfactory results for a well chosen σ.

Spatial extension of Gaussian clusters σ 2
The parameter σ 2 corresponds to the variance of Gaussian clusters used to compute the assignment matrix P in Algorithm 1. It ensures the local smoothness of the graph by allowing a soft partitioning of the input datapoints into centroids. Thus, σ represents the spatial extension of each cluster and the higher it is, the more datapoints will be affiliated to a specific node of the resulting graph leading to a coarser representation. This trend is illustrated in Fig. 8 showing several regularized MST obtained by fixing λ = 1 and varying σ. As σ increases, centroids tend to be aligned and describe a coarser shape of the underlying structure, biasing the estimate. Intuitively, σ should represent the thickness of a typical filament so that centroids are fitting well the distribution.
To automatically tune this parameter from the data, we follow the recommendation of Chen et al. (2015) who already investigated the choice of such a parameter in the SCMS algorithm. We thus choose σ using a modified version of the Silverman's rule (Silverman 1986): where A 0 is a constant, N is the number of datapoints, d is the dimension of the data and σ min is the minimum standard deviation over all directions. Taking A 0 = 1 leads to the Silverman's rule and is the optimal estimate for an underlying Gaussian distribution. As argued by Chen et al. (2015), when the data are not Gaussian anymore, A 0 should be optimised as a free parameter. In our experiments, when the parameter is not explicitly defined, we adopt the baseline value of Table 1, namely A 0 = 0.1, a rather low value so that the estimated trees keep some small scales variations. When A 0 increases, the smoothing scale also increases and a coarser filamentary pattern is described.
Although we considered a fixed isotropic and identical covariance matrix for all clusters, it is noteworthy that the formalism initially presented in Sect. 3.3 is more general. We could consider a specific covariance for each cluster, initialize it with the rule of eq. (13) and adapt it automatically from the data. EM computation can indeed auto-adjust this estimate at each iteration by considering θ = {f 1 , . . . , f k , Σ 1 , . . . , Σ k } and then maximizing the lower bound of the log-likelihood not only over f k but also with respect to Σ k in the M-step. This solution has, however, an additional computational cost and can lead each Gaussian cluster to be housed in a specific datapoint when K is close to N . It did not sufficiently improved the results in our cosmological application to consider it but could be in future works. The current choice hence restricts the range of scales that can be described by the Gaussian clusters implying that broad structures in which the extension is way above σ will not collapse into a single ridge passing in the middle of the structure in the resulting graph.

Pruning level l
As explained in Sect. 5.3, the pruning acts like a denoising operation but it also helps reducing the number of kernels to span the point cloud. A high cut-off level removes a large number of nodes at the extremity of all branches revealing only the core of the tree structure while a lower value allows branches to have long extensions reaching even nodes in empty regions. The choice of l can hence lead to different tree representations of a noisy dataset.
To choose the value of l, we rely on the work of Hébert-Dufresne et al. (2016) who introduced the onion decomposition for graphs. The idea is to attribute to each node a layer in terms of depth in the network allowing to define a   (13)). Corresponding projected points are also represented, respectively by triangles, dots and crosses.
center and a periphery. Left panel of Fig. 9 shows the onion spectrum of the noisy toy dataset and illustrates the points of the noisy dataset colored by their layers. The power-law decay in the first part of the spectrum can be interpreted as the removal of all short branches (in number of nodes). A constant level in the onion spectrum means that we are iteratively removing the same amount of nodes in the network at each iteration and thus that the tree structure is 'stable' in terms of number of branches. Using as cut-off level the beginning value of the last constant level in the onion spectrum (l = 39 on left of Fig. 9) would lead to keep only the core of the tree structure with a single branch (the longest one in the initial tree). However, this is a very conservative solution and doing so in real datasets would lead to miss some ending parts of filaments or peripheral structures.
A threshold l that is too low can bring out spurious detections of the underlying pattern for a realization of a tree. However, this effect should be mitigated by: i) the λ parameter which also helps reducing the length of branches in noise and outliers, as discussed in Sect. 5.1 and ii) the bootstrap step where those detections will have a low occurrence as illustrated in the superlevel sets of Fig. 10. For this reason, in what follows, we will consider a rather low value for the pruning parameter, namely l = 4. In the case of simulated cosmological datasets, this parameter only helps to remove datapoints that are located in empty or low dense regions since, for a well chosen value of σ, Algorithm 1 is robust to noise encountered around the ridges.

Number and size of the bootstrap samples
Both number and size of the replicated samples, B and N B respectively, are related to the probability map. Above a minimum value, the parameter B has almost no effect on the estimate for a fixed N B . The main idea to explain this phenomenon is that, for a fixed size N B , there is only a limited number of different possible paths with high proba-bility. Even though a higher value for B can highlight some new paths, they will have very low occurrence.
N B affects the map in a more important way. A low size value induces more possible paths to cross and thus more variability in the resulting map while a size close to the initial one N (0.90N for instance) allows only local modifications of the highlighted paths and hence is more conservative. Choosing a low value can thus lead to more spurious paths detection.

Results: Application to cosmological datasets
In this section, we apply T-ReX with the baseline parameters of Table 1 on the 2D and 3D cosmological datasets described in Sect. 2. The slice of the 2D subhalo distribution corresponds to the datapoints on Fig. 2 representing a projected slice of 5Mpc/h depth. The 3D distribution of halos is built from a 200 3 Mpc/h Gadget-2 simulation used in Libeskind et al. (2017). Figure 11 shows two realizations of a regularized minimum spanning tree (see Algorithm 1) over 75% of the datapoints picked randomly and uniformly. First, it is interesting to see that each RMST is standing in regions that would naturally be called ridges or filaments in the distribution of galaxies, i.e. elongated structures connecting high density regions together. Second, we can see that according to the the distribution of picked datapoints, different paths are taken for the core of the tree structure. The complementarity of the two realizations is highlighted in the zoomed region. Since the tree topology cannot include loops, the effect of disconnection is observable in this particular region where the solid blue realization is not fully connecting the network. Note that such an effect is intensified by the pruning operation. Other realizations might not exhibit the disconnection in the same region as seen in the case of the dashed red line of Fig. 11. This highlights the necessity and the interest of stacking several RMST to obtain a full characterization of the cosmic network. Figure 12 shows a probability map obtained from 100 realizations of RMST. We can see that the highly probable part of the map is fitting what one would expect for the underlying distribution while the overlap of the superlevel set Γ 0.25 (I) with the DM distribution allows us to see that high probability paths (above 0.25 in this case) are tracing the most prominent part of the network. It is worth noting that the agreement is particularly interesting given that the input of the algorithm are subhalos and not DM particles. The zoomed-in region emphasises that small scales are also recovered where high probability paths follow the ridge in the DM distribution.

Comparison with DisPerSE skeletons
DisPerSE (Sousbie 2011) is a publicly available 4 and widely used algorithm able to detect filaments and walls in a density field tracer such as galaxy distribution. From this discrete set of particles, a continuous density field is estimated using the Delaunay tessellation field estimation. Based on 10 1 10 2 Layer the discrete Morse theory (Forman 1998), DisPerSE first aims at identifying singularities (or critical points) in the field defined as positions where the gradient cancels and then uses the local morphology to classify those points in maxima, minima and saddles using eigenvalues of the Hessian matrix. DisPerSE finally identifies filaments using the connectivity of critical points following the gradient lines in the density field. Persistent homology (Edelsbrunner et al. 2002) is then used to remove insignificant parts of the pattern. Figure 13 shows both T-ReX and DisPerSE results obtained on the Illustris slice with several probability thresholds for the former and several persistence levels for the latter. At fixed density smoothing (here 1), the DisPerSE skeletons show the best overlap with our un-tresholded probability map for a persistence σ p = 0, where the two methods agree for most of the filamentary structure. The  boundary effects observed in the DisPerSE skeleton at low σ p disappear with increasing persistence. The good agreement between high probability paths provided by T-ReX and the DisPerSE skeleton remains with increasing persistence levels and probability thresholds as shown by the overlap of DisPerSE and T-ReX skeletons (right column of Fig. 13). It should be emphasized that there is no direct transposition of the persistence threshold in DisPerSE into the probability threshold in T-ReX. The present choice of threshold parameters is hence arbitrary and only serves illustration purposes.
Although the two algorithms have very different definitions for what they both call filamentary pattern, it is reassuring to see that they are recovering similar structures.   However, it is not surprising to observe some disagreement on specific filaments (see orange shaded regions in Fig. 13). Since the pattern identified by T-ReX is obtained by minimizing a global criterion, some paths identified by Dis-PerSE are not relevant to minimize the total distance and thus do not appear as possible paths in any of the realizations. When comparing two conservative cases, Γ 0.25 (I) and the 5σ DisPerSE persistence skeleton (lowest right panel of Fig. 13) we see that the T-ReX pattern preserves more small-scale structures and provides some paths that seems coherent with the subhalo distribution but which are not identified with the chosen parameters for the DisPerSE output (see blue shaded regions in Fig. 13).

Sparse datapoint distribution
In order to explore the robustness of the method against the datapoint density used for ridge detection, we reduce the number of subhalos in the initial dataset by keeping only those with a mass M ≥ M cut . In practice, we investigate how the original filamentary map is spatially close to the recovered ones when M cut varies. Figure 14 shows probability maps obtained for increasing values of M cut leading to sparser and sparser input (100%, 83%, 60%, 31% and 10% of the initial subhalos in the slice respectively corresponding to M cut = {0, 0.85, 1.35, 3.22, 11} × 10 10 M /h). Visually, probability maps show a nice stability, even when the sparsity is high: patterns are pretty much the same when we keep at least 60% of the most massive objects hence recovering the essential part of the structure. Figure 15 emphasizes the spatial proximity between the different maps by representing, for each I J , where J denotes the fraction of galaxies we kept to compute the map, the cumulative distribution of {d J x } x∈Γ0.25(I100) defined, for a position x in the set Γ 0.25 (I 100 ), as Hence d J x corresponds to the closest distance from a position x in the original skeleton obtained by keeping all subhalos, namely Γ 0.25 (I 100 ), to a given thresholded map Γ 0.25 (I J ). This way, the distribution of d J x measures how far the original pattern is from the one obtained with J% of the datapoints.
In more than 95% of the cases, the original pattern finds a closest point in the 83% and the 60% maps at less than 1.8Mpc/h, showing that structures found in the three maps are spatially close and about the thickness of typical filaments . When M cut increases, the filamentary pattern traces the most prominent parts of the structure with a loss of some small scales and hence highlights coarser and coarser structures. Even though the pattern is rough with only 31% of the datapoints used, we still observe a nice correlation with previous maps highlighting coherent structures with 90% of the original pattern being retrieved at less than 3Mpc/h. As expected, an unrealistic scenario where we use only 10% of the datapoints associated with the most massive subhalos degrades the reconstruction of the filamentary pattern. Yet, the recovered structures show a coarse but coherent connectivity between regions. This illustrates the ability of T-ReX to recover the underlying structure with high stability with respect to deformation of the input distribution of datapoints.

Application to 3D data
In this section, we apply T-ReX on the 3D distribution of halos obtained from a Gadget-2 simulation (see Sect. 2) and compare our results with some other existing procedures that have also been run on the same dataset. Although the original review (Libeskind et al. 2017) is considering a dozen of different methods, we focus the comparison on 3 procedures, namely Nexus+, DisPerSE and Bisous so that we have a broad set of different methods using respectively scale-space representation, topological considerations or stochastic approach to recover the filamentary pattern: Resolution of the maps provided by T-ReX is 250Kpc/h. Shaded blue and orange areas highlight some differences between results discussed in Sect. 6.1.1.
niques leading to state-of-the-art environment classification able to identify clusters, filaments and walls. The main idea is to assume that the local morphology of the density field fully encodes the environmental information. Eigenvalues of the Hessian of the density field are thus used to compute an environmental signature in each voxel of the smoothed field. The key ingredient is to compute this signature for a set of smoothed fields with a log-Gaussian filter over a range of different scales to  identifying the filamentary structure using a set of random parametric cylinders. Filaments are modelled as aligned and contiguous small cylinders of a given size in the galaxy distribution. The Bisous model generates two maps allowing to extract filaments spine; one characterising the probability to find a filament at a given position called the visit map and an other one corresponding to the filament orientation field. This way, spines are defined as dense regions and are aligned with the axis of the different cylinders.
Note that, not only these methods have very different mathematical definitions for what they all call clusters, filaments and walls, but they also have been run with different input using either DM particles or halos.
We applied T-ReX to the full halo distribution of the 3D simulated box (281465 halos in total) and built a 100 × 100 × 100 grid map like other methods. For T-ReX, this means that the final probability map is computed over a 100 3 grid in which all visited voxels are considered as part of the filamentary structure. As T-ReX is using 1D objects (segments of the RMST) sampled over the input space, it is preferable, for illustration and comparison, to give its filamentary pattern a 'thickness' by smoothing the obtained probability map. Whenever a voxel is classified as part of the filamentary structure, a smoothing is thus performed over its 26 direct neighbors. In what follows, we call this version T-ReX s while the original result is referred to as T-ReX us .
For illustration and following Libeskind et al. (2017), we show in Fig. 16 the results of the classification provided by each method for a 2Mpc/h depth slice from which FoF halos were extracted (top left panel of Fig. 16). Note that all methods have been run over the full 3D cube and this is a projected slice of the detection. It is also worth noting that T-ReX identifies the filamentary pattern as a whole and does not classify the environment into clusters, filaments and walls, as Nexus and DisPerSE do. To perform the comparison, we hence look at the full pattern provided by each method, and compare it with our extracted skeleton. We observe that T-ReX: 1. Provides a satisfactory connectivity of the halos, 2. Leads, in its smoothed version, to thicker filaments than Nexus+ and Bisous but thinner than Disperse, 3. Retrieves most of the structures (filaments + walls + clusters) obtained by the Nexus+ algorithm.
Even though these methods have been developed with different approaches, it is interesting to see whether they agree or not in the detection of the filamentary pattern. To do so in a quantitative way, we could use the proximity measurement of eq. (14) but as the resulting patterns are presented on a 2Mpc/h grid, distance between them would not be accurate. Hence, we introduce a similarity measurement as follows; Considering the answers provided by two detection methods, H 1 and H 2 , such that H • (x) = 1 if the position x is part of the filamentary structure and 0 otherwise, the similarity measurement is defined as: where |H i | denotes the cardinal of H i (defined as x 1 Hi(x)=1 ) and |H 1 ∩ H 2 | is the cardinal of the intersection between H 1 and H 2 detections defined as x 1 H1(x)=1 1 H2(x)=1 . Hence, S(H 1 , H 2 ) measures the proportion of H 1 detections that are contained in H 2 and is thus asymmetric. In other words, if we consider H 2 as a reference, S(H 1 , H 2 ) represents the proportion of true detections provided by H 1 . Of course, such a simple metric does not provide the full information on the similarity between the considered patterns. This measure is thus to be completed with others, or with visual inspection, as we do here. Table 2 shows the similarity indices between all considered methods for the entire 3D cube. We observe that Table 2: Index of similarity S(H1, H2) as defined in eq. (15) between the considered methods applied on the entire 3D cube. T-ReXus refers to the unsmoothed version of the detection while T-ReXs refers to the smoothed one over the 26 neighboring voxels. The same tendency is observed concerning Bisous for which the detections are mostly contained in other skeletons (last row of Table 2) but not reciprocally (last column of Table 2). This is due to the sparse and unconnected detection provided by the Bisous method. The thick skeleton of Dis-PerSE also tends to contain a large fraction of other skeletons (fourth column of Table 2) but fills so much volume that is not contained in the latters (fourth line of Table 2).

Conclusion
In this paper, we present T-ReX, a graph-based algorithm aiming at drawing automatically the underlying density from a discrete set of points. We show that it can be used to uncover the natural filamentary pattern of the Cosmic Web from a 2D or 3D galaxy distribution. The key idea of T-ReX is to find a set of centroids paving a given set of datapoints in its ridges by enforcing a predefined topology. To do so, the minimum spanning tree is computed over those centroids which are iteratively moved to obtain a smoothed version of the MST. To characterize the reliability of the underlying filamentary structure, a without replacement bootstrap approach is used where several regularized MST are computed over a subset of datapoints chosen randomly and uniformly. This way, we can build a probability map of those realizations to get the most frequent paths and highlight some regions as being part of the underlying filamentary pattern with high reliability. For the sake of simplicity and because this topology is, at first, well representing the filamentary structure, we chose the tree topology for the centroids to highlight ridges of the point cloud distribution of galaxies. In addition to that, the MST provides a natural way to connect observed datapoints and can infer the underlying filamentary pattern by minimizing the total distance to link them. However, the presented framework (see Sect. 3.3) is more general and can use any kind of graph construction. Hence, it could be interesting to investigate other topologies and in other contexts that detecting ridges. In particular, nearest neighbors have been recently applied in several cosmological studies such as Coutinho et al. (2016) to find new metrics characterising the Cosmic Web using graphs. Also, studying the properties of the regularized tree representation in the same way as it can be done for the usual MST (see e.g. Colberg 2007;Naidoo et al. 2019) could be of interest.
In this paper, we mainly focus the application of the procedure on simulated datasets. When dealing with real data, in addition to the mathematical considerations of defining and extracting the filamentary pattern, come the usual technical issues of observed data: noise, outliers, uneven distribution of the samples, sparsity of the representation, selection and observational effects. Even though we showed some robustness of the estimate to noise and outliers, the ability of minimum spanning tree methods to get rid of observational (redshift-space distortions) and selection effects (missing parts of the sky) in real cosmological surveys could be considered in further studies.