Characterising filaments in the SDSS volume from the galaxy distribution

Detecting the large-scale structure of the Universe from the galaxy distribution and characterising its components is of fundamental importance in astrophysics but also a difficult task to achieve. This is due to the necessity of having wide-area spectroscopic redshift surveys to accurately measure galaxy positions in space while also covering large areas of the sky, as well as to the difficulty of creating algorithms to extract cosmic web structures (e.g. filaments). Moreover, these detections will be affected by systematic uncertainties, stemming both from the characteristics of the survey used (e.g. its completeness and coverage) and from the unique properties of the specific method adopted to detect the cosmic web (i.e. the assumptions it relies on and the free parameters it may employ). For these reasons, the creation of new catalogues of cosmic web features on wide sky areas is important, as this allows users to have at their disposal a well understood sample of structures, whose systematic uncertainties have been thoroughly investigated. In this paper we present the filament catalogues created using the Discrete Persistent Structure Extractor (DisPerSE) tool in the Sloan Digital Sky Survey (SDSS) and we fully characterise them in terms of their dependence on the choice of parameters pertaining to the algorithm, as well as with respect to several systematic issues, which may arise in the skeleton due to the properties of the galaxy distribution (such as Finger-of-God redshift distortions and defects of the density field due to the boundaries of the survey). We create several fully characterised filament catalogues, which we make available to the scientific community upon request.


Introduction
The cosmic web (de Lapparent et al. 1986;Bond et al. 1996) is a deeply interconnected network of structures which extends throughout the visible Universe. It is formed of galaxy clusters (which occupy node positions), linked among them by essentially one-dimensional structures called filaments (with some low-mass groups possibly inhabiting filaments, known as knots). Often clusters are grouped together inside more massive structures (i.e. superclusters, see e.g. Liivamägi et al. 2012) which can also be found at the intersection of filamentary structures, albeit their definition is less univocal. Filaments are located at the intersections of planar structures named walls, which surround vast and empty regions denominated voids.
While this Large Scale Structure of the Universe (LSS) is also composed of dark matter and gas, it was through the galaxy distribution that it has started to be detected. Indeed galaxy clusters were the first cosmic web features to be identified and studied, thanks to their being easily detectable through various techniques. It is only with the advent of wide-area spectroscopic redshift surveys that other structures such as filaments have started to be systematically identified. Surveys such as the Two-Degree Field Galaxy Redshift Survey (2dFGRS, Colless et al. 2001), the Sloan Digital Sky Survey (SDSS, York et al. 2000), the Galaxy And Mass Assembly survey (GAMA, Driver et al. 2009), the Vimos Public Extragalactic Redshift Survey (VIPERS, Scodeggio et al. 2018), or the COSMOS survey (Scoville et al. 2007) have allowed us to obtain statistical samples of filaments and other LSS features. For example, Chen et al. (2016) and Tempel et al. (2014a) have produced filament catalogues in the SDSS (but see also Aragón Calvo 2007;Sousbie et al. 2011;Rost et al. 2019). Other works such as Kraljic et al. (2018) and Alpaslan et al. (2014) detected filaments in GAMA, while Malavasi et al. (2017) detected filaments in VIPERS. Additionally, Gott et al. (2005), Iovino et al. (2016), and Kraljic et al. (2018) also identified walls in the SDSS, COSMOS, and GAMA surveys respectively, while several projects are devoted to the analysis of voids (see e.g. Colberg et al. 2008, for a summary). Recently, not only spectroscopic surveys, but also the increased precision of photometric redshifts (e.g. in COSMOS and in the Canada-France-Hawaii Telescope Legacy Survey, CFHTLS, Laigle et al. 2016;Coupon et al. 2009) allowed for the detection of filaments in volumes of the Universe, up to z ∼ 1 (Laigle et al. 2018;Sarron et al. 2019, but also Darvish et al. 2014).
The common element in all these works is the application of a cosmic web detection algorithm to a sample of galaxies. Detecting cosmic web features in the Universe from the galaxy distribution essentially requires to extract topological information from the geometrical properties of the density field, which is sampled through the use of galaxies as tracers. A wide variety of methods exist to perform this task (see e.g. Libeskind et al. 2018, for a review and a comparison of some of them). Cosmic web detection algorithms implement different techniques to face common problems. In particular, the ultimate goal is to detect a set of structures which are anisotropic in shape (e.g. elongated filaments), multi-scale (filaments can be from a few to 100 Mpc long features), and intricately connected among them (see e.g. Cautun et al. 2014). Libeskind et al. (2018) classified the various methods into 6 groups, based on their main characteristics: graph and percolation techniques (e.g. Minimal Spanning Tree, MST,  Alpaslan et al. 2014, T-ReX, Bonnaire et al. 2019; stochastic methods (e.g. BISOUS, Tempel et al. 2014a); geometric hessianbased methods (e.g. the Multiscale Morphology Filter, MMF, Aragón- Calvo et al. 2007); scale-space hessian-based methods (e.g. the already mentioned MMF and NEXUS+, Cautun et al. 2013); topological methods (e.g. the Discrete Persistent Structure Extractor, DisPerSE , Sousbie 2011); phase-space methods (e.g. ORIGAMI, Falck et al. 2012).
In this paper we apply the DisPerSE cosmic web detection algorithm to the SDSS to obtain a catalogue of filaments in a representative volume of the Universe. The choice of this method was motivated by its ability in detecting structures in a topologically correct way (i.e. the filaments detected are actual topological entities which obey mathematical rules) directly from the galaxy distribution, by its being (almost) parameter free and by its adaptive nature which well responds to the observational characteristics of the survey that is used, in terms of sampling rate of tracer galaxies and uniformity of their distribution both in redshift and on the plane of the sky (i.e. DisPerSE can work with sparse datasets or with distributions of galaxies where the density of tracers changes dramatically across the sampled volume). The catalogues of filaments described here have been used in Malavasi et al. (2020); Tanimura et al. (2019); Bonjean et al. (2019). We study their systematic uncertainties deriving from the DisPerSE method and the SDSS survey characteristics. We perform a systematic characterisation of the impact of the DisPerSE parameters on the cosmic web reconstruction and we fully describe the issues present in the catalogues and how to address them. The final filament catalogues will be available upon request to the community, together with several python codes apt to deal with the DisPerSE output and to ease-out the handling of the catalogue and its analysis. This paper is structured as follows: in Sect. 2 we describe the SDSS survey and the galaxy samples that we used to detect the filaments, while in Sect. 3 we summarise the DisPerSEalgorithm and its application. We describe the critical point and filament catalogues in Sect. 4 and we check for systematic effects due to the DisPerSE method. We validate the catalogue in Sect. 5 and we present our conclusions and a summary of the work in Sect. 6. Throughout this paper we use a Planck Collaboration et al. (2016b) cosmology, with H 0 = 67.74 km s −1 Mpc −1 , Ω m = 0.3075, Ω Λ = 0.6925. Equatorial coordinates are given in the J2000 reference.

Data
When choosing the best galaxy survey to run DisPerSE to create a filament sample, several factors come into play. In the case of this work, the main characteristics that were sought after in the choice of the sample were essentially the area coverage and the uniformity of the selection function. Among the available surveys, the Sloan Digital Sky Survey (SDSS, York et al. 2000) offered a wide area coverage and a large number of galaxies to exploit.
Although DisPerSEcan effectively deal with inhomogeneous datasets as well as with a non-contiguous coverage of the 3D volume of the Universe sampled, still it is recommended to work with galaxy samples as uniform as possible so as to provide an adequate sampling of the density field (see e.g. the discussion on the treatment of gaps in the VIPERS survey region when applying DisPerSE in Malavasi et al. 2017). For this reason, we focused on two galaxy samples extracted from the SDSS whose selection function was rather well understood, namely the Main Galaxy Sample (MGS, Strauss et al. 2002) from the SDSS Data Release 7 (DR7) Legacy survey (Abazajian et al. 2009) and the LOWZ+CMASS sample (Reid et al. 2016) from the SDSS Data Release 12 (DR12, Alam et al. 2015).
Both these samples offer a rather uniform angular coverage on the plane of the sky and a good redshift sampling for the study of the 3D density distribution. Moreover, although the number density distribution as a function of redshift (hereafter n(z)) of the sources is not flat at all redshifts, it is quite stable in certain redshift ranges. We present the samples more in detail in the following sections.

The SDSS DR7 Legacy survey Main Galaxy Sample
The SDSS DR7 MGS (Strauss et al. 2002) is a sample of 697 920 galaxies located mainly in the north hemisphere with a few stripes sampling the south one. Galaxies in this sample have both spectroscopic and photometric information. The sample selection described in (Strauss et al. 2002) ensures that sources in the MGS have Petrosian r-band magnitude r P ≤ 17.77 and an r-band Petrosian half-light surface brightness µ 50 ≤ 24.5 mag arcsec −2 . The completeness of the sample is stated to be 99%, while the error on the redshift measurement is lower than 30 km/s. The sample is publicly available on the SDSS website 1 .
From this sample, we visually isolate the galaxies belonging to a contiguous region located in the northern hemisphere, so as to eliminate the isolated stripes in the southern one. We also select sources with reliable redshifts, in the form of zwarning = 0, zconffinal > 0.35, zfinal > 0, following Strauss et al. (2002).
The resulting angular distribution of the 566 452 sources constituting the sample that will be input to the DisPerSE algorithm is shown in Figure 1. The distribution of the galaxies on the plane of the sky is rather uniform, with no clear holes or gaps. The imprint of the Cosmic Web in the galaxy distribution is already clearly visible, while the average surface density of sources of 78 gal/ deg 2 is in line with the value of 92 gal/ deg 2 reported by Strauss et al. (2002). Figure 2 shows the redshift distribution for the same galaxies (bottom panel) and the corresponding number density distribution in the survey volume. In order to compute it, we used the area relative to the north galactic hemisphere elliptic footprint given in Tempel et al. (2014b) of 7221 deg 2 . Although this value may not correspond to the actual effective area we used in this work, this will only affect the normalisation of n(z), leaving the shape unchanged. This allows us to check that the number density of galaxies in the survey volume does not change significantly in the redshift range where we are interested to carry out our analysis of the LSS around the Coma cluster (z ∈ [0.006, 0.040], highlighted in the figure by the orange shaded region). The redshift distribution on the other hand, confirms that the Coma analysis exploits a redshift range well covered by the MGS.

The SDSS DR12 LOWZ+CMASS sample
We rely on the LOWZ+CMASS total sample described in Reid et al. (2016) 2 , composed of 953 193 galaxies in the northern galactic hemisphere region and of 372 542 in the southern galactic hemisphere. These galaxies are selected through a series of cuts in the (r − i) vs (g − r) plane and end up having model  Table 1. Useful quantities for the galaxy samples drawn from the SDSS. The number of galaxies refers to the north galactic hemisphere only for both surveys. The survey area was taken from the value for the north hemisphere quoted in Tempel et al. (2014b) (MGS) and from Table 2 of Reid et al. (2016, effective area, NGC, CMASS survey). As there is no value for the combined LOWZ+CMASS area for the NGC, but only the two separate values for LOWZ and CMASS, we chose the value for the area of CMASS as it is the biggest of the two under the consideration that the LOWZ survey has a smaller area within the CMASS one. The coordinates of the centre of the field (R.A. f c , Dec. f c ) are used in the rest of the paper to align the x-axis with the LoS direction when converting from Equatorial to Cartesian coordinates or vice-versa. magnitudes corrected for Milky Way extinction in the range 16 ≤ r cmod ≤ 19.6 (LOWZ) and 17.5 ≤ i cmod ≤ 19.9 (CMASS). In this work, only the galaxies in the northern galactic hemisphere are used.
The angular distribution of the galaxies in the northern galactic hemisphere is visible in Figure 3. This sample provides a slightly less homogeneous coverage of the plane of the sky than the Legacy MGS, with a few holes and empty stripes visible. Still, the galaxy distribution provides a good sampling of the survey area, with the Cosmic Web imprint again visible. The average surface density of galaxies is 139 gal/ deg 2 . This is in line with the value of 155 roughly estimated from the last line of Ta-ble 2 of Reid et al. (2016), summing the number of Targets/ deg 2 for the NGC for LOWZ and CMASS and considering that the two footprints occupy a similar area.
Spectroscopic information is available for all the objects in the sample and the redshift distribution and number density of sources in the survey volume are shown in Figure 4. The redshift distribution of the LOWZ+CMASS sample extends to higher redshift than the MGS, reaching z = 0.8. The completeness is stated to be 99% for CMASS and 97% for LOWZ (see Figure  8 of Reid et al. 2016). The number density in the survey volume (top panel of Figure 4) was computed using an area of 6851 deg 2 (Table 2 of Reid et al. 2016), considering that the LOWZ survey is contained within the CMASS one. Again choosing the wrong value for the area has the effect of slightly changing the normalisation of n(z) in Figure 4 while leaving the shape unaltered. After an initial peak, the number density in the survey volume is rather flat, especially in the redshift range covered by the works of Bonjean et al. The samples were converted from Equatorial to Cartesian coordinates prior to applying the DisPerSE algorithm. The conversion was done while also applying a rotation of the system so that the x-axis points in the Line of Sight (LoS) direction towards the centre of the field. The coordinates for the centre of the field were computed as the average of the galaxy coordinates and are listed in Table 1 together with other quantities for the samples. This transformation has been applied to all Figures showing galaxies or filament distributions in the rest of the paper.

Detecting the filaments: DisPerSE
To construct our filament catalogues we make use of the Discrete Persistent Structure Extractor method (DisPerSE Sousbie 2011; Article number, page 3 of 21 A&A proofs: manuscript no. main  Sousbie et al. 2011). The DisPerSE algorithm is capable of operating in three dimensions on discrete, non-smoothed datasets. The detection of filaments and other elements of the cosmic web relies on a two step process. First the galaxy density field is measured, then the discrete Morse Theory is applied and filaments are detected.
We measure the density field of the galaxy distribution using the Delaunay Tessellation Field Estimator (DTFE, Schaap & van de Weygaert 2000;van de Weygaert & Schaap 2009). This method uses Delaunay tessellation to cover the space with tetrahedrons, using the galaxy positions as vertexes. The density field can be recursively smoothed by averaging the value at each galaxy position with the density field values of the galaxies directly connected to it through an edge of the tessellation. We will refer to a single iteration of the smoothing process as a "smoothing cycle". The smoothing process can be iterated several times, by averaging the already-averaged density values.The DisPerSE method then proceeds with the application of the discrete Morse theory to the measured density field. While details of the algorithm can be found in Sousbie (2011) and Sousbie et al. (2011), we summarise here the main features.
The first step consists in the computation of the gradient of the density field and on the identification of points where the gradient vanishes (critical points). They can be classified into maxima of the density field, minima, and 1-and 2-saddles (local density minima bounded to structures, such as walls or filaments, respectively). Filaments are defined as field lines of constant gradient connecting critical points (maxima and saddles). Special types of critical points called "bifurcation" points are inserted at the position where filaments intersect, in post-processing. The final result is a set of critical points connected by filaments which are composed of short segments of which the positions of the extrema are given. The length of the segments is related to the typical length of the edges of the tessellation.
Another strong point of the DisPerSE method is the application of the persistence theory to eliminate spurious filaments which are likely to be due to the Poisson noise of the discrete galaxy distribution. Critical points are coupled in topological constructs called persistence pairs, based on the ratio of their density as measured by the DTFE. Persistence pairs are eliminated from the real data set if they are closer to the noise persistence distribution than a certain number of σ. This procedure is equivalent to imposing a Signal-to-Noise ratio threshold for the filaments. In the following we will compare two different persistence cuts (a low-persistence and a high-persistence one) for our filaments, namely a 3σ and a 5σ one 3 . We would like to point out that what we define as our low-persistence cut (3σ) is still high enough to eliminate most of the spurious filaments (see e.g. the discussion in Sousbie 2011 andMalavasi et al. 2017, where it is stated that already at the 2σ level the probability that a feature is spurious is ∼ 5%, dropping to 0.006% at 4σ). We apply the high-persistence cut to ensure that our results hold also when only the most secure filaments are selected.
The DisPerSE algorithm allows for a spatial smoothing of the skeleton after its detection. This procedure is carried out in a similar way as the smoothing of the density field, by averaging the spatial positions of the extrema of the segments which compose the filaments, each segment position averaged with those of the segments directly attached to it. As for the density field, the procedure can be iterated by averaging the already-averaged positions.

Edge effects
The edges of a survey footprint present a problem anytime an environmental estimation has to be done. Due to the abrupt lack of tracers beyond a certain limit in space, the density field results underestimated close to the borders and any analysis using this quantity will suffer from it, including topological reconstructions of the LSS. The DisPerSE algorithm has several ways for dealing with boundary conditions, depending on the sample it is applied to. In particular, when running the algorithm it is possible to choose smooth boundary conditions. Extra particles are added outside of the initial domain, with a random distribution that mimics the average density field inside. These so-called "guard particles" are used to compute the tessellation outside the initial bounding box. DisPerSE then tags as belonging to the boundary all the features (both critical points and filaments connected to them) which have one feature of the tessellation belonging to the guard particles or whose value of the DTFE density may be influenced by the tessellation connecting with particles outside of the domain (see Section 7 of Sousbie 2011). As bifurcation 3 Except for the LOWZ+CMASS filaments when one smoothing cycle is applied to the density field before the application of DisPerSE . In this case, applying a 5σ threshold results in a crash of the algorithm as it tries to cancel too many low-persistence pairs. In this case, the highpersistence cut will be set to 4.5σ. points are added in post-processing after the skeleton has been extracted from the data, they cannot be tagged by DisPerSE even if they are on the boundary.
We eliminate from the sample of critical points all those that are tagged as boundary by DisPerSEas well as all the bifurcation points which are within 100 Mpc (200 Mpc) for the Legacy MGS (LOWZ+CMASS) from the surface defining the convex hull of all the critical points tagged as boundary. This is a very conservative cut, but it allows us to determine the impact of boundaries on our skeleton reconstruction. Figure 5 shows the critical points that have been removed from the sample as belonging to the boundaries in the full area of the Legacy MGS and of the LOWZ+CMASS sample. Filaments connected to these critical points have been removed consistently. In the bottom panel of this Figure, critical points do not seem to be tagged as boundary with more frequency close to the large holes visible within the footprint of the galaxy distribution, for example in Figures 3 and 11 (see below). This is due to the DTFE method to measure the density field: as stated also in Section 4.2 the tetrahedrons of the tessellation are able to cross gaps and holes, essentially interpolating the density field over regions devoid of tracers. For this reason, these regions are not considered as boundary.

Minor issues
We perform other checks for minor systematic problems that may affect the skeleton. These include minor issues that may not be important when performing analyses of statistical properties of the cosmic web, but which may nevertheless affect the general quality of our reconstruction.
We begin by eliminating from the skeleton saddles and maxima that are perfectly overlapping, connected by a filament of length exactly 0 Mpc. These topological defects happen likely because of the Poisson noise of the galaxy distribution, which creates sharp changes in the density field that the DisPerSE algorithm has trouble interpreting. Clearly 0 Mpclength filaments between overlapping critical points are nonphysical and we remove them from the skeleton. When performing this cleaning of the skeleton, we eliminate the saddle and the 0-length filament, leaving only the maximum which gets reconnected to other existing filaments.
We also eliminate isolated maxima connected by only one filament. These points are less problematic than overlapping saddles and maxima, as they most likely represent local maxima connected to the rest of the skeleton. Still, these isolated nodes do not fully represent our expectations for the cosmic web, where clusters are not the end point of a single filament where matter accumulates, but rather intersection points of two or more filaments that channel matter accretion onto the structure. As DisPerSE is supposed to reconstruct a fully connected skeleton, these isolated maxima could present a problem (albeit a very minor one) and it is worth investigating how they affect our reconstruction of the cosmic web.
Applying the above cleaning to the skeleton results in 206 critical points and filaments being removed from the Legacy MGS sample and 450 critical points and filaments for the LOWZ+CMASS sample. Figure 6 shows an example of the cleanup procedure on the Legacy MGS and LOWZ+CMASS skeletons. The critical points we eliminate are randomly distributed on the plane of the sky, meaning that these issues are not necessarily due exclusively to edges.

Description of the catalogue
The LSS catalogue we constructed is composed of a few thousands to a few tens of thousands critical points, depending on the number of smoothing cycles and the persistence (see Table  2) as well as of the filaments connecting them. Filaments connect maxima of the density field (which can be identified with galaxy clusters) to 2-saddle points (i.e. local density minima lo-Article number, page 5 of 21 A&A proofs: manuscript no. main cated on filaments) or bifurcations (i.e. points where several filaments intersect, which can be identified with low-mass clusters and groups, often unresolved). 1-saddle points are local density minima located on walls which surround low density regions centred on minima of the density field (which can be identified with the centres of voids). While in the analyses described in (Malavasi et al. 2020;Tanimura et al. 2019, andBonjean et al. 2019) we only focus on maxima, 2-saddle points, and bifurcations and in the present catalog we only distribute filaments, not considering for the moment walls, 1-saddle points, and minima (voids), still these kind of critical points are included in the numbers given in Table 2 and will be included in some of the distributions presented below, to provide a complete overview of the properties of the catalogue.

Critical points
As expected, the number of critical points decreases with the adopted persistence threshold, as less significant ones are removed from the sample, as well as with the smoothing. Introducing a smoothing of the density field reduces the amount of noise, and small scale fluctuations are removed, together with their corresponding critical points. The redshift distribution of the critical points ( Figure 7) closely follows the galaxy distribution, as expected as critical points provide a topological description of the density field which is traced by the galaxies. Nevertheless, changing the smoothing or the persistence threshold adopted does not alter the redshift distribution of the critical points, a sign of the fact that critical points are correctly removed uniformly in space. Figure 8 shows the density distribution of critical points divided by their type. This figure shows the large diversity of the features of the density field identified by DisPerSE , as well as the broad range of densities explored. The density distributions of critical points have a large overlap between them, a further confirmation for the need of a topological analysis to correctly identify cosmic structures (such as filaments) rather than a simple density criterion (see also e.g. Cautun et al. 2014). Particularly interesting is the density distribution for bifurcation points. These points are artificially inserted at the intersection position of two or more filaments and are intermediate between maxima of the density field and type-2 saddles, located in filaments. This particular feature, makes them suitable candidates for being associated with low-mass groups or unresolved clusters.
Changing the level of smoothing reduces the range of densities explored, as expected, but does not significantly alter the shape of the distributions. Changing the persistence threshold, In each panel all the critical points (either of type maximum, 2-saddle, or bifurcation) detected by DisPerSE are shown with filled circles, colour-coded according to their type (green: 2saddles, red: maxima, brown: bifurcations). Cyan stars highlight critical points which are affected by minor issues (i.e. superposing maxima and saddle connected by a filament of length 0 Mpc or maxima connected to only one filament, see text). As an example, the 3σ persistence skeletons have been chosen, with one smoothing cycle of the density field for both the Legacy MGS and LOWZ+CMASS. on the other hand, impacts significantly the density distribution of maxima, eliminating the low-density tail of their distribution (red dashed line in the figures). The density distributions of the other types of critical points are left almost unaltered (not shown here for clarity). Figure 9 shows the length distribution of the filaments connecting maxima, bifurcations, and type-2 saddle points. Depending on the redshift distribution of the sample and on the density of tracers available, the distribution may be more peaked towards short filaments (as in the Legacy MGS case) or towards longer ones (e.g. LOWZ+CMASS). Figure 9 shows how increasing the persistence threshold essentially eliminates a large fraction of short, less significant filaments from the sample. From this figure it can also be seen how performing a smoothing of the skeleton after its detection does not significantly alter the length distribution of the filaments. The skeleton smoothing procedure allows to get rid of unphysical edges and sharp turn of directions within the filaments, likely due to the noise of the galaxy distribution. Still, the effect is very light and the filament properties are not significantly changed. Figures 10 and 11 show the filament distribution in the plane of the sky (in three redshift bins for clarity, from top to bottom), for the Legacy MGS and LOWZ+CMASS skeletons, respectively. These figures give a visual representation of how the filaments are distributed, with respect to the galaxies. Filaments clearly follow the galaxy density field, with dense nodes at their intersection and large low-density regions between them. These maps provide a full view of the LSS as detected in the volume Article number, page 7 of 21 A&A proofs: manuscript no. main Fig. 8. Density distribution of the critical points as derived from the DTFE. Left and right columns refer to the Legacy MGS and LOWZ+CMASS sample, respectively. Rows are different smoothing cycles of the density field prior to filament detection. In each panel, solid lines refer to maxima (red), bifurcations (brown), type-2 saddles (green), type-1 saddles (yellow), and minima (blue). Only the 3σ persistence threshold case is shown, except for maxima, where also the 5σ persistence sample is reported (red dashed lines). Although indicated as 5σ, the high-persistence cut for LOWZ+CMASS in the one-smoothing case has been limited to 4.5σ.

Filaments
Article number, page 8 of 21 Malavasi et al.: Filaments in the SDSS Fig. 9. Length distribution of the filaments. Left and right columns refer to the Legacy MGS and LOWZ+CMASS sample, respectively. Rows are different smoothing cycles of the density field prior to filament detection. Red lines refer to a 3σ persistence threshold, while blue lines refer to a 5σ persistence threshold. Dashed lines and solid lines represent filament samples when no smoothing of the skeleton is applied after filament identification (marked "K0" in the legend) and when one cycle of smoothing is applied (marked "K1" in the legend), respectively. Although indicated as 5σ, the high-persistence cut for LOWZ+CMASS in the one-smoothing case has been limited to 4.5σ. of the Universe explored by the SDSS. Several defects are also visible from these maps: filaments are present where no galaxies are available, especially in the lowest redshift bin of the LOWZ+CMASS survey. This highlights at the same time the power and the limitations of the DisPerSE method: the DTFE tetrahedrons are able to cross gaps, holes, and empty regions, essentially providing an interpolation of the density field across them. Of course if the holes are too large (e.g. the lowest redshift bin of the LOWZ+CMASS survey, top panel of Figure 11) the interpolated density field will not be reliable and the detected filaments will be spurious. In Tanimura et al. (2019), the analysis is performed at z ≥ 0.2, with all the filaments at lower redshifts and outside of the footprint of the SDSS DR12 being discarded. As for Bonjean et al. (2019), the SDSS DR12 mask is also applied, which nevertheless does not take into account the large holes present in the footprint at low redshift (z 0.15). Still, given their size, in these regions the risk is rather not to detect any filaments than to detect spurious ones. This should decrease the significance of the signal for the filament profiles detected in the galaxy distribution rather than polluting it with spurious filaments, due to a reduced statistics in terms of the number of filaments available. As the SDSS footprint has a large area and the number of filament for the analysis is high, the conclusions derived in Bonjean et al. (2019) remain valid. Moreover, splitting the sample in two redshift bins (0.1 < z < 0.2 and 0.2 < z < 0.3) yielded consistent results between the two, albeit with lower significance. Figures 12 and 13 show the filament distribution in a transverse cut in a Cartesian reference system, aligned with the redshift direction (x-axis) and with Right Ascension (y-axis), in three declination bins for clarity (from left to right). These maps show how the filament we detect fill the whole survey volume, following the galaxy density field and do not have a preferential angular direction or a strong trend with redshift. Also in these maps several defects are visible: long filaments extend from the edge of the survey to regions almost devoid of galaxies, several hundreds of Mpc long. These filaments are probably due to spurious critical points created at the edge of the survey volume. The cuts in the length of the filaments applied to the samples in Tan Figure 14 shows the distributions of the number of filaments connected to maxima. This quantity is called connectivity and it is an important observable for the study of the cosmic web, of structure formation, and cosmology (see e.g. Codis et al. 2018;Sarron et al. 2019;Darragh Ford et al. 2019;Malavasi et al. 2020;Kraljic et al. 2019). The connectivity of maxima scales with the density of the maxima, i.e. with the mass of the clusters associated to the peaks of the density field, and it is usually a number in the range 0 ÷ 10. Figure 14 shows how the connectivity values are distributed in the range 0 ÷ 3 consistently for the different smoothing levels and the different samples. Increasing the persistence threshold has the effect of decreasing the average connectivity, as more filaments are eliminated from the sample.
We note that the notion of a maximum connected to only one filament may be considered unphysical, as in a fully connected skeleton there should not exist many isolated maxima. In all the panels of Figure 14 we report the distributions of connectivity values for the maxima when we remove the 0 Mpc-length filaments and the maxima with only one filament for a 3σ persistence. As expected, the connectivity distributions now peak at a value of 2 filaments connected to maxima. Of particular interest is the fact that in the 5σ persistence threshold and two smoothing cycles for the Legacy MGS all maxima are connected to one filament, therefore no maxima are left in the sample after the cleaning. This contributes to better understanding the best combination of DisPerSE parameters to use to detect the filaments in this particular galaxy sample and further discourages the use of such a high persistence threshold combined with a high degree of smoothing in this particular case.

Validation of the catalogue
In this section we analyse a few of the possible systematic effect that can impact the reconstruction of the Cosmic Web, as well as possible ways to deal with them. In particular we focus on two main items: the effect of the so-called Finger-of-God (FoG) redshift distortions (Jackson 1972;Sargent & Turner 1977;Kaiser 1987) and the edge effects due to the border of the survey.

Finger of God redshift space distortions
The FoG effect is a distortion in the position of galaxies along the LoS due to the peculiar motions inside clusters. When measuring the redshift of a source, the observed redshift is the sum of the intrinsic cosmological redshift and of a component due to the peculiar velocities of galaxies inside structures. As redshift is a measurement of the velocity component along the LoS, Article number, page 9 of 21 A&A proofs: manuscript no. main Fig. 10. Angular distribution of the filaments on the plane of the sky. The galaxy distribution is colour-coded according to the local density as measured by the DTFE, while filaments are over-plotted in green. Three ranges of redshift are considered for clarity, as marked on top of each panel. Only the Legacy MGS is considered. As an example, filaments detected with a 3σ persistence threshold and one smoothing cycle of the density field have been reported. Article number, page 10 of 21 Fig. 11. Angular distribution of the filaments on the plane of the sky. The galaxy distribution is colour-coded according to their local density as measured by the DTFE, while filaments are over-plotted in green. Three ranges of redshift are considered for clarity, as marked on top of each panel. Only the LOWZ+CMASS sample is considered. As an example, filaments detected with a 3σ persistence threshold and no smoothing of the density field have been reported.
Article number, page 11 of 21 A&A proofs: manuscript no. main Fig. 12. Maps of filaments on a cartesian x − y plane. The x-axis is aligned with increasing redshift, while the y-axis with increasing Right Ascension. The galaxy distribution is colour-coded according to local density as measured by the DTFE, while filaments are over-plotted in green. Three ranges of Declination are considered for clarity, as marked on top of each panel. Only the Legacy MGS is considered. As an example, filaments detected with a 3σ persistence threshold and one smoothing cycle of the density field have been reported. Rows are different smoothing cycles of the density field prior to filament detection. Cyan lines refer to a 3σ persistence threshold, while orange lines refer to a 5σ persistence threshold. Red dashed lines are connectivity distributions for a 3σ persistence threshold after eliminating maxima connected to only one filament. Although indicated as 5σ, the high-persistence cut for LOWZ+CMASS in the one-smoothing case has been limited to 4.5σ.
structures do not get distorted in the directions lying on the plane of the sky. As a result, clusters appear elongated along the line of sight, with a cylindrical symmetry more than a spherical one. This may present a problem, as an algorithm like DisPerSE may misinterpret clusters distorted by the FoG effect for filaments. This would affect subsequent analysis, as there would be a large number of straight filaments aligned along the LoS. We address this problem by modifying the tessellation and re-running DisPerSE, eliminating galaxies that could be affected by the FoG problem. Figure 15 shows the steps adopted by our procedure, which we summarise below. This figure shows a cubic region of 50 Mpc a side, centred on the Coma cluster, taken as an example both because it was the subject of the analysis of Malavasi et al. (2020) and because it is a massive cluster with a very prominent distortion. The top left panel shows the galaxies from the Legacy MGS colour-coded according to their density as computed from the DTFE. The FoG distortion due to Coma is clearly visible as an elongated structure (in all four panels, a black arrow indicates the direction of the LoS). In the same figure, coloured squares mark the position of clusters coming from different samples (the same described in Section 2 of Malavasi et al. 2020: MCXC X-ray clusters, Piffaretti et al. 2011, SZ clusters, Planck Collaboration et al. 2011, 2016aWilliamson et al. 2011;Reichardt et al. 2013;Ruel et al. 2014;Bleem et al. 2015;Hasselfield et al. 2013;AMI Consortium et al. 2012, 2013aSchammel et al. 2013;Brodwin et al. 2015;Buddendiek et al. 2015, optical clusters, Tempel et al. 2017Rykoff et al. 2014;Wen et al. 2012;Wen & Han 2015).
To address the problem of FoG in the skeleton reconstruction, we adopt the following steps: 1. We centre a cylinder on each cluster and identify all the galaxies inside. The cylinder has radius R = 3 × R vir and height dz = ±3 × (σ v /c) × (1 + z) where σ v is the velocity dispersion computed using the virial mass as σ v = √ GM vir /5R vir . 2. The galaxies inside the cylinders centred on each cluster from the samples mentioned above are removed from the galaxy catalogue. At the same time a new entry is added to the galaxy catalogue at the position of each cluster. To each of these new particles a density is given equal to the average of the DTFE densities of the galaxies inside the cylinder. The densities of all other galaxies are left unaltered. 3. The tessellation is recomputed to properly rebuild the tetrahedrons on which the density field function is defined for DisPerSE to run properly. The DTFE density is not recomputed, but previous values are used for galaxies which have not been removed and the new average values computed inside the cylinders are used for the new particles added at the position of clusters. This avoids the formation of spurious minima at the positions of clusters where holes are left by the removal of galaxies in the cylinders. 4. DisPerSE is run as normal on the new tessellation with the modified density field.
This method essentially changes the shape of the density field locally, while leaving rather unaltered the density values. Bottom left panel of Figure 15 shows the galaxy distribution after the removal of the FoG cylinder, with the same colour-coding as the top left one. The elongated feature at the Coma position is clearly missing. Top right panel shows the skeleton as detected on the galaxy distribution with the FoG at the position of Coma, wile bottom right panel shows a comparison between the skeleton at the position of Coma computed with the FoG present and with the FoG distortions eliminated from the galaxy distribution using the above procedure. When we compute the skeleton with the FoG distortion, a filament is clearly visible in the LoS direction, at the position of the Coma cluster. When the FoG distortions are removed, the filament along the LoS detected at the position of Coma is still present, although its shape has changed. This confirms that although the FoG distortion have an impact on the skeleton, still the detection of filaments at the position of clusters, even massive ones as Coma, can be considered reliable. The bottom right panel of Figure 15 shows that the skeleton has changed also in several other positions, due to the presence of many other clusters in the region, but the skeleton re-computed after the removal of the FoG distortions is still visually close to the original. Figures 16 and 17 show examples of how the filaments are changed by the removal of FoG distortions around 9 randomly selected clusters, in three mass bins in the range M vir ∈ [10 13 , 10 15 ]M for the Legacy MGS and the LOWZ+CMASS sample, respectively. We note that when performing the FoG analysis on the LOWZ+CMASS sample, we substituted the Tempel et al. (2017) cluster sample with the Wen & Han (2015), as it shares a larger overlap in redshift with the galaxy distribution.
We investigated how the removal of the FoG distortions affects the statistical properties of the skeleton. We produced the distributions of filament lengths, redshift of critical points, density of critical points divided by type, and of the connectivity of the maxima in the case with and without FoG distortions (visible in the upper four panels of Figure 18 and 19 for the Legacy Article number, page 13 of 21 A&A proofs: manuscript no. main Fig. 15. Effect of FoG on the skeleton reconstruction around Coma. The four panels show a 50 × 50 × 50 Mpc 3 box centred on the Coma cluster, projected on the cartesian x − y plane (with x-axis aligned with redshift and y-axis with right ascension). In all panels small points correspond to galaxies from the Legacy MGS, in three out of four panels they are colour-coded according to local density. In the top panels, the FoG effect is present in the galaxy distribution, while in the bottom panels it has been removed. In the right panels, the blue skeleton has been derived with the FoG effect, while the orange dashed skeleton has been derived after the elimination of the FoG. Both skeletons have been derived with no smoothing of the density field and a 3σ persistence threshold. In each panel, squares correspond to the clusters used to eliminate the FoG, colourcoded according to the sample to which they belong (not all samples have clusters inside the box; represented are MCXC X-ray clusters in aqua, Planck SZ clusters in orange, and Tempel et al. 2017 optical clusters in yellow). The size of the squares scales with cluster virial mass. In each panel the black arrow shows the direction of the LoS.
MGS and LOWZ+CMASS samples, respectively). These distributions result very similar in the two cases, showing how statistically there is no difference between the two sets of filaments. To further test the effect of FoG distortions, we computed the level of curvature of the filaments. This quantity is defined as the ratio between the length of the filaments measured as the length of the straight line connecting the critical points at the filament extremities (L S ) over the length of the filaments computed summing the length of the individual segments composing each filament and actually following the filament path (L C ). Filaments with a ratio (L S /L C ) close to one will be straight, meaning that the filament path does not differ too much from a straight line Article number, page 14 of 21 Malavasi et al.: Filaments in the SDSS Fig. 16. Effect of FoG on the skeleton reconstruction around clusters of different mass. The nine panels show a 50 × 50 × 50 Mpc 3 box centred on random clusters (mass increases from bottom to top and from left to right), projected on the cartesian x − y plane (with x-axis aligned with redshift and y-axis with right ascension). In all panels small grey points correspond to galaxies from the Legacy MGS, the FoG effect has been removed from the galaxy distribution. The blue skeleton has been derived with the FoG effect, while the orange dashed skeleton has been derived after the elimination of the FoG. Both skeletons have been derived with no smoothing of the density field and a 3σ persistence threshold. In each panel, squares correspond to the clusters used to eliminate the FoG, colour-coded according to the sample to which they belong (not all samples have clusters inside each box; represented are MCXC X-ray clusters in aqua, Planck SZ clusters in orange, Tempel et al. 2017 optical clusters in yellow, RedMaPPer optical clusters in purple, andWen et al. 2012 optical clusters in green). The size of the squares scales with cluster virial mass. In each panel the black arrow shows the direction of the LoS.
connecting the filament extremities. If L S /L C is close to zero, it means that L C L S and the filament will be curved. While straight filaments are perfectly possible even in the perfect case of a galaxy distribution without FoG distortions, their incidence should be higher in a distorted density field. Figures 18 and 19 show how the distributions of filament curvature are very simi-lar regardless of whether we remove FoG distortions or not. The case where FoG are included shows a slightly smaller incidence of straight filaments, so our FoG removal procedure actually increases the number of straight filaments. This is likely due to the fact that removing FoG also has the effect of slightly reducing the lengths of the filaments (as visible from the small change in Article number, page 15 of 21 A&A proofs: manuscript no. main Fig. 17. Effect of FoG on the skeleton reconstruction around clusters of different mass. The nine panels show a 50 × 50 × 50 Mpc 3 box centred on random clusters (mass increases from bottom to top and from left to right), projected on the cartesian x − y plane (with x-axis aligned with redshift and y-axis with right ascension). In all panels small grey points correspond to galaxies from the LOWZ+CMASS sample, the FoG effect has been removed from the galaxy distribution. The blue skeleton has been derived with the FoG effect, while the orange skeleton has been derived after the elimination of the FoG. Both skeletons have been derived with no smoothing of the density field and a 3σ persistence threshold. In each panel, squares correspond to the clusters used to eliminate the FoG, colour-coded according to the sample to which they belong (not all samples have clusters inside each box; represented are Wen & Han 2015 optical clusters in aqua, RedMaPPer optical clusters in orange, Planck SZ optical clusters in grey, MCXC X-ray clusters in purple, and Wen et al. 2012 optical clusters in green). The size of the squares scales with cluster virial mass. In each panel the black arrow shows the direction of the LoS. the filament length distribution). Shorter filaments have a higher chance of being straight. The small decrease in filament length with the removal of the FoG effect is likely due to the removal from the sample of the filaments which pass through the distorted positions of clusters, whose length is likely comparable to the amount of distortion in the galaxy distribution and therefore larger than the average. We further analysed the statistical properties of the skeleton by choosing only the filaments with L S /L C > 0.99, i.e. those for which the length measured following the filament is less than 1% longer than a simple straight line connecting the extremities. In the top right panel, distributions are colourcoded according to the critical point type they refer to (blue: minima, yellow: 1-saddles, green: 2-saddles, red: maxima, brown: bifurcations). Solid lines refer to the distributions obtained for the case where the skeleton has been derived without the FoG effect and dashed lines for the case where FoG effect is included. In all the other panels, the red line refers to the case where the FoG is included in the derivation of the skeleton and the different shades of cyan to the case where the FoG effect is removed using different samples of clusters or clusters of different mass as explained in the text. In the legend, "M" refers to MCXC X-ray clusters, "P" to Planck SZ clusters, "T" and "R" to Tempel et al. (2017) and RedMaPPer optical clusters, respectively, and "No FoG" to the case where all the clusters are used. This figure is for the Legacy MGS case.
For these very straight filaments we computed the angle with respect to the LoS, defined as the angle between the straight line connecting the observer with the closer of the two critical points defining the filament and the straight line connecting the two filament extremities. This angle has values in the range 0 ÷ 90 deg and we show the distribution of the angles with LoS for straight filaments in the skeletons detected with and without the FoG distortions in the bottom right panels of Figures 18 and 19. Even if the skeleton detected without the FoG distortions has a larger amount of short and straight filaments, they are not preferentially aligned perpendicular to the line of sight. Rather, the distribution is rather uniform, meaning that filaments have random orienta-Article number, page 17 of 21 A&A proofs: manuscript no. main In the top right panel, distributions are colourcoded according to the critical point type they refer to (blue: minima, yellow: 1-saddles, green: 2-saddles, red: maxima, brown: bifurcations). Solid lines refer to the distributions obtained for the case where the skeleton has been derived without the FoG effect and dashed lines for the case where FoG effect is included. In all the other panels, the red line refers to the case where the FoG is included in the derivation of the skeleton and the different shades of cyan to the case where the FoG effect is removed using different samples of clusters or clusters of different mass as explained in the text. In the legend, "M" refers to MCXC X-ray clusters, "P" to Planck SZ clusters, "W15" and "R" to Wen & Han (2015) and RedMaPPer optical clusters, respectively, and "No FoG" to the case where all the clusters are used. This figure is for the LOWZ+CMASS case.
tions with respect to the LoS. No differences are visible between the cases with or without FoG distortions.
As a final check for potential biases due to the samples of clusters we used, we derived the distributions of Figures 18 and  19 several times, using an increasing number of clusters each time. In particular, for the Legacy MGS we derived the dis-tributions first considering only the MCXC X-ray cluster sample and then adding in sequence the Planck SZ clusters, Tempel et al. (2017) and RedMaPPer optical ones, and finally the Wen et al. (2012) sample. For the LOWZ+CMASS sample, we started with only the Wen & Han (2015) clusters, followed by the RedMaPPer ones, the Planck SZ clusters, the MCXC X-ray Article number, page 18 of 21 clusters and finally the Wen et al. (2012) sample. The different order for the cluster samples considered in the Legacy MGS and LOWZ+CMASS case is due to the different redshift ranges covered by the galaxy surveys and the cluster samples. We also performed the FoG analysis by considering all cluster samples, but with different mass thresholds: for the Legacy MGS all clusters with M vir ≥ 10 12 , M vir ≥ 10 13 , M vir ≥ 10 14 , M vir ≥ 10 15 . For the LOWZ+CMASS sample the two intermediate mass bins (M vir ≥ 10 13 and M vir ≥ 10 14 ) resulted in a crash of DisPerSE , likely due to the fact that the removal of the FoG regions modifies the shape of the density field in such a way that prevents DisPerSE from computing its gradient. We therefore considered only the two extreme ones (i.e. M vir ≥ 10 12 and M vir ≥ 10 15 ) which still allow us to detect any trend in the FoG effect on the skeleton with cluster mass. We report the resulting distributions in Figures 18 and 19. No trend with the cluster sample adopted or with the mass threshold used is visible.

Edge effects
Figures 20 and 21 show the same distributions of filament lengths, critical point redshifts, critical point density for different types and connectivity of maxima for the case of a skeleton where the critical points on the boundary have been kept and one where they have been removed as described in Section 3.2. Removing the critical points and filaments on the boundary has the effect of removing short filaments as well as to shifting the connectivity distributions for maxima to lower values. Overall, nevertheless, the change in the statistical properties of the skeleton is not dramatic, especially for the redshift distribution of critical points and for the density distributions of critical points divided by type.

Large scale variations in the galaxy distribution
As a final test, we checked for the effect of potential large scale inhomogeneities in the galaxy distribution on the plane of the sky on the reconstruction of filaments in the Legacy MGS. In particular, the surface density of galaxies seem to show a small but detectable trend with declination in Figure 1. To check for possible systematics, we derived the length distributions in three declination slices, which we report in Figure 22. The length distributions of the filaments in the three declination slices look identical regardless of the persistence and smoothing level adopted. This combined with visual inspection of the filaments shown in Figure 12 allows us to exclude any systematic trend in the reconstruction of the skeleton with declination.

Qualitative comparison with the literature
In recent years other catalogues of filaments in the SDSS have been created, using various methods to detect the cosmic web. For example, Tempel et al. (2014a) used the BISOUS algorithm on the full SDSS Legacy MGS footprint, Martínez et al. (2016) detected filaments in the SDSS Legacy MGS by considering pairs of groups of galaxies, while Chen et al. (2016) extracted the cosmic web in 2D redshift slices using the SCMS algorithm on both the Legacy MGS and the LOWZ+CMASS sample. While a comparison between these catalogues is beyond the goal of this work (see e.g. Rost et al. 2019) also due to the very different algorithms employed, still we qualitatively contrast our work with others from the literature. While visually the sky distribution of filaments look close (see e.g. Figure 2 of Chen et al. 2016, Figure Fig. 20. Distributions of skeleton quantities when boundary effects are taken into account. The four panels show the distributions of the filament lengths (top left), of the density of critical points divided by type (top right), of the redshift of the critical points (bottom left), and of the connectivity of the maxima (bottom right). In the top right panel, distributions are colour-coded according to the critical point type they refer to (blue: minima, yellow: 1-saddles, green: 2-saddles, red: maxima, brown: bifurcations). Solid lines refer to the distributions obtained for the case where the critical points at the boundary and the filaments connected to them have been removed and dashed lines for when they have been included. In all the other panels, the red line refers to the case where the critical points at the boundary and the filaments connected to them have been removed and the cyan line to the case where they have been included. This figure is for the Legacy MGS case.  Figure 9 of this work shows how our filaments can reach lengths up to 100 Mpc, with distributions varying according to smoothing of the density field and persistence threshold. Moreover, our skeleton provides topologically motivated information on the critical points (maxima, bifurcations, and saddles) in addition to the detection of filaments of the cosmic web.

Conclusions and summary
This paper presents the catalogues of filaments used in the works Malavasi et al. (2020); Tanimura et al. (2019); Bonjean et al. (2019), and how they have been derived. We provide a full investigation of the effect of the parameter choice for the DisPerSE algorithm on the skeleton reconstruction. We also provide a complete characterisation of the systematic issues that may affect our filaments, such as FoG redshift distortions and problems at the survey boundary. We characterise and validate Article number, page 19 of 21 A&A proofs: manuscript no. main In the top right panel, distributions are colour-coded according to the critical point type they refer to (blue: minima, yellow: 1-saddles, green: 2-saddles, red: maxima, brown: bifurcations). Solid lines refer to the distributions obtained for the case where the critical points at the boundary and the filaments connected to them have been removed and dashed lines for when they have been included. In all the other panels, the red line refers to the case where the critical points at the boundary and the filaments connected to them have been removed and the cyan line to the case where they have been included. This figure is for the LOWZ+CMASS case. the catalogue, ensuring the absence of strong problems in the filament samples used for the above mentioned works. All the catalogues described in this paper, including all levels of persistence and of smoothing of the density field, for both the Legacy MGS and the LOWZ+CMASS samples are available upon request to the scientific community, together with several pieces of python software that may help in their handling and analysis.
With increased interest in the study of the cosmic web and with several wide-area galaxy surveys foreseen in the future (e.g. Euclid, Laureijs et al. 2011, Prime Focus Spectrograph, PFS, Takada et al. 2014, the capability of creating catalogues for use from the community is vital to both pave the way for future studies as well as to improve the techniques to detect the LSS. This paper provides several reference catalogues with which to perform studies of the cosmic web as well as with which compare for the development of new tools (e.g. Bonnaire et al. 2019).