Open Access
Issue
A&A
Volume 670, February 2023
Article Number A77
Number of page(s) 9
Section Cosmology (including clusters of galaxies)
DOI https://doi.org/10.1051/0004-6361/202244883
Published online 07 February 2023

© The Authors 2023

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe-to-Open model. Subscribe to A&A to support open access publication.

1. Introduction

The galaxy distribution in the observable Universe is not homogeneous but has a structure that is dictated by matter distribution and gravitational forces. The large-scale structure of the Universe defines the environment galaxies reside in and has a wide range of effects on the properties of those galaxies; for example, the orientation of galaxies in relation to the filaments (Lee & Pen 2000; Aragón-Calvo et al. 2007; Tempel & Libeskind 2013; Ganeshaiah Veena et al. 2019; Kraljic et al. 2020), the satellite distribution around larger galaxies (Knebe et al. 2004; Zentner et al. 2005; Tempel et al. 2015; Wang et al. 2020), the elliptical-to-spiral ratio, and the star formation rate (Alpaslan et al. 2015; Kuutma et al. 2017).

Usually, the large-scale structure is divided into four types of substructures (Libeskind et al. 2018). The densest and most compact are galaxy clusters that host many gravitationally bound galaxies and are called knots in the large-scale structure context. The clusters are connected by chains of galaxies called filaments that populate the intricate cosmic web. Between clusters and filaments are large under-dense volumes named voids encapsulated in sheets of filaments called walls or sheets.

There are many different approaches to detecting the different large-scale structure elements. Usually, the methods use either the relative positions of the galaxies themselves or different scalar and tensor fields derived from galaxy positions and properties from observational surveys or simulation data. For example, the NEXUS+ model (Cautun et al. 2013) uses the Hessian of the shear tensor field. Models that use galaxy positions also employ different approaches. For example, DisPerSE (Sousbie 2011) uses mass estimates and identifies the cosmic web using topological features of the mass distribution, and the Bisous model (Tempel et al. 2016) uses the distribution of the galaxies and marked point process with interactions. Libeskind et al. (2018) gives an overview and a brief comparison of 12 different methods to detect the large-scale structure elements.

The accuracy of these finders depends on the completeness and accuracy of the data. The best results are obtained from the simulations where galaxy positions and properties are accurate and complete phase-space information is available. When using data from surveys where the data are incomplete and have uncertainties, and phase-space information is derived from those observations, the resulting cosmic web maps deteriorate. Some methods are better suited for observational data, but all methods are limited by the completeness and accuracy of spectroscopic redshift data. The current largest spectroscopic survey is the Sloan Digital Sky Survey (SDSS, Eisenstein et al. 2011; Alam et al. 2015), which covers 7221 deg2 of the sky. There are upcoming large spectroscopic surveys such as the 4-m Multi-Object Spectroscopic Telescope (4MOST, de Jong et al. 2019) surveys and the Dark Energy Spectroscopic Instrument (DESI, Dey et al. 2019) Bright Galaxy Survey (BGS, Ruiz-Macias et al. 2021). Future surveys will cover larger areas but will still be limited by depth and completeness.

Data with photometric redshifts are much more abundant than the spectroscopic counterpart, as redshifts can be measured in bulk. For example, SDSS Data Release 12 has 100 times more photometric redshifts than spectroscopic ones (Beck et al. 2016). The upcoming J-PAS (Benitez et al. 2014) will observe the sky in 54 narrowband and three broadband filters and is designed to measure the redshifts for a large number of galaxies with a precision of σz ≲ 0.003(1 + z). This precision is comparable to low-resolution spectroscopic surveys and enables wider use of photometric redshift data for applications that require positions of galaxies, such as large-scale structure detection.

In this paper, we use the Bisous filaments finder, which is developed to detect filaments from observational data. The Bisous model only needs the galaxy distribution and uses geometric methods and the marked point process with interactions to detect the cosmic web (Tempel et al. 2014, 2016). Bisous has been successfully used in many works, such as Nevalainen et al. (2015), Kuutma et al. (2017), Ganeshaiah Veena et al. (2019), and Tuominen et al. (2021). Kruuse et al. (2019) show a significant positive correlation between the distribution of photometric galaxies and the Bisous filaments, which suggests that the Bisous model could be able to use photometric data to improve the detection of filaments.

In this study, we present a proof of concept that data with photometric redshifts can be used to improve the detection of the filamentary network. For this, we take a simple approach to use data with significant uncertainties in position along one axis with the Bisous model. We generate mock data with photometric and spectroscopic redshifts from a simulation and use samples with only photometric redshifts, mixed samples of photometric and spectroscopic redshifts, and, for comparison and benchmarking, also samples with only spectroscopic redshifts. Using Bisous results from the full spectroscopic redshift data as a reference, we study the recall and false discovery rate of the Bisous runs on different samples. Further aspects of interest are whether or not using data with photometric redshift produces biases in the filaments and the maximum size of uncertainties that Bisous can handle while still improving the filamentary network.

The structure of the paper is as follows. In Sect. 2, we describe the simulation we used to create the mock data and samples in this study. In Sect. 3, we describe the Bisous filament finder and our method to use data with photometric redshifts. In Sect. 4, we present the results from different samples. A discussion of the results, problems, possible improvements, and future applications is presented in Sect. 5 and conclusions are outlined in Sect. 6.

2. Data

2.1. Simulation data

The analysis in this paper is based on simulated mock data. For the mock data set, we used the galaxy catalogue MULTIDARK-GALAXIES which is based on the MULTIDARK-PLANCK 2 (MDPL21, Klypin et al. 2016) simulation with the SAG semi-analytic model for galaxies described in Knebe et al. (2018). The MDPL2 simulation is based on a dark-matter-only flat Λ cold dark matter (ΛCDM) model with Planck cosmological parameters: Ωm = 0.307,  ΩB = 0.048,  ΩΛ = 0.693,  σ8 = 0.823,  ns = 0.96, and h = 0.678 (Planck Collaboration I 2016). The box size is 1000 h−1 Mpc (1475.6 Mpc) with 38403 particles with a mass resolution of mp = 1.51 × 109h−1M per dark matter particle.

This work uses a smaller box of the whole simulation with a side of 250 Mpc to have a sufficiently large sample size for statistical analysis but a sufficiently small volume to limit the calculation time of the Bisous filament finder (see Sect. 3) applied to the data. We used a magnitude limit of −20.0 in the SDSS r-band to have galaxy number density similar to observations (for comparison, see Muru & Tempel 2021). This cut leaves us with 181 411 galaxies in a box with a side of 250 Mpc, and the galaxy number density is 0.0116 Mpc−3.

2.2. Photometric redshift mock data

As the distance measures from spectroscopic surveys are relatively precise, the spectroscopic redshift mock data are simply data with exact positions from the simulation, but in order to generate photometric redshift mock data we have to introduce photometric redshift uncertainties to them. The simulation data positions form a cube for which we take two axes to represent the sky plane, and the coordinates represent the sky coordinates and therefore have no extra uncertainty, and one axis represents the line of sight. We added a random error to the line of sight coordinate of each galaxy. For simplicity, all the coordinates are given in megaparsecs (Mpc), and the errors do not scale with distance.

The random errors for the line-of-sight axis are generated with a Gaussian distribution (𝒩(x, σ2)) with different standard deviation (σ) values for different samples. Within one sample, the standard deviation value is constant. For this study, we used six different standard deviation values of 1 Mpc, 2 Mpc, 3 Mpc, 5 Mpc, 7 Mpc, and 10 Mpc. We also created mixed samples of galaxies with spectroscopic and photometric distances in different proportions and with different photometric uncertainties. This is to emulate a realistic situation where one would start with an observational catalogue of spectroscopic targets and include photometric targets to improve the detection of the large-scale structure. Different mixed samples have 10%, 20%, 30%, 40%, and 50% of the brightest galaxies as spectroscopic galaxies; the rest are photometric galaxies with uncertainties generated with σ = 5 Mpc or 10 Mpc. This means that a chosen percentage of the brightest galaxies have exact positions and other galaxies have photometric uncertainties in the line of sight axis, such as distance.

Figure 1 shows the comparison between a sample of galaxies with no uncertainties (all spectroscopic redshifts), and two samples of galaxies with photometric redshifts with uncertainties with distributions 𝒩(0 Mpc, (5 Mpc)2) and 𝒩(0 Mpc, (10 Mpc)2). The leftmost plot shows a visible web-like structure. In the middle plot, the structure is more diffuse because of the added randomness along the z-axis, but some of the original structure is still somewhat visible. In the rightmost plot, the original structure is no longer visible, but rather seems to have filamentary structures along the z-axis that have been produced by the added random errors to galaxy positions along the z-axis.

thumbnail Fig. 1.

Projection of galaxy distributions of samples σ0,  σ5,  σ10 in a slice with a thickness of 10 Mpc. Each dot represents a galaxy. The photometric uncertainties are parallel to the z-axis, which also defines the line of sight in this work. Only an area of 150 Mpc × 150 Mpc is shown for visual clarity. For information about samples; see Sect. 2.3.

2.3. Samples

We use the following notation to name the samples. The fraction of galaxies in the samples with spectroscopic distance estimates is denoted with sXX, where XX is a number indicating the percentage from the whole sample. The spectroscopic galaxies are always the brightest galaxies in the sample. For example, s40 means the sample contains 40% of the brightest galaxies from the whole sample, all of these have exact distances, and is missing the other 60% of the galaxies. The photometric samples are denoted with σYY, where YY is a number indicating the size of the uncertainties for photometric distance estimates. For example, σ5 means the sample contains galaxies that have uncertainties in the distance measures that are generated with Gaussian distribution with a standard deviation of 5 Mpc. For mixed samples, σ10s30 for example means 30% of the brightest galaxies have exact distances (i.e. spectroscopic distance estimates), and the rest, that is 70% of the galaxies in the sample, have distances with uncertainties generated with Gaussian distribution with a standard deviation of 10 Mpc. Table 1 lists samples used in this work, the distributions used to generate uncertainties for distances, and the percentages of galaxies with spectroscopic distances.

Table 1.

Photometric distance uncertainties and percentage of spectroscopic galaxies in each sample.

For brevity, hereafter the term spectroscopic galaxies/data is used as a synonym for galaxies/data with spectroscopic redshifts, and photometric galaxies/data is used as a synonym for galaxies/data with photometric redshift. In this work, the former means data with no uncertainties, and the latter means data with uncertainties along one axis.

3. The Bisous filament finder

We used the Bisous filament finder to detect the filaments from the mock data. This finder is a stochastic tool to identify the spines of the filaments using the spatial distribution of galaxies or haloes (Tempel et al. 2014, 2016). The Bisous has already been applied to a variety of data and has been proven to give similar results to other filament finders (Libeskind et al. 2018). We give a short overview of the method below.

First, the Bisous randomly populates the volume with points with parameters (called marked points), where each point represents the centre of a cylinder and the parameters give the size and orientation of the cylinder. The cylinder’s width is about 1 Mpc, which defines the width of the detected filaments. This width is derived from the gradient of the galaxy density, where there is a peak at approximately 0.5 Mpc from the filament’s spine. Each configuration of cylinders in the volume has a defined energy, which depends on the position of the cylinders in relation to the underlying data of haloes and the interconnectedness of the filamentary network made up of the cylinders. Using the Metropolis–Hastings algorithm and the simulated annealing procedure, the Bisous model optimises the energy function of the system by suggesting random moves to add, remove, or change the cylinders.

The data of the cylinder configurations are collected over hundreds of thousands of cycles, each consisting of tens of thousands of moves, which is the basis for visit map calculations. In general, one realisation of cylinders in the volume represents the detected filamentary network. As the model is stochastic, the configuration of cylinders changes from realisation to realisation. The combination of many realisations allows us to define the visit map that describes the detected filamentary network. Each coordinate has a defined visit map value, ranging from 0 to 1. The visit map contains information on how often a coordinate in space was ‘visited’ by a cylinder, which signifies how probable it is that a random realisation has a cylinder at that position.

To decrease the effects of Poisson noise, the Bisous model is run many times, usually 50–100. This increases the signal-to-noise ratio as a larger number of independent realisations are combined to obtain the resulting maps.

Muru & Tempel (2021) show how the galaxy number density affects the detected filamentary network. These authors show that the Bisous method underestimates the extent of the filamentary network rather than giving false-positive results. This means that the filament finder underestimates the filamentary structures at higher distances where the galaxy number density drops. To improve the quality of the detected filamentary network, we need to increase the galaxy number density, for example, with photometric data.

3.1. Using photometric data

Filament finders usually need precise data, either scalar or tensor fields or galaxy positions, and therefore the less accurate photometric data are ignored. Here we present a method that benefits from photometric data by having higher input data density and is able to mitigate the uncertainty from distance measures. This subsection gives an overview of a simple method of how the Bisous filament finder can use photometric data.

For each galaxy with a photometric redshift estimate and its probability distribution, we generate NR new distance estimates drawn from the photometric redshift probability distribution. For the mock data in this paper, we used a Gaussian distribution to generate the uncertainties, and so we also use the same Gaussian distribution to generate different distance estimates for every galaxy. Every Bisous run uses a different distance estimate for a galaxy with a photometric distance measure. The number of Bisous runs should be large in order to minimise the Poisson noise in the results but also small to minimise the computational resources used for the model. Usually, there are around 50–100 Bisous runs, for this work, we used NR = 80, which has been shown to give good results in previous works using the Bisous model. For mixed data sets of spectroscopic and photometric targets, only the photometric ones have different distance estimates, whereas spectroscopic targets have the same distance value in every run.

The novelty of this method is that the runs that have more accurate distance estimates for the photometric galaxies produce more persistent filaments. Galaxies with inaccurate distance estimates generate noise. The Bisous model suppresses the noise by combining a large number of realisations. The more inaccurate distance estimates there are, the more noise, which means the Bisous is able to find fewer filaments. This means that uncertainties still have to be small to produce good results. The generation of new distance estimates is done separately for each galaxy. In practice, we can use a different probability distribution for each galaxy.

4. Results

This work uses three types of samples: spectroscopic, photometric, and mixed. The primary purpose of spectroscopic-only samples is to be a reference value for the other two types of samples. Photometric-only samples show what can be done using only photometric redshift surveys, and mixed samples show what we can do by combining the spectroscopic and photometric redshift surveys, for example in the areas of spectroscopic surveys where galaxies are sparse or at higher distances where the detection is less complete.

As mentioned in Sect. 1, filaments affect the evolution of galaxies and knowing whether a galaxy is in a filament or not is useful when studying the galaxy properties. Therefore, one of the simplest metrics with which to compare the resulting filamentary network is the fraction of galaxies situated inside filaments. Figure 2 shows the fraction of galaxies inside filaments for all the samples used in this work. The sample σ0 is the most complete sample (galaxy positions without uncertainties but with the same magnitude limit as other samples), and the fraction of galaxies in filaments for that sample could be considered as a reference value for an ideal case. Looking at samples with only photometric redshift galaxies, we see the expected trend that the larger the uncertainties for the distance, the fewer galaxies are in filaments. This comes from the fact that the larger the uncertainties, the fewer filaments the Bisous model is able to detect (cf. Fig. 3) as the structure in the galaxy distribution is less obvious, as seen from Fig. 1. Adding spectroscopic redshift galaxies to create the mixed samples considerably increases the number of galaxies in filaments. For example, 33% of the galaxies in σ5 are in filaments, but when 20% of the brightest galaxies have spectroscopic redshifts (σ5s20) this fraction rises to 45%, and with 50% galaxies with spectroscopic redshifts (σ5s50) up to 59% of galaxies are in filaments. This shows that using spectroscopic galaxies together with photometric galaxies increasingly improves the detected filamentary network as the number of spectroscopic galaxies in a sample increases. On the other hand, when comparing spectroscopic-only samples (e.g., s50 or s40) to mixed samples (e.g., σ5s50 or σ5s40) where the galaxy number density is increased with added photometric galaxies, we can see that the mixed samples have more galaxies in filaments when compared to spectroscopic samples. This indicates that adding galaxies with photometric redshifts to increase the number density of galaxies in the sample helps to improve the detected filamentary network, as it increases the fraction of galaxies in filaments and is closer to the reference sample (σ0).

thumbnail Fig. 2.

Fraction of galaxies in filaments for different spectroscopic-only, photometric-only, and mixed samples. The samples are ordered so that the y-axis values of photometric-only and mixed samples are in increasing order. The spectroscopic-only samples are used for reference values to show the increase in the fraction of galaxies in filaments for mixed samples. The sample s30 is the smallest spectroscopic sample in this study because smaller samples had too few galaxies to be able to detect the filamentary network.

This metric can also be used to compare the results with observational data, but different filament finders and different filament definitions give results that are not directly comparable. For example, Tempel et al. (2014) found that when using the Bisous model on SDSS data, the fraction of galaxies in filaments is about 40%, but they use a stricter definition for whether a galaxy is considered to be in a filament or not. Also, results based on observational data are likely missing fainter galaxies that are present in simulations, which affects the fraction of galaxies in filaments.

It is a good idea to look at the spatial distribution of filaments produced by different samples to assess them visually. Figure 3 shows visit map slices from 12 different samples. The colour indicates the likelihood of a coordinate being inside a filament. The plot in the upper left corner is the sample we use as ground truth, the full spectroscopic sample. The vertical axis is parallel to the axis of photometric uncertainties and emulates the line of sight. The photometric-only samples in the left column show that photometric galaxies make it very difficult to detect filaments perpendicular to the line of sight. Only stretched-out filaments parallel to the line of sight remain. In the middle and rightmost columns, mixed samples are used. Including the spectroscopic galaxies helps detect filaments perpendicular to the line of sight. But even in the mixed samples, when photometric galaxies dominate, as in the lower rows, the filaments are preferentially parallel to the line of sight. This does not mean that filaments are parallel to the line of sight, but that these are the filaments the Bisous model is able to detect with the corresponding data.

thumbnail Fig. 3.

Projections of maximum visit map values in slices obtained from the Bisous model using different samples. Only a smaller 100 Mpc × 100 Mpc area is shown for visual clarity. The thickness of the slice is 10 Mpc. Usually, a visit map limit of 0.05 is used to classify whether or not a coordinate is inside a filament. Therefore, everything besides the blue area is likely part of the filamentary network. The vertical axis (z) is parallel to the axis of the photometric uncertainties, i.e., it emulates the line of sight. The leftmost column shows samples with only photometric galaxies, the middle column shows mixed samples with medium uncertainties (σ = 5 Mpc) for photometric galaxies, and the rightmost column shows mixed samples with the larger uncertainties (σ = 10 Mpc). Different rows in the leftmost column have different photometric uncertainties, and the middle and the rightmost column have different fractions of the brightest galaxies as spectroscopic galaxies. See Table 1 and Sect. 2.3 for the sample naming convention used here.

Figure 3 shows that photometric galaxies, which have large uncertainties along the line of sight, suppress the detection of filaments perpendicular to the line of sight. To study this effect, we describe the distribution of angles between filament spines and the line of sight. These results are shown in Fig. 4. Again, the σ0 sample is the baseline for this work and shows a uniform distribution of angles. Using photometric-only samples skews the distribution closer to 1, meaning the filaments are mostly parallel to the line of sight, as is visible from the visit map projections in Fig. 3. Adding spectroscopic galaxies to the samples significantly reduces the bias of high cosine values in the distributions. This is also visible in Fig. 3, where more filaments are perpendicular to the z-axis in mixed samples.

thumbnail Fig. 4.

Distributions of the cosine of the angle between filament spines (fil) and the line of sight (los). For each sample, there are two plots. The left one is a bar plot of the quartiles of the distribution, where the black crossbar indicates the second quartile (the median). The right plot is a violin plot that shows the density curve of the distribution. The horizontal grey line indicates the median value for a uniform distribution. The closer the distribution gets to value 1, the more filaments are parallel to the line of sight (z-axis in other plots).

When using the results obtained with a full spectroscopic sample σ0 as ground truth, we can compare other results to it and construct contingency tables called confusion matrices. We assign a binary label for each coordinate depending on the visit map value. If the visit map value is equal to or greater than 0.05, then that coordinate is classified as inside a filament. With each coordinate labelled, we can assign four kinds of results: true positive, true negative, false positive, and false negative. To describe the goodness of the results for sample s, we use two statistics: the recall

Recall s = TP s P σ 0 , $$ \begin{aligned} \text{ Recall}_s = \frac{{TP}_s}{{P}_{\sigma 0}} , \end{aligned} $$(1)

where TPs is the number of true-positive values in the sample s, and Pσ0 the number of positive values in the reference sample σ0;

and the false discovery rate,

False discovery rate s = FP s P s , $$ \begin{aligned} \text{ False} \text{ discovery} \text{ rate}_s = \frac{{FP}_s}{{P}_s}, \end{aligned} $$(2)

where FPs is the number of false-positive values in the sample s, and Ps the number of all positive values in the sample s, which includes both the true-positive and false-positive values. Recall shows the fraction of filaments the model is able to find compared to the filaments present in results obtained with the sample σ0, which we want to maximise. The false discovery rate describes the fraction of false filaments in the results, which we want to minimise.

Figure 5 shows the recall and the false discovery rates for different samples. As expected, the recall decreases monotonically when photometric uncertainties increase. Using mixed samples improves the recall even when using small fractions of spectroscopic galaxies. For example, this improvement can be seen when comparing the recalls of σ5 (0.45) and σ5s10 (0.54) or σ10 (0.27) and σ10s10 (0.40), both mixed samples use only 10% of the spectroscopic galaxies. Using 50% of the spectroscopic galaxies boosts the recall above 0.73, which means almost three-quarters of the original filaments are detected. As seen in Fig. 5, the false discovery rate is below 0.05 for every sample. This shows that the Bisous model produces only little noise and false-positive values even with photometric redshift data.

thumbnail Fig. 5.

Recall and false discovery rates for photometric and mixed samples. All the samples have the same total number of galaxies. The definitions for recall and false discovery rate are given in Sect. 4. The false discovery rate uses the secondary vertical axis on the right side of the plot. Including spectroscopic galaxies improves recall but also increases false discovery rates. The false discovery rates are below 5% for every sample.

In addition, we ran Bisous on mock data without using the method described in Sect. 3 and using the samples σ5, σ5s30, σ10, and σ10s30 as they are. This enables us to compare the Bisous model results obtained with the method in Sect. 3 with results obtained with photometric data without doing anything special to the photometric galaxies and ignoring photometric redshift errors. Table 2 lists the different statistics introduced in this section calculated for these Bisous runs. These results are calculated as a reference and motivation for using the method described in Sect. 3. In comparison to the samples introduced in Sect. 2.3 these results show significantly worse recall values and fewer galaxies in filaments. In some cases, the false discovery rate can have better results, but this comes from the fact that when detecting fewer filaments, there are also fewer false-positive results and therefore a lower false discovery rate.

Table 2.

Comparison of Bisous model results with and without using the method described in Sect. 3.

These results qualitatively confirm the results from Kruuse et al. (2019), showing that galaxies with photometric redshifts are clustered around the Bisous filaments. We show that the Bisous model can use photometric redshift data to detect the filamentary network without producing significant amounts of false-positive results. However, when the uncertainties in the distance measure increase, the model is able to recall fewer filaments. For example, with sample σ10, the recall is only 0.27, and mostly filaments parallel to the line of sight are detected. Including spectroscopic galaxies in the samples considerably improves the recall and helps to mitigate the issue with filament alignment in the detected filamentary network. Results in this work also qualitatively follow the results of Muru & Tempel (2021), where they show how the Bisous filaments depend on the number density of the galaxies in the input data. In this work, the mixed samples show a similar trend, and the photometric galaxies boost the number density of galaxies, although less than the same number of spectroscopic galaxies would.

5. Discussion

Previous works applied the Bisous model to SDSS, which is a spectroscopic survey, and compiled a catalogue of filaments (Tempel et al. 2014). This work extends the applicability of the model and demonstrates the effects of using data with photometric redshifts. The major benefit of using photometric redshift data comes from its comparatively high availability, and measurements are made in bulk, and not for single galaxies as in spectroscopic measurements. The problem with photometric redshift data is the significantly larger uncertainties when calculating redshifts, which results in larger uncertainties in distance measurements. This is problematic for filament finders.

To simulate the large uncertainties in distance measurements, we used simulation data to create data with added uncertainties. For simplicity, all the uncertainties are generated with the same Gaussian distribution for each galaxy. In reality, the uncertainty depends on many properties, and one of the most relevant is the magnitude of the galaxy. But the dependence of the uncertainty on the magnitude is different for different surveys. Also, using a simulation removes any redshift dependence in the data. In observations, there are two major redshift-dependent effects. Firstly, the number density of galaxies decreases with redshift as we are able to detect fewer galaxies on the fainter end, and this affects the ability of the Bisous model to detect filaments as shown in Muru & Tempel (2021). Secondly, the precision of the photometric redshift values for galaxies depends on their actual distance. These dependencies should be studied in greater depth when concentrating on specific surveys and is outside the scope of this study.

The method we use to overcome this problem of large uncertainties is straightforward. Essentially, we are just guessing the true positions. Each galaxy gets 80 different random positions based on the uncertainties of the redshift estimate. The theory behind this approach is that while random inaccurate positions produce noise, the positions close to the true position of the galaxy produce a strong enough signal to be above the noise level. Regardless of its simplicity, the method shows considerable improvements over results when not using this method (see Table 2).

Although this simple method improves the results, the problems introduced by using the photometric redshift data are still prevalent. Using photometric-only redshift data (σXX samples) results in part of the signal being lost and an incomplete filamentary network. This is visible from the recall values when compared against σ0 (Fig. 5), the fraction of galaxies in filaments (Fig. 2), and the projections of visit map values (Fig. 3). Another problem is that with larger uncertainties for distances, the filaments perpendicular to the line of sight are almost impossible to detect. This creates a strong bias for filaments parallel to the line of sight (cf. Fig. 4). It is important to note that the false discovery rate (cf. Fig. 5) decreases when data with larger uncertainties are used. This is because galaxies with larger uncertainties produce less meaningful signals, and therefore there will be fewer filaments in the results, which also means fewer false-positive filaments. Low false discovery rate values are good because they demonstrate the robustness of the results. The model rather outputs fewer filaments than false-positive filaments.

All of the aforementioned problems are reduced by using mixed samples of spectroscopic and photometric redshift data instead only photometric, as shown in Sect. 4. Figure 2 also shows that using mixed samples to boost the galaxy number density is better than only using the spectroscopic redshift galaxies. This could be useful, for example, in the more distant areas of spectroscopic surveys, where galaxies with spectroscopic redshifts are too sparse to use for the detection of the large-scale structure. Using mixed data could help us extend the area where we can reliably detect the filaments.

Still, this method requires photometric redshift data with relatively small uncertainties, which are not usually achieved by photometric surveys. Unfortunately, all current photometric surveys have unusably large uncertainties for the redshifts, but there will be some new surveys with suitable accuracy in the near future. One prominent candidate for photometric redshift data is the upcoming Javalambre Physics of the Accelerating Universe Astrophysical Survey (J-PAS, Benitez et al. 2014; Bonoli et al. 2021; Laur et al. 2022). J-PAS is designed to measure the positions and redshifts of 14 million galaxies. And the estimated precision for the photometric redshift for galaxies in the redshift range 0.1 < z < 1.2 is σz ≲ 0.003(1 + z). For example, when using SDSS, the spectroscopic redshift galaxy number density is high enough to detect some filaments up to a distance of 400 Mpc, which is approximately z = 0.1 (Tempel et al. 2014; Muru & Tempel 2021). For this distance, the precision of the redshifts is σz ≲ 0.003 × 1.1 ≈ 14 Mpc. This is the same order of magnitude as the σ10 samples used in this work. We expect the uncertainties to be smaller for brighter galaxies. We aim to apply the Bisous model to J-PAS data when it is released and compile a catalogue of filaments. To obtain the mixed data of photometric and spectroscopic redshift galaxies, we plan to use the Sloan Digital Sky Survey (SDSS; Alam et al. 2015) and the Dark Energy Spectroscopic Instrument (DESI) Bright Galaxy Survey (BGS; Dey et al. 2019; Ruiz-Macias et al. 2021).

Although this study is based on the Bisous filament finder, it is likely that the general tendencies when using data with photometric redshifts are similar with other filament finders. Using photometric data will decrease the effectiveness of the filament finder, and filaments parallel to the line of sight are more likely to be detected. It is uncertain whether using mixed data of photometric and spectroscopic redshifts improves the results compared to using only spectroscopic data when using other filament finders. Also, the false discovery rates might have different values for other filament finders. The advantage of the Bisous model is that it models the underlying filamentary network, and galaxies are only used to constrain the model properties. Hence, in the Bisous filament finder, it is straightforward to combine spectroscopic and photometric samples. While fixing the scale of the filaments in the Bisous model, we are free from smoothing the galaxy distribution, and the Bisous model is able to detect filaments with a specified scale regardless of the galaxy density.

As mentioned in Sect. 1, one common application for filaments is to study the alignment of galaxies and their host filaments. This means that obtaining the accurate filament orientation from the data is instrumental. In future studies, we aim to improve the Bisous model to reduce the alignment bias of filaments when using data with photometric redshifts.

6. Conclusions

Filament finders are limited, among other things, by the abundance of spectroscopic redshift data. This limits the sky areas and depth where we can detect the filamentary network. As photometric redshift data can be obtained on shorter timescales, because you can observe many objects simultaneously, there are many more photometric redshift data available. We present a method that enables the Bisous filament finder to use data with considerable uncertainties in one coordinate; for example photometric redshift data. We use MULTIDARK-GALAXIES, a dark matter-only simulation with semi-analytical galaxies, to generate the data for analysis. Spectroscopic redshift data are simply the exact positions of galaxies from the simulation, and photometric redshift galaxies have added random error with Gaussian distribution in one axis, where this latter represents the line of sight. This work uses three types of samples. Firstly, spectroscopic samples with different magnitude cuts for reference values for other samples. Secondly, photometric samples using different standard deviations from σ = 1 Mpc to 10 Mpc to generate the errors with different sizes for distances. Thirdly, mixed samples, where in different samples 10%–50% of the brightest galaxies have spectroscopic redshifts, that is, they have exact distance measurements, and other galaxies have distances with uncertainties. An overview of the samples used in this work is given in Sect. 2.3.

The Bisous model uses a marked point process to fit cylinder-like objects to the underlying galaxy distribution and optimises the distribution of objects based on the galaxy distribution and the interconnectedness of the cylinder network. To use the photometric redshift data with uncertainties along one axis, we modified the coordinates along that axis. Knowing the distribution of the uncertainties for the distance of photometric redshift galaxies, we use the same distribution to add a random value to the distance of a galaxy. For each Bisous run, we generated a new galaxy distribution, where each photometric redshift galaxy has a different random value added to its distance based on the uncertainty distribution. Each Bisous model uses 80 runs. The theory underpinning this approach is that those runs, where some galaxies have random distance values that are closer to their true distances, produce strong signals, and others with scrambled galaxy distributions just produce noise, which will be removed in the post-processing.

Using photometric-only samples shows that when uncertainties are very small, Gaussian distribution with σ = 1 Mpc or 2 Mpc, the Bisous model can find most of the same filaments as in the full spectroscopic sample σ0. Unfortunately, these uncertainties are unachievable for modern or even future planned photometric surveys. With larger uncertainties in the photometric-only samples, the ability to recall the filaments in the reference sample drops below 50%, and the filaments align with the line of sight. Using mixed samples of photometric and spectroscopic data helps to reduce the mentioned problems. For example, a comparison between three samples: a spectroscopic-only sample s30, which uses only 30% of the brightest galaxies, a photometric-only sample σ10, which uses data with errors generated with σ = 10 Mpc, and a mixed sample σ10s30, which uses the same standard deviation (σ = 10 Mpc) for errors and the same amount of spectroscopic galaxies (30% of the whole sample). Using the spectroscopic data, which contain only 30% of the brightest galaxies, results in 36% of galaxies being inside filaments. Using only photometric data, which contain all the galaxies, but have uncertainties in one coordinate, we find that 20% of galaxies are inside filaments. And finally, using the mixed data, which contain more data than the spectroscopic sample and, in contrast to the photometric sample, also incorporate 30% of the spectroscopic data, the Bisous model finds that 40% of the galaxies are inside filaments. The reference value for these galaxies and the volume is from the full spectroscopic sample, which gives a value of 71% of galaxies in filaments. Adding the spectroscopic galaxies from the sample s30 to the photometric sample σ10 increases the recall of filaments from 27% to 60%. This shows that using mixed data is beneficial when spectroscopic data are too sparse and photometric data have excessively large uncertainties to be used without spectroscopic data.

J-PAS is an upcoming photometric survey that is designed to produce data with sufficiently small uncertainties to be applicable to a method such as the one in this article. The expected precision of the redshifts is σz ≲ 0.003(1 + z) (Benitez et al. 2014). For a distance of about z = 0.1, this is σz ≲ 0.003 × 1.1 ≈ 14 Mpc, which is close to the values used in this work. The next step is to apply the Bisous model to J-PAS data once available.


Acknowledgments

We thank the referee for their comments and suggested improvements. Part of this work was supported by institutional research funding PRG1006 of the Estonian Ministry of Education and Research. We acknowledge the support by the Centre of Excellence “Dark Side of the Universe” (TK133). Part of this work was carried out in the High-Performance Computing Center of the University of Tartu (University of Tartu 2018). The CosmoSim database used in this paper is a service by the Leibniz-Institute for Astrophysics Potsdam (AIP). The MULTIDARK database was developed in cooperation with the Spanish MultiDark Consolider Project CSD2009-00064. The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gauss-centre.eu) and the Partnership for Advanced Supercomputing in Europe (PRACE, www.prace-ri.eu) for funding the MULTIDARK simulation project by providing computing time on the GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, www.lrz.de). The data exploration was done using TOPCAT (Taylor 2005), and analysis and plotting were done using Julia Language (Bezanson et al. 2017) and the following packages: DrWatson.jl (Datseris et al. 2020), Pluto.jl (van der Plas et al. 2022), Makie.jl (Danisch & Krumbiegel 2021), DataFrames.jl (White et al. 2020), Distributions.jl (Besançon et al. 2021), ColorSchemes.jl, which uses Scientific colour maps (Crameri 2021).

References

  1. Alam, S., Albareti, F. D., Allende Prieto, C., et al. 2015, ApJS, 219, 12 [Google Scholar]
  2. Alpaslan, M., Driver, S., Robotham, A. S. G., et al. 2015, MNRAS, 451, 3249 [Google Scholar]
  3. Aragón-Calvo, M. A., van de Weygaert, R., Jones, B. J. T., & van der Hulst, J. M. 2007, ApJ, 655, L5 [Google Scholar]
  4. Beck, R., Dobos, L., Budavári, T., Szalay, A. S., & Csabai, I. 2016, MNRAS, 460, 1371 [Google Scholar]
  5. Benitez, N., Dupke, R., Moles, M., et al. 2014, ArXiv e-prints [arXiv:1403.5237] [Google Scholar]
  6. Besançon, M., Papamarkou, T., Anthoff, D., et al. 2021, J. Stat. Softw., 98, 1 [Google Scholar]
  7. Bezanson, J., Edelman, A., Karpinski, S., & Shah, V. B. 2017, SIAM Rev., 59, 65 [Google Scholar]
  8. Bonoli, S., Marín-Franch, A., Varela, J., et al. 2021, A&A, 653, A31 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  9. Cautun, M., van de Weygaert, R., & Jones, B. J. T. 2013, MNRAS, 429, 1286 [NASA ADS] [CrossRef] [Google Scholar]
  10. Crameri, F. 2021, https://doi.org/10.5281/zenodo.5501399 [Google Scholar]
  11. Danisch, S., & Krumbiegel, J. 2021, J. Open Source Softw., 6, 3349 [NASA ADS] [CrossRef] [Google Scholar]
  12. Datseris, G., Isensee, J., Pech, S., & Gál, T. 2020, J. Open Source Softw., 5, 2673 [NASA ADS] [CrossRef] [Google Scholar]
  13. de Jong, R. S., Agertz, O., Berbel, A. A., et al. 2019, The Messenger, 175, 3 [NASA ADS] [Google Scholar]
  14. Dey, A., Schlegel, D. J., Lang, D., et al. 2019, AJ, 157, 168 [Google Scholar]
  15. Eisenstein, D. J., Weinberg, D. H., Agol, E., et al. 2011, AJ, 142, 72 [Google Scholar]
  16. Ganeshaiah Veena, P., Cautun, M., Tempel, E., van de Weygaert, R., & Frenk, C. S. 2019, MNRAS, 487, 1607 [NASA ADS] [CrossRef] [Google Scholar]
  17. Klypin, A., Yepes, G., Gottlöber, S., Prada, F., & Heß, S. 2016, MNRAS, 457, 4340 [Google Scholar]
  18. Knebe, A., Gill, S. P. D., Gibson, B. K., et al. 2004, ApJ, 603, 7 [NASA ADS] [CrossRef] [Google Scholar]
  19. Knebe, A., Stoppacher, D., Prada, F., et al. 2018, MNRAS, 474, 5206 [Google Scholar]
  20. Kraljic, K., Davé, R., & Pichon, C. 2020, MNRAS, 493, 362 [NASA ADS] [CrossRef] [Google Scholar]
  21. Kruuse, M., Tempel, E., Kipper, R., & Stoica, R. S. 2019, A&A, 625, A130 [EDP Sciences] [Google Scholar]
  22. Kuutma, T., Tamm, A., & Tempel, E. 2017, A&A, 600, L6 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  23. Laur, J., Tempel, E., Tamm, A., et al. 2022, A&A, 668, A8 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  24. Lee, J., & Pen, U.-L. 2000, ApJ, 532, L5 [NASA ADS] [CrossRef] [Google Scholar]
  25. Libeskind, N. I., van de Weygaert, R., Cautun, M., et al. 2018, MNRAS, 473, 1195 [NASA ADS] [CrossRef] [Google Scholar]
  26. Muru, M. M., & Tempel, E. 2021, A&A, 649, A108 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  27. Nevalainen, J., Tempel, E., Liivamägi, L. J., et al. 2015, A&A, 583, A142 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  28. Planck Collaboration I. 2016, A&A, 594, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  29. Ruiz-Macias, O., Zarrouk, P., Cole, S., et al. 2021, MNRAS, 502, 4328 [NASA ADS] [CrossRef] [Google Scholar]
  30. Sousbie, T. 2011, MNRAS, 414, 350 [NASA ADS] [CrossRef] [Google Scholar]
  31. Taylor, M. B. 2005, ASP Conf. Ser., 347, 29 [Google Scholar]
  32. Tempel, E., & Libeskind, N. I. 2013, ApJ, 775, L42 [NASA ADS] [CrossRef] [Google Scholar]
  33. Tempel, E., Stoica, R. S., Martínez, V. J., et al. 2014, MNRAS, 438, 3465 [CrossRef] [Google Scholar]
  34. Tempel, E., Guo, Q., Kipper, R., & Libeskind, N. I. 2015, MNRAS, 450, 2727 [NASA ADS] [CrossRef] [Google Scholar]
  35. Tempel, E., Stoica, R. S., Kipper, R., & Saar, E. 2016, Astron. Comput., 16, 17 [NASA ADS] [CrossRef] [Google Scholar]
  36. Tuominen, T., Nevalainen, J., Tempel, E., et al. 2021, A&A, 646, A156 [EDP Sciences] [Google Scholar]
  37. University of Tartu 2018, https://doi.org/10.23673/ph6n-0144 [Google Scholar]
  38. van der Plas, F., Dral, M., Berg, P., et al. 2022, https://doi.org/10.5281/zenodo.6916713 [Google Scholar]
  39. Wang, P., Libeskind, N. I., Tempel, E., et al. 2020, ApJ, 900, 129 [NASA ADS] [CrossRef] [Google Scholar]
  40. White, J. M., Kamiski, B., powerdistribution, et al. 2020, https://doi.org/10.5281/zenodo.4282946 [Google Scholar]
  41. Zentner, A. R., Kravtsov, A. V., Gnedin, O. Y., & Klypin, A. A. 2005, ApJ, 629, 219 [NASA ADS] [CrossRef] [Google Scholar]

All Tables

Table 1.

Photometric distance uncertainties and percentage of spectroscopic galaxies in each sample.

Table 2.

Comparison of Bisous model results with and without using the method described in Sect. 3.

All Figures

thumbnail Fig. 1.

Projection of galaxy distributions of samples σ0,  σ5,  σ10 in a slice with a thickness of 10 Mpc. Each dot represents a galaxy. The photometric uncertainties are parallel to the z-axis, which also defines the line of sight in this work. Only an area of 150 Mpc × 150 Mpc is shown for visual clarity. For information about samples; see Sect. 2.3.

In the text
thumbnail Fig. 2.

Fraction of galaxies in filaments for different spectroscopic-only, photometric-only, and mixed samples. The samples are ordered so that the y-axis values of photometric-only and mixed samples are in increasing order. The spectroscopic-only samples are used for reference values to show the increase in the fraction of galaxies in filaments for mixed samples. The sample s30 is the smallest spectroscopic sample in this study because smaller samples had too few galaxies to be able to detect the filamentary network.

In the text
thumbnail Fig. 3.

Projections of maximum visit map values in slices obtained from the Bisous model using different samples. Only a smaller 100 Mpc × 100 Mpc area is shown for visual clarity. The thickness of the slice is 10 Mpc. Usually, a visit map limit of 0.05 is used to classify whether or not a coordinate is inside a filament. Therefore, everything besides the blue area is likely part of the filamentary network. The vertical axis (z) is parallel to the axis of the photometric uncertainties, i.e., it emulates the line of sight. The leftmost column shows samples with only photometric galaxies, the middle column shows mixed samples with medium uncertainties (σ = 5 Mpc) for photometric galaxies, and the rightmost column shows mixed samples with the larger uncertainties (σ = 10 Mpc). Different rows in the leftmost column have different photometric uncertainties, and the middle and the rightmost column have different fractions of the brightest galaxies as spectroscopic galaxies. See Table 1 and Sect. 2.3 for the sample naming convention used here.

In the text
thumbnail Fig. 4.

Distributions of the cosine of the angle between filament spines (fil) and the line of sight (los). For each sample, there are two plots. The left one is a bar plot of the quartiles of the distribution, where the black crossbar indicates the second quartile (the median). The right plot is a violin plot that shows the density curve of the distribution. The horizontal grey line indicates the median value for a uniform distribution. The closer the distribution gets to value 1, the more filaments are parallel to the line of sight (z-axis in other plots).

In the text
thumbnail Fig. 5.

Recall and false discovery rates for photometric and mixed samples. All the samples have the same total number of galaxies. The definitions for recall and false discovery rate are given in Sect. 4. The false discovery rate uses the secondary vertical axis on the right side of the plot. Including spectroscopic galaxies improves recall but also increases false discovery rates. The false discovery rates are below 5% for every sample.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.