The GW-Universe Toolbox II: constraining the binary black hole population with second and third generation detectors

We employ the method used by the GW-Universe Toolbox to generate a synthetic catalogue of detection of stellar mass binary black hole (BBH) mergers. We study advanced LIGO (aLIGO) and Einstein Telescope (ET) as two representatives for the 2nd and 3rd generation GW observatories, and study how GW observations of BBHs can be used to constrain the merger rate as function of redshift and masses. We also simulate the observations from a detector that is half as sensitive as the ET at design which represents an early phase of ET. Two methods are used to obtain the constraints on the source population properties from the catalogues: 1. parametric differential merger rate model and applies a Bayesian inference on the parameters; and 2. non-parametric and uses weighted Kernel density estimators. The results show the overwhelming advantages of the 3rd generation detector over the 2nd generation for the study of BBH population properties, especially at a redshifts higher than ~2, where the merger rate is believed to peak. With the simulated aLIGO catalogue, the parametric Bayesian method can still give some constraints on the merger rate density and mass function beyond its detecting horizon, while the non-parametric method lose the constraining ability completely there. We also find that, despite the numbers of detection of the half-ET can be easily compatible with full ET after a longer observation duration, the catalogue from the full ET can still give much better constraints on the population properties, due to its smaller uncertainties on the physical parameters of the GW events.


Introduction
The era of gravitational wave (GW) astronomy began with the first detection of a binary black hole (BBH) merger in 2015 (Abbott et al. 2016). There have been ∼ 90 BBH systems discovered with ∼ 3 years of observations in the LIGO/Virgo O1/O2/O3 runs (GWTC-1 and -2 catalogues: Abbott et al. (2019bAbbott et al. ( , 2020). The number is catching up on that of discovered Galactic black holes (BHs) in X-ray binaries during the last half century (Corral-Santana et al. 2016). LIGO and Virgo are being upgraded towards their designed sensitivity, and new GW observatories such as The Kamioka Gravitational Wave Detector (KAGRA, Kagra Collaboration et al. 2019) and LIGO-India (Iyer et al. 2011) will join the network in the near future. Furthermore, there are plans for the next generation of GW ⋆ sxyi@ihep.ac.cn observatories such as the Einstein Telescope (ET, Punturo et al. 2010) and the Cosmic Explorer (CE, (Reitze et al. 2019)), which will push forward the detecting horizon significantly. With the expected explosion in the size of the BBH population, the community has been actively studying what science can be extracted from GW observations of BBHs. Applications include for example the cosmic merger rate density of BBHs (e.g. Vitale et al. 2019), the mass distribution of stellar mass black holes (SMBHs; e.g. Kovetz 2017;Kovetz et al. 2017;Abbott et al. 2019aAbbott et al. , 2021a, the rate of stellar nucleosynthesis (e.g. Farmer et al. 2020), and constraints on the expansion history of the Universe . Simulating a catalogue of observations was the crucial step in this latter work. In all literature mentioned above, the simulation is performed for specific detectors and observation duration and their catalogues are difficult to extend to other uses. Therefore, we are building a set of simulation tools that has the flexibility to be used with different detectors, observation durations, source populations, and cosmological models. The GW-Universe Toolbox is such a tool set; it simulates observations of the GW Universe (including populations of stellar-mass compact binary mergers, supermassive black hole binary inspirals and mergers, extreme-mass-ratio inspirals, etc.) with different types of GW detectors, including Earth-based and space-borne laser interferometers and pulsar timing arrays. It is accessible as a website 1 , or can be run locally as a Python package. For a detailed overview of the functionalities and methods of the GW-Universe Toolbox, please see Yi et al. 2021, which is referred to as Paper-I hereafter.
Here, we present one application of the GW-Universe Toolbox, showing how the cosmic merger rate and mass function of BHs can be constrained by GW observations using advanced LIGO (aLIGO, at design sensitivity) and ET. The current GW-Universe Toolbox can only simulate observations with single detectors. A network of detectors is expected to be more efficient in the following aspects: 1. the total S/N of a jointobserved GW source would be higher than (but of the same order of magnitude as) that from a single detector; 2. the uncertainties of source parameters would be smaller. In the current GW-Universe Toolbox, the duty cycle of detectors is assumed to be 100%. In a more realised simulation, where the duty cycles of single detectors are not 100%, a detector network also benefits from a longer time coverage. The cosmic merger rate density, or the volumetric merger rate as a function of redshift R(z), provides valuable information on the history of the star formation rate (SFR), which is crucial for understanding the evolution of the Universe, for example in terms of chemical enrichment history, and the formation and evolution of galaxies (Madau & Dickinson 2014). The SFR as a function of redshift has been studied for decades mainly with UV, optical, and infrared observations of galaxies at different redshifts (Serjeant et al. 2002;Doherty et al. 2006;Thompson et al. 2006;Carilli et al. 2008;Magdis et al. 2010;Rujopakarn et al. 2010;Ly et al. 2011;Sedgwick et al. 2011;Kennicutt & Evans 2012;Curtis-Lake et al. 2013;Madau & Dickinson 2014). Such electromagnetic (EM) wave probes are limited by dust extinction, and the completeness of samples at high redshifts is difficult to estimate (e.g. Maiolino & Mannucci 2019). Moreover, the connection between the direct observable and the SFR history is model dependent on multiple layers (see e.g. Chruślińska et al. 2021). GWs provide a unique new probe that suffers less from the above-mentioned limitations.
Moreover, R(z) depends also on the distribution of delay times τ, that is, the time lag between the formation of the binary massive stars and the merger of the two compact objects (e.g. Belczynski et al. 2010;Kruckow et al. 2018). Therefore, a combination of the SFR from EM observations and R(z) from GW can improve our knowledge of the delay-time distribution. P(τ) together with the primary mass function p(m 1 ) (where m 1 is the mass of the primary BH) of BBHs helps us to better understand the physics in massive binary evolution, for example in the common envelope scenario (e.g. Belczynski et al. 2016;Kruckow et al. 2018).
There are several previous works that studied the prospect of using simulated GW observations to study R(z) and p(m 1 ) of BBH. Vitale et al. (2019) shows that with three months observation with a third-generation detector network, R(z) can be constrained to high precision up to z ∼ 10. In their work, the authors did not take into account the signal-to-noise ratio (S/N) limits of 1 gw-universe.org detectors and instead made the assumption that all BBH mergers in the Universe are in the detected catalogue. Kovetz et al. (2017) studied how the observation with aLIGO can be used to give a constraint on p(m 1 ). These authors assumed a certain parameterization of p(m 1 ), and used an approximated method to obtain the covariance of the parameters based on a simulated catalogue of aLIGO observations. Recently, Singh et al. (2021) simulated observations with ET of compact object binary mergers, taking into account the effect of the rotation of the Earth. They found that R(z) and mass distribution can be so well constrained from ET observations that mergers from different populations can be distinguished.
In this work, we simulate the observed catalogues of BBH mergers in ten years of aLIGO observations with its design sensitivity and one month of ET observations using the GW-Universe Toolbox. We also simulate the observation from a detector that is half as sensitive as the ET in design which represents the early phase of ET. The paper is organised as follows: In section 2, we summarise the method used by the GW-Universe Toolbox to simulate the catalogue of BBH mergers; In section 3, we use the synthetic catalogues to set constraints on the cosmic merger rate and mass function of the population. We employ two ways to constrain R(z) along with the mass function of the primary BH. The first assumes parameteric forms of R(z) and the mass function and uses Bayesian inference on their parameters. The other method does not assume a parameteric formula and uses weighted Kernel density estimators. The results are presented and compared. We conclude and discuss in section 4.

Method
We use the GW-Universe Toolbox to simulate observations of a BBH with three different detectors, namely, aLIGO (designed sensitivity), ET, and 'half-ET' 2 . The noise power spectrum S n for aLIGO and ET are obtained from the LIGO website 3 . The 'half-ET' represents our expectation of the performance of the ET in its early phase. In figure 1, we plot the √ S n as a function of frequency for aLIGO, ET at its designed sensitivity, and half-ET. For its S n , we multiply S n of the ET by a factor of 10.89 to obtain a noise curve that is half way between those of the designed aLIGO and ET. Here we give a brief summary of the method. For details, we refer to Yi et al. (2021). We first calculate a function D(z, m 1 , m 2 ), which represents the detectability of a BBH with redshift z, and primary and secondary masses m 1 , m 2 . Here, D(z, m 1 , m 2 ) depends on the antenna pattern and S n of the detector, and the user designated S/N threshold. The other ingredient is the merger rate density distribution of BBH in the Universe,ṅ(z, m 1 , m 2 ). In the GW-Universe Toolbox, we integrate two parameterized function forms forṅ(z, m 1 , m 2 ), namely pop-A and pop-B (see Yi et al. 2021 for the description on the population models). For a more sophisticatedṅ(z, m 1 , m 2 ) from population synthesis simulations, see Belczynski et al. (2002)  The distribution of detectable sources, which we denote N(z, m 1 , m 2 ), can be calculated with: where T is the observation duration, and Θ = (z, m 1 , m 2 ). An integration of N D (Θ) gives the expectation of total detection number: The actual total number in a realization of the simulated catalogue is a Poisson random around this expectation. The synthetic catalogue of GW events is drawn from N D (Θ) with a Markov-Chain Monte-Carlo (MCMC) algorithm. We can also provide a rough estimation of the uncertainties on the intrinsic parameters of each event in the synthetic catalogue using the Fisher information matrix (FIM). Previous studies (e.g. Rodriguez et al. (2013)) found that the FIM will sometimes severely overestimate the uncertainties, especially when the S/N is close to the threshold. We implement a correction whereby, if the relative uncertainty of masses calculated with FIM is larger than 20%, δm 1,2 = 0.2m 1,2 is applied instead.
The uncertainty of the redshift is in correlation with those of other external parameters, such as the localisation error. The actual localisation uncertainties are largely determined by the triangulation error of the detector network, which cannot yet be estimated with the current version of GW-Universe Toolbox. As a result, we do not have a reliable way to accurately estimate the errors on redshift. As an ad hoc method, we assign δz = 0.5z as a typical representative for the uncertainties of aLIGO, and δz = 0.017z + 0.012 as a typical representative for the uncertainties of ET (Vitale & Evans 2017). For half-ET, we assign δz = 0.1z + 0.01.

Catalogues
Using the above-mentioned method, we generate synthetic catalogues corresponding to 10 years of aLIGO observations, 1 month of ET observations, and 1 month of observations with half-ET. along with the corresponding marginalised N D (Θ). The numbers in the catalogues are 2072, 1830, and 889, respectively. The catalogues agree with the theoretical number density, which proves the validity of the MCMC sampling process. The underlying population model forṅ(z, m 1 , m 2 ) is pop-A, with parameters: R n = 13 Gpc −3 yr −1 , τ = 3 Gyr, µ = 3, c = 6, γ = 2.5, m cut = 95 M ⊙ , q cut = 0.4, σ = 0.1. Those parameters are chosen because they result in simulated catalogues which are compatible with the real observations (see Yi et al. 2021).

Constraints on R(z) and p(m 1 ) from BBH catalogues
In this section, we study how well R and p(m 1 ) can be constrained with BBH catalogues from observations with second and third generation GW detectors. Two distinct methods are used.

Bayesian method
In the first method, we use a parameterised form ofṅ(Θ|B). Here, B are the parameters describing the population rate den-Article number, page 3 of 9 A&A proofs: manuscript no. output sity, which are B = (R 0 , τ, µ, c, γ, q cut , σ) in our case. Θ k are the physical parameters of the k-th source in the catalogue, and we use {Θ} to denote the whole catalogue.
The posterior probability of B given an observed catalogue {Θ} obs : where p(N|λ(B)) is the probability of detecting N sources if the expected total number is λ, which is a Poisson function. The probability of detecting a source k with parameters Θ k is: We further assume that observational errors are Gaussian, and therefore we have p(Θ k obs |Θ k true ) = p(Θ k true |Θ k obs ). Taking this symmetric relation into equation 5, we have: The above integration is equivalent to the average of p(Θ k true |B) over all possible Θ k . Therefore, where the ⟨· · · ⟩ denotes an average among a sample of Θ k which is drawn from a multivariate Gaussian characterised by the observation uncertainties. Inserting equation (7)  For the prior distribution p(B), we use the log normal distributions for τ, R 0 , c, γ, and µ, which centre at the true values and the standard deviations are wide enough so that the posteriors are not influenced. The posterior probability distributions of the parameters are obtained with MCMC sampling from the posterior in equation (3). We keep the nuisance parameters q cut and σ fixed to the true value, in order to reduce the necessary length of the chains. The posterior of the interesting parameters would not be influenced significantly by leaving these free. In Figure 5, we plot the MCMC chains sampled from the posterior distribution of the population parameters, where we can see the chains are well mixed, and thus can serve as a good representation of the underlying posterior distribution. In Figure 6, we plot the correlation between the parameters of the redshift dependence (R n , τ) and those between the parameters of the mass function (µ, c, γ). We find strong correlations between the R(z) parameters τ and R n , and also among the p(m 1 ) parameters γ, c, and µ. The correlation between parameters of the redshift dependence and those of the mass function are small and therefore not plotted.
In figure 7, we plot the inferred differential merger rate as a function of redshift and mass of the primary BH. The bands correspond to the 95% confidence level of the posterior distribution of the population parameters. We can see from the panels of figure 7 that, although 10 yr of aLIGO observations and 1 month of ET observations result in a similar number of detections, ET can still obtain much better constraints on both R(z) and p(M 1 ). This is due to (1): the much smaller uncertainties on parameters with ET observations compared to those with aLIGO, and (2) the fact that the catalogue produced from observations with ET covers larger ranges in redshift and mass than that of aLIGO. For half-ET and ET, at low redshift and high mass, the two detectors result in a similar number of detections, and therefore they give comparable constraints on R(z) at low redshift and p(m 1 ) at large mass. On the other hand, at high redshift and low mass, the full ET observed more events and with lower uncertainties, and therefore gives better constraints than that of half-ET.
We see from figure 2 that the maximum redshift in the aLIGO catalogue is z ∼ 1.5, while R(z) can still be constrained for z > 1.5, and can even be more stringent towards larger redshift. This constraint on R(z) at large redshift is not from the Article number, page 4 of 9 Shu-Xu Yi, Fiorenzo Stoppa, Gijs Nelemans, Eric Cator: The GW-Universe Toolbox II: Constraining the binary black hole population with second and third generation detectors observed catalogue, but is intrinsically imposed by the assumed model in the Bayesian method: we fixed our star formation rate as a function of redshift so that ψ(z f ) peaks at z ∼ 2. The peak of the BBH merger rate can therefore only appear later.

Non-parametric method
From equation (1), we know thaṫ and N D (Θ) can be inferred from the observed catalogue. Marginalising over m 1 and m 2 in equation (9) on both sides, we have where I(z) is the 1D intensity function (number density) over z calculated from the observed catalogue, with weights w i equal to the corresponding 1/D(Θ i ). The intensity can be calculated with an empirical density estimator of the (synthetic) catalogue or with the (non-normalised) weighted kernel density estimator (KDE; see appendix A). By simulating a number of different realisations of the Universe population inferred from I(z), the confidence intervals of the estimated I(z) are obtained.
Here, in order to include the observational uncertainties and give at the same time an estimate of the confidence intervals of the KDE, a specific bootstrap is employed, as follows: where Θ k true denotes the true parameters of the k-th events in the catalogue, including its m 1 , m 2 , and z, and their uncertainties δm 1 , δm 2 , and δz. Here, k goes from 1 to N, where N is the total number of events. 2. Shift the true values Θ k true according to a multivariate Gaussian centred on the true values and with covariance matrix given by the corresponding uncertainties. Therefore, a new catalogue {Θ k } obs is obtained. 3. Estimate the detection probabilities D(Θ k obs ). 4. Calculate the (non-normalized) weighted KDE of {Θ k obs }, with weights 1/D(Θ k obs ). The intensity is estimated on the log and then transformed back to the original support; this ensures that there is no leakage of the intensity on negative support. 5. Take M ≡ N i=k 1/D(Θ k obs ) as an estimate of the total number of mergers in the Universe in the parameter space. 6. Generate a realisation M according to a Poisson with expected value M. 7. Draw a sample of size M from the KDE calculated in step 4. This is the equivalent of generating a new Universe sample based on the estimate of the distribution in the Universe from the observations. 8. For each of the events in the above sample, calculate its detection probability. Draw M uniform random values, U, in between 0 and 1, and drop the observations for whose detectability is less than U. A smaller sample is left, which represents a new realisation of the detected catalogue. 9. Estimate the covariance for each observation of the generated data set Σ Θ k using interpolation of the original catalogue, and shift them according to a multivariate Gaussian centred on each observation and with covariance matrix Σ Θ k . 10. Estimate their detection probability. 11. Calculate the (non-normalized) weighted KDE of the above mentioned sample. 12. Repeat steps from 6 to 11 for a large number of times B (we find B=200 is large enough). Calculate the confidence intervals according to the assigned quantiles. 13. Based on the results above, we can correct for the bias introduced by the measurement uncertainties. We estimate the bias between the bootstrap intensities and the intensity estimated on the observed sample, and use it to rescale the estimated intensity of the observed sample.
Similarly, the merger rate in the Universe as a function of the mass of the primary BH is where p(m 1 ) is the normalised mass function of the primary BH, n tot is the total number of BBH mergers in the Universe in a unit local time, and I(m 1 ) is the 1D intensity function of m 1 , calculated from the observed catalogue. The procedure for obtaining I(m 1 ) is the same as that used to calculate I(z).   Figures 8 and 10 show the R(z) and R(m 1 ) reconstructed with equations 10 and 11 from the synthetic aLIGO, ET, and half-ET catalogues. The shaded areas are the 95% confidence intervals.
We note in figure 8 that, for aLIGO observations, the obtained R(z) in z > 0.3 is an underestimate of the underlying model. This is because the aLIGO detectable range of BBH masses is increasingly incomplete towards higher redshift. An intuitive illustration is shown in Figure 9. There, we plot the 2D merger rate density as a function of m 1 and z. The underlying black region marks the parameter space region where the detectability is essentially zero, which is the region beyond aLIGO's horizon. When plotting Figure 9, we set the cut to D(z, m 1 , m 2 = m 1 ) < 10 −2 . We can see the fraction of the black region in the mass spectrum is increasing towards high redshift. This corresponds to the fact that BBH with small masses cannot be detected with aLIGO at higher redshift.
Similarly, in figure 10, we see R(m 1 ) is deviating the theory towards low m 1 . To compare the estimation with the model, we limit both the aLIGO catalogue and the theory in the range z < 0.6. Therefore, the meaning of R(m 1 ) is no longer the merger rate density of the complete Universe, but that of a local Universe up to 0.6, which is calculated with: where p(m 1 ) is the normalised mass function. The result is unlike the results with the Bayesian inference method, where constraints can still be placed in regimes with no detection. The constraints on R(z) and the mass function are also less tight from the non-parametric method compared with those calculated using the parametric one. The difference is rooted in the degree of prior knowledge we assume with the population model in both methods: In the parametric method, we assume the function forms which are exactly the same as the underlying population model that was used to generate the simulated catalogue. This represents an overly optimistic extreme, because in reality, we would never be able to guess a perfect function representation of the underlying population. On the other hand, in the non-parametric method, we do not impose any assumption on the properties of the population model. This represents a conservative extreme, because in reality, we have a good idea of the general shape or trends of the population.
Article number, page 6 of 9

Conclusion and Discussion
In this paper, we describe the details of how the GW-Universe Toolbox simulates observations of BBH mergers with different ground-based GW detectors. As a demonstration of its scientific application, we use synthetic catalogues produced using this software to reconstruct and constrain the cosmic merger rate and mass function of the BBH population. The comparison amongst the results from 1 month of (half) ET observations and 10 yr of aLIGO observations reveals the overwhelming advantages of the third generation detectors over those of the second generation, especially at redshifts higher than ∼ 2, which is approximately where the cosmic merger rate is believed to peak. The reconstruction and constraint of R(z) and p(m 1 ) are performed with two methods, namely (1) a Bayesian method, assuming a parameteric formula of R(z) and p(m 1 ), and (2) a KDE method with non-parameteric R(z) and p(m 1 ). For the catalogue of ET observations, the results from both methods are qualitatively the same. However, for aLIGO, the Bayesian method can put constraints on R(z) beyond the detecting limit, where the non-parameteric method loses its ability completely. The m 1 dependence is also much more accurately reconstructed by the Bayesian method than the non-parameteric method for the aLIGO catalogue. The difference is due to the extra information placed by assuming a specific parameterisation of the population model in the Bayesian method. In our example, the underlying R(z) and p(m 1 ) used to generate the catalogues are the same as those assumed in the Bayesian method; while in the KDE method, no assumptions are made of the general shape of R(z) and f (m 1 ), not even of its smoothness. These represent the two extreme situations in the differential merger rate reconstruction.
In reality, the underlying shapes of R(z) and f (m 1 ) are unknown and they could be very different from those used here. The parameterised Bayesian method may give biased inference and underestimated uncertainties. Furthermore, any unexpected structures in R(z) and f (m 1 ) cannot be recovered with the parameterised Bayesian method. For instance, there can be additional peaks in the BH spectrum corresponding to pulsational pair-instability supernovae and primordial BHs. On the other hand, although the function form of the merger rate is unknown, the general trends should be more or less known. A better reconstruction method should lie in between these two extreme methods.
An alternative method is to parameterise R(z) and p(m 1 ) as piecewise constant functions and use the Bayesian inference to obtain the posterior of the values in each bin, as in Vitale et al. (2019). This confers the advantage that no assumptions on the shapes of R(z) and p(m 1 ) are needed, as in the Kernel density estimator method. The disadvantage of this method is that MCMC sampling in high dimensions needs to be performed. If one assumes that p(m 1 ) is independent of z, there are more than N + M free parameters, where N and M are the number of bins in z and m 1 respectively. If we want the resolutions to be 0.5 in z and 1 M ⊙ in m 1 , then the dimension of the parameter space is > 80 for the ET catalogue. In the more general case, when one includes the dependence of p(m 1 ) on z, the number of parameters becomes N × M, and therefore the dimension of the parameter space goes to > 1200.
Another conclusion we draw about the early-phase ET (half-ET) is that it can also give a good constraint on R(z) for highredshift Universe. However, both R(z) and the primary mass function are less accurately constrained than they are when using the full ET, even using a larger number of detected events from a longer observation duration. This is because the full ET determines smaller uncertainties on the physical parameters of the GW events.
In this paper, we compare the results from a single aLIGO detector with that of ET. In reality, there will always be multiple detectors working as a network (e.g. LIGO-Virgo-KAGRA), which is expected to lead to better results than those provided by a single aLIGO. However, we do not expect a qualitatively different conclusion from what we find here, namely that the ET (as a representative of the third generation GW detectors) has an overwhelming greater ability to constrain the BBH population properties than the second generation detectors, even when these latter are used in the form of a network. We anticipate that the same conclusion applies to other third generation GW detectors, for example the Cosmic Explorer.