Evidence for the evolution of black hole mass function with redshift

We investigate the joint primary mass, mass ratio, and redshift observed distribution of astrophysical black holes using the gravitational wave events detected by the LIGO-Virgo-KAGRA collaboration and included in the third gravitational wave transient catalogue. We reconstruct this distribution using Bayesian non-parametric methods, which are data-driven models able to infer arbitrary probability densities under minimal mathematical assumptions. We find evidence for the evolution with redshift of both the primary mass and mass ratio distribution: our analysis shows the presence of two distinct sub-populations in the primary mass - redshift plane, with the lighter population, $\lesssim$ 20 $M_\odot$, disappearing at higher redshifts, $z>0.4$. The mass ratio distribution shows no support for symmetric binaries. The observed population of coalescing binary black holes evolves with look-back time, suggesting a trend in metallicity with redshift and/or the presence of multiple, redshift-dependent formation channels.


Introduction
GW150914 (Abbott et al. 2016a), the first gravitational wave (GW) event detected by the two LIGO detectors (Aasi et al. 2015) during the first observing run (O1), was the first direct measurement of the intrinsic properties of a binary black hole (BBH).The two compact objects that composed this system were a surprise for the astrophysical community: before GW150914, stellar-sized black holes (BHs) were studied mainly via single-line spectroscopic binaries from the very first observation of the BH in the X-ray source Cygnus X-1 (Byram et al. 1966;Bahcall 1978).The mass distribution for such objects was modelled as a Gaussian distribution centred at 7.8 M ⊙ with a standard deviation of 1.2 M ⊙ (Özel et al. 2010).In contrast, the primary and secondary masses of GW150914, 36 +5  −4 and 29 +4 −4 M ⊙ respectively (Abbott et al. 2016b(Abbott et al. , 2021a)), are significantly more massive than the maximum BH mass observed in Milky Way X-ray binaries (Farr et al. 2011).A common interpretation, although still controversial in its details, is that the mass of a BH depends on the metallicity of its stellar progenitor, with metal-poor stars losing less mass by stellar winds and leading to the formation of more massive compact remnants than metalrich stars (e.g., Heger et al. 2003;Mapelli et al. 2009Mapelli et al. , 2010;;Zampieri & Roberts 2009;Belczynski et al. 2010).
⋆ E-mail: stefano.rinaldi@uni-heidelberg.de ⋆⋆ E-mail: walter.delpozzo@unipi.it⋆⋆⋆ E-mail: mapelli@uni-heidelberg.de Joined by the Virgo detector (Acernese et al. 2015) during the second observing run (O2) in 2017, by KAGRA (Aso et al. 2013;Akutsu et al. 2020) in 2020 at the end of the third run (O3), and expected to include LIGO-India (Iyer et al. 2011) during the next decade, the LIGO-Virgo-KAGRA (LVK) collaboration detected, to date, almost 100 GW signals, most of them generated by BBH coalescences (Abbott et al. 2021c).Since the LVK collaboration is now in the middle of the fourth observing run (O4), this number is expected to grow in the upcoming months.
These observations confirmed that the mass distribution inferred from Galactic BHs does not describe the general BBHs across the Universe, thus hinting that either coalescing BBHs follow different evolutionary paths from Galactic BHs, or they have a different origin altogether.To discriminate among the various proposed scenarios, or unveil new ones, several groups have put considerable efforts towards the study of BBH populations (Abbott et al. 2019(Abbott et al. , 2021e, 2023b;;Talbot & Thrane 2018;Stevenson et al. 2019;Edelman et al. 2021;Fishbach et al. 2021;Roulet et al. 2021;Galaudage et al. 2021;Bouffanais et al. 2021;Stevenson & Clarke 2022;Callister & Farr 2023;Sadiq et al. 2023).
The processes undergone by stars during their life are expected to leave signatures in the BH astrophysical distributions.For example, stellar winds (Vink et al. 2001;Lamers & Levesque 2017), core-collapse supernovae (Fryer et al. 2012;Schneider et al. 2021), envelope removal after common envelope (Hurley et al. 2002), and pulsational pairinstability supernovae (Heger et al. 2003;Woosley & Heger 2015) result in non-negligible mass loss for the star.In addition to this, there is also the possibility that the BH is not of astrophysical origin, like primordial BHs (Carr 1975).
The LVK collaboration explored the properties of BBH populations in a series of papers (Abbott et al. 2019(Abbott et al. , 2021e, 2023b) ) employing different parametric and semi-parametric phenomenological models to investigate the BH properties.In particular, the BH mass distribution is inspired by the initial stellar mass function (Salpeter 1955;Kroupa 2001) and revolves around a power-law model.Several additional features have been included throughout the years, such as a high mass hard cutoff and a power-law distribution for the mass ratio (Fishbach & Holz 2017;Wysocki et al. 2019), or a Gaussian feature at around 35 − 40 M ⊙ (Talbot & Thrane 2018), resulting in the so-called 'PowerLaw+Peak' model.This model, described in App.B.1b of Abbott et al. (2023b), is the fiducial population model for LVK analysis.
Models as such follow a phenomenological approach: they are inspired by the underlying physics, but capturing the many facets of astrophysical processes is beyond their scope.Moreover, these models are constrained to a given functional form that may or may not represent the underlying population.As a consequence the inference based on them could be biased if some of the features of the BH distribution are not properly accounted for in the model.
Nonetheless, building a reliable parametric model that takes into account the complexity of astrophysical models is a formidable task: to date, and to the best of our knowledge, no all-encompassing parametric model has been proposed, although mixture models start to be explored.For instance, Zevin et al. (2021) show that single channel models are disfavoured compared to models that include multiple channels.
An alternative route for the inference of BBH astrophysics, based on an opposite approach to the allencompassing modelling, is to use the so-called 'nonparametric methods'.These methods allow to reconstruct probability densities from observations without being committal to any model-specific prescription.Being notably flexible and powerful, they are able to reconstruct arbi-trary probability densities under minimal assumptions.Despite the potentially misleading name, these models do have parameters, either countably infinitely many or a flexible number of them.Their flexibility comes with the cost that no physical information can directly be inferred from the reconstructed distribution: any interpretation in terms of astrophysical processes and formation channels has to be done a posteriori.Some works do explore this direction (Mandel et al. 2016;Tiwari 2018;Edelman et al. 2021;Tiwari 2021;Li et al. 2021a;Toubiana et al. 2023;Ray et al. 2023;Sadiq et al. 2023;Callister & Farr 2023;Edelman et al. 2023) as well as some of the models included in Abbott et al. (2023b), however most of them are semi-parametric rather than non-parametric.The distinction among the two classes lies in the number of free parameters included in the model.Both classes include effective parameters that have the sole purpose of modelling the underlying distribution, meaning that a direct physical interpretation of them is not to be sought: for non-parametric models the number of such parameters is potentially infinite, allowing for a very large flexibility in modelling arbitrary distributions.Conversely, semi-parametric models include a finite number of effective parameters, and often the value of some of them has to be specified a priori.Thus, while being more flexible than parametric models, semi-parametric models often require non-negligible tuning of some of their parameters, for instance the maximum number of components in flexible mixture models or the typical width of the expected features.
In a previous publication, we proposed (H)DPGMM (short for 'a hierarchy of Dirichlet process Gaussian mixture model'): a fully non-parametric multivariate method developed to investigate the BH mass distribution (Rinaldi & Del Pozzo 2022a).In this study, we limited our investigations to the reconstruction of the one-dimensional distribution of the masses of BHs in merging binaries, essentially confirming what has been found by the LVK (Abbott et al. 2021e, 2023b.In this paper, we extend our investigation to a multi-dimensional slice of the full BBH parameter space.We apply (H)DPGMM to the third GW transient catalogue (GWTC-3) to investigate the correlations among the primary mass of the binary M 1 , the mass ratio q = M 2 /M 1 , and the redshift z.The presence of such correlations would be completely hidden within any one-dimensional -hence marginal -distribution.Spotting any correlations is paramount, since they could be the signature of different formation channels, shedding light on the processes that lead to the observed population of astrophysical BHs: Callister et al. (2021) and more recently Adamcewicz et al. (2023) report the presence of an anti-correlation among q and the effective spin χ eff , whereas Biscoveanu et al. (2022) suggest that the spin distribution broadens with redshift.
The paper is organised as follows: in Sec. 2, we sketch the non-parametric method used in this work and discuss an improved method used here to account for selection effects over mass and redshift.In Sec.3.1 we follow up the work of Rinaldi & Del Pozzo (2022a), reconstructing the primary mass distribution with GWTC-3; we then expand the parameter space in Sec.3.2 to include the primary mass, mass ratio, and redshift, the main result of this paper.Finally, we propose an astrophysical interpretation of such distribution in Sec. 4.

Methods
The methods used in this work are largely based on a previous paper by the same authors (Rinaldi & Del Pozzo 2022a), where we investigate the BBH mass distribution with a fully non-parametric approach using the data from the second GW transient catalogue (GWTC-2) (Abbott et al. 2021b).Rinaldi & Del Pozzo (2022a) find that the non-parametric reconstruction of the BBH mass distribution is consistent with all the four parametric models used in Abbott et al. (2021e).Hereafter, we briefly outline the main features of (H)DPGMM, the hierarchical non-parametric method introduced by Rinaldi & Del Pozzo (2022a), based on the Dirichlet process Gaussian mixture model (DPGMM).The notation follows the one of the aforementioned paper.Details about the implementation and sampling scheme can be found in Rinaldi & Del Pozzo (2022b).Details of both the model used and the inference scheme can be found in these papers.The notation of this section closely follows the one introduced in Rinaldi & Del Pozzo (2022a).
The basic idea of the DPGMM is that any arbitrary probability density p(x) can be approximated as infinite weighted sums of Gaussian distributions (Nguyen et al. 2020): (1) This result holds also for multivariate probability densities, simply replacing the Gaussian distribution with its multivariate extension as kernel function: this paper is based on this feature1 .The DPGMM has a countably infinite number of parameters, θ = {w, µ, σ}: these can be inferred by mean of samples from the underlying distribution x, where we assumed that the samples are exchangeable, hence independent and identically distributed.The index j labels the different samples.
In the case of a hierarchical inference, we have access only to the posterior distributions for individual realisations for the target hyper-distribution (as it is the case for GW parameter estimation).(H)DPGMM models both the hyperdistribution and the individual posterior distributions with DPGMMs, linking them in a hierarchical fashion making use of the set of available individual event posterior distributions Y = {y 1 , . . ., y N }; hence the name 'Hierarchy of Dirichlet process Gaussian mixture models'.The posterior distribution then reads p(Θ|Y) ∝ j p(x j |Θ) p(x j |θ j ) p(θ j |y j ) p(Θ) dθ j dx i , (3) where both p(x j |Θ) and p(x j |θ j ) are DPGMMs.Here, θ j denotes the DPGMM parameters of the j-th event and Θ denotes the DPGMM parameters {w, µ, σ} for the hyperdistribution.
The posterior distribution is explored using a variation of the collapsed Gibbs sampling scheme described by Rinaldi & Del Pozzo (2022b) and implemented in figaro2 , a Python code designed to reconstruct arbitrary probability densities with DPGMM and (H)DPGMM.All the results presented in this paper are obtained making use of this code.
Throughout the paper, for all the calculations that involve cosmological quantities (conversion between luminosity distance and redshift, between detector-frame and sourceframe masses and the comoving volume element dV c /dz) we will assume the cosmological parameters reported by the Planck collaboration (Aghanim et al. 2020, Table 1, 'Combined' column).

Selection function
Key ingredient to any astrophysical population inference result is the correction to account for the uneven sensitivity of the detectors to different corners of the parameter space, resulting in different detection probabilities for different events (Loredo 2004;Mandel et al. 2019a).In particular, the non-parametric approach we are using is even more sensible to the so-called 'selection effects', due to the fact that the only source of information about the underlying distribution is the data set on which the inference is based.In this paragraph, we will denote with λ the parameters of the GW model: component masses, luminosity distance, spins, etc.
As all the methods based on the Dirichlet process, (H)DPGMM is only able to reconstruct the observed distribution p obs (λ|Λ), rather than the astrophysical one3 p astro (λ|Λ).Here we denote with Λ the parameters required by the astrophysical distribution.In this context, the selection effects act as a filter through the so-called 'selection function' p det (λ): As pointed out by Essick & Fishbach (2023), directly inferring the observed distribution p obs (λ|Λ) with the astrophysical parameters Λ and then correcting for selection effects a posteriori might lead to biased results due to the missed convolution with selection effects.In this framework, we approximate p obs (λ|Λ) with (H)DPGMM, thus having an effective representation of this probability density that does not depend on the astrophysical parameters Λ but rather on the DPGMM parameters Θ introduced in Eq. ( 3): By doing so, the convolution described in Essick & Fishbach (2023) becomes a simple reweighing that can be done a posteriori in the analysis, thanks to the separation between Λ and Θ.In other words, being Λ the set of parameters of the astrophysical distribution p astro (λ|Λ) before the application of selection effects, the inference scheme must account for selection biases directly while exploring the parameter space.
Our approach, aiming at describing the observed distribution p obs (λ|Θ) only after the application of selection effects, is not directly affected by this bias: in this framework, the parameters Θ are not the parameters of the astrophysical distribution, thus the correction can be done a posteriori.The reconstructed astrophysical distribution is then transformed into a differential merger rate as where we restricted only to the parameters of interest for this work.Knowledge of the selection function allows us to correct for the selection bias and recover the astrophysical distribution.It goes without saying that the details of the selection function greatly affect the inference, weighting differently different parts of the parameter space.Due to its crucial role, the greatest care must be put in studying this function.
Traditionally, the selection function is evaluated either via Monte Carlo integration carried out during the population study itself (Tiwari 2018;Farr 2019;Essick 2021) or through (semi-)analytical approximations (Wysocki et al. 2019;Veske et al. 2021).Both approaches require a set of injections4 dedicated to studying the performance of the detector in a Monte Carlo fashion, either to be used directly in the analysis (first method) or to calibrate the approximant (second).
In this paper, due to the requirement of exchangeability5 , we cannot include selection effects directly in the inference process: hence, we correct for selection effects a posteriori, after the reconstruction of the observed distribution, as in Eq. 5. We rely, therefore, on the analytical approximant for the selection function that will be presented by Lorenzo-Medina & Dent (2023)6 .The selection function p det (M 1 , M 2 , z) is modelled as a sigmoid function7 , calibrated on the O3 injections data set released by the LVK collaboration (Abbott et al. 2021d).We give further details on this approximant in Appendix A. In the LVK sensitivity estimate campaign, the authors injected events drawn from a fiducial, astrophysically inspired distribution modelled as a non-evolving power-law for both the primary and secondary mass.This is reasonable having in mind the Monte Carlo approach to selection effects: injecting a distribution that is similar in shape to the one that is expected to be found in real data minimises the statistical uncertainty.For the purposes of this work, however, and for non-parametric approaches in general, it is crucial to be able to characterise the detection probability of events that are not predicted by the fiducial model: these are the events that might be smoking guns for the presence of new formation channels.This will be considered in future works.
The way in which we account for selection effects implicitly relies on the assumption that the available data is representative of the actual population, meaning that the underlying distribution has little to no support in regions of the parameter space where either our detector is insensitive or we observed for too little time to have a reasonable chance of detecting events there.If this assumption fails, the recovered astrophysical distribution will be biased by the fact that the non-parametric reconstruction cannot be informed about the BH population in the so-called 'censored' regions of the parameter space and will not be representative of the actual population.We address this potential issue in Appendix D, finding that our results are not likely to be affected by this bias.
As a final note, we want to reiterate that the results of a population study in general, and of a non-parametric one in particular, are extremely dependent on correctly taking into account the corrections for selection effects.We believe that the approximant used in this paper captures all the relevant features of the selection function: nonetheless, we cannot exclude the possibility that some of the features henceforth presented could be ascribed to an inaccurate modelling of selection effects.To ensure the robustness of our results, we will explore different choices of the selection function as soon as they become available.Moreover, we are making the assumption that a single selection function calibrated on the O3 injection campaign can be used for a data set composed of observations coming from O1, O2 and O3.This is a reasonable approximation considering that the majority of the GW events has been detected during O3 and the fact that, since we are not reconstructing the total merger rate, we are mainly interested in the shape of the selection function: nonetheless, we plan to investigate the possibility of using different selection functions for different observing runs in a future work.

Multivariate plots
The multivariate plots reporting the differential merger rate density d 3 R/dM 1 dqdz presented in Sec. 3 are obtained making use of the specific functional form of the Gaussian mixture model, that allows for an analytical marginalisation and conditioning.In particular, given that the three-dimensional reconstruction obtained with figaro is inherently continuous, our plots do not rely on a redshift binning but simply report slices of the distribution at arbitrary redshift values for visualisation purposes.This does not apply to Figure 1, which is a marginal distribution and thus inherently onedimensional.
Since (H)DPGMM is meant to reconstruct probability densities, with this specific framework we are not able to estimate the local merger rate density R 0 : all the distributions are therefore re-scaled by this quantity and the y−axis values are not to be taken as indicative of actual values for the differential merger rate.Lastly, concerning Figs.3a and 4a: due to the way in which they are built, each level has its own vertical linear scale, which may vary significantly within the same plot.By construction, every level is re-scaled in such a way that all levels appear to have the same height.We opted for this particular representation because these plots aim at showing the morphological evolution of the differential merger rate rather than than the relative magnitude at different redshifts.

BBH sample
GWTC-3 (Abbott et al. 2021c) adds to the already existing GW catalogues the events detected during the second half of O3, for a total of just below 100 events.Among these observations, Abbott et al. (2023b) uses a subset of 62 high-purity BBH events to investigate their astrophysical properties using both astrophysically motivated phenomenological models and semi-parametric models.In this work, we apply the non-parametric method sketched in Section 2 to the same subset of events in order to be able to compare our findings to the results presented by the LVK collaboration (Abbott et al. 2023b).

Primary mass distribution
Figure 1 shows the marginal primary mass differential merger rate density we obtained with figaro, applied to the subset of 62 high-purity events from GWTC-3 used in Abbott et al. (2023b).We accounted for selection effects using a marginalised version of the three-dimensional selection function approximant.Despite being consistent for the majority of the mass spectrum, our non-parametric model departs from the simpler form of the parametric distribution: the power-law behaviour is present only in the low-mass end of the spectrum (< 25 M ⊙ ) and the smooth tapering of the Power-Law+Peak is less pronounced in our reconstruction.
way less localised than the 4.6 +4.1 −2.5 M ⊙ standard deviation reported in Abbott et al. (2023b), spreading all the way up to ∼ 70 M ⊙ .Lastly, the non-parametric distribution shows a cutoff at around 90 M ⊙ (mainly due to GW190521, Abbott et al. 2020a).
We made sure that the features we observe, especially the excess between 35 and 70 M ⊙ , are not an artefact of the selection function repeating the analysis with a marginalised version of the two-dimensional (primary and secondary mass) approximant presented in Veske et al. (2021).We find that the features are robust in this respect.
From the mass distribution alone it is understandably difficult to decide whether the features we find in the BH mass distribution presented in this section are intrinsic or due to stochastic fluctuations, let alone providing a reliable interpretation in terms of astrophysical processes.We therefore exploited the possibility offered by (H)DPGMM of inferring multivariate probability density to investigate the correlations among the primary mass M 1 , the mass ratio q and the redshift z: the presence of such correlations will both confirm the astrophysical origin of the features and guide the development of theoretical models for BH formation channels.

Joint primary mass, mass ratio, and redshift distribution
In this section, we report and describe the non-parametric reconstruction of the differential merger rate as obtained with figaro.
The three-dimensional distribution displayed as a corner plot in Fig. 2 shows correlations among all possible pairs of parameters.In particular, we find worth mentioning the following two things: -The mass ratio does not show support for symmetric binaries: this is particularly interesting since most binary evolution models predict the mass ratio to have a maximum at q ∼ 1 (Dominik et al. 2012;Belczynski et al. 2016;de Mink & Mandel 2016;Giacobbo & Mapelli 2018;Klencki et al. 2018;Broekgaarden et al. 2022).The central panel of Fig. 2 shows a bimodal marginal distribution for the mass ratio, with the main mode centred around q = 0.7.This particular feature is not an artefact of either the model or the code we are employing, as shown in Appendix C. -The mass distribution evolves with redshift: the lower-left panel of Fig. 2 shows that the primary mass changes with redshift: at high redshift, the mass distribution shows no support for BHs with masses < 20 M ⊙ .
Conversely, for low z, the distribution peaks at around 8 − 10 M ⊙ with a cutoff just above 40 M ⊙ .This is in contrast with the findings of Abbott et al. (2021e) and Ray et al. (2023), who do not find evidence for the evolution of the mass distribution.Karathanasis et al. (2023), on the other hand, find an indication of such redshift dependence.
In the following, we will describe the different features that are present in the evolving primary mass and mass ratio distribution, deferring a discussion of such in terms of astrophysical models to Section 4.

Mass ratio
Figure 3 shows the evolution with redshift of the mass ratio differential merger rate, dR/dq.Overall, the mass ratio distribution is roughly constant at all redshifts, being composed of a single pileup at q ∼ 0.7.The only additional feature is a peak at around q ∼ 0.25 − 0.3 at redshift z ≲ 0.2, populated by two events only (GW190412 and GW190917_114630).This latter feature could simply be due to the small number of events: with more events coming from the current and future observing run, it will be possible to assess its actual significance in the astrophysical distribution.
The differential merger rate in Fig. 3b is not precisely Gaussian: in particular, the distribution favours asymmetric binaries, with a preference for smaller redshifts.Symmetric binaries, conversely, are equally disfavoured independently of the redshift, being suppressed just above q = 0.9 at 90% credible level (see Figure 2, second row, second column panel and online-only material): support for symmetric binaries is practically absent.A more detailed discussion of the shape of this distribution in terms of observed events can be found in Appendix B. This is in contrast with the parametric model used in Abbott et al. (2023b), where the mass ratio is modelled as a power-law: dR/dq ∝ q β , with β = 1.1 +1.7  −1.3 .The specific parametric model employed in this analysis inherently favours symmetric binaries, having a maximum at q = 1 for β > 0, potentially making the preference for symmetric systems an 'artefact' of the model.Other works making use of semi-parametric models investigate this distribution (Tiwari 2022;Callister & Farr 2023), not finding the results presented in this work: Tiwari (2022), however, makes use of a model based on a power-law for q (Tiwari 2021).Edelman et al. (2023) hint at the possibility of a decrease in the merger rate near equal mass binaries.The suppression of symmetric binaries, not predicted by the majority of the existing literature on the subject, is however confidently found in the data by our non-parametric model.
In order to assess that this feature is not an artefact induced by the model we are using, we validated our method using a simulated data set modelled after the findings of Abbott et al. (2023b).The results, presented in Appendix C, show that our model is capable of retrieving the simulated distribution, including a power-law q 1.1 for the mass ratio.

Primary mass
The primary mass differential merger rate is reported in Fig. 4. The primary mass distribution changes its morphology with redshift, from a power-law-like distribution with a preference for low-mass objects at redshift z < 0.4 to a high-mass pileup for z > 0.4.The combination of these two sub-populations, once marginalised over redshift, is the basis for the PowerLaw+Peak model.Abbott et al. (2023b) find no evidence in favour of a redshift dependence of the primary mass, in contrast with our findings.We ascribe this difference to the parametric model Fishbach et al. (2021); Abbott et al. (2023b) adopted to investigate this correlation: they assume that the parameters of the mass distribution change with redshift, rather than allowing for a change in the functional form.In particular, they model the high-mass tail of the distribution with a separate power-law index with the cutoff location evolving with redshift.Based on our findings, this model does not properly capture the actual evolution of the mass spectrum.We believe that this is the main reason why Abbott et al. (2023b) do not find evidence in favour of the evolution of the BH mass function with redshift: while performing model selection in the context of Bayesian statistics, the addition of one or more degrees of complexity (such as the second power-law index) that do not improve the modelling of the available data is penalised by the Bayes' factor.
Given our non-parametric reconstruction, we believe that a good phenomenological parametric model, still based on the PowerLaw+Peak model, would be a superposition of a power-law and a Gaussian peak with a redshift-dependent relative weight w(z), to smoothly transition from the lowredshift, low-mass power-law to the high-redshift, high-mass Gaussian peak with a mean that evolves with redshift.Using a model along the lines of this one, a parametric analysis should in principle be able to pick up the redshift evolution of the mass function.van Son et al. (2022) propose a similar model where the relative weights of the power-law and the Gaussian peak are proportional to (1 + z) κ pl and (1 + z) κ peak respectively.The authors find marginal evidence in favour of the evolution of the mass with redshift.

Transition from power-law to peak
Figure 5 compares the mass distribution in the local Universe, z ≤ 0.2, with a pure power-law distribution with index α = −3.5, as found in Abbott et al. (2023b).This definition of local Universe is purely arbitrary and chosen for visualisation purposes.The non-parametric reconstruction is consistent with the power-law distribution in the 10 ∼ 40 M ⊙ region, up to redshift z ∼ 0.13, with cutoffs at both ends of the mass spectrum.At larger redshift, the distribution starts to deviate from the simple power-law behaviour, transitioning towards the ∼ 35 M ⊙ Gaussian- like feature.Given this qualitative agreement between our reconstruction and the power-law reported by Abbott et al. (2023b), we suggest that the PowerLaw+Peak model is actually driven by the local merger rate density, with a contribution for the Gaussian peak by the BHs further away in look-back time.

Peak evolution with redshift
Figure 4a shows that the pileup drifts towards larger masses with redshift.This behaviour is highlighted in Fig. 6.We define the peak of the pileup as the maximum of the distri-bution for8 M 1 > 25 M ⊙ .With this, we can isolate three different regimes: z < 0.2: this region is dominated by the low-mass powerlaw, therefore the maximum leans towards 25 M ⊙ , driven by the preference of the local Universe for lighter BHs.
The pileup here, if present, is subdominant; -0.2 < z < 0.7: in this redshift interval the number of available events is large enough to have an accurate reconstruction of the pileup: its peak steadily moves towards larger masses with look-back time, with a somewhat linear dependence; z > 0.7: the number of events beyond z = 0.7 is too limited: this fact, combined with a poor resolution of the binary parameters, makes the non-parametric reconstruction merely a hint of the actual behaviour of the underlying distribution.Therefore, the peak in this region is ill-defined, and its 90% credible interval encompasses almost all the mass spectrum.

Minimum and maximum mass of a black hole
Following Abbott et al. (2023b), we define the minimum and maximum mass of a BH as the 1st and 99th percentile of the mass distribution, M 1% and M 99% .We report these values in Table 1.The redshift values are the same used for the primary mass joy plot.For the same quantities, Abbott  2023b) report M 1% = 5.0 +0.9 −1.7 and M 99% = 44 +9.2 −5.1 : both values are in agreement with our findings for the lowredshift region, in which the mass distribution is dominated by the power-law-like feature.
We did not quote the credible interval for M 1% for redshifts above 0.25 due to the dominance of selection effects in the low-mass, high-redshift region of the parameter space.The vetted region bounds below the credible interval for this quantity: we deemed the uncertainty estimate for z > 0.25 unreliable, thus opting for quoting the median value only, for reference purposes and to show that a growing trend is however present.

GW190521: a stand-alone?
The primary mass spectrum also shows a small peak 9 at M 1 ∼ 90 M ⊙ , z ∼ 0.8 (see Figure 7).
This feature marks the presence of GW190521 (Abbott et al. 2020a), the most massive BBH in the high-purity catalogue we are considering here, with a total mass of ∼ 150 M ⊙ .In Abbott et al. (2020b) the primary mass of this BBH is said to be in tension with the population inferred from O1 and O2, thus suggesting the possibility for this event to be an 'outlier'.Despite its peculiar component masses, in the latest population studies (most notably Abbott et al. 2021e andAbbott et al. 2023b) GW190521 is found to be consistent with the rest of the BBH population.Here we do not quantitatively assess whether if this event is an outlier or not, but rather we take the occasion to briefly discuss the concept of outliers in the context of non-parametric methods.for z = 0.8.The shaded area correspond to the 68% credible region.The peak at M 1 ∼ 90 M ⊙ marks the presence of GW190521.
An outlier is often defined as a data point that is difficult to explain with or is unlikely to have been produced by a specific population model: this is the case, for example, of a point that has been drawn by a sub-population not included in the model.However, in the context of non-parametric analysis we cannot really talk about outliers in this sense, given that the inferred distribution is the one that best describes all the available data, regardless of their origin.It is up to the analysts to decide whether to include a certain data point (or, in our case, an event) in the data set.An event or a group of events that shows different properties from the rest of the catalogue, in a non-parametric analysis, is simply modelled as a sub-population on its own, as it is also the case for the mass ratio of GW190412 and GW190917_114630.We therefore think that, in this context, 'stand-alones' could be a better name for these events.
GW190521 could be, qualitatively, a stand-alone event in this sense, and might be hinting towards the presence of a new, previously unobserved sub-population of BBHs.This possibility is supported by the results presented by Morton et al. (2023), where we find evidence for GW190521 to be formed in an AGN disk.In AGN disks, BBHs might form with masses that reach and exceed 100 M ⊙ , due to the increased probability for hierarchical mergers and accretion (McKernan et al. 2012(McKernan et al. , 2018;;Bellovary et al. 2016;Bartos et al. 2017;Stone et al. 2017;Yang et al. 2019;Tagawa et al. 2020;Gayathri et al. 2022).GW190521 could be the first observed BBH to belong to this sub-population (Graham et al. 2020).

Discussion: interpretation of the astrophysical distribution
We now discuss the astrophysical implications of the distribution reported in the previous section.In what follows, we will assume that the inferred distribution is correct and not affected by the possible caveats discussed above.

Primary mass
We find that the astrophysical population of BBHs changes with look-back time.Our main result is the morphological evolution of the distribution of primary masses as a function of redshift.In the low-redshift Universe most BBHs have primary mass < 20 M ⊙ , in agreement with the X-ray binaries in the Milky Way (e.g., Özel et al. 2010;Farr et al. 2011).
A second possible interpretation for the evolution of the primary mass function is a change of the relative importance of different formation channels with redshift.For example, the dominant formation channel might change from isolated binary evolution to dynamical assembly, as redshift increases.Several previous studies indicate that we expect higher BBH masses from dynamical formation in dense star clusters (e.g., Rodriguez et al. 2016a;Askar et al. 2017;Chatterjee et al. 2017;Antonini et al. 2019;Kremer et al. 2020;Torniamenti et al. 2022;Antonini et al. 2023) and AGN disks (e.g., McKernan et al. 2012, 2018;Bellovary et al. 2016;Bartos et al. 2017;Stone et al. 2017;Tagawa et al. 2020).In particular, globular clusters' formation peaks at redshift z ∼ 3 − 4 (VandenBerg et al. 2013;Choksi et al. 2018;El-Badry et al. 2019).This redshift-dependent formation history, combined with longer-than-expected delay times and/or with a higher-than-expected merger efficiency, might produce a dominant population of massive BH mergers down to a redshift z ∼ 0.4 (Rodriguez & Loeb 2018;Choksi et al. 2019;Antonini & Gieles 2020;Antonini et al. 2023;Mapelli et al. 2022).
If this interpretation is correct, we should observe an evolution with redshift not only of the primary mass function but also of the effective and precessing spin parameters.In fact, dynamical interactions randomise the orientation of BH spins, leading to a nearly-isotropic spin distribution, while isolated binary evolution tends to align BH spins with the orbital angular momentum of the binary systems (e.g., Kalogera 2000;Rodriguez et al. 2016b).Moreover, hierarchi-cal BBH mergers are associated with large spin magnitudes, at least for the primary BH, because the dimensionless spins of merger remnants peak at ∼ 0.7 (Buonanno et al. 2008;Jiménez-Forteza et al. 2017).Hence, an increase of the hierarchical merger population with redshift would result into a broadening of the effective spin distribution with redshift, and would lead to the emergence of a peak in the precessing spin around ∼ 0.7 (Baibhav et al. 2020;Mapelli et al. 2021).The selection function implementation used in this work does not account for spins, therefore we did not include these parameters in this work.We plan to investigate the effective spin distribution in a future paper.Here, we mention that Biscoveanu et al. (2022) find an increase in the width of the effective spin distribution of BBHs with increasing redshift.This might be a hint of a larger contribution of the hierarchical merger scenario at higher redshift (Baibhav et al. 2020;Mapelli et al. 2021;Kritos et al. 2022;Santini et al. 2023), but it is still consistent with other scenarios.
A third hypothesis (not necessarily alternative to the other two) is that the evolution of the primary mass function captured by our method stems from processes that current models fail to properly describe.Overall, models that do not predict any evolution (or predict only a mild evolution) with redshift (e.g., Mapelli et al. 2019) are already challenged by our findings.

Mass ratio
Our results seem to indicate that the preferred BBH mass ratio is q ∼ 0.7, with little or no evolution with redshift (with the exception of the lowest redshift region, dominated by the events GW190412 and GW190917_114630).In contrast, most current models (especially isolated binary evolution via common envelope and chemically homogeneous evolution) indicate a preference for equal-mass or nearly-equal mass systems (e.g., Belczynski et al. 2016;de Mink & Mandel 2016;Giacobbo & Mapelli 2018;Klencki et al. 2018;Kruckow et al. 2018;Broekgaarden et al. 2022), with a few exceptions (van Son et al. 2022;Santoliquido et al. 2023).
Most dynamical formation channels allow for a larger fraction of unequal-mass systems than isolated binary evolution (e.g., Di Carlo et al. 2019;Arca Sedda et al. 2020;Torniamenti et al. 2022).In particular, BBHs born from hierarchical mergers have a strong preference for unequal mass mergers (Rodriguez et al. 2019;Fragione & Silk 2020;Zevin & Holz 2022).However, recent studies show that hierarchical mergers cannot account for the majority of LVK events (Zevin & Holz 2022;Fishbach et al. 2022).
Recently, Santoliquido et al. (2023) have shown that metal-poor and metal-free BBH mergers tend to have q ∼ 0.7 at redshift z ≲ 5 (see their Fig. 7).This feature is mostly an effect of the evolutionary channel: in their models, all the BBHs with q ≤ 0.7 form from binary systems in which the progenitor star of the primary BH retains at least a fraction of its H-rich envelope at the time it collapses to BH.In most cases, the binary system evolves via stable mass transfer.Similarly, van Son et al. (2022) find that BBHs formed via stable mass transfer in binary systems peak at q ∼ 0.6 − 0.7 (see their Fig.9).Notably, the revised treatment of Roche-lobe overflow stability by Olejak et al. (2021) leads to significantly lower values of q ∼ 0.4 − 0.6 (see their Figure 4): most of the unequal-mass BBHs in their model form without common-envelope evolution.Clues in this direction are also reported in Gallegos-Garcia et al. (2021).
Overall, none of the current models in the literature is able to explain at the same time a non-evolving mass ratio q ∼ 0.7 and the rapid evolution of the primary mass with redshift we find with (H)DPGMM.This might indicate that astrophysical models fail to describe some key physical process, for example the efficiency of mass accretion and angular momentum transport during mass transfer (e.g., Gallegos-Garcia et al. 2023;Willcox et al. 2023), the impact of stellar rotation (e.g., Mapelli et al. 2020;Marchant & Moriya 2020;Riley et al. 2021), or the efficiency of dynamical assembly (e.g., Fragione & Kocsis 2018;Mapelli et al. 2022).

Conclusions
We have applied our non-parametric model, (H)DPGMM, to the GW events included in GWTC-3, focusing on the primary mass, mass ratio, and redshift distribution.Such distribution, once corrected for selection effects, clearly shows the presence of correlation among the three parameters, providing support in favour of the evolution of the primary mass distribution with redshift.In particular, we found that: -The mass ratio distribution does not show support for symmetric binaries, favouring systems with q ∼ 0.7 mostly independent of redshift; -The primary mass distribution is composed of two distinct sub-populations: a low-mass, low-redshift powerlaw-like population and an high-mass, high-redshift Gaussian-like feature.The change between the two subpopulations happens between z ∼ 0.2 and z ∼ 0.4; -The position of the Gaussian-like feature evolves with redshift, drifting towards larger masses with look-back time.
We outlined two possible, non-mutually exclusive explanations for the evolution of the astrophysical BBH population with look-back time.The mass could follow a trend in metallicity, changing from low-mass BHs born from metalrich progenitor stars at low redshift to more massive BHs, produced by metal-poor and metal-rich stars.The other possible interpretation is a change of the relative importance of different formation channels, from a isolated evolutiondominated scenario at z < 0.4 to a prevalent contribution of dynamically assembled systems for z > 0.4.
Our findings for the mass ratio are in contrast with most of the current models for isolated binary evolution, where the favoured systems are symmetric or near-symmetric binaries; unequal-mass systems are more likely to be produced by dynamical formation channels or hierarchical mergers.The apparent contradiction between the rapid evolution of the BH mass function and the constancy of the mass ratio distribution could hint towards some missing physics in current astrophysical models.
The results presented in this paper will be crucial not only for the advancement of BH astrophysics, but also for all those investigations that, at different levels, rely on the BH mass distribution.For example, the mass function is a key ingredient of methods for the inference of cosmological parameters (Abbott et al. 2023a), the galaxy catalogue method (Gray et al. 2020), and the joint population and cosmology approach (Mastrogiovanni et al. 2021).Moreover, the presence of non-scale-invariant and redshift-evolving features like the pileup at z > 0.4 will have a dramatic impact on the methods that relies on features in the mass distribution to infer the cosmological parameters (Farr et al. 2019;Mancarella et al. 2022;Mukherjee 2022).
As a final remark, we remind that the results presented in this paper are obtained under two important caveats, namely that the data are representative of the underlying population (we are not in presence of censored data and the features in the underlying distribution are properly sampled) and that the approximant we employed captures all the features of the selection function.We made sure that these assumptions are met (see Appendices A, C and D).Nonetheless, it is still possible that some of the features we reported are to be ascribed to the limited number of available GW events: the new events detected during the currently ongoing LVK run, together with the flexibility of data-driven models such as (H)DPGMM to adapt themselves to new observations, will either confirm or disprove the findings reported in this paper and will unveil new features of the BH population.that we properly sampled all the regions of the parameter space.This assumption does not hold, however, in the case in which the astrophysical distribution has support in a region where the detection probability is negligible 12 or exactly zero, like in a region of the parameter space which is not covered by any waveform model.The selection function would in this case censor a portion of the data.An astrophysical distribution that extends its support in a censored region would not therefore be reconstructed properly by a non-parametric method: the observed distribution would be cut off by the selection function, biasing the subsequent inference of the astrophysical distribution.
In The median two-dimensional M 1 − z distribution follows, in the low-mass, high-redshift region, the iso-probability curves of the selection function, thus suggesting the censoring of a portion of the data.Conversely, this behaviour is not present for the credible regions of the observed distribution reconstructed with GWTC-3 (reported in Fig. D.2) suggesting, although heuristically, that we are not in presence of censored data.
12 For example, consider the case in which we expect one detection in a hundred years and we observed for one year only.Observed distribution for GWTC-3 reconstructed with figaro (blue line) along with the iso-probability levels of the selection (orange).The diagonal panels show the marginal distributions for M1, q and z (top to bottom) respectively and the panels below diagonal show the median distribution marginalised over the third variable.

Fig. 1 .
Fig. 1.Primary mass differential merger rate density reconstructed with figaro along with the PowerLaw+Peak model from Abbott et al. (2023b).The shaded areas correspond to 68 and 90% credible regions for (H)DPGMM (blue) and Power-Law+Peak (orange).

Fig. 3 .Fig. 4 .
Fig. 3. Mass ratio evolution with redshift: joy plot (a) and tomography (b).Both panels report the median distribution only:an animated version of this plot including the 68% credible regions is available as online-only supplementary material along with plots showing individual slices.This distribution, set aside an excess at low redshifts (z ≲ 0.2), is consistent throughout the whole redshift spectrum in having support at around 0.7, with negligible support for q = 1.

Fig. 7 .
Fig. 7.Primary mass conditional differential merger rate density Fig. C.1.figaro reconstruction (blue) of the M1, q and z simulated distribution (red).The diagonal panels show the marginal distributions for M1, q and z (top to bottom) respectively and the panels below diagonal show the median distribution marginalised over the third variable.
Fig. D.1 we show the behaviour of an observed distribution whose astrophysical distribution has support in a censored region of the selection function used throughout this paper (Lorenzo-Medina & Dent 2023).The simulated underlying distribution is the same as in Appendix C, with the exception of p(z Fig. D.1.Observed distribution for the simulated catalogue reconstructed with figaro (blue line) along with the iso-probability levels of the selection function (red).The diagonal panels show the marginal distributions for M1, q and z (top to bottom) respectively and the panels below diagonal show the median distribution marginalised over the third variable.
Fig. D.2.Observed distribution for GWTC-3 reconstructed with figaro (blue line) along with the iso-probability levels of the selection (orange).The diagonal panels show the marginal distributions for M1, q and z (top to bottom) respectively and the panels below diagonal show the median distribution marginalised over the third variable.

Table 1 .
Minimum and maximum value for the BH mass as a function of redshift.For the minimum mass, we quote the credible interval only for z < 0.25 due to the dominance of selection effects in the low-mass, high-redshift region of the parameter space.