icarogw : A python package for inference of astrophysical population properties of noisy, heterogeneous and incomplete observations

We present icarogw 2.0 (cid:135) , a pure CPU / GPU python code developed to infer astrophysical and cosmological population properties of noisy, heterogeneous, and incomplete observations. icarogw 2.0 is mainly developed for compact binary coalescence (CBC) population inference with gravitational wave (GW) observations. The code contains several models for masses, spins


Introduction
Inferring Cosmology and AstRophysics with Observations of Gravitational Waves (ICAROGW1 ) is a pure python code developed to infer the population properties of compact binary coalescences (CBCs) observed with gravitational waves (GWs).The problem of inferring the population properties from a sample of observations of astrophysical sources is a very common and longstanding problem across a range of research topics.With almost 100 GW observations from the last runs of the LIGO, Virgo, and KAGRA (LVK) collaboration (Abbott et al. 2023a), GW sources are moving rapidly to the "population-inference" domain.The first distribution of ICAROGW was presented in the context of GW cosmology in Mastrogiovanni et al. (2021) and first used in the LVK analysis of Abbott et al. (2023b) and released on the LVK official GitLab page ɭ2 .The first version of ICAROGW was also used in independent studies for population code validations (Karathanasis et al. 2023b;Turski et al. 2023) beyond general relativity (GR; Leyde et al. 2022), astrophysical processes (Karathanasis et al. 2023a), and primordial black hole models (Liu et al. 2023;Zheng et al. 2023).
For astrophysics, population inference involves the correction of a selection bias, namely, the Malmquist bias (Malmquist 1922) that prevents the observation of a particular class of astrophysical processes, thus biasing the population analysis when not properly taken into account.Selection biases appear in many astrophysical observations involving neutrino physics (Loredo & Lamb 2002), exoplanets (Foreman-Mackey et al. 2014;Winn & Fabrycky 2015), galaxies and galaxy clusters (Gonzalez & Faber 1997), and γ-rays (Loredo & Wasserman 1998).For GW observations, selection biases are introduced by the sensitivity of the detector as a function of the GW frequency, which is also related to the binary parameters such as masses and redshift.
Besides the correction for the selection bias, GW population inference also has to account for possible uncertainties in the measurement of the source parameters, since the GW signal is always observed in addition to noise in the data (heterogeneous data).Therefore, when reconstructing the population properties of GW signals, we face noisy, heterogeneous, and incomplete observations that require a specific statistical framework.Bovy et al. (2011) also referred to this type of analysis as extreme deconvolution.Current techniques for population inference include the use of hierarchical Bayesian inference (Mandel et al. 2019;Vitale et al. 2022).The code

Inhomogeneous Poisson process and Bayesian inference
The main application of ICAROGW is to infer the population parameters, Λ, that describe the production rate of events in terms of the GW parameters, θ, namely, dN dtdθ (Λ).For instance, in Sect. 4 we will consider the rate of CBC in terms of the source frame masses.The hierarchical likelihood of obtaining N obs observations, each described by some parameters, θ, in a data collection {x} for a given observing time, T obs , from a population of events, with a constant rate and in presence of selection biases is given by (see Mandel et al. 2019;Vitale et al. 2022 for a detailed derivation): N obs i=1 T obs L(x i |θ, Λ) dN dtdθ (Λ)dθ. (1) Equation ( 1) is often referred to as a "hierarchical likelihood".By assuming a "scale-free" (i.e., neglecting information about the rate) prior π(N exp ) ∝ 1/N exp on the expected number of detections, an equivalent form of Eq. ( 1) can be derived: L(x i |θ, Λ) dN dtdθ dθ p det (θ, Λ) dN dtdθ dθ . (2) Both the hierarchical likelihoods in Eqs. ( 1) and ( 2) contain several crucial quantities to the inference problem.In what follows, we will explain how ICAROGW numerically computes these quantities and obtains a numerical evaluation of the full hierarchical likelihood defined in Eq. ( 1).
The first central quantity is the single-event likelihood L(x i |θ, Λ), it quantifies the measurement uncertainties of the GW parameters θ.The single-event likelihood is usually not directly accessible, instead we are provided with N s,i (PE) posterior samples drawn from p(θ|x i , Λ) ∝ L(x i |θ, Λ)π PE (θ|Λ), where π PE (θ|Λ) is the prior used to generate the samples.The PE samples represent the source parameters that we believe given the observation of data x.ICAROGW numerically evaluates the likelihood integral in Eq. ( 1) and in the numerator of Eq. ( 2) via Monte Carlo integration by summing over PE samples: where the index, i, refers to the event and the index, j, to the posterior samples of the events.The weight w i, j is the number of CBC mergers happening per unit of time.As Eq. ( 3) is evaluated with a finite sum over posterior samples, we introduce the numerical stability estimator called "effective number of posterior samples" per event i, as introduced in Talbot & Golomb (2023): This estimator quantifies how many samples per event are contributing to the evaluation of the integral.Typically, in population analyses such as the one presented in Abbott et al. (2023b), it is necessary to have at least an effective number of posterior samples equal to 20 for each event and population model supported by the analysis.In case this requirement is not satisfied, ICAROGW will artificially associate a null likelihood to the population model, as the model cannot be trusted.
The second central quantity is the "expected number of events", N exp (Λ), which is related to the selection bias and can be evaluated as: where p det (θ, Λ) is a detection probability that can be calculated as: However, we do not have access to an analytical form of the detection probability, unless some simplifying assumptions are made; we refer to Gair et al. (2023) for an introductory example in the context of GW cosmology with galaxy catalogues.The current approach to evaluate selection biases is to use Monte Carlo simulations of injected and detected events (Abbott et al. 2021b), often shortly referred to as "injections".The injections are used to evaluate the signal detectable volume that can be explored in the parameter space and correct for selection biases.a prior π inj (θ) to calculate the integral in Eq. ( 5), using Monte Carlo integration as follows: Here, we have again defined a weight, s j , with the dimension of a CBC merger rate per detector time.The injection prior π inj (θ) must be properly normalized to obtain a reasonable value of N exp , while a wrong normalization of π PE (θ) (used in Eq. ( 3)) will only result in an overall normalization factor to the overall hierarchical likelihood.Following Farr (2019), for Eq. ( 7), we can also define a numerical stability estimator, namely, the "effective number of injections", as follows: where typical value for numerical stability is N eff,inj > 4N obs .In summary, the hierarchical likelihoods defined in Eqs.(1) and (2) are calculated using the approximations of the Monte Carlo integrals in Eqs.(3) and ( 7).The Monte Carlo integrated version of Eq. ( 1) is then: while for the scale-free version, the Monte Carlo version for Eq. ( 2) is: For each population model, ICAROGW calculates one of the two likelihoods and the numerical stability estimators.If at least one of these numerical estimators is below the threshold set by the user, ICAROGW returns a result of ln[L({x}|Λ)] = −∞.This keeps the population model from being chosen.

Code structure and GPU performance
ICAROGW contains several python modules used for population inference.In Fig. 1, we display a schematic view of ICAROGW.The PE samples and injections are stored in two different objects, although they interact with common functions to calculate the CBC merger rate.The CBC rate models are the central objects of the code, they are python classes that contain the production rate dN dtdθ .Each one specifies what are the event-level parameters θ used to calculate dN  dtdθ and what are the population-level parameters, Λ, used to calculate the rate model.The rate class also contains instructions on how to update the rate model from the population parameters Λ.As we describe more in detail in Sect.4, for GW cosmology, ICAROGW distinguishes between detector frame frame parameters, θ D , and source frame parameters, θ S .Detector frame parameters are the CBC physical parameters observed on Earth, while the source frame ones are the actual parameters of the source.The GW parameters θ D that are necessary to calculate the CBC merger rate are passed to a CBC rate class that performs the conversion to source frame quantities, θ S , and calculates the weights, w i, j and s j , that are to be used for the calculation of the hierarchical likelihood and numerical stability estimators.Finally, the hierarchical likelihood computation is handled by a function embedded in the python package BILBY (Ashton et al. 2019b,a;Ashton & Talbot 2021) for Bayesian inference.
Almost all the functions of ICAROGW can also be executed on Graphical Processing Units (GPUs).A GPU is a specialized processor devoted to light and parallel computations.ICAROGW uses cupy (Okuta et al. 2017) to parallelize on GPU the computation of the weights, w i, j , s k , and all the quantities that are computed from them.
The use of the GPU greatly enhances the computation time needed for the calculation of the hierarchical likelihood with respect to the time taken by Central Processing Units (CPUs).In Fig. 2, we show a comparison between the timing of the hierarchical likelihood in Eq. (1) computation with CPU and GPU using double float precision.The figure reports two figures of merit, the total time for the likelihood computation and the time taken per sample, in terms of the number of samples (PE+injections) used in the code.The figures of merit have been generated on a computing machine employed to obtain the results of Mastrogiovanni et al. (2023).
The total time of the likelihood is a proxy for the time required to obtain a population fit as a function of the data size.
For both panels, the CPU is reported with blue pentagons and GPU with orange squares.The two panels are reported as a function of a number of samples, namely, the sum of the number of PE samples and injections used for the computation.The CPU markers are generated up to 10 6 samples since at 10 5 samples the efficiency is already saturated.Figures generated with double float precision on CPU Intel Core i9-11950H (8 cores HT, 2.6-5.0GHz Turbo) and GPU NVIDIA GeForce RTX3080 (16Gb GDDR6, 6144 cores CUDA).
From Fig. 2, we can observe three regimes.For a number of samples (<10 4 ), the CPU's and GPU's timing coincide, meaning that the computing time is not dominated by the sample size but by the internal python routines.For a number of samples between 10 4 and 10 6 , the CPU's time starts to be dominated by the sample size.In this middle regime, the GPU timing can be further improved by acting on routines that do not use data samples.The last regime is achieved for a number of samples higher than 10 6 , where both the CPU and GPU timings are dominated by the data samples.In agreement with Talbot et al. (2019), we find that the GPU enhance the computation time of a factor ∼100 for population inference.
In Mastrogiovanni et al. (2023), about 10 5 samples were used on a CPU to obtain a population fit in 1 week of computation time.If 10 6 samples on CPU are used (i.e., ten times more GW events are considered), then the total time required would be equal to as many as ten weeks of computation.Indeed, ten weeks of computation would not be strictly computationally prohibitive, but this choice could certainly impact the number of models and exploratory studies that could be performed.
The efficiency per sample is another way of quantifying when the CPU's and GPU's timings will start scaling with the number of samples.The efficiency per sample is defined as the data sample size divided by the total computing time.Figure 2 shows three regimes analogous to the total likelihood time.For a number of samples (<10 4 ), the efficiency improves for both the CPU and GPU and the computing time is not dominated by the sample size.For a number of samples higher than 10 4 , the CPU has already saturated its efficiency to a value of 10 6 samples per second and the CPU's total time will start scaling linearly with the number of samples.Finally, for a total number of samples higher than 10 6 , the GPU's efficiency has saturated to a value of 10 8 samples per second per sample and the total likelihood time will start scaling linearly as the samples pass.
Indeed, we expect that different hardware configurations will display distinct values for the total likelihood time, efficiency, and for the transition between the three regimes.However, we expect the considerations discussed above to apply in general to all hardware facilities including GPU and CPU.Therefore, GPU computing will play a crucial role for future populations of GW events.

Application to a compact binary coalescence case
Although where t d is the detector unit time.Equation ( 11) is the CBC detector rate model.The rate in Eq. ( 11) does not strictly include any information about the Universe expansion model and its parameters.The information about cosmology can be included by rewriting the CBC merger rate in terms of source-frame parameters such as the source masses and redshift.The source masses are related to the detector masses through the relation: and the luminosity distance is related to the redshift by the choice of a cosmological model and possible GR deviations at cosmological scales.ICAROGW contains several parameterizations for the luminosity distance that are reported explicitly with their "population parameters" in Appendix A.1 for GR models and in Appendix A.2 for modified gravity models.
The CBC merger rate at the detector can be written as a function of source-frame parameters by performing a change of variables that involves the choice of the cosmological parameters, namely: In the equation above, the factor of 1/1 + z comes from the difference between source-frame and detector-frame time, and is the Jacobian for the change of variables between the detector and the source frames.The expression of the differential of the luminosity distance can be found in Appendix A.1 for standard is the CBC merger rate in the source frame and it is described by the distributions of CBCs in terms of redshift, spins and source frame masses.Several parameterizations of the CBC merger rate in the source frame are possible depending on the observational data available.We cover all these cases in the subsections below.

Spectral sirens merger rates
The first case that we discuss is the "spectral siren" analysis (Ezquiaga & Holz 2022).For this case, we are interested in inferring population properties of the source rate model, as well as cosmology and GR deviations from a population of GW events alone.For this model, the detector event parameters θ D are (d L , m 1,d , m 2,d , χ), namely, luminosity distance, detector masses, and spin parameters.The source event parameters are θ S = (z, m 1,s , m 2,s , χ), namely, the redshift, two source masses, and the spin parameters.The detector rate for the spectral sirens analysis is parameterized as: where R 0 is the CBC merger rate per comoving volume per year (in Gpc −3 yr −1 ), Ψ(z; Λ) is a function parametrizing the rate evolution in redshift such that Ψ(z = 0; Λ) = 1, p pop (m 1,s , m 2,s |Λ) is a prior distribution describing the production of source masses, and p pop (χ|Λ) is a prior distribution for the production of spin parameters.ICAROGW implements the prior and rate distributions that are typically used for CBC population studies and motivated by population syntheses studies of CBCs (Abbott et al. 2023b).Appendix B.2 lists and describes the models implemented for the masses, Appendix B.3 the models for spins and Appendix B.1 the models for the merger rates as a function of redshift.Finally, V c is the comoving volume, that also depends on the cosmological parameters.

Galaxy catalog merger rates
The "galaxy catalog" parametrization adds information on the GW event redshift from galaxy surveys (Schutz 1986;Del Pozzo 2012;Gray et al. 2020Gray et al. , 2022)).Also in this case, we are interested in inferring population properties of the source rate model, as well as the cosmology and GR deviations from a population of GW events, this time with extra information coming from the galaxy catalog.
In this method, the detector parameters θ D = (d L , m 1,d , m 2,d , Ω, χ) are the luminosity distance and detector masses, sky direction pixel, and spins.The sky direction pixel area is measured in squared radians.The source event parameters are θ S = (z, m 1,s , m 2,s , Ω, χ), that is, the redshift and the two source masses, sky direction, and spins.Here, we directly provide the parametrization of the CBC merger rate for this method, for a more detailed derivation, see Mastrogiovanni et al. (2023).The detector rate for the galaxy catalog analysis is parameterized as: where R * gal,0 is the local CBC merger rate per galaxy per year (in yr −1 ).The sum of the two terms in the square brackets represents the galaxy number density in redshift and sky area that could host GW sources (see Mastrogiovanni et al. 2023 for more details).The first term is the completeness correction, that is, it accounts for missing galaxies.It depends on the absolute magnitude threshold of galaxy detection, M thr , on how likely more luminous galaxies can emit GW events (through the ϵ parameter), and on the Schecter luminosity function and its parameters ϕ * and α, x min/max are defined in Mastrogiovanni et al. (2023) and are related to the minimum and maximum of the Schecter function.The second term in the square brackets is given by the galaxy distribution reported in the catalog.The function f L (M(m j , z); ϵ) quantifies how likely luminous galaxies emit GW events, while p(z|z j obs , σ j z,obs ) is the probability of having a certain value of z given observed values of galaxy redshift inside the catalog.

Sources with observed EM counterparts
A third methodology to infer population and cosmological properties for GW events with an associated EM counterpart is also implemented.The model now takes into account additional constraints on sky position and redshift from a putative EM counterpart.In this case, the CBC merger rate can still be parametrized as done in the previous sections, but the overall hierarchical likelihood should be modified to include the additional EM information.We might be in two types of observational cases that we describe in the following.

High-latency analyses
In the case that we perform a "high latency" population analysis for the GW and EM observation, we expect GW studies to have produced PE samples on all the GW parameters such as sky position, redshift, and masses.In addition, the EM counterpart will provide us with an accurate sky localization of the source and a redshift localization with small uncertainties due to the peculiar velocities of the host galaxy.
By assuming that the GW measure is independent of the EM measure, the overall likelihood term is now L EM+GW (x i |z, Ω, m 1,s , m 2,s ), which describes the measure of z, Ω, m 1,s , m 2,s from EM and GW data.Here we assume that the EM data measures Ω and z, while the GW data can measure Ω, z, m 1,s , m 2,s independently, so that: The integral of the numerator in Eq. ( 2) becomes A167, page 5 of 16 Mastrogiovanni, S., et al.: A&A, 682, A167 (2024) Regarding the denominator of Eq. ( 2), namely the selection bias, following the implementation of the same methodology in Gray et al. (2020), we assume that the probability of detecting an EM counterpart (if present) given the GW detection is always 1.This assumption allows us to calculate the selection bias in Eq. ( 2) only using GW detectable signals, without the need to fold in a detection probability model for the EM counterpart.This assumption is physically valid only when the detection range of GW observatories is significantly lower than the detection range of EM observatories.As GW detectors improve their sensitivity this assumption should be revised as it could introduce a systematic bias on the estimation of H 0 (Chen 2020).The integral in Eq. ( 18) can not be directly evaluated by calculating CBC rate weights w i, j and s l j as described in Sect. 2. Some additional steps need to be taken to account for the information from the EM counterpart.To perform the integral in Eq. ( 18), ICAROGW firstly identifies all the GW PE samples falling in the sky pixel in which we believe the EM counterpart is located.The PE selection procedure practically corresponds to performing the integral on dΩ in Eq. ( 18).Next, a function of redshift F(z) is defined from the selected GW PE samples.The function is where KDE is a kernel density estimate performed on the GW redshift samples z i with weights w i .The weights are given by: where π EM (z) is the prior used by the EM experiment to provide a redshift measure.The definition of F(z), and the fact that we selected GW PE in the EM counterpart localization area, allows us to rewrite Eq. ( 18) as where p EM (z|x i , Ω) is EM posterior that localizes the EM counterpart in redshift given a certain sky area.The above integral can be evaluated with the usual Monte Carlo sum, by assuming a set of PE samples on z drawn from the EM posterior as: (22)

Low-latency analyses
In some cases, we might want to produce a "low latency" inference of population parameters (e.g.Hubble constant) given a GW and EM observation.In this case, we are typically provided with low-latency EM and GW low-latency skymap.The low-latency GW skymap can be thought of as a "posterior" p GW (d L , Ω|x i ) of the luminosity distance and sky position given the observed GW event.The low-latency EM skymap can be thought of as a posterior p EM (z, Ω|x i ) for redshift and sky position.Also for this case, we make the assumption that the selection bias is dominated by the GW detection process and therefore no modelling for the EM selection bias is needed.
The GW-EM likelihood is a function only of sky position and redshift and can be written as The integral in the numerator of the hierarchical Likelihood in Eq. ( 2) becomes and the CBC merger rate is parameterized only in terms of redshift and sky position.We can now use the Bayes theorem to rewrite the above equation as: where with Λ we indicate any population parameter required to calculate the luminosity distance from redshift, while with π PE and π GW the priors used to generate the low-latency EM and GW skymaps.The integral now explicitly depends on the EM and GW low-latency skymaps.Eq. ( 25) can be evaluated analytically by drawing N EM samples on z and Ω the EM skymap, namely, The motivation for which the integral is evaluated by drawing samples from the EM skymap is that typically the EM source is better localized than the GW source.By drawing samples from the EM skymaps, we have a higher probability for the weights in Eq. ( 26) to be different from zero.Consequently, the numerical evaluation of I is more stable.

Code validation
We performed several tests reproducing population results on CBC that are consistent with previous analyses.In all the cases, we found that ICAROGW 2.0 is able to reproduce the results of previous works in the literature.

Spectral siren analyses
We used the same 42 BBHs used in Abbott et al. (2023b) to reproduce populations, cosmological, and beyond-GR analyses generated with ICAROGW 1.0.For the benchmarking, we employed a POWERLAW + PEAK model (see Appendix B.2) for the source mass distribution of BBHs and a MADAU-DICKISON merger rate model (see Appendix B.1).To evaluate the selection biases, we used the procedure described in Sect.2, based on a set of detectable GW signals released with Abbott et al. (2023b).
The first analysis that we reproduced is the one presented in Abbott et al. (2023b), which conjointly estimates the population parameters for the BBH distribution together with the cosmological parameters H 0 , Ω m , and w 0 (see Appendix A.1 for more details).The first column of Fig. 3 shows the quantile-quantile plots (QQ-plots) for the marginal posteriors on the population parameters obtained with ICAROGW 1.0 and ICAROGW 2.0.When the two posterior are in agreement, the QQ-plot displays a A167, page 6 of 16 Mastrogiovanni, S., et al.: A&A, 682, A167 (2024)  bisector.For all the population parameters, we obtained an excellent agreement between the two posteriors, as shown in the first column of Fig. 3.We also reproduced the constraints on modified gravity models and the population of BBHs generated with ICAROGW 1.0 in Leyde et al. (2022).The second, third, and fourth columns of Fig. 3 show the QQ-plots for three modified gravity models (see Sect.A.2 for more details).Also, in this case, the posteriors generated with ICAROGW 1.0 and ICAROGW 2.0 are in a good match between each other.The tests with spectral sirens are also in good agreement with results obtained from the independent code MGcosmoPop (Mancarella & Genoud-Prachex 2022) in Mancarella et al. (2022) (for the same set of BBH events).They are also in agreement with the results of Ezquiaga (2021) using events from O3a modified gravity propagation and with the population-only results generated with the code gwpopulation (Talbot et al. 2019).

Galaxy catalog analysis
We tested ICAROGW against the results generated by gwcosmo (Gray et al. 2020(Gray et al. , 2022) ) in Abbott et al. (2023b) using the GLADE+ (Dálya et al. 2022) galaxy catalog with the infrared K-band.
The procedure to construct a redshift distribution of possible host galaxies from the catalog is described in detail in Mastrogiovanni et al. (2023).In summary, all the galaxies reported in GLADE+ are divided into equal-sized sky pixels large 53 deg 2 .For each pixel, an apparent magnitude threshold is defined as the median of the apparent magnitude reported for all the galaxies.Galaxies fainter than the apparent magnitude threshold are removed from the catalog.Then, galaxies are subdivided into smaller pixels of 13 deg 2 that are the ones used to run the analysis.The completeness correction is applied following the procedure described in Mastrogiovanni et al. (2023), which consists of assuming a Schecter function model for the galaxy distribution and counting all the galaxies missing due to the apparent magnitude threshold cut.The Schechter function model assumed for the K-band galaxies is specified by the parameters M min = −27.85,M max = −19.84,α = −1.09,ϕ * = 0.03 Mpc −3 (for H 0 = 67.7 hu).For this benchmark analysis, following Abbott et al. (2023b), we assume that for each galaxy reported in the catalog, the probability of hosting a GW event is proportional to the galaxy's intrinsic luminosity.The source mass and redshift population model is fixed to the same used by gwcosmo in Abbott et al. (2023b).
We only considered the inference for the Hubble constant, H 0 , to make a direct comparison with Abbott et al. (2023b).The results that we obtained are shown in Fig. 4. We found A167, page 7 of 16 Mastrogiovanni, S., et al.: A&A, 682, A167 (2024) 0.00 0.01 0.02 a good agreement for almost all the 42 BBHs present in the dataset.This is not surprising since most of the results are dominated by the assumption of the source mass distribution (Abbott et al. 2023b) and, as discussed in the previous section, ICAROGW 2.0 agrees well with previous literature when using the spectral siren method.However, we note that the most close-by events such as GW150914, GW170814, and GW190824_021846 are only partially dominated by population assumption on masses and even for these events, we obtained posteriors that are in agreement.
We note that the posteriors we obtained for the catalog analysis are also consistent with the ones generated in Finke et al. (2021) from the DarkSirensStat code 3 .However, the results in Finke et al. (2021) are generated with a different choice of BBHs and galaxy catalogue descriptions compared to the analyses performed in Abbott et al. (2023b).
We also tested the H 0 -catalog analysis on GW190814, one of the best localized and close-by dark sirens.For this event, we compare with the new version of GWCOSMO (Gray et al. 2023).We performed two analyses, the first one using only the GW event and the spectral siren method, the second using galaxies reported in the K-band from GLADE+.In contrast to the previous analysis using the galaxy catalog, for GW190814, we used a pixel size (after computing the apparent magnitude cut) of 3 deg 2 because GW190814 localization area is a few tens of square degrees.Following Abbott et al. (2023b), we assumed that GW190814 is an NSBH and we used a source mass POWER LAW + PEAK source mass model with the same population parameters used in Abbott et al. (2023b).In Fig. 5, we show the comparison between GWCOSMO and ICAROGW for the spectral and catalog analyses.The two results are in excellent agreement, thus indicating the robustness of our analysis.

Sources with observed EM counterparts
To test the electromagnetic counterpart method, we inferred the Hubble constant using the BNS merger GW170817 and its EM counterpart.We performed two types of analyses, one using a "high-latency" approach and the other using a "low-latency" approach.For both cases, the injection set used to evaluate the A167, page 8 of 16 Mastrogiovanni, S., et al.: A&A, 682, A167 (2024) 20 40 60 80 100 120 140 selection biases is given by the BNS injection set released for O3 sensitivity in Abbott et al. (2023c).
For the high-latency approach, we used IMRPhenom PE samples from Abbott et al. (2021b), consistently with the analysis of GWCOSMO, while for the low-latency approach we use the GW170817 skymap publicly released for GWTC-2 4 .For both analyses, we used a sky position for the EM counterpart at right ascension 197.6 deg and declination of −23.4 deg.Moreover, we assumed a recessional velocity in the Hubble flow of v H = 3017 km s −1 with an uncertainty of σ v = 166 km s −1 (Abbott et al. 2021b).
The analysis done with the code gwcosmo in Abbott et al. (2021b) used slightly different population and cosmological models to what it is currently implemented in ICAROGW.Therefore, to make a fair comparison, we implemented in ICAROGW the population and cosmological models used in Abbott et al. (2021b).First, the luminosity distance was approximated using linear cosmology, namely: The assumption corresponds to the following relations for the jacobians between source and detector frame and the differential of the comoving volume: Second, The CBC merger rate model for GW170817 used in the analysis was uniform in detector masses, namely where the Θ function ensures that the detector's secondary mass is lighter than the primary one.The analysis also neglected the High-latency icarogw GWTC-2 gwcosmo Low-latency icarogw Fig. 6.Posterior distributions for H 0 obtained from GW170817 with ICAROGW 2.0 with the high-latency approach (blue line) and lowlatency approach (green dotted line) in comparison with gwcosmo using the high-latency approach (orange dash line).The posterior of gwcosmo is taken from Abbott et al. (2021b).
1/1 + z factor from the difference between source and detector frames.The overall merger rate was: We remark that the aforementioned assumptions on cosmology and rate model are not expected to provide a noticeable difference when calculating the weights w i, j for the GW170817 PE samples.This is because GW170817 is a very close-by GW event, and even for extreme values of H 0 , it remains at low redshift where the linear cosmology approximation is enough.Moreover, assumptions about the masses for GW170817 are not expected to strongly bias the result in the presence of an EM counterpart, as shown in Mastrogiovanni et al. (2021).However, both masses and cosmological assumptions are expected to have an impact on the calculation of the selection bias.With O3 sensitivities, BNSs are detected up to a luminosity distance of ∼300−400 Mpc, where the linear cosmology approximation can fail (especially if a high value of H 0 is chosen).So, even when reproducing GW170817, it is important to consider the rate model assumed in Abbott et al. (2023c).Thus, we implemented the rate model in Eq. ( 32) to study this comparison.
Figure 6 shows the posteriors that we obtain with ICAROGW 2.0 and the method highlighted in Sect. 4 for GW170817 in comparison with gwcosmo.The posteriors for the high-latency analyses are in good agreement with each other.The main difference that we have found is that the low-latency posterior based on the GW skymap supports higher values of H 0 .The result is motivated as follows.By fixing the sky localization of GW170817 EM counterpart, the GW PE samples and skymap have different support for the luminosity distance location.In Fig. 7, we show that the skymap has a posterior distribution approximated as a Gaussian and it peaks at lower distance values if compared to the PE samples.At a fixed value of redshift for the EM counterpart, a lower luminosity distance can be fit using a higher value of H 0 .This is the motivation for which the low-latency analysis supports slightly higher values of H 0 .Concerning the high-latency analyses, the ICAROGW analysis excellently agrees with the GWCOSMO analysis of Abbott et al. (2021b).

Conclusions and future developments
In this paper, we present ICAROGW 2.0, a python software for population properties inference in the presence of selection biases.We describe some of the tests performed to check the validity of the CBC rate models implemented.We show that the results obtained by ICAROGW 2.0 with the spectral sirens method are consistent with the previous version of the code (with and without the use of modified gravity models).We also show that the galaxy catalog analysis matches results generated by the code GWCOSMO.Finally, we show how EM counterparts can be employed with the CBC rate models to infer the vale of the Hubble constant.We discuss two approaches from the EM counterpart case that are able to produce consistent results with GWCOSMO.
ICAROGW 2.0 can be easily adapted to any custom population inference problem involving the presence of noisy measurements and selection biases.Future development plans in GW science for ICAROGW include more realistic models for CBCs that might include correlations among different variables (e.g.mass and redshift), the inclusion of more beyond-GR models, and time-delay models.The latest version of ICAROGW 2.0 is available to use in a public GitHub5 repository.
The Jacobian is given by: Appendix A.2.2: The phenomenological log parametrization The luminosity distance is given by: (A.9) The Jacobian is given by: (A.10) Appendix A.2.3: Extra-dimensions In the extra-dimensional model, the luminosity distance is given by (see Eq. 2.22 in Corman et al. 2022): .11)We define the following function: .12)We also define the exponential: .13)We can write the previous equation as: .14)

Fig. 1 .
Fig.1.ICAROGW logical structure.The orange boxes highlight the computation of the rate weights which are needed for the calculation of the hierarchical likelihood given in Eq. (9).

Fig. 2 .
Fig.2.ICAROGW computational benchmarking on CPU and GPU.Top: average elapsed time taken by ICAROGW to evaluate the hierarchical likelihood in Eq. (1).Bottom: efficiency, defined as number of samples processed per second to evaluate the hierarchical likelihood in Eq. (1).For both panels, the CPU is reported with blue pentagons and GPU with orange squares.The two panels are reported as a function of a number of samples, namely, the sum of the number of PE samples and injections used for the computation.The CPU markers are generated up to 10 6 samples since at 10 5 samples the efficiency is already saturated.Figures generated with double float precision on CPU Intel Core i9-11950H (8 cores HT, 2.6-5.0GHz Turbo) and GPU NVIDIA GeForce RTX3080 (16Gb GDDR6, 6144 cores CUDA).

Fig. 3 .
Fig. 3. Quantile-quantile plots of the marginal posterior distributions of the CBC merger rate population parameters inferred from the 42 BBHs in GWTC-3.The plots columns correspond to the model used to calculate the GW luminosity distance as a function of redshift, see Appendices A.1-A.2.The legend and different lines report the population parameters.All the benchmark runs use the POWER LAW + PEAK model for the source mass (cf.Appendix B.2) and the MADAU-DICKINSON merger rate model (cf.Appendix B.1).

Fig. 5 .
Fig.5.Hubble constant posteriors generated for the well-localized event GW190814.The plot reports results generated with ICAROGW and the GWCOSMO code for two cases: the spectral siren and galaxy catalog cases (see legend).

Fig. 7 .
Fig. 7. Luminosity distance posterior probability density function conditioned on the EM counterpart line-of-sight for GW 170817 using PE samples (orange histogram) and the low-latency skymap (blue line).
ICAROGW can be adapted to any problem involving an inhomogeneous Poisson process in the presence of selection bias, the code is mostly developed for GW science with CBCs.From the GW detections, we estimated the luminosity distance of the source, d L , the detector masses, m 1,d , m 2,d , and the sky position, Ω, as well as a number of parameters related to the spins of the compact objects (we indicate them generally with χ, although two parameterizations are available; see Appendix B.3).Therefore, the rate of CBC mergers as seen in detectors' data must be modelled as a function of d L , m 1,d , m 2,d , and χ, namely with a rate: