The potential role of binary neutron star merger afterglows in multimessenger cosmology

Binary neutron star mergers offer a new and independent means of measuring the Hubble constant $H_0$ by combining the gravitational-wave inferred source luminosity distance with its redshift obtained from electromagnetic follow-up. This method is limited by intrinsic degeneracy between the system distance and orbital inclination in the gravitational-wave signal. Observing the afterglow counterpart to a merger can further constrain the inclination angle, allowing this degeneracy to be partially lifted and improving the measurement of $H_0$. In the case of the binary neutron star merger GW170817, afterglow light-curve and imagery modeling thus allowed to improve the $H_0$ measurement by a factor of 3. However, systematic access to afterglow data is far from guaranteed. In fact, though each one allows a leap in $H_0$ precision, these afterglow counterparts should prove rare in forthcoming multimessenger campaigns. We combine models for emission and detection of gravitational-wave and electromagnetic radiation from binary neutron star mergers with realistic population models and estimates for afterglow inclination angle constraints. Using these models, we quantify how fast $H_0$ will be narrowed-down by successive multimessenger events with and without the afterglow. We find that, because of its rareness and though it greatly refines angle estimates, the afterglow counterpart should not significantly contribute to the measurement of $H_0$ in the long run.


Introduction
The detection of gravitational waves (GWs) from compact binary coalescence (Abbott et al. 2019) has opened a new window to study the Universe. Gravitational-wave sources are a new type of "standard candle", usually referred to as "standard sirens" as it is possible to directly measure their luminosity distance (d L ) from the GW signal (Schutz 1986). Hence, if supplied with the source redshift information, GW detections can be used to measure cosmological parameters (Holz & Hughes 2005;Nissanke et al. 2013;Chen et al. 2018;Mortlock et al. 2019a), such as the Hubble constant H 0 . This possibility is of great interest given the current tension between the H 0 measurement at early and late epochs of the Universe (Freedman 2017).
The first GW measurement of H 0 was made possible by the multimessenger observation of the binary neutron star (BNS) merger GW170817 (Abbott et al. 2017c) and its associated kilonova, which enabled the identification of the host galaxy and its redshift, leading to a new and independent measurement of H 0 = 70 +12 −8 km/s/Mpc (Abbott et al. 2017a). Currently, GW170817 is the only GW event observed with an electromagnetic counterpart that allowed this kind of cosmological measure. In the absence of an electromagnetic counterpart, one can use the redshifts of all cataloged galaxies with positions consistent with the 3D GW skymap to measure H 0 (the so-called "dark siren" method; Fishbach et al. 2019;Soares-Santos et al. 2019;Gray et al. 2020), leverage tidal effects in the GW waveform to estimate the source redshift (Messenger & Read 2012;Del Pozzo et al. 2017), or exploit the power spectrum mastrosi@apc.in2p3.fr duque@iap.fr of GWs and galaxy distributions (Mukherjee et al. 2021b(Mukherjee et al. , 2020.
These methods encounter challenges due to galaxy catalog incompleteness and the difficulty in detecting tidal effects in the GW signal, as well as uncertainty in the equation of state of neutron star matter. Overall, an electromagnetic counterpart detection remains the best prospect for obtaining precise H 0 measurements with GWs.
The H 0 measurement is limited by the large uncertainty on the GW luminosity distance. This arises because the GW emission is not isotropic and the luminosity distance determination is degenerate with the binary orbital inclination to the line-of-sight, hereafter referred to as ι (Chen et al. 2019). To improve the H 0 measurement, it is crucial to break this degeneracy.
One solution is to measure ι from anisotropic electromagnetic signals emitted after the merger. The most stringent constraint comes from the photometry and very long baseline interferometry (VLBI) imaging of the afterglow. This is nonthermal radiation from the shock formed as the relativistic merger outflow decelerates in the circum-merger environment. For GW170817, these observations provided inclination angle measurements precise to ∼ 12 deg (e.g., Guidorzi et al. 2017;Troja et al. 2019;Hajela et al. 2019) and ∼ 5 deg, respectively (Mooley et al. 2018a;Coughlin et al. 2019b;Ghirlanda et al. 2019;Ascenzi et al. 2021), and refined the H 0 measurement to H 0 = 70.3 +5.3 −5.0 km/s/Mpc (Hotokezaka et al. 2019).
Further afterglow counterparts could drastically improve H 0 measurements and play a leading role in multimessenger cosmology. However, they are faint and difficult to detect for distant or very inclined binaries and should therefore prove rare in the future (Saleem et al. 2018; Got-  Duque et al. 2019). Furthermore, using electromagnetic measurements of ι entails selection effects that, if not correctly understood, can significantly bias the subsequent measurement of H 0 and must be carefully taken into account (Chen 2020).
In this prospective study, we consider both the likeliness of the afterglow to be observed and its capability to constrain ι to realistically quantify its benefit for the measurement of the Hubble constant with future multimessenger events. We will assume that the peculiar motion of the galaxies is accurately known, as their mismatch could introduce a systematic bias.
Afterglow-derived information on ι is much more precise than that from GW data only. Nonetheless, we find that, for advanced GW detectors in their O3 and O4+ configurations, detecting afterglows should be so rare that, statistically, afterglows will not accelerate the narrowing-down of H 0 . This paper is organized as follows. In Sect. 2 we describe the possible multimessenger observation scenarios of a GW event with electromagnetic counterparts following which a measurement of H 0 can be made. In Sect. 3 we evaluate and quantify the likeliness of the different scenarios by combining models for the emission of GWs and electromagnetic radiation from BNSs, models for the detection of these signals, and population models for the sources. In Sect. 4, we briefly describe the prospects of joint GW and electromagnetic detections in these different scenarios. In Sect. 5, we evaluate the potential of electromagnetic counterparts in contributing to the measurement of H 0 by combining their capability to measure the inclination angle ι with their likelihood of being observed in the scenarios described above.

Inferring H 0 with gravitational waves and electromagnetic counterparts
In this section, we review the possible multimessenger observation scenarios expected after a GW event from a BNS merger. Until the advent of deep radio surveys such as the Square Kilometer Array (Dewdney et al. 2009), the kilonova will continue to be the only electromagnetic counterpart to GW events that can lead to an identification of the system host galaxy and redshift. For a kilonova to be detected during follow-up of a GW event, it must be in reach of the follow-up telescopes in terms of both magnitude and sky position, and the source must be localized sufficiently well to be discovered in time before it fades. Magnitude and localization are, however, not the only conditions for detection. As recent searches for kilonova signals both during GW follow-up campaigns (Kasliwal et al. 2020) and in archival data (Andreoni et al. 2020) have shown, difficulties can arise in recognizing kilonovae among a myriad of optical transients, even with quality spectroscopic or color evolution observations. Detection of the kilonova counterpart and thereby acquiring the system's redshift through its host galaxy is the minimal scenario required for a multimessenger measurement of H 0 . We refer to this scenario as "Level 1." It is subject to detection criteria concerning the GW and kilonova signals, described in Sect. 3.
In this scenario, the information on d L is provided solely by the GW data, without any contribution from electro-magnetic counterparts. Indeed, any direct distance information from an electromagnetic counterpart would require using the cosmic distance ladder, which is of course excluded in the perspective of measuring H 0 .
In principle, the kilonova signal could indirectly contribute through the measurement of ι from color-evolution considerations (Kashyap et al. 2019;Dhawan et al. 2020). However, currently, these signals crucially lack modeling and observing history. Inclination angle measurements from kilonova data are very model dependent and lack robustness (Doctor 2020;Heinzel et al. 2021). Only when the kilonova sample grows will the potential impact of kilonova-derived angle constraints be appreciated. Therefore, we excluded any contribution to ι from kilonova data.
As expected from the observation of short gamma-ray bursts and evidenced in the case of GW170817, relativistic jets are launched from BNS mergers (Mooley et al. 2018a;Ghirlanda et al. 2019). The interaction of this jet with the circum-merger environment leads to long-lived 1 nonthermal emission in the radio to the X-ray bands: the afterglow counterpart.
The afterglow photometry can provide an independent measurement of inclination angle. In such measurements from off-axis jet afterglow data fitting, there is unavoidable degeneracy with the jet opening angle (Nakar & Piran 2021). However, when combined with prior estimates for short gamma-ray burst jet opening angles (see references in Sect. 3.2.3), afterglow data can still lead to inclination angle information. It can therefore indirectly inform on d L by breaking the d L /ι degeneracy in the GW data. We refer to the scenario where Level 1 is realized and a ι measurement from the jet afterglow photometry is made as "Level 2." It is subject to the realization of the GW and kilonova detection criteria and to detection criteria on the jet afterglow light curve, which we detail in the next section.
Additionally, the relativistic nature of the jetted outflow can be revealed by VLBI observations (Mooley et al. 2018a;Ghirlanda et al. 2019) that evidence an apparent superluminal motion of the jet head. Detecting this centroid displacement is possible for events that are particularly close or bright or under specific inclination angle conditions (Duque et al. 2019;Dobie et al. 2020). Doing so further constrains ι and narrows down the measurement of H 0 , as shown in the case of GW170817. We refer to the scenario where such a constraint on ι can be extracted from afterglow VLBI imaging, in addition to those of afterglow photometry, as "Level 3." This final level of ι constraint is the most informative on ι and H 0 , but also the most difficult to obtain.
Other electromagnetic counterparts could potentially provide further independent measurements of ι. These are the short gamma-ray burst and the rebrightening in the source's multiwavelength signal due to the emergence of emission from the front shock of the decelerating mildly relativistic ejecta responsible for the kilonova emission, called the "kilonova afterglow" (Hotokezaka et al. 2018;Kathirgamaraju et al. 2019). However, the short gamma-ray burst should prove extremely rare in future events Beniamini et al. 2019) and there is a lack of robust modeling for gamma-ray signals, especially for very inclined events. Furthermore, the singularity of GRB170817A with respect to other short gamma-ray bursts has cast more uncertainty on the modeling of gamma-ray emission from BNS mergers (Kasliwal et al. 2017;Nakar et al. 2018). Modeling of the kilonova afterglow is still uncertain, rendering any angle measurement difficult (Duque et al. 2020). To sum up, we consider neither the gamma-ray burst nor the kilonova afterglow as viable to measure ι and thus H 0 .

Evaluation of detection probabilities and selection effects
Here, we evaluate the likelihoods of the different observing scenarios to occur, by computing the probabilities of detection of the GW and electromagnetic signals.

Gravitational-wave detection probability
In GW searches an event is detected if its detected signalto-noise ratio (S/N)ρ det exceeds a certain threshold. The detected S/N is a measure of the power of the GW signal registered at the GW detectors. The higher the S/N, the more chance there is to recognize the signal from detector noise. It is important to note thatρ det differs from the optimal-filter S/N ρ opt that is calculated taking into account the average sensitivity of the detector network. Indeed, noise fluctuations are not included in optimal-filter S/N. Also, due to the same noise fluctuations, the GW event will be detected with valuesd L and cosι that differ from the true d L and cos ι. The detectedd L , cosι, and the correspondingρ det can be sampled from the GW likelihood model. In this article, we used the Cutler and Flanagan (CF) approximant for the GW likelihood (Cutler & Flanagan 1994;Poisson & Will 1995;Chassande-Mottin et al. 2019). This approximant can reproduce the GW likelihood in the cosι andd L space under the assumption that the chirp mass of the signal is well estimated, as is always the case for BNS detections (Cutler & Flanagan 1994). By sampling from the CF approximate, we can obtain a value of the detected cosι andd L , which can then be used to compute the detected S/Nρ det with the following (Cutler & Flanagan 1994;Chassande-Mottin et al. 2019): where χ + ≡ (1 + cos 2ι )/2, while d , σ d are variables that depend on the detector network and the sky-position of the GW source andψ is the GW polarization angle. The variable ρ fo is the optimal-filter S/N that the binary would have had if it had been face-on, with S n,aver (f ) the harmonic mean of the noise power spectrum densities (PSD) of the interferometric detectors composing the network, G and c universal gravitational and speed-of-light constants and M c the system's chirp mass in the detector frame. As per the low redshift range of the systems in this study (see below), we assimilate the chirp mass and last stable orbit frequency f LSO in the detector frame to their values in the source frame. The integral low boundary f low = 20 Hz is set to the low-frequency cut-off for ground-based GW detectors.
To evaluate the GW detection probability, we perform a Monte Carlo simulation with synthetic signals. We divide the cos ι range into 200 bins and we simulate 10000 BNS merging at fixed d L and uniformly distributed over the celestial sphere. The BNS masses are generated from a Gaussian distribution with a mean of 1.35M and a standard deviation of 0.15M (Farrow et al. 2019). For each binary, we then draw a detectedd L and cosι from the CF likelihood and calculateρ det following Eqs. 1-2. We then count the binaries with S/N exceeding the online matchfiltering threshold for detection of 14 2 and compute the detection probability as the fraction of BNS events detected. The previous procedure is repeated in d L with a step of 1 Mpc until only 1 of the 10000 simulated systems is detected. Following Chen et al. (2021a), we define the 0.02% "response distance" d r 0.2% at which 0.2% of the simulated binaries (with isotropic distribution in the sky and orientation) will be detected by the network. We verified by running the simulation several times that the GW detection probability estimation is not prone to Monte Carlo statistical fluctuations.
For this paper, we use three PSDs for the HLV network, composed of the LIGO Hanford, LIGO Livingston, and Virgo detectors. The first PSD is indicative of the detectors' sensitivity during O2 (Abbott et al. 2017b), the second is indicative of the sensitivity reached during the first 3 months of O3 (Abbott et al. 2020b), while the third one is a projection for the O4 run sensitivity (Biwer et al. 2019). Fig. 1 shows the GW detection probability as a function of the BNS d L and cos ι marginalized over the GW polarization angle and sky-position. The systems we consider have redshifts ≤ 0.06 for both Planck (Planck Collaboration et al. 2020) and SH0ES (Riess et al. 2019) values of H 0 . Because of the low redshift range of our simulation, the shape of the GW detection probability functions are not affected by the nonlinear d L -z relationship and are therefore the same for all the PSD hypotheses. As it can be seen from the figure, for all three GW sensitivity levels, face-on binaries are easier to detect as the GW emission is stronger perpendicularly to the orbital plane, and they can be observed at higher luminosity distances.

Level 1 observing scenario
In the perspective of upcoming high-cadence and largefield-of-view optical facilities such as the Zwicky Transient Facility (ZTF, Bellm et al. 2019) or Vera C. Rubin Observatory (LSST, Ivezic et al. 2008), we assume the detection of the kilonova is limited only by the kilonova magnitude and sky-position, and not by the size of the GWprovided skymap that the follow-up network must cover in its searches. Indeed, we consider that these survey facilities can cover all the sky available to them within the first

Notation
Description Peak absolute magnitude of a face-on kilonova in the g band Viewing angle after which the kilonova magnitude varies no more with viewing angle, in both r and g bands ∆G = 7 magnitude contrast between a face-on and a θ max,KN v -angle view of a kilonova in the g band ∆R = 4 Same, in the r band p night = 0.52 Fraction of the entire sky accessible to a ∼ 33 deg-latitude followup instrument in a single night r lim , g lim = 21 Limiting magnitude of the optical follow-up instruments F lim = 15 µJy Flux sensitivity of the radio follow-up instruments θ j = 0.1 rad Half-opening angle of the jet launched by the merger ∆θ VLBI = 2 mas Angular resolution of the VLBI array nights of the search, that is, before the estimated time for significant dimming of kilonova signals. While this can be considered a "best-case" assumption, this level of performance was reached during the campaigns following GW events in the recent O3 run of the LIGO-Virgo Collaboration by, for example, the GROWTH collaboration (Kasliwal et al. 2020). In particular, for the only confirmed BNS event of the O3 run GW190425 (Coughlin et al. 2019a), the ZTF covered the ∼ 8000 deg 2 of the skymap overlapping with their night sky.
Similarly to the evaluation of GW detection probability, we place systems uniformly in the sky and uniformly in cos ι and d L out to 600 Mpc. This maximum distance ensures that a fraction of less than 10 −4 of kilonovae were detected in the furthest distance bin. Our model for the inclination-angle-dependent peak magnitude of the kilonova signal is an empirical fit to model "W2" in Wollaeger et al. (2018) (their Fig. 19). It appears that the peak magnitudes are approximately linear in cos ι up to a maximum angle θ max,KN v ∼ 60 deg after which the dependence on angle strongly decreases. We therefore set the peak absolute magnitude of these events as: with numerical values reported in Table 1. We note that a similar dependence of kilonova magnitudes on ι was considered in Villar et al. (2017) (their "asymmetric model") and found in Kawaguchi et al. (2020). We select the detectable events as those satisfying the magnitude threshold criterion g < g lim or r < r lim , with lowercase letters denoting the apparent magnitudes.
Among all the events with large enough flux, follow-up can only detect those in its accessible sky. For observatories at latitudes of ∼ 33 deg such as the ZTF (northern hemisphere) and the LSST (southern hemisphere), this represents a season-averaged fraction p night ∼ 52% of the whole sky (Bellm 2016). Thus, we deem detected the remaining events with a coin-toss with probability p night = 0.52.
Forming the ratio with the original number of events allows us to evaluate the detection probability p z det , the probability of acquiring an event redshift as a function of its luminosity distance and inclination angle. This is plotted in Fig. 2. Here, the detection probabilities were normalized to p night . As the p det are even in ι, we represent only the cos ι > 0 range. Following what was developed in Sect. 2 and given the independence of the GW and kilonova detection processes, we define the probability of the Level 1 scenario occurring as p L1 = p GW det × p z det .

Level 2 observing scenario
For events where emission from the core jet dominates the afterglow radiation, the afterglow light curves are expected to display a single peak occurring when the jet has decelerated to a Lorentz factor of Γ ∼ 1/ι (e.g., Gill & Granot 2018;Mooley et al. 2018b, but see Beniamini et al. 2020 for a study of multipeaked afterglows). This peak can occur up to hundreds of days after the merger (Duque et al. 2019).
Assuming instrument availability and long-term follow-up efforts, detecting the afterglow is simply a matter of flux sensitivity, once the source's position is settled by the kilonova. However, detecting the afterglow at its peak does not suffice to make an inclination angle measurement. This requires an extended and well-sampled light curve on which to fit afterglow models, as was extensively done for GW170817 (e.g., Lamb & Kobayashi 2017;Resmi et al. 2018;Lazzati et al. 2018;Troja et al. 2019). We therefore define the criterion for ι measurement with the afterglow light curve as: where F p is the peak flux of the afterglow light curve, and F lim is the limiting sensitivity of the follow-up facility. In this study, we consider the 3 GHz band and the Very Large Array (VLA) as the limiting radio facility, with F lim = 15 µJy. In addition to ι and d L , the peak flux F p of every event depends on the jet's kinetic energy, the particle density in the surrounding medium, and on the microphysical parameters of the front shock formed by the decelerating jet, such as the spectral index of the shock-accelerated population of electrons, denoted p. The analytical form for F p as a function of these parameters that we use can be found in Nakar et al. (2002). There is some uncertainty in the distributions of these parameters in the population of jets from BNS mergers, in particular for the jet kinetic energy. To establish p z+AG det , the probability of making the inclination measurement allowed by the afterglow photometric data in addition to acquiring the redshift, we use the same distribution of parameters as the population model of Duque et al. (2019). In particular, we use two distinct hypotheses regarding the distribution of the jets' kinetic energies. These are labeled G16 and WP15 in the sequel and are derived by assuming a constant conversion factor between the luminosity of a short gamma-ray burst and the post-burst jet kinetic energy, and starting from the short gamma-ray bursts luminosity functions found by Ghirlanda et al. (2016) and Wanderman & Piran (2015), respectively. Among published short gamma-ray burst luminosity functions, these two represent extremes in the steepness of the luminosity function, with G16 predicting many more bright bursts (and therefore many more energetic afterglows) than WP15.
By applying this population model to those events selected to establish p z det and applying the detection criterion in Eq. 4, we calculate p z+AG det . It is plotted in Fig. 2. We define the likelihood that a Level 2 scenarios occurs as p L2 = p GW det × p z+AG det . As expected, the range of parameter space allowed by this scenario is much smaller than for a kilonova-only event. In particular, the distant or inclined events are largely cut off because F p ∝ ι −4.4 /d 2 L (for a shock-accelerated population spectral index of p = 2.2, Nakar et al. 2002), whereas r, g ∼ 1 − cos ι + log d L .
We note that GW170817 does not exactly qualify for our Level 2 scenario. Indeed, it had log F p /F lim ∼ 0.9 < 1 (Mooley et al. 2018b). It seems that much of the uncertainty in the measurement of ι with GW170817 is held in the very early phases of the afterglow, where the fitting models most diverge (Ghirlanda et al. 2019). Had the afterglow been brighter-at the level of our Level 2 scenario-and these earlier points observed, a better measurement of ι would have certainly ensued. Nonetheless, as we detail in Sect. 5, we consider the case of GW170817 as representative of the ι measurements possible in the Level 2 scenario.

Level 3 observing scenario
The Level 3 scenario occurs if the motion of the jet can be measured by VLBI imaging, in addition to the measurements of afterglow photometry. Once again this assumes constant instrument availability and long-term follow-up. We estimate the angular displacement of the jet centroid as: where δt VLBI is the total time the afterglow remains detectable by the radiotelescope array, that is, the time its flux is above F lim 3 , and where dθ/dt| max is the proper motion of the remnant at the time of the afterglow peak, that is when Γ × ι ∼ 1. At this time, it is straightforward to estimate the remnant's proper motion as: with β app = β sin ι 1−β cos ι the apparent velocity of the remnant, β = 1 − 1/Γ 2 p and Γ p = 1/ι the jet head Lorentz factor at the afterglow peak. By considering the source's proper motion to be that at afterglow peak during the entire followup, Eq. 6 in fact over-estimates δθ r . Our Level 3 events are therefore likely treated in an optimistic manner, as we discuss in Sect. 6.
Also, Eq. 6 is not valid if the observer is within the jet's opening, when ι < θ j with θ j the half-opening angle of the jet. In this case, no jet displacement is observed, δθ r = 0. We consider θ j = 0.1 ∼ 6 deg, in line with measurements on GW170817 (Gill & Granot 2018;van Eerten et al. 2018;Mooley et al. 2018b) and other short gamma-ray burst studies (Fong et al. 2015;Beniamini et al. 2019).
The afterglow flux will remain in reach of the radio network for a duration that depends on the details of its light curve and therefore on, for example, the jet structure, its expansion dynamics, and the surrounding medium density profile. To simplify, we assume all jets launched from mergers have the same structure as GW170817 and all mergers occur in a rarefied medium with a constantdensity profile as suggested by short gamma-ray burst observations. In this case, the slopes of the increasing and decreasing phases of the light curve are the same as for GW170817's afterglow, regardless of inclination angle (Beniamini et al. 2020). Therefore, we empirically modeled the afterglow light curves as a broken power-law with slopes +0.80 and −2.2 (Mooley et al. 2018b) respectively before and after a peak occurring at a time T p and a flux F p . The time T p depends on the same parameters as F p , and relevant equations can be found in Nakar et al. (2002). The time the signal is above the radio threshold can thus be analytically estimated, and thereby the total source displacement. Then, detection of this displacement is simply conditioned by the VLBI array angular resolution: δθ r > ∆θ VLBI ∼ 2 mas (Ghirlanda et al. 2019).
As for the other electromagnetic counterparts, we determine the probability of detecting the remnant proper motion and making the corresponding angle measurement in addition to the other measurements denoted p z+AG+VLBI det and define p L3 = p GW det × p z+AG+VLBI det . This is plotted in Fig. 2, where one can see further suppression of events.

Selection effects in multimessenger cosmology
In Appendix A, we make estimates of the selection effects impacting the measurements of H 0 with GW and electromagnetic data.
In the circumstances of measuring H 0 with data from GW170817, that is, a Level 3 scenario during the O2 run, the selection effects are less than 2.0% over the 60-80 km/s/Mpc range, according to below the 14% precision claimed by studies making this measurement (Hotokezaka et al. 2019). We conclude that no selection effects significantly impacted the measurement of H 0 with the GW170817 data.
However, the selection effect is not negligible compared to the 4% precision required to resolve the H 0 tension (Freedman 2017;Feeney et al. 2018), particularly with events detected in O3-and O4-type runs, where selection effects reach 2% and 4% in the 60-80 km/s/Mpc range, respectively. Then, careful consideration of selection effects becomes necessary.

Prospects of joint electromagnetic and gravitational-wave detections
In this section, we use the detection probability models to generate a population of events detected jointly in GW and the different electromagnetic counterparts. This allows us to study the dominant effects in the multimessenger detection process.

Simulation description
To evaluate the multimessenger capability to measure H 0 , we simulate BNS events in a Universe described by flat ΛCDM cosmology and with H 0 = 70 km/s/Mpc and Ω m = 0.308. We generate 80000 BNS mergers uniformly distributed in the sky up to a luminosity distance of ∼ 1.5d r 0.2% for a given detector network with BNS 0.2% response distance d r 0.2% (see Fig.1). We assume a system formation rate uniform in redshift and the BNS masses are generated from a Gaussian distribution with a mean of 1.35M and standard deviation 0.15M (Farrow et al. 2019), cos ι is generated uniform on the unitary sphere and the GW polarization angle is distributed uniformly in the range ∈ [0, π].
For each BNS merger we calculate the measured GW S/N as described in Sect. 3.1 by drawing a measured d L and cos ι from the CF approximation. If the GW detected S/N exceeds a threshold of 14, then we assume that the event has been detected by the HLV network. Each GW event detected is passed to the electromagnetic follow-up chain described in Sect. 3.2. We use the different p EM det as described previously to decide which electromagnetic counterpart is detected according to the value of d L and cos ι. At the end of the simulation, each detected GW event is associated with a flag describing the corresponding scenario: either not detected in the electromagnetic domain or detected with Levels 1, 2, or 3. We repeat this process for O2-, O3-and O4-type GW sensitivities and for the G16 and WP15 electromagnetic counterpart population models, for a total of six GW and population model combinations.

Rates of electromagnetic counterpart detections
In Table 2, we show the fraction of GW detections with different electromagnetic information levels predicted for O2-, O3-and O4-like runs. As the population models G16 and WP15 are extremes in terms of afterglow luminosities, one can consider the figures in Table 2 as confidence interval bounds for the corresponding fractions 4 .
In an O2-type run, 52% of the GW events are expected to have a detectable kilonova counterpart. Since we assumed the optical instruments can cover a fraction p night = 52% of the sky, this means that the kilonova magnitude was not a limiting effect for multimessenger events during O2. In other words, the GW detection probability dominates the multimessenger detection probability, as was predictable from Figs. 1 and 2.
Conversely, the fraction of events expected to have been detected in O2 with Levels 2 and 3 are between 4% and 12% and between 1% and 7%, respectively. This is from 10% to 25% and from 2% to 15% of kilonovae with detectable afterglow light curve and remnant proper motion, respectively. Therefore, as of O2, the sample of events with afterglow counterpart was limited by selection. As it can be seen from Figs. 1 and 2, this is mostly because electromagnetic signals are much more sensitive to binary inclination than GW signals.
GW170817 was a Level 3 event during O2 (Abbott et al. 2017c;Hotokezaka et al. 2019). In light of our study, the probability of this to occur was between 1% and 7%, and GW170817 was a very lucky event.
In O3-type runs, the kilonova counterparts are still largely in reach of follow-up instruments. However, the probability of measuring cos ι with afterglow counterparts is strongly reduced compared to an O2-type run, meaning the multimessenger detection process is strongly dominated by electromagnetic selection. Nonetheless, there is a nonnegligible probability of an afterglow counterpart to a GW event. In the first 6 months of O3, GW190425 (Abbott et al. 2020a) was the only plausible BNS event observed. For this event, there was no electromagnetic counterpart reported by large field-of-view facilities (Coughlin et al. 2020b). Let us note that, as opposed to the events considered in this 4 We note that the results on afterglow counterpart fractions presented in this section are consistent with the predictions of Duque et al. (2019), though the treatment of the GW and jet motion in the present study is more refined. simulation, GW190425 was detected by only two detectors (LIGO Livingston and Virgo), thus producing a large skylocalization area covering 10,200 deg 2 .
From our simulations, we estimate that maintaining an O2-like fraction of radio-detectable afterglows during an O3-like GW run would require a factor of five increase in the radio sensitivity. That is, a F lim on the order of 3 µJy, as projected for the Square Kilometer Array 1 "Mid" band (Dewdney et al. 2009).
In an O4-like run, GW detectors detect a large fraction of BNS mergers at higher distances. In this case, the effects of magnitude limitations start to kick in, with only half of the kilonovae with sufficient flux. Taking into account sky-position limitations, this number is one out of 4. In O4-type runs, it should prove extremely rare to obtain an electromagnetic cos ι measurement (Level 2 and Level 3).
The different selection biases introduced by the GW and electromagnetic detections processes when going from an O2-like to an O4-like run are depicted in Figs. 3-4. The two figures show the distributions in d L , cos ι, and optimalfilter S/N detected in different levels. In an O2-like run, the GW and electromagnetic detections roughly probe the same BNS mergers, with the exception of some regions in cos ι that forbid a detection by VLBI. On the other hand, in an O4-like run, the electromagnetic and GW detection clearly corresponds to a different subpopulation of sources. In fact, the electromagnetic facilities can observe only closeby events-which thus have a high GW S/N-corresponding to a very small fraction of GW-detected events.
In Table 3 we report the average number of years of observation required to detect BNS mergers with different levels of EM counterparts. We assume a BNS merger rate of R 0 = 320 +490 −240 Gpc −3 yr −1 as estimated after the first six months of O3 (The LIGO Scientific Collaboration et al. 2020). We note that this is overestimated as this is based on a conservative choice for the detection threshold. For this simulation we selected confident BNS detections with S/N > 14 (∼ 8 in each detector) while current GW detectors can detect BNS with S/N ∼ 8 (Abbott et al. 2020b). Choosing a detection threshold of S/N ∼ 8 corresponds to a surveyed volume ∼ 6 times larger, and thus an observation time to detection reduced by the same factor. Table 2 reports the relative fractions of events detected with different levels of EM emission. From the results in Table 3 we conclude that the probability of detecting a BNS merger during a one-year O2-like run is small, and especially so for a loud event such as GW170817.

Method
We now set to quantify the benefit of electromagneticprovided information in measuring H 0 . We use the statistical framework described in Appendix A with the posterior as in Eq. A.1. The likelihood p(d|H 0 ) encodes the statistical uncertainties of the GW and electromagnetic data x GW/EM measurements: p(d|H 0 ) = dz d cos ι p GW (x GW |d L (H 0 , z), cos ι) × p EM (x EM | cos ι, z)p pop (z, cos ι|H 0 ), (7) Article number, page 7 of 15 A&A proofs: manuscript no. main  Table 3. Average observation time (years) needed to detect one BNS merger with different EM counterpart levels. Observation times are computed assuming R0 = 320 +490 −240 Gpc −3 yr −1 , the best estimate after the first half of the O3 run. We assume a network S/N of at least 14 for GW detection; relaxing to a threshold of 8 would decrease these waiting times by a factor of ∼ 6 (see text for details).

Electromagnetic information level GW Run
Level 1  where p pop (z, cos ι|H 0 ) is the assumed population distribution in redshift and inclination for the entire BNS population.
The function p GW (x GW |d L (H 0 , z), cos ι) is the GW likelihood, which provides the GW-detected source parameters distribution for given true parameters. We take the CF ap- proximation for consistency with our earlier computation of selection effects. Likewise, p EM (x EM | cos ι, z) is the likelihood for electromagnetic measurement of source parameters cos ι and z. This is informative on z only for a Level 1 scenario and on both z and cos ι for higher-level scenarios. It is not informative on d L . We decompose this electromagnetic likelihood as p EM (x EM |z)p EM (x EM | cos ι) by supposing the electromagnetic measurements of redshift and angle are independent. This is reasonable as the redshift information is deduced from the host galaxy alone while the ι information is expected to be provided by the jet itself.
We assume the redshift measurement is unbiased and set p EM (x EM |z) to a Gaussian distribution centered on the true event redshift with standard deviation 5·10 −4 for all our scenarios. This is the same accuracy measured for GW170817's redshift (Abbott et al. 2017a). For Level 1 scenarios, the electromagnetic counterpart is uninformative on cos ι and we set p EM (x EM | cos ι) to a flat function.
For Level 2 scenarios, we assumed that one can obtain an unbiased ι constraint at the level of that deduced from GW170817's afterglow light curve alone. That is, a Gaussian constraint with a 12 deg 1-σ uncertainty (Troja et al. 2019;Hajela et al. 2019) for p EM (x EM | cos ι).
For Level 3 scenarios, we based our predicted constraints on those of GW170817 and set p EM (x EM | cos ι) to an unbiased Gaussian constraint with a width of 4 deg (Mooley et al. 2018a;Ghirlanda et al. 2019;Hotokezaka et al. 2019) for all the events. We discuss the validity and impacts of these assumptions in Sect. 6.
Using Eqs. A.1, A.3 and 7, we simulated the H 0 measurement process of 500 binary systems in all three observing scenarios and all three GW sensitivity hypotheses. For the individual measurements, we systematically assumed a prior on H 0 uniform in [40,120] km/s/Mpc .
We then combined the measurements of the first 100 events to emulate a thread of multimessenger events. We repeated the combining step after reordering the 500 events to reproduce different possible time orderings of the events. This allowed us to study the reconstruction of H 0 by the multimessenger measurements, and in particular the speed of convergence.

Bulk comparison of observing scenarios
Breaking the d l − cos ι degeneracy is fundamental for measuring H 0 . Let us show this by inferring H 0 using only Level 2 or 3 scenarios in comparison with Level 1 scenarios. Fig. 5 shows the H 0 posteriors obtained by combining 10 BNS events in different observing scenarios. From the plot, we can see that, when the knowledge of cos ι is refined by electromagnetic observations, the estimation of H 0 improves, as also noted in Chen et al. (2019). We can also observe that the H 0 posterior reaches Gaussian convergence after under 10 events.
The bottom panel of Fig. 6 shows the relative uncertainty ∆H 0 /H 0 with 1 − σ confidence intervals for the H 0 estimation as more events are detected. The uncertainty corresponds to the different population realizations of the detected events. In Fig. 6, one can read that a single Level 3 event during an O2-type run results on average in the uncertainty of 14% on the estimation of the Hubble constant, as observed for GW170817 (Hotokezaka et al. 2019).
It is clear that (i) the precision on H 0 improves as more events are combined, (ii) the convergence is faster when cos ι is more constrained from the electromagnetic emission, (iii) above about ten events (Chen et al. 2018), the combined H 0 posterior becomes Gaussian, with ∆H 0 /H 0 ∼ Θ/ √ N where N is the number of events, allowing us to define Θ as an effective single-event H 0 estimation standard deviation.
In the top panel of Fig. 6, we show the expected number of years of continuous observing required to detect the number of events read on the bottom horizontal axis. We assume an O2-like sensitivity 5 and the detection rates reported in Table 3. From the bottom panel of Fig. 6, the number of Level 2 events required to resolve the H 0 tension assuming an O2-like sensitivity is ∼ 25. According to the top panel, one would need 100 to 600 years of observation to collect these events, depending on the population model.
Counterparts to GW events detected with different GW sensitivities probe different regions of the distance and inclination parameter space (compare Figs. 3 and 4). Therefore, one cannot deduce the number of events or observing time required to resolve the H 0 tension for an O4-type run from those for an O2-type run by simply comparing the GW detection rates. In the next section, we derive the number of events required to resolve the H 0 tension assuming a longlived O4-type run.
In Fig. 7 we show the values of the average effective single-event standard deviation Θ of the different scenarios, as fit on the curves in Fig 6. There is a clear boost in the H 0 convergence speed when considering the information in cos ι from the electromagnetic counterparts. We find that a cos ι precision of 4 deg (Level 3) provides a convergence that is 1.4 times faster than one of 12 deg (Level 2), which is itself about 1.5 times faster when there is no angle information at all from the electromagnetic domain. Roughly, it means that the H 0 accuracy reached combining 10 Level 2 events is equivalent to that reached by combining 5 Level 3 events.
We also find that detectors with better sensitivities will be able to better constrain H 0 , even without cos ι measurements from electromagnetic counterpart. This is due to the higher redshifts of the events used to infer H 0 , as we consider a constant uncertainty in redshift measurements. Considering events without electromagnetic contributions to cos ι for O4-type runs, we find an average effective singleevent standard deviation of 14%, consistent with previous simulations (Chen et al. 2018;Gray et al. 2020).
We note that for a given GW sensitivity, there is no difference in this first approach between G16 and WP15 population models as these only impact the probabilities of detecting the electromagnetic counterparts. This is clear in Figs. 6 and 7. Bulk comparison of scenarios in O2-like run Fig. 7. Effective single-event standard deviation Θ in different observing scenarios, assuming all events are at a given electromagnetic information level, in an O2-type GW run.

Considering realistic detection rates
In a real observing run, not all the GW events with a redshift estimation will have cos ι measurements from electromagnetic counterparts. In a second approach, we estimate the H 0 convergence by including the relative detection rates of the different electromagnetic counterparts. More precisely, we generated threads of events as in the bulk comparison of the previous section, but successively allowing for only Level 1 events, then up to Level 2, then up to Level 3, to quantify the acceleration of H 0 convergence each EM level allows. We then simulated the H 0 measurement for each of them, and combined their measurements throughout the first 100 events. We repeat this process 500 times with varying event time orderings to simulate different realizations of the subset of 100 detections. Fig. 8 shows the H 0 convergence as a function of the total number of detections, while Fig. 9 shows the singleevent standard deviation Θ for all scenarios. According to Table 2, about 7%-23% of O2 events with associated redshift would have had Level 2 information on ι, and about 1%-13% Level 3. As seen in Fig. 8, this detection fraction is enough to somewhat improve the H 0 convergence. In other words, the convergence speed allowing for Level 2 and 3 events is larger than with kilonova events alone.
The situation changes drastically when we consider the multimessenger events in O3-and O4-type observing runs. In this case, the fraction of Level 2 or Level 3 events are so small that on average they bring no additional improvement. Indeed, the precision on H 0 obtained with a given number of GW events does not change whether we allow for afterglow counterparts or if we do not. In particular, for an O3-like run, only allowing for Level 3 events and assuming the optimistic G16 population prescription could provide a slight acceleration in the H 0 narrowing-down, while for all the other cases the improvement is negligible. In O4-type runs, neither Level 2 or Level 3 events should statistically speed up the convergence of H 0 .
During an O4-type run, Level 2 or 3 events are too rare to significantly improve the H 0 convergence and shorten the time needed resolve the H 0 tension. Relying on Level 1 events only is just as fast. In Sect. 6, we argue that, in this case, discarding measurements of the inclination angles from afterglows prevent further biases in the H 0 estimate.
For O4-like runs, Table 3 shows that one Level 1 events are detected every 6 months on average. Fig. 8 shows that, to resolve the H 0 tension problem, 30 such events are required. Thus, with an O4-like sensitivity, 15 years of data taking are necessary to collect the number of detections with measured redshift.

Discussion
In this article, we studied the prospects of measuring the Hubble constant with GW standard sirens coupled to inclination angle measurements from merger afterglow counterparts. We first studied the potential impact of selection effects in multimessenger cosmology and showed that these were negligible in the H 0 measurement reported after GW170817. We illustrated how selection effects increase with the GW sensitivity. For events in future GW observing runs, selection effects will be important and should be taken into account.
We then studied the likelihood of detecting the electromagnetic counterparts required for multimessenger measurements of H 0 . We showed that for O2-and O3-type GW sensitivities, their optical magnitudes are not limiting in the detection of kilonova signals, and therefore in acquiring source redshifts. This is true provided GW interferometers are operating and follow-up facilities available and efficient.
We showed that, in O4-like runs and beyond, the GW detection probability will extend further in distance compared to the electromagnetic detection probabilities, largely    decreasing the likelihood of detecting electromagnetic counterparts.This will especially be the case when secondgeneration GW interferometers reach their design sensitivity level. Fortunately, the lower number of observable counterparts should be counter-balanced by the advent of large field-of-view, high-cadence optical instruments such as the ZTF and LSST, which are not limited in terms of skymap coverage (but only by their limiting magnitude). Also, the advent of additional GW interferometers such as KAGRA (Kagra Collaboration et al. 2019) will largely improve the median GW localization skymap down to ∼ 40 deg 2 (Abbott et al. 2020b), paving the way to more effective follow-up by smaller field-of-view instruments and therefore better-sampled kilonova light curves and better leveraging of events.
In the future, third-generation GW detectors such as the Einstein Telescope (ET, Punturo et al. 2014) and Cosmic Explorer (Reitze et al. 2019) will open up the detection range of even larger redshifts. Naturally, the low-redshift events will still be accessible, with a much better sky resolution that can be covered entirely by small field-of-view facilities. For larger redshifts, follow-up observations will be limited by the sky localization, with an average resolution of 200 deg 2 for BNS at z = 0.1 with ET. This coverage issue combined with the unprecedented dimness of the counterparts at these redshifts may call for totally different follow-up strategies for the high-cadence large field-of-view survey facilities. Supposing the source is identified, photometric and spectroscopic follow-up would still be limited to z = 0.5 and z = 0.3, respectively, for the largest optical telescopes such as the Extremely Large Telescope (Maggiore et al. 2020). Therefore, the access to those events at cosmic redshifts should rely on the observation of counterparts other than kilonova, such as the short gamma-ray burst, the detection of which could be facilitated by the GW signal being present in the ET band hours before merger.
We finally studied whether the observation of merger afterglow signals and subsequent measurements of cos ι will significantly accelerate the narrowing-down of H 0 when combined with GW detections in the future. We considered only the afterglow signal-its photometry and imaging-as potential providers of ι measurements. We deemed the other counterparts such as the kilonova and gamma-ray burst unfit for such a measurement, for their still large modeling uncertainties.
We considered an optimistic measurement model in which all events with an afterglow counterpart contribute a ι measurement with an accuracy comparable to GW1710817, for both afterglow photometry and imaging. This is an optimistic assumption as the uncertainty on ι depends on the number of photometric points detected from the light curve, and thereby on the event distances and density of the follow-up. Also, only a subset of the follow-up campaigns is expected to provide such detailed multiwavelength photometric data. Taking the variability of the follow-up scope and data quality into account is a possible extension to this study.
Furthermore, our analysis assumes that cos ι estimated from EM is accurate, which is a simplifying assumption. For both the afterglow photometry and imaging analysis, the leading uncertainty in the electromagnetic modeling is the treatment of the jet lateral expansion (Ghirlanda et al. 2019). The jet expansion affects both the time of afterglow peak flux (e.g., Duque et al. 2019) and the dynamics of the VLBI image (Fernández et al. 2021), possibly biasing and widening the electromagnetic posterior on ι more than assumed here.
From the point of view of observations, most of the indeterminations in measuring ι from GW170817's afterglow lie in the very early phases of the light curve and the late phases of the source displacement curve (see the posterior sampling in Ghirlanda et al. 2019). Acquiring early photometric data with deeper searches would have provided a better estimate of ι from this source. Recently, radio points were acquired from GW170817 to flux levels much deeper than the early radio monitoring of GW170817 (Balasubramanian et al. 2021), proving that this is in reach of current radio facilities. We thus advocate for such early deep searches.
Also, we assumed perfect kilonova detection and identification above a magnitude threshold over the accessible sky, and our expression of the jet proper motion in Eq. 6 is clearly an overestimate, leading to overpredict the number of sources with detectable proper displacement.
Finally, the effect of peculiar velocities can bias the estimation of H 0 . Galaxies' peculiar motions can be as high as v p = 300 km/s, with an associated error of their measurement of the same order (Mukherjee et al. 2021a). This corresponds to a redshift correction (and additional uncertainty on H 0 ) of 10% for events such as GW170817, and it is fundamental to take into account. With future GW detectors, whici will detect BNSs up to redshift ∼ 0.1, this type of correction will generally be negligible though it will remain important in the case of close-by, high-S/N events.
For all these reasons, our hypotheses are optimistic. Even so, we have found that, for all GW runs after O3, events with afterglow counterparts should prove so rare that, statistically, using the afterglow counterparts when available for the multimessenger measurement of H 0 on individual events will not bring any acceleration to the measurement of H 0 .
We found that, for the electromagnetic measurement of cos ι to significantly increase the H 0 convergence, the GW and electromagnetic detection probabilities should be comparable, or the understanding of electromagnetic emission from compact binary mergers should drastically improve, to the point where a degree-level precision on ι is accessible from a typical afterglow light curve. Even then, care should be taken with systematic effects in angle measurements, for example, from VLBI imaging, to not create a second tension on H 0 .
The current state of kilonova modeling does not allow for robust measurements of the inclination angle. This may change in the future as the model uncertainties will probably reduce after more signals are observed. Given the abovementioned rates of kilonova associations with GW, we estimate that kilonovae could accelerate the narrowing-down of H 0 if a ∼ 10% calibration can be reached between kilonova data (such as light curves or color evolution) and inclination angle. In this respect, the advent of wide field-of-view, highcadence optical facilities is an asset as they are expected to collect a large sample of kilonovae detected both serendipitously and as GW counterparts. With tens of well-sampled kilonova light curves, the 10% accuracy level for the light curve versus inclination relation may be within reach, especially if these sources are standardizable (Kashyap et al. 2019;Coughlin et al. 2020a,c). This perspective would truly allow multimessenger cosmology to develop.
Our results should not be misunderstood: If the opportunity of making an electromagnetic measurement of cos ι occurs, then it should obviously be made as the improvement on H 0 from such an event is important. All that we found is that, statistically, being able to make such measurements will not significantly speed up the narrowingdown of H 0 and the resolution of the Hubble tension on the long run. This is because of the rareness of electromagnetic counterparts and their still too-poor constraints on cos ι.
We proved that the electromagnetic-provided cos ι measurements will likely not drive the H 0 narrowing-down. Therefore, direct biases to H 0 through electromagnetic mismeasurements of cos ι should not be feared. However, as the detection probabilities of the electromagnetic counterparts should dominate the selection effect for GW-EM standard sirens, incorrectly modeled dependence of the kilonova signal on the inclination can lead to H 0 biases through uncontrolled selection effects, as discussed in Chen et al. (2018); Chen (2020). Correct modeling of the kilonova signal to control the selection effects in follow-up campaigns should be a point of care for future endeavors in multimessenger cosmology.
Once these selection biases are dealt with, the limiting uncertainty in multimessenger cosmology should be the GW data calibration. This uncertainty is at the level of ∼ 1% (Karki et al. 2016), below the Hubble constant tension, and therefore should not impede the resolution of the tension by multimessenger cosmology when combining a low number of events. However, a systematic effort on calibration uncertainties when combining a large number of events should be performed.
As the afterglow counterparts should not accelerate the measurement of H 0 , we can state that the number of multimessenger events necessary to resolve the H 0 tension is still that given by Chen et al. (2021b) and Mortlock et al. (2019b), that is, 20-50. This represents about fifteen years of continuous O4-level GW observation.

Conclusion
The afterglow counterparts of binary neutron star mergers represent viable means to measure the inclination angle of sources, and thereby to improve the standard-siren measurement of the Hubble constant. Afterglows could therefore play the role of narrowing down H 0 and possibly resolve the Hubble tension with fewer events than by leveraging only the gravitational-wave data and source redshift. To quantify how much faster afterglow-enhanced H 0 measurements could solve the Hubble tension, we carried out a realistic population model considering that every future afterglow counterpart could provide a constraint on the source inclination angle at the same level as GW170817. We found that, while each afterglow allows for a jump in H 0 precision, events with afterglow counterparts should prove very rare, to the point that allowing for afterglow-enhanced measurements should not statistically make any difference in the number of events required. Once models have improved, kilonova light curves could be viable for inclination angle measurements and, as these should be much more frequently acquired, kilonovae could play the leading role in multimessenger cosmology. Whether for kilonova or afterglow counterparts, one must treat selection effects with care so as to not produce yet another tension because upcoming gravitational-wave observing runs will probe distances where selections effects are important.