Binary orbits from combined astrometric and spectroscopic data

L. B. Lucy

doi:10.1051/0004-6361/201732145

Home

All issues

Volume 618 (October 2018)

A&A, 618 (2018) A100

Full HTML

Free Access

Issue		A&A Volume 618, October 2018


Article Number		A100
Number of page(s)		9
Section		Numerical methods and codes
DOI		https://doi.org/10.1051/0004-6361/201732145
Published online		18 October 2018

A&A 618, A100 (2018)

Binary orbits from combined astrometric and spectroscopic data

L. B. Lucy^*

Astrophysics Group, Blackett Laboratory, Imperial College London, Prince Consort Road, London SW7 2AZ, UK
e-mail: l.lucy@imperial.ac.uk

Received: 20 October 2017
Accepted: 1 January 2018

Abstract

An efficient Bayesian technique for estimation problems in fundamental stellar astronomy is tested on simulated data for a binary observed both astrometrically and spectroscopically. Posterior distributions are computed for the components’ masses and for the binary’s parallax. One thousand independent repetitions of the simulation demonstrate that the 1- and 2-σ credibility intervals for these fundamental quantities have close to the correct coverage fractions. In addition, the simulations allow the investigation of the statistical properties of a Bayesian goodness-of-fit criterion and of the corresponding p-value. The criterion has closely similar properties to the traditional χ² test for minimum-χ² solutions.

Key words: binaries: visual / binaries: spectroscopic / stars: fundamental parameters / methods: statistical

e-mail: l.lucy@imperial.ac.uk

^*

With profound sadness we note that Prof. Leon Lucy passed away after the final stages of acceptance of this article.

© ESO 2018

1 Introduction

In fundamental stellar astronomy, all statistical estimation problems involve mathematical models with both linear and non-linear parameters – the so-called hybrid problems. The linear parameters determine scale and location; the non-linear parameters appear as arguments in dimensionless functions of time.

The presence of linearities suggests that more efficient estimation techniques exist than when all parameters are treated as non-linear (Wright & Howard 2009; Catanzarite 2010). However, some attempts to achieve this lead to significantly underestimated error bars (Eastman et al. 2013; Lucy 2014a, hereafter L14a). This poses the challenge of developing a technique that achieves the computational efficiency allowed by linearity while still giving confidence or credibility intervals with correct coverage – so that, for example, 1-σ error bars contain the correct answer with probability 0.683. A solution to this challenge is presented in (Lucy 2014b, hereafter L14b) where a grid search in the space defined by the non-linear parameters is combined with Monte Carlo sampling of the space defined by the linear parameters. In L14b, this technique is applied to simulated observations of a visual binary and coverage experiments confirm that 1- and 2- σ error bars enclose the exact values with close to the frequencies expected for normally distributed errors.

In this paper, a significantly harder problem is posed, that of analysing a binary with both astrometric and spectroscopic data. Such data could be analysed separately, but this is sub-optimal since information concerning several orbital elements is present in both data sets. Accordingly, the aim here is to obtain the posterior distribution over the entire parameter space using both data sets and to test if the derived error bars are trustworthy, an essential requirement for fundamental data on stellar masses and luminosities.

The posed problem aims at demonstrating proof-of-concept in the treatment of hybrid problems and to provide a template for the many such problems in statistical astronomy. To this end, the code developed for this investigation is freely available.

Although here a technical exercise, the simultaneous analysis of astrometric and spectroscopic data is of practical importance in the era of adaptive optics (AO) and speckle interferometry. As emphasized by Mason et al. (1999), the ability to resolve binary stars at or near the diffraction limit results in a powerful synergy between short-period visual and long-period spectroscopic binaries, leading to stellar masses and improved mass-luminosity relations.

2 Synthetic orbits

The physical model comprises an isolated pair of stars undergoing Keplerian motion due to their mutual gravitational attraction. This binary is observed astrometrically and spectroscopically, yielding two independent data sets D_a and D_s, respectively. To analyse these data sets, the mathematical models predicting the components’ relative motion on the sky as well as their radial velocity variations are used simultaneously to derive the posterior distribution over parameter space.

2.1 Orbital elements

For the astrometric orbit, L14a is followed closely with regard both to notation and the creation of synthetic data.

The motion on the sky of the secondary relative to the primary is conventionally parameterized by the Campbell elements P, e, T, a, i, ω, Ω. Here P is the period, e is the eccentricity, T is a time of periastron passage, i is the inclination, ω is the longitude of periastron, and Ω is the position angle of the ascending node. However, from the standpoint of computational economy, many investigators – references in L14a, Sect. 2.1 – prefer the Thiele-Innes elements. Thus, the Campbell parameter vector θ = (ϕ, ϑ), where ϕ = (P, e, τ) and ϑ = (a, i, ω, Ω), is replaced by the Thiele-Innesvector (ϕ, ψ), where the components of the vector ψ are the Thiele-Innes constants A, B, F, G (Note that in the ϕ vector, T has been replaced by τ = T∕P which by definition ∈ (0, 1)).

The spectroscopic orbits of the components introduce three additional parameters, the systemic velocity γ and the semi-amplitudes K_1,2. The predicted radial velocities are then $v_{1, 2} (t) = γ + K_{1, 2} [c o s (ν + ω_{1, 2}) + e c o s ω_{1, 2}],$ $\begin{equation*} v_{1,2}(t) = \gamma + K_{1,2} \: [ {\mathrm cos}(\nu + \omega_{1,2}) + e \: {\mathrm cos} \: \omega_{1,2} ],\end{equation*}$ (1)

where ν(t) is the true anomaly, ω₂ = ω and ω₁ = ω + π.

Note that v₂ − v₁ = ż, where z, the companion’s displacement perpendicular to the sky, is given by the Thiele-Innes constants C and H. This is a useful check on coding.

With the inclusion of spectroscopic data, the combined data sets allow inferences about the 10-dimensional vector $Θ = (ϕ, ψ, λ),$ $\begin{equation*} {\mathrm \Theta} = (\phi, \psi, {\mathrm \lambda}),\end{equation*}$ (2)

where λ = (γ, K₁, K₂).

In a Bayesian analysis, the task is to compute the posterior probability density in Θ -space given D_a and D_s.

2.2 Model binary

The adopted model binary has the following Campbell elements: $\begin{array}{l} P = 10 y τ = 0.4 e = 0.6 a = {0.3}^{''} \\ i = 70^{°} ω = 250^{°} Ω = 120^{°}, \end{array}$ $\begin{eqnarray*} P = 10y \;\;\; \tau = 0.4 \;\;\; e = 0.6 \;\;\; a = 0.3\hbox{$^{\prime\prime}$} \nonumber \\ i = 70^{\circ} \;\;\; \omega = 250^{\circ} \;\;\; {\mathrm \Omega} = 120^{\circ},\end{eqnarray*}$ (3)

With this a, the binary would be unresolved in seeing- broadened images but should be resolved in images approaching diffraction limits.

If we now take the parallax ϖ = 0.05′′, the total mass $= 2.16 M_{⊙}$ $= 2.16~{\cal M}_{\odot}$ . With mass ratio q = 0.7, the component masses are $M_{1} = 1.27$ ${\cal M}_{1} = 1.27$ and $M_{2} = 0.889 M_{⊙}$ ${\cal M}_{2} = 0.889~{\cal M}_{\odot}$ . The resulting semi-amplitudes are K₁ = 8.64 and K₂ = 12.35 km s⁻¹, and without loss of generality we take γ = 0.0 km s⁻¹.

2.3 Observing campaigns

An astrometric observing campaign is simulated by creating measured Cartesian sky coordinates $({\tilde{x}}_{n}, {\tilde{y}}_{n})$ $(\widetilde{x}_{n},\widetilde{y}_{n})$ with weights $w_{n}^{a} = 1 / σ_{a, n}^{2}$ $w_{n}^{a} = 1/\sigma_{a,n}^{2}$ for both coordinates. One observation in observing seasons of length 0.3 yr is created randomly for 10 successive years. We take σ_a,n = 0.05′′ for all n.

In the above, we assume equal precision for each coordinate and uncorrelated errors. These assumptions are well justified if the AO or speckle image reconstructions give circularly symmetric stellar profiles. If necessary, the technique can be generalized to treat unequal and correlated errors (Appendices A.1, C.1).

A spectroscopic observing campaign is simulated by creating measured radial velocities ${\tilde{v}}_{1 n}, {\tilde{v}}_{2 n}$ $\widetilde{v}_{1n},\widetilde{v}_{2n}$ at random times in 10 successive observing seasons. The observations have weights $w_{1 n}^{s} = 1 / σ_{s, n}^{2}$ $w_{1n}^{s} = 1/\sigma_{s,n}^{2}$ and $w_{2 n}^{s} = 0.5 / σ_{s, n}^{2}$ $w_{2n}^{s} = 0.5/\sigma_{s,n}^{2}$ . We take σ_s,n = 0.5 km s⁻¹ for all n.

All 40 simulated observations have errors that are normally-distributed.

3 Conditional probabilities

In order to benefit from the hybrid character of the problems arising in orbit estimation, the chain rule for conditional probabilities is used to factorize multi-dimensional posterior distributions Λ in such a way that linear and non-linear parameters are separated. This facilitates the construction of efficient hybrid numerical schemes that combine grid scanning with Monte Carlo sampling.

3.1 Approach

Consider a problem with two scalar parameters, α and β, and suppose the model is non-linear in α and linear in β. Applying the chain rule, we can write the posterior density as $Λ (α, β) = P r (α) P r (β | α),$ $\begin{equation*} {\mathrm \Lambda}(\alpha,\beta) = Pr(\alpha) \: Pr(\beta|\alpha),\end{equation*}$ (4)

where $P r (α) = \int Λ (α, β) d β .$ $\begin{equation*} Pr(\alpha) = \int {\mathrm \Lambda}(\alpha,\beta) \: \textrm{d} \beta.\end{equation*}$ (5)

Pr(α) is thus the projection of Λ(α, β) onto the α axis.

The 1D function Pr(α) can be approximated by the discrete values Pr(α_i), where the α_i are the mid-points of a uniform grid with steps Δα. In contrast, because of linearity, $N_{i l}$ ${\cal N}_{i\ell}$ values β_iℓ can readily be derived that randomly sample Pr(β|α_i). Combining these approaches, we derive the following approximation for the posterior distribution: $Λ (α, β) = \sum_{i l} Δ α P r (α_{i}) \times N_{i l}^{- 1} δ (β - β_{i l}) .$ $\begin{equation*} {\mathrm \Lambda}(\alpha,\beta) = \sum_{i\ell} {\mathrm \Delta} \alpha \: Pr(\alpha_{i}) \times {\cal N}_{i \ell}^{-1} \: \delta(\beta - \beta_{i \ell}).\end{equation*}$ (6)

With this approximation, all the quantities we wish to infer from the posterior distribution become weighted summations over the points (α_i, β_iℓ), and these summations converge to exact values as Δα → 0 and $N_{i l} \to \infty$ ${\cal N}_{i \ell} \rightarrow \infty$ . Arbitrary accuracy can therefore be achieved.

3.2 Astrometry only

If we only have astrometric data, Λ is a function of seven parameters. With the Thiele-Innes parameterization, the parameter vector is (ϕ, ψ), and the mathematical model is linear in the four ψ parameters and non-linear in the three ϕ parameters.

Following the 2D example of Sect. 3.1, we apply the chain rule to obtain $Λ (ϕ, ψ | D_{a}) = P r (ϕ | D_{a}) P r (ψ | ϕ, D_{a}),$ $\begin{equation*} {\mathrm \Lambda}(\phi,\psi| D_{a}) = Pr(\phi|D_{a}) \: Pr(\psi|\phi,D_{a}),\end{equation*}$ (7)

where $P r (ϕ | D_{a}) = \int Λ d ψ .$ $\begin{equation*} Pr(\phi|D_{a}) = \int {\mathrm \Lambda} \: \textrm{d}\psi.\end{equation*}$ (8)

Here Pr(ϕ|D_a) is the projection of the 7D posterior distribution Λ(ϕ, ψ|D_a) onto the 3D ϕ-space. The second factor Pr(ψ|ϕ, D_a) then specifies how this projected or summed probability is to be distributed in ψ-space.

3.3 Astrometry and spectroscopy

With spectroscopic data included, Λ is now a function of 10 parameters (ϕ, ψ, λ). Again applying the chain rule, we write $\begin{array}{l} Λ (ϕ, ψ, λ | D_{a}, D_{s}) & = & P r (ϕ | D_{a}, D_{s}) \\ \times P r (ψ | ϕ, D_{a}, D_{s}) \times P r (λ | ϕ, ψ, D_{s}) \end{array}$ $\begin{eqnarray*} {\mathrm \Lambda}(\phi,\psi,{\mathrm \lambda}| D_{a}, D_{s}) &=& Pr(\phi|D_{a},D_{s}) \: \\ && \times\, Pr(\psi|\phi,D_{a},D_{s}) \:\times \!Pr({\mathrm \lambda} |\phi,\psi,D_{s}) \nonumber\end{eqnarray*}$ (9)

where $P r (ϕ | D_{a}, D_{s}) = \int Λ d ψ d λ,$ $\begin{equation*} Pr(\phi|D_{a},D_{s}) = \int {\mathrm \Lambda} \: \textrm{d}\psi~\textrm{d}{\mathrm \lambda},\end{equation*}$ (10)

and $P r (ψ | ϕ, D_{a}, D_{s}) = \int Λ d λ / \int Λ d ψ d λ .$ $\begin{equation*} Pr(\psi|\phi,D_{a},D_{s}) = \left. \int {\mathrm \Lambda} \: \textrm{d}{\mathrm \lambda} \: \middle/ \int {\mathrm \Lambda} \: \textrm{d}\psi~\textrm{d}{\mathrm \lambda}\right..\end{equation*}$ (11)

Here Pr(ϕ|D_a, D_s) is the projection of the 10D posterior distribution Λ(ϕ, ψ, λ|D_a, D_s) onto the 3D ϕ-space. The productPr(ψ|ϕ, D_a, D_s) × Pr(λ|ϕ, ψ, D_s) then specifies how this summed probability is to be distributed first into ψ-space and then into λ-space.

The dependence of these probability factors on D_a and D_s merits comment. Both data sets contain information on ϕ = (logP, e, τ). Accordingly,Pr(ϕ) depends on both D_a and D_s.

The ψ-vector (A, B, F, G) determines the Campbell elements (a, i, ω, Ω) and vice versa. Since ω is a spectroscopic as well as an astrometric element, Pr(ψ|ϕ) must depend on D_s as well as on D_a.

If ϕ and ψ are given, then, since ω =ω(ψ), the spectroscopic elements, P, e, τ, ω are known. The data D_s then suffices to determines the remaining spectroscopic elements λ = (γ, K₁, K₂). Thus Pr(λ|ϕ, ψ) does not depend on D_a.

4 Likelihoods

The probability factors defined in Sect. 3 are now evaluated using Bayes’ theorem and the appropriate likelihoods. Throughout this paper, we assume weak, non-informative priors whose impact on posterior distributions can be neglected.

4.1 Astrometry only

In this case, the posterior distribution is $Λ (ϕ, ψ | D_{a}) \propto L_{a},$ $\begin{equation*} {\mathrm \Lambda}(\phi,\psi|D_{a}) \propto {\cal L}_{a},\end{equation*}$ (12)

where, ignoring a constant factor, $L_{a} = \exp (- \frac{1}{2} χ_{a}^{2}),$ $\begin{equation*} {\cal L}_{a} = \exp\left( -\frac{1}{2} \chi^{2}_{a}\right),\end{equation*}$ (13)

and $χ_{a}^{2} = Σ_{n} w_{n}^{a} {(x_{n} - {\tilde{x}}_{n})}^{2} + Σ_{n} w_{n}^{a} {(y_{n} - {\tilde{y}}_{n})}^{2} .$ $\begin{equation*} \chi^{2}_{a} = \Sigma_{n} w_{n}^{a} (x_{n}-\widetilde{x}_{n})^{2} +\Sigma_{n} w_{n}^{a} (y_{n}-\widetilde{y}_{n})^{2}.\end{equation*}$ (14)

Because of linearity, $\hat{ψ} (ϕ)$ $\widehat{\psi}(\phi)$ , the minimum-χ² Thiele-Innes vector at given ϕ, is obtained without iteration, and we can write $χ_{a}^{2} (ψ | ϕ) = {\hat{χ}}_{a}^{2} (\hat{ψ} | ϕ) + δ χ_{a}^{2} (δ ψ | ϕ),$ $\begin{equation*} \chi^{2}_{a}(\psi|\phi) = \widehat{\chi}^{2}_{a}(\widehat{\psi}|\phi) + \delta \chi^{2}_{a}(\delta \psi|\phi),\end{equation*}$ (15)

where $δ χ_{a}^{2}$ $\delta \chi^{2}_{a}$ is the positive increment in $χ_{a}^{2}$ $\chi^{2}_{a}$ due to the displacement to $ψ = \hat{ψ} + δ ψ$ $\psi = \widehat{\psi} + \delta \psi$ .

Correspondingly, we write $L_{a} (ϕ, ψ) = {\hat{L}}_{a} (ϕ) {\tilde{L}}_{a} (ψ | ϕ),$ $\begin{equation*} {\cal L}_{a}(\phi,\psi) = \widehat{{\cal L}}_{a}(\phi) \: \widetilde{{\cal L}}_{a}(\psi|\phi), \end{equation*}$ (16)

where $\hat{L_{a}} = \exp (- \frac{1}{2} {\hat{χ}}_{a}^{2}) and \tilde{L_{a}} = \exp (- \frac{1}{2} δ χ_{a}^{2}) .$ $\begin{equation*} \widehat{{\cal L}_{a}} = \exp\left(- \frac{1}{2} \widehat{\chi}^{2}_{a}\right) \;\;\; \textrm{and} \;\;\; \widetilde{{\cal L}_{a}} = \exp\left( -\frac{1}{2} \delta \chi^{2}_{a}\right).\end{equation*}$ (17)

The statistics of displacements in ψ-space is treated in Appendix A of L14b. These follow a quadrivariate normal distribution such that $P r (ψ | ϕ, D_{a}) = C^{- 1} \exp (- \frac{1}{2} δ χ_{a}^{2}),$ $\begin{equation*} Pr(\psi| \phi, D_{a}) = {\cal C}^{-1} \exp\left(-\frac{1}{2} \delta \chi^{2}_{a}\right),\end{equation*}$ (18)

where $C (ϕ) = {(2 π)}^{2} \sqrt{Δ}$ ${\cal C}(\phi) = (2 \pi)^{2} \sqrt{{\mathrm \Delta}}$ and Δ is the determinant of the covariance matrix. It follows that $\tilde{L_{a}} = C (ϕ) P r (ψ | ϕ, D_{a}) .$ $\begin{equation*} \widetilde{ {\cal L}_{a} } = {\cal C} (\phi) Pr(\psi| \phi, D_{a}).\end{equation*}$ (19)

Substituting $Λ \propto {\hat{L}}_{a} {\tilde{L}}_{a}$ ${\mathrm \Lambda} \propto \widehat{{\cal L}}_{a} \widetilde{{\cal L}}_{a}$ into Eq. (8) and eliminating ${\tilde{L}}_{a}$ $\widetilde{{\cal L}}_{a}$ with Eq. (19), we obtain $P r (ϕ | D_{a}) \propto C (ϕ) \exp (- \frac{1}{2} {\hat{χ}}_{a}^{2}) .$ $\begin{equation*} Pr(\phi|D_{a}) \propto {\cal C}(\phi) \exp\left(-\frac{1}{2} \widehat{\chi}^{2}_{a}\right).\end{equation*}$ (20)

This determines the relative weights of the grid points ϕ_ijk and agrees with Eq. (14) in L14b.

From arandom sampling of the quadrivariate normal distribution Pr(ψ|ϕ, D_a) we obtain the approximation $P r (ψ | ϕ, D_{a}) = N_{ψ}^{- 1} \sum_{l} δ (ψ - ψ_{l}) .$ $\begin{equation*} Pr(\psi|\phi,D_{a}) = {\cal N}^{-1}_{\psi} \sum_{\ell} \delta(\psi - \psi_{\ell}).\end{equation*}$ (21)

Accordingly, the relative weights from Eq. (20) are distributed equally among the points ψ_ℓ in ψ-space (note that at each ϕ_ijk an independent sample {ψ_ℓ} is drawn).

If the errors in ${\tilde{x}}_{n}$ $\widetilde{x}_{n}$ and ${\tilde{y}}_{n}$ $\widetilde{y}_{n}$ are uncorrelated, the quadrivariate distribution Pr(ψ|ϕ, D_a) simplifies to the product of two bivariate normal distributions – see Appendix A in L14b.

4.2 Astrometry and spectroscopy

With the addition of spectroscopic data and again assuming non-informative priors, the posterior density is $Λ (ϕ, ψ, λ | D_{a}, D_{s}) \propto L_{a} L_{s},$ $\begin{equation*} {\mathrm \Lambda}(\phi,\psi,{\mathrm \lambda}|D_{a},D_{s}) \propto {\cal L}_{a} {\cal L}_{s},\end{equation*}$ (22)

where, ignoring a constant factor, $L_{s} = \exp (- \frac{1}{2} χ_{s}^{2}),$ $\begin{equation*} {\cal L}_{s} = \exp\left( -\frac{1}{2} \chi^{2}_{s}\right),\end{equation*}$ (23)

and $χ_{s}^{2} = Σ_{n} w_{1 n}^{s} {(v_{1 n} - {\tilde{v}}_{1 n})}^{2} + Σ_{n} w_{2 n}^{s} {(v_{2 n} - {\tilde{v}}_{2 n})}^{2} .$ $\begin{equation*} \chi^{2}_{s} = \Sigma_{n} w_{1n}^{s} (v_{1n}-\widetilde{v}_{1n})^{2} +\Sigma_{n} w_{2n}^{s} (v_{2n}-\widetilde{v}_{2n})^{2}.\end{equation*}$ (24)

Because of linearity in λ = (γ, K₁, K₂) when ϕ and ψ are fixed, $\hat{λ} (ϕ, ψ | D_{s})$ $\widehat{{\mathrm \lambda}}(\phi,\psi|D_{s})$ , the minimum-χ² vector, is obtained without iteration, and we can write $χ_{s}^{2} (λ | ϕ, ψ) = {\hat{χ}}_{s}^{2} (\hat{λ} | ϕ, ψ) + δ χ_{s}^{2} (δ λ | ϕ, ψ),$ $\begin{equation*} \chi^{2}_{s}({\mathrm \lambda}|\phi,\psi) = \widehat{\chi}^{2}_{s}(\widehat{{\mathrm \lambda}}|\phi,\psi) + \delta \chi^{2}_{s}(\delta {\mathrm \lambda}| \phi,\psi),\end{equation*}$ (25)

where $δ χ_{s}^{2}$ $\delta \chi^{2}_{s}$ is the positive increment in $χ_{s}^{2}$ $\chi^{2}_{s}$ due to the displacement to $λ = \hat{λ} + δ λ$ ${\mathrm \lambda} = \widehat{{\mathrm \lambda}} + \delta {\mathrm \lambda}$ .

Correspondingly, we write $L_{s} (λ | ϕ, ψ) = {\hat{L}}_{s} (ϕ, ψ) {\tilde{L}}_{s} (λ | ϕ, ψ),$ $\begin{equation*} {\cal L}_{s}({\mathrm \lambda}|\phi,\psi) = \widehat{{\cal L}}_{s}(\phi,\psi) \: \widetilde{{\cal L}}_{s}({\mathrm \lambda}|\phi,\psi),\end{equation*}$ (26)

where $\hat{L_{s}} = \exp (- \frac{1}{2} {\hat{χ}}_{s}^{2}) and \tilde{L_{s}} = \exp (- \frac{1}{2} δ χ_{s}^{2}) .$ $\begin{equation*} \widehat{{\cal L}_{s}} = \exp\left(- \frac{1}{2} \widehat{\chi}^{2}_{s}\right) \;\;\; \textrm{and} \;\;\; \widetilde{{\cal L}_{s}} = \exp\left( -\frac{1}{2} \delta \chi^{2}_{s}\right).\end{equation*}$ (27)

The statistics of displacements in λ-space is treated in Appendix A. These follow a trivariate normal distribution Pr(λ|ϕ, ψ, D_s) such that Eq. (A.1) holds. From Eqs. (27) and (A.1), we obtain $\tilde{L_{s}} = D (ϕ, ψ) P r (λ | ϕ, ψ, D_{s}) .$ $\begin{equation*} \widetilde{ {\cal L}_{s}} = {\cal D} (\phi,\psi) Pr({\mathrm \lambda}| \phi,\psi, D_{s}).\end{equation*}$ (28)

The statistics of displacements in ψ-space is modified by the spectroscopic data as noted in Sect. 3.3, and this is treated in Appendix B.

We now calculate Pr(ϕ|D_a, D_s). Substituting $Λ \propto {\hat{L}}_{a} {\tilde{L}}_{a} {\hat{L}}_{s} {\tilde{L}}_{s}$ ${\mathrm \Lambda} \propto \widehat{\cal L}_{a} \widetilde{\cal L}_{a} \widehat{\cal L}_{s} \widetilde{\cal L}_{s}$ into Eq. (10), eliminating ${\tilde{L}}_{s}$ $\widetilde{\cal L}_{s}$ with Eq. (28), and intergrating over λ, we obtain $P r (ϕ | D_{a}, D_{s}) \propto {\hat{L}}_{a} \int D {\tilde{L}}_{a} {\hat{L}}_{s} d ψ .$ $\begin{equation*} Pr(\phi|D_{a},D_{s}) \: \propto \: \widehat{\cal L}_{a} \int {\cal D} \: \widetilde{\cal L}_{a} \widehat{\cal L}_{s} \: \textrm{d}\psi.\vspace*{-8pt}\end{equation*}$ (29)

We now eliminate ${\tilde{L}}_{a}$ $\widetilde{\cal L}_{a}$ using Eq. (19) to obtain $P r (ϕ | D_{a}, D_{s}) \propto C {\hat{L}}_{a} \int D {\hat{L}}_{s} P r (ψ | ϕ, D_{a}) d ψ .$ $\begin{equation*} Pr(\phi|D_{a},D_{s}) \: \propto \: {\cal C} \widehat{\cal L}_{a} \int {\cal D} \: \widehat{\cal L}_{s} Pr(\psi|\phi,D_{a}) \: \textrm{d}\psi.\vspace*{-8pt}\end{equation*}$ (30)

If we now replace Pr(ψ|ϕ, D_a) by the approximation given in Eq. (21) and assume that $N_{ψ}$ ${\cal N}_{\psi}$ is independent of ϕ, then $P r (ϕ | D_{a}, D_{s}) \propto C {\hat{L}}_{a} \sum_{l} {(D {\hat{L}}_{s})}_{ψ_{l}} .$ $\begin{equation*} Pr(\phi|D_{a},D_{s}) \: \propto \: {\cal C} \widehat{\cal L}_{a} \: \sum_{\ell} ({\cal D} \widehat{\cal L}_{s})_{\psi_{\ell}}.\vspace*{-8pt}\end{equation*}$ (31)

Accordingly, the relative weights of points ϕ_ijk in the ϕ-grid are $μ_{i j k} = C {\hat{L}}_{a} \times \sum_{l} {(D {\hat{L}}_{s})}_{ψ_{l}} .$ $\begin{equation*} \mu_{ijk} = {\cal C} \widehat{\cal L}_{a} \times \sum_{\ell} ({\cal D} \widehat{\cal L}_{s})_{\psi_{\ell}}.\vspace*{-11pt}\end{equation*}$ (32)

Here the first factor $C {\hat{L}}_{a}$ ${\cal C} \widehat{\cal L}_{a}$ depends only on D_a. The dependence on D_s is introduced by the second factor: If at a given ϕ, all ψ_ℓ correspond to poor fits to D_s, then the second factor disfavours that ϕ.

5 Numerical results

The technique developed in Sects. 3 and 4 is now applied to synthetic data D_a and D_s created as described in Sect. 2.3 for the model binary defined in Sect. 2.2. All calculations use a 100³ grid for ϕ-space, and Monte Carlo sampling with $N_{ψ} = 20$ ${\cal N}_{\psi} = 20$ for ψ-space and $N_{λ} = 20$ ${\cal N}_{{\mathrm \lambda}} = 20$ for λ -space.

5.1 Parameter cloud

Let ϕ_ijk denote cell mid-points of the 3D grid spanning ϕ-space. At each ϕ_ijk, the technique generates $N_{ψ}$ ${\cal N}_{\psi}$ points ψ_ℓ in ψ-space. Then, at each ψ_ℓ, the technique generates $N_{λ}$ ${\cal N}_{{\mathrm \lambda}}$ points λ_m in λ -space. Thus, with this cascade, a cloud of points (ϕ_ijk, ψ_ℓ, λ_m) is generated inthe 10D (ϕ, ψ, λ)-space.

Note that this is a cloud of orbit parameters and not a cloud of orbits. Because linearity in ψ and λ is fully exploited, the values of $χ_{a}^{2}$ $\chi^{2}_{a}$ and $χ_{s}^{2}$ $\chi^{2}_{s}$ at cloud points are derived without computing astrometric and spectroscopic orbits, and this is the origin of the technique’s computational efficiency.

The relative weights of cloud points (ϕ_ijk, ψ_ℓ, λ_m) are $μ_{i j k, l, m} = μ_{i j k} \times ζ_{l} \times N_{λ}^{- 1} .$ $\begin{equation*} \mu_{ijk,\ell,m} = \mu_{ijk} \times \zeta_{\ell} \times {\cal N}^{-1}_{{\mathrm \lambda}}.\end{equation*}$ (33)

The first factor comes from Eq. (32), the second from Eq. (B.5), and the third from Eq. (A.4).

Note that the third factor is only relevant if $N_{λ}$ ${\cal N}_{{\mathrm \lambda}}$ varies with (ϕ, ψ). Note also that if Pr(ψ|ϕ, D_a, D_s) given by Eq. (B.2) were randomly sampled, the second factor would be $N_{ψ}^{- 1}$ ${\cal N}^{-1}_{\psi}$ . Instead, the quadrivariate normal distribution Pr(ψ|ϕ, D_a) is sampled and then corrected via the coefficients ζ_ℓ, which are such that ∑ ζ_ℓ = 1 – see Appendix B.

5.2 Inferences

Suppose Q(Θ) is a quantity of interest. Its posterior distribution derived from the parameter cloud is $Θ (Q) = \sum_{t} μ_{t} δ (Q - Q_{t}) / \sum_{t} μ_{t},$ $\begin{equation*} {\mathrm \Theta}(Q) = \left. \sum_{t} \: \mu_{t} \: \delta(Q-Q_{t}) \middle/ \sum_{t} \: \mu_{t}\right.,\end{equation*}$ (34)

where t ≡ (ijk, ℓ, m). The corresponding cumulative distribution function is $F (Q) = \sum_{Q_{t} < Q} μ_{t} / \sum_{t} μ_{t} .$ $\begin{equation*} F(Q) = \left.\sum_{Q_{t} < Q} \mu_{t} \middle/ \sum_{t} \mu_{t}\right..\end{equation*}$ (35)

The equal tail credibility interval (Q_L, Q_U) corresponding to ± 1σ is then obtained from the equations $F (Q_{L}) = 0.1587 F (Q_{U}) = 0.8413,$ $\begin{equation*} F(Q_{L}) = 0.1587 \;\;\;\;\; F(Q_{U}) = 0.8413,\end{equation*}$ (36)

so that the enclosed probability is 0.6826.

These credibility intervals are asymptotically rigorous – i.e., are exact in the limits $N_{ψ} \to \infty$ ${\cal N}_{\psi} \rightarrow \infty$ , $N_{λ} \to \infty$ ${\cal N}_{{\mathrm \lambda}} \rightarrow \infty$ , and grid steps → 0.

Fig. 1

Posterior densities for $\log M_{1}$ $\log {\cal M}_{1}$ and $\log M_{2}$ $\log {\cal M}_{2}$ . The long vertical arrows indicate exact values. The short vertical arrows and lines indicate the posterior means and the 1- and 2- σ credibility intervals.

5.3 An example

The fundamental data derivable from the combined astrometric and spectroscopic data are the component masses $M_{1}, M_{2}$ ${\cal M}_{1}, {\cal M}_{2}$ and the parallax ϖ. None of these quantities can be derived if only one data set is available.

At every cloud point t, Kepler’s law and the two spectroscopic mass functions can be solved for $M_{1}, M_{2}$ ${\cal M}_{1}, {\cal M}_{2}$ and ϖ, and these values each have relative weight μ_t. Accordingly, the posterior densities of these quantities can be calculated from Eq. (34) and their credibility intervals from Eq. (35).

For a particular simulation of D_a and of D_s, the posterior densities so derived are plotted in Figs. 1 and 2. Also plotted are the posterior means, the 1− and 2− σ credibility intervals, as well as the exact values from Sect. 2.2. In each case, the exact values fall within the 1− σ limits.

In Appendix C, following L16, a Bayesian goodness-of-fit (GOF) statistic, $χ_{B}^{2}$ $\chi^{2}_{B}$ , is defined together with corresponding Bayesian p-value. We now apply this test. For the astrometric data, the posterior mean of $χ_{a}^{2}$ $\chi^{2}_{a}$ is ${〈 χ_{a}^{2} 〉}_{u}$ $\langle \chi^{2}_{a} \rangle_{u}$ = 22.2, and for the spectroscopic data ${〈 χ_{s}^{2} 〉}_{u}$ $\langle \chi^{2}_{s} \rangle_{u}$ = 16.9, so that in total ${〈 χ^{2} 〉}_{u}$ $\langle \chi^{2} \rangle_{u}$ = 39.1. Since the total number of parameters is k = 10, Eq. (C.4) gives $χ_{B}^{2} = 29.1$ $\chi^{2}_{B} = 29.1$ .

The total number of measurements is n = 40, comprising two astrometric (x, y) and two spectroscopic (v₁, v₂) measurements in each of ten years. The number of degrees of freedom is therefore ν = n − k = 30. Substitution in Eq. (C.5) then gives p_B = 0.51, a value consistent with the belief that the data is analysed with a valid model and that Bayesian inferences are not suspect.

Fig. 2

Posterior density for logϖ(′′). The long vertical arrow indicates the exact value. The short vertical arrow and lines indicate the posterior mean and the 1- and 2- σ credibility intervals.

Table 1

Coverage fractions from 10³ trials.

5.4 Coverage

An accurate gauge of the statistical performance of the technique requires many repetitions of the above calculation with independently drawn samples of D_a and D_s.

With different data, the posterior densities and corresponding credibility limits in Figs. 1and 2 change. But the long vertical arrows marking exact values remain fixed. For each independent repetition, we can record whether or not the exact values are enclosed by the 1- and 2- σ credibility intervals. In this way, we carry out a coverage experiment as in L14a,b – see also Martinez et al. 2017.

The results obtained from 1000 repetitions are summarized in Table 1. These show reasonable agreement with ε(f), the expected fractions for errors obeying a normal distribution. Thus, despite the non-linearities, the credibility intervals retain their conventional interpretations.

6 Discussion: Bayesian hypothesis testing

In L16 and references therein, the relative absence in the astronomical literature of statistical testing of Bayesian models is commented upon.

6.1 Some quotes

The following quote from a statistician (Anscombe 1963) indicates that concern on this issue is of long standing:

“To anyone sympathetic with the current neo-Bernoullian neo-Bayesian Ramseyesque Finettist Savageous movement in statistics, the subject of testing goodness-of-fit is something of an embarrassment.”

A very recent comment (Fischer et al. 2016 in Sect. 4.1, authored by E. Ford) referring to exoplanets is:

“Too often people using Bayesian methods ignore model checking, because it doesn’t have a neat and tidy formal expression in the Bayesian approach. But it is no less necessary to do GOF type checks for a Bayesian analysis than it is for a frequentist analysis”

6.2 Additional justifications

When authors ignore model checking, they seldom, if ever, explain why. The quote above suggests that Bayesians are deterred by the absence of a readily-applied test. In contrast, frequentists reporting a minimum-χ² analysis generally include $χ_{0}^{2}$ $\chi^{2}_{0}$ , the χ² minimum, and often also the p-value derived from the known distribution of $χ_{0}^{2}$ $\chi^{2}_{0}$ . Thus, this traditional, frequentist approach has a built-in reality check. Moreover, this check is rigorously justified for linear models and normally-distributed measurement errors.

Note that minimum-χ² codes return estimates and confidence intervals even when $χ_{0}^{2}$ $\chi^{2}_{0}$ corresponds to a vanishingly small p-value. Thus, we may surmise that innumerable spurious inferences from false hypotheses or poor data are absent from the scientific literature precisely because of this built-in check.

Besides the difficulty of Bayesian model-checking, it seems likely that the following additional reasons play a role in checking being ignored: The detection of the expected signal confirms the hypothesis.

This is endemic in studies of orbits, including frequentist analyses going back decades. If a star is investigated for reflex motion due to a companion and a periodic signal is detected, then it is all too easy to take this as confirmation of a companion. A more critical approach recognizes that a harmonic expansion of Keplerian motion provides quantitative tests of the orbit hypothesis. When this approach is applied to a sample of spectroscopic binaries with exquisitely accurate radial velocities, significant departures from exact Keplerian motion are found (Lucy 2005; Hearnshaw et al. 2012).

A notable recent signal detection is that of gravitational waves from coalescing black holes (Abbott et al. 2016). The published parameters for the initial black hole binary derive from a Bayesian analysis. But these authors do not ignore model checking: their Bayesian analysis is preceded by a standard frequentist χ² test of template fits. The Bayesian model has so many parameters that a poor fit is improbable.

In this case, the acceptance-rejection aspect of the scientific method is replaced by the posterior density favouring or disfavouringregions of parameter space. The expectation is that with enough high quality data, the posterior density will be sharply peaked at the point corresponding to the true hypothesis. But what if the true hypothesis is not part of the adopted multi-parameter model? How does the investigator detect this? A hypothesis should not be rejected if there is no alternative.

On this view, Bayesian model checking, even if readily carried out, should not lead to the rejection of a hypothesis. Rather, one should wait until an alternative hypothesis is proposed and then implement the model selection machinery. This view goes back to (Jeffreys 1939, Sect. 7.2.2) – see also (Sivia & Skilling 2006, p.85).

Jeffreys supports this view by remarking that there was never a time over previous centuries when Newton’s theory of gravity would not have failed ap-test. It is therefore instructive to recall how Adams and Le Verrier reacted to the large residuals in the motion of Uranus – i.e., small p-value. Crucially, their view was that the hypothesis being tested was not Newton’s theory but the then current 7-planet model of the solar system. Because they had a far greater degree of belief in Newtonian gravity than in the 7-planet model, they doubted the latter and went onto successfully predict Neptune’s position. This example illustrates that ambitious and effective scientists take small p-values seriously even in the absence of alternative hypotheses. By doing so, they create alternative hypotheses.

6.3 The $χ_{B}^{2}$ $\chi^{2}_{B}$ statistic

A Bayesian GOF statistic $χ_{B}^{2}$ $\chi^{2}_{B}$ and corresponding p-value is defined in Appendix C.

A crucial requirement of any (GOF) statistic is that it should not often falsely lead one to reject or doubt anhypothesis when that hypothesis is true. In frequentist terms, this is a Type II error. Such an error arises when the statistic gives a small p-value, say p = 0.001, even when the null hypothesis (H₀) is true. Of course, such a value can occur by chance, but for an acceptable GOF statistic the frequency of Type II errors should not markedly exceed p.

In the simulation reported in Sects. 5.3 and 5.4, the null hypothesis is correct by construction, since the data is generated from the exact formulae for the astrometric and spectroscopic orbits. Thus, if the mathematical models were completely linear, we would expect $χ_{B}^{2}$ $\chi^{2}_{B}$ to be distributed exactly as $χ_{ν}^{2}$ $\chi^{2}_{\nu}$ with ν = n − k degrees of freedom. The p-value defined by Eq. (C.5) would then have an exactly uniform distribution in the interval (0, 1).

The N_tot = 1000 simulations used for the coverage experiment in Sect. 5.4 allow the uniformity of the p_B values to be tested. In Fig. 3, the fraction with p_B < p is plotted against p. We see that uniformity is obeyed with reasonable precision for p ∈ (0.01, 1.00). In particular, there is no indication of a significant departure from uniformity that could be attributed to the non-linearities. Accordingly, the p_B values derived from $χ_{B}^{2}$ $\chi^{2}_{B}$ can be interpreted in the same way and with the same confidence as p-values in minimum-χ² estimation.

Note that the calculation of $χ_{B}^{2}$ $\chi^{2}_{B}$ is a trivial addition to an existing Bayesian code that very likely already calculates the posterior means of other quantities.

6.4 Posterior predictive p-values

In the contribution authored by E. Ford from which the quote in Sect. 6.1 is taken, readers are referred to Gelman et al. (2013) who recommend posterior predictive p-values for testing Bayesian models.

In the context of the technique developed here, this recommendation proceeds as follows: 1) Randomly select a point in parameter space from the posterior distribution. Thus, if t is an index that gives a 1D enumeration of the parameter cloud, a random point t′ is that which most closely satisfies the equation $\sum_{t < t^{'}} μ_{t} / \sum_{t} μ_{t} = x,$ $\begin{equation*} \left. \sum_{t < t'} \mu_{t} \middle/ \sum_{t} \mu_{t} \right. = x,\end{equation*}$ (37)

where x is a random number ∈ (0, 1). 2) From the 10 parameters at t′, create synthetic data $D^{'} = {D^{'}}_{a} + {D^{'}}_{s}$ $D' = D'_{a} + D'_{s}$ , compute $χ^{2} = χ_{a}^{2} + χ_{s}^{2}$ $\chi^{2} = \chi^{2}_{a} + \chi^{2}_{s}$ , and then compare to the χ² at t′ for the original data D = D_a + D_s. 3) Repeat steps 1) and 2) $N_{t o t}$ ${\cal N}_{tot}$ times.

A Bayesian p-value is then defined to be $p_{B} = N [χ^{2} (D^{'}) > χ^{2} (D)] / N_{t o t} .$ $\begin{equation*} p_{B} = {\cal N} [ \chi^{2} (D') > \chi^{2} (D) ] \:/\: {\cal N}_{tot}.\end{equation*}$ (38)

Thus a small value of p_B indicates that it is hard to find points t′ giving a worse fit than the original data, indicating that original data gives a poor fit.

To quote E. Ford again (Sect. 6.1), posterior predictive checking is evidently not “a neat and tidy” formalism. Moreover, physical scientists have a strong interest in having reliable p-values when p≲0.001, since such values raise serious doubts about a model’s validity. This then requires $N_{t o t} ~ 10 000$ ${\cal N}_{tot} \sim 10\, 000$ repetitions of the above steps, which may be infeasible.

Posterior predictive p-values have been compared to the values given by Eq. (C.5) for a simple 1D toy model. Specifically, a Hubble flow in Euclidean space populated by perfect standard candles. Synthetic data is created and the posterior density for the Hubble constant derived. A poor fit can be engineered by corrupting the data at high redshift and then comparing the resulting two small p-values. They agree closely.

This suggests that the readily calculated p_B given by Eq. (C.5) eliminates any need for the cumbersome direct calculation of the posterior predictive p-value given by Eq. (38).

Fig. 3

Test of Bayesian p-values. From 1,000 simulations, the fraction with p_B > p is plotted against p for p = 0.01(0.01)1.00. The dashed line shows the expected result if the null hypothesis is correct and if the statistic $χ_{B}^{2}$ $\chi^{2}_{B}$ obeys the $χ_{ν}^{2}$ $\chi^{2}_{\nu}$ distribution with ν = n − k degrees of freedom.

7 Conclusion

In this paper, a non-trivial example of a wide class of problems in statistical astronomy is addressed. These are the so-called hybrid problems where the mathematical models predicting the observations are partly linear and partly non-linear in the basic parameters. As in the simpler, purely astrometric case considered in L14b, when spectroscopic data is added, a grid search over the non-linear parameter space combined with Monte Carlo sampling in the linear parameter spaces still leads to a computationally efficient scheme and again yields credibility intervals with close to correct coverage (Sect. 5.4), a result of prime importance generally, but especially so when estimating fundamental stellar parameters.

In contrast to L14a, the formulation in Sects. 3 and 4 is mostly quite general and so should be readily adapted to other hybrid problems.

In addition to exhibiting correct coverage, the large number of independent simulations allow the testing (Sect. 6.3) of $χ_{B}^{2}$ $\chi^{2}_{B}$ , a Bayesian GOF criterion (Appendix C) for posterior probability densities. Even though the test problem involves some non-linear parameters, the exact sampling distribution in the linear case is closely followed, thus providing a readily-calculated p-value that quantifies one’s confidence in the inferences drawn from the posterior distribution. Since in problems that are exactly linear, the Bayesian and frequentist p-values are identical, investigators can make the decisions on the basis of the Bayesian p_B -value exactly as they would for a frequentist p-value. Moreover, since calculating the statistic $χ_{B}^{2}$ $\chi^{2}_{B}$ involves trivial changes to a Bayesian code, it provides “the neat and tidy formal expression” that is missing in current Bayesian methodology – see quote from E. Ford in Sect. 6.1.

Acknowledgements

The issue of error underestimation in hybrid problems was raised by the referee of L14a and was the direct stimulus of L14b and of this investigation. A useful correspondence with E. L. N. Jensen is also acknowledged.

Appendix A Statistics in λ-space

Statistics in the 4D Thiele-Innes ψ-space is treated in Appendix A of L14b. Analogous results are briefly stated here for the 3D λ -space.

Given ϕ and ψ, the minimum- $χ_{s}^{2}$ $\chi^{2}_{s}$ vector $\hat{λ} = (\hat{γ}, \hat{K_{1}}, \hat{K_{2}})$ $\widehat{{\mathrm \lambda}} = (\widehat{\gamma}, \widehat{K_{1}}, \widehat{K_{2}})$ is obtained without iteration from the normal equations derived from Eqs. (1) and (24).

The displacement $λ = \hat{λ} + δ λ$ ${\mathrm \lambda} = \widehat{{\mathrm \lambda}} + \delta {\mathrm \lambda}$ gives $χ_{s}^{2} = {\hat{χ}}_{s}^{2} + δ χ_{s}^{2}$ $\chi^{2}_{s} = \widehat{\chi}^{2}_{s} + \delta \chi^{2}_{s}$ with positive $δ χ_{s}^{2}$ $\delta \chi^{2}_{s}$ . On the assumption of normally-distributed errors, the probability density at λ is a trivariate normal distribution such that $P r (λ | ϕ, ψ, D_{s}) = D^{- 1} \exp (- \frac{1}{2} δ χ_{s}^{2}),$ $\begin{equation*} Pr({\mathrm \lambda}|\phi,\psi,D_{s}) = {\cal D}^{-1} \exp(-\frac{1}{2} \delta \chi^{2}_{s}),\end{equation*}$ (A.1)

where $D (ϕ, ψ) = {(2 π)}^{3 / 2} \sqrt{Δ}$ ${\cal D}(\phi,\psi) = (2 \pi)^{3/2} \sqrt{{\mathrm \Delta}}$ and Δ is the determinant of the covariance matrix.

A.1 Random sampling in λ-space

Points λ_ℓ randomly sampling the trivariate normal distribution Pr(λ|ϕ, ψ, D_s) are derived with a standard procedure (Gentle 2009) for sampling multivariate distributions. The first step is to make a Cholesky decomposition (Press et al. 2007, p. 100) of the covariance matrix C – i.e., to find the lower triangular matrix L such that $L L^{'} = C .$ $\begin{equation*} \vec{L} \vec{L}^{'} = \vec{C}.\end{equation*}$ (A.2)

A random sample from Pr(λ|ϕ, ψ, D_s) is then $λ = \hat{λ} + L . z = \hat{λ} + δ λ,$ $\begin{equation*} {\mathrm \lambda} = \widehat{{\mathrm \lambda}} + \vec{L}.\vec{z} = \widehat{{\mathrm \lambda}} + \delta{{\mathrm \lambda}},\end{equation*}$ (A.3)

where the elements of z = (z₁, z₂, z₃) are random Gaussian variates. The resulting approximation to Pr(λ) is $P r (λ | ϕ, ψ, D_{s}) = N_{λ}^{- 1} \sum_{m} δ (λ - λ_{m}) .$ $\begin{equation*} Pr({\mathrm \lambda}|\phi,\psi,D_{s}) = {\cal N}^{-1}_{{\mathrm \lambda}} \sum_{m} \delta({\mathrm \lambda} - {\mathrm \lambda}_{m}).\end{equation*}$ (A.4)

Note that $N_{λ}$ ${\cal N}_{{\mathrm \lambda}}$ can vary with (ϕ, ψ). The increment in $χ_{s}^{2}$ $\chi^{2}_{s}$ due to the displacement from $\hat{λ}$ $\widehat{{\mathrm \lambda}}$ is $δ χ_{s}^{2} = z_{1}^{2} + z_{2}^{2} + z_{3}^{2} .$ $\begin{equation*} \delta \chi^{2}_{s} = z_{1}^{2}+z_{2}^{2}+z_{3}^{2}.\end{equation*}$ (A.5)

Accordingly, as in the analogous problem in ψ-space – see Eq. (A.22) in L14b, the increment in χ² is obtained without computing the spectroscopic orbits at $\hat{λ} + δ λ$ $\widehat{{\mathrm \lambda}} + \delta{{\mathrm \lambda}}$ – though this should be checked during code development. This is a consequence of linearity and accounts for the computational efficiency of the technique.

In Appendix A of L14, Cholesky decompostion is not needed because the quadrivariate normal distribution Pr(ψ|ϕ, D_a) is the product of bivariate distributions. But this simplification is lost if ${\tilde{x}}_{n}$ $\widetilde{x}_{n}$ and ${\tilde{y}}_{n}$ $\widetilde{y}_{n}$ have correlated errors (Sect. 2.3). In that circumstance, the above Cholesky approach is the necessary generalization.

Appendix B Modifiedstatistics in ψ- space

The treatment of statistics in ψ-space in Appendix A of L14b does not apply when spectroscopic data is included. As noted in Sect. 3.3 – see Eq. (11), Pr(ψ) depends on both D_a and D_s

The required modification is obtained by substituting $Λ \propto {\hat{L}}_{a} {\tilde{L}}_{a} {\hat{L}}_{s} {\tilde{L}}_{s}$ ${\mathrm \Lambda} \propto \widehat{\cal L}_{a} \widetilde{\cal L}_{a} \widehat{\cal L}_{s} \widetilde{\cal L}_{s}$ into Eq. (11), integrating over λ using Eq. (28), and noting that ${\hat{L}}_{a}$ $\widehat{\cal L}_{a}$ is independent of ψ. This gives $P r (ψ | ϕ, D_{a}, D_{s}) \propto D {\tilde{L}}_{a} {\hat{L}}_{s} .$ $\begin{equation*} Pr(\psi|\phi,D_{a},D_{s}) \: \propto \: {\cal D} \: \widetilde{\cal L}_{a} \widehat{\cal L}_{s}.\end{equation*}$ (B.1)

We now eliminate ${\tilde{L}}_{a}$ $\widetilde{\cal L}_{a}$ using Eq. (19) and noting that $C$ ${\cal C}$ is independent of ψ. This gives $P r (ψ | ϕ, D_{a}, D_{s}) \propto D {\hat{L}}_{s} P r (ψ | ϕ, D_{a}),$ $\begin{equation*} Pr(\psi|\phi,D_{a},D_{s}) \: \propto \: {\cal D} \: \widehat{\cal L}_{s} \: Pr(\psi|\phi,D_{a}),\end{equation*}$ (B.2)

showing that Pr(ψ) is modified from the pure astrometry case by the factor $D {\hat{L}}_{s}$ ${\cal D}\widehat{\cal L}_{s}$ introduced by spectroscopy. Because of this modification, Pr(ψ|ϕ, D_a, D_s) is not a multivariate normal distribution and so not as readily sampled.

The adopted sampling procedure is as follows: from Eq. (A.20) in L14b, we have the approximation $P r (ψ | ϕ, D_{a}) = N_{ψ}^{- 1} \sum_{l} δ (ψ - ψ_{l}),$ $\begin{equation*} Pr(\psi|\phi,D_{a}) = {\cal N}^{-1}_{\psi} \sum_{\ell} \delta( \psi - \psi_{\ell}),\end{equation*}$ (B.3)

where the ψ_ℓ randomly sample the quadrivariate normal distribtion appropriate in the pure astrometry case (Appendix A, L14b). Substituting into Eq. (B.2), we obtain the corresponding approximation when spectroscopy is included $P r (ψ | ϕ, D_{a}, D_{s}) = \sum_{l} ζ_{l} δ (ψ - ψ_{l}),$ $\begin{equation*} Pr(\psi|\phi,D_{a},D_{s}) \: = \:\sum_{\ell} \zeta_{\ell} \: \delta( \psi - \psi_{\ell}),\end{equation*}$ (B.4)

where $ζ_{l} (ϕ) = {(D {\hat{L}}_{s})}_{ψ_{l}} / \sum_{l} {(D {\hat{L}}_{s})}_{ψ_{l}} .$ $\begin{equation*} \zeta_{\ell}(\phi) = \left. ({\cal D} \widehat{\cal L}_{s})_{\psi_{\ell}} \middle/ \sum_{\ell} ({\cal D} \widehat{\cal L}_{s})_{\psi_{\ell}}\right..\end{equation*}$ (B.5)

As might be expected, because of the factor ${({\hat{L}}_{s})}_{ψ_{l}}$ $(\widehat{\cal L}_{s})_{\psi_{\ell}}$ , points ψ_ℓ in ψ-space are strongly disfavoured if that ψ_ℓ gives a poor fit to the spectroscopic data.

Appendix C The $χ_{B}^{2}$ $\chi^{2}_{B}$ and $ψ_{B}^{2}$ $\psi^{2}_{B}$ statistics

In an earlier paper (Lucy 2016; L16), we define ${〈 χ^{2} 〉}_{u} = \int χ^{2} (α) Λ (α | D) d V_{α},$ $\begin{equation*} \langle \chi^{2} \rangle_{u} \: = \int \chi^{2}(\alpha) {\mathrm \Lambda}(\alpha|D) \: \textrm{d}V_{\alpha},\end{equation*}$ (C.1)

where $Λ (α | D) = L (α | D) / \int L (α | D) d V_{α} .$ $\begin{equation*} {\mathrm \Lambda}(\alpha|D) \: = {\cal L} (\alpha|D) \:/ \int {\cal L} (\alpha|D) \: \textrm{d}V_{\alpha}.\end{equation*}$ (C.2)

Thus ${〈 χ^{2} 〉}_{u}$ $\langle \chi^{2} \rangle_{u}$ is the posterior mean of χ²(α) when the posterior density Λ is computed under the assumption of a uniform (u) prior.

If the model is linear in the parameter vector α and if errorsare normally-distributed, then (Appendix A, L16) ${〈 χ^{2} 〉}_{u} = χ_{0}^{2} + k,$ $\begin{equation*} \langle \chi^{2} \rangle_{u} \: = \chi^{2}_{0} + k,\end{equation*}$ (C.3)

where $χ_{0}^{2}$ $\chi^{2}_{0}$ is the minimum value of χ²(α) and k is the number of parameters. Moreover, under the stated assumptions, $χ_{0}^{2}$ $\chi^{2}_{0}$ is distributed as $χ_{ν}^{2}$ $\chi^{2}_{\nu}$ , where ν = n − k is the number of degrees of freedom and n is the number of measurements.

It follows that if we define the statistic $χ_{B}^{2} = {〈 χ^{2} 〉}_{u} - k,$ $\begin{equation*} \chi^{2}_{B} = \langle \chi^{2} \rangle_{u} \: - k,\end{equation*}$ (C.4)

then, for a linear model and normally-distributed errors, $χ_{B}^{2}$ $\chi^{2}_{B}$ is distributed as $χ_{ν}^{2}$ $\chi^{2}_{\nu}$ with ν = n − k. Accordingly, a p-value that quantifies the quality of the posterior distribution Λ (α|D) from which all Bayesian inferences are drawn is given by $p_{B} = P r (χ_{ν}^{2} > χ_{B}^{2}) f o r ν = n - k .$ $\begin{equation*} p_{B} = Pr(\chi^{2}_{\nu} > \chi^{2}_{B}) \;\; for \;\; \nu = n-k.\end{equation*}$ (C.5)

If the model is indeed linear in α, it follows from Eqs. (C.3) and (C.4) that $χ_{B}^{2} = χ_{0}^{2}$ $\chi^{2}_{B} = \chi^{2}_{0}$ , and so the frequentist and Bayesian p-values agree, a gratifying result.

If the model is non-linear in some parameters, then this GOF test should still be useful if the data is such that the fractional error of the non-linear parameters are small, for then a linearized model could be used.

In most Bayesian analyses in astronomy, the imposed priors are uninformative and so this analysis holds. In the rare cases where an informative prior π is imposed, perhaps from a previous experiment, the discussion in L16, Sect. 4.1 suggests that the criterion $χ_{B}^{2}$ $\chi^{2}_{B}$ with ${〈 χ^{2} 〉}_{π}$ $\langle \chi^{2} \rangle_{\pi}$ replacing ${〈 χ^{2} 〉}_{u}$ $\langle \chi^{2} \rangle_{u}$ will have closely similar characteristics.

C.1 Generalization

The above analysis assumes uncorrelated measurement errors. The inclusion of correlations is a straightforward application of the statistics of quadratic forms – see, e.g., Hamilton (1964, Chap.4).

When corellations are included, the χ² summation is replaced by $ψ^{2} = v^{'} M^{- 1} v,$ $\begin{equation*} \psi^{2} = \vec{v}' \vec{M}^{-1} \vec{v}, \end{equation*}$ (C.6)

where M is the covariance matrix and v is the vector of residuals. The previous analysis assumes that the off-diagonal elements of M⁻¹ are zero.

With linearity in the parameter vector α and normally- distributed measurement errors $\tilde{x} - x$ $\tilde{\vec{x}}-\vec{x}$ , the sampling distribution of $\tilde{x}$ $\tilde{\vec{x}}$ is a k-dimensional multivariate normal disribution ∝ exp(−ψ²∕2), where k is the number of parameters. It follows that the likelihood is $L (α | D) \propto \exp (- \frac{1}{2} ψ^{2}) .$ $\begin{equation*} {\cal L}(\vec{\alpha}|D) \propto \exp( - \frac{1}{2} \psi^{2}). \vspace*{6pt}\end{equation*}$ (C.7)

Exploiting linearity in α and assuming a weak prior, we can write the posterior density as $Λ (α | D) \propto \exp (- \frac{1}{2} ψ_{0}^{2}) \times \exp (- \frac{1}{2} δ ψ^{2}),$ $\begin{equation*} {\mathrm \Lambda}(\vec{\alpha}|D) \propto \exp( - \frac{1}{2} \psi^{2}_{0}) \times \exp( - \frac{1}{2} \delta \psi^{2}), \vspace*{6pt}\end{equation*}$ (C.8)

where $ψ_{0}^{2}$ $\psi^{2}_{0}$ is the minimum of ψ² at α₀ and δψ² is the positive increment due to the displacement α −α₀.

Accordingly, in the case of a uniform prior, the posterior mean of ψ² is ${〈 ψ^{2} 〉}_{u} = ψ_{0}^{2} + \frac{\int Δ ψ^{2} \exp (- Δ ψ^{2} / 2) d V_{α}}{\int \exp (- Δ ψ^{2} / 2) d V_{α}} .$ $\begin{equation*} \langle{\psi^{2}}\rangle_{u} = \psi^{2}_{0} + \frac{ \int {\mathrm \Delta} \psi^{2} \exp( -{\mathrm \Delta} \psi^{2} /2) \: dV_{\alpha} } {\int \exp( -{\mathrm \Delta} \psi^{2} /2) \: dV_{\alpha} }. \vspace*{8pt}\end{equation*}$ (C.9)

This has the same form as Eq. (A.4) in L16, with Δ ψ² replacing Δχ². Therefore, since surfaces of constant Δψ² are also self-similar k-dimensional ellipsoids, we immediately have ${〈 ψ^{2} 〉}_{u} = ψ_{0}^{2} + k .$ $\begin{equation*} \langle{\psi^{2}}\rangle_{u} = \psi^{2}_{0} + k. \vspace*{6pt}\end{equation*}$ (C.10)

Now $ψ_{0}^{2}$ $\psi^{2}_{0}$ is distributed as $χ_{ν}^{2}$ $\chi^{2}_{\nu}$ with ν = n − k degrees of freedom. Accordingly, the statistic $ψ_{B}^{2} = {〈 ψ^{2} 〉}_{u} - k,$ $\begin{equation*} \psi^{2}_{B} = \langle{\psi^{2}}\rangle_{u} - k, \vspace*{6pt}\end{equation*}$ (C.11)

is distributed as $χ_{ν}^{2}$ $\chi^{2}_{\nu}$ with ν = n − k degrees of freedom.

This is the required generalization of $χ_{B}^{2}$ $\chi^{2}_{B}$ given by Eq. (C.4).

References

Abbott, B.P., et al. (LIGO Scientific & Virgo Collaborations) 2016, Phys. Rev. Lett., 116, 061102 [Google Scholar]
Anscombe, F. J. 1963 J. Roy. Stat. Soc. Ser. B, 25, 81 [Google Scholar]
Catanzarite, J. 2010, ArXiv e-prints [arXiv:1008.3416] [Google Scholar]
Eastman, J., Gaudi, B., & Agol, E. 2013, PASP, 125, 83 [NASA ADS] [CrossRef] [Google Scholar]
Fischer, D. A., Anglada-Escude, G., Arriagada, P., et al. 2016, PASP, 128, 6001 [Google Scholar]
Gelman, A., Carlin, J. B., Stern, H. S., et al. 2013, Bayesian Data Analysis (3rd edn.), ed. Chapman and Hall (Boca Raton, FL: CRC Press) [Google Scholar]
Gentle, J. E. 2009, Computational Statistics (New York: Springer), 315 [Google Scholar]
Hamilton, W.C. 1964, Statistics in Physical Science (New York: Roland Press) [Google Scholar]
Hearnshaw, J. B., Komonjinda, S., Skuljan, J. & Kilmartin, P.M. 2012, MNRAS, 427, 298 [NASA ADS] [CrossRef] [Google Scholar]
Jeffreys, H. 1939, Theory of Probability (Oxford: Clarendon Press) [Google Scholar]
Lucy, L. B. 2005, A&A, 439, 663 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Lucy, L. B. 2014a, A&A, 563, A126 (L14a) [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Lucy, L. B. 2014b, A&A, 565, A37 (L14b) [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Lucy, L. B. 2016, A&A, 588, A19 (L16) [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Martinez, G. D., Kosmo, K., Hees, A., Ahn, J., & Ghez, A. 2017, IAU Symp., 322, 239 [NASA ADS] [Google Scholar]
Mason, B. D., Douglass, G.G. & Hartkopf, W.I. 1999, AJ, 117, 1023 [NASA ADS] [CrossRef] [Google Scholar]
Press W. H., Teukolsky S. A., Vetterling W. T., & Flannery B. P. 2007, Numerical Recipes 3rd edn. (Cambridge: Cambridge Univ. Press) [Google Scholar]
Sivia, D. S.,& Skilling, J. 2006, Data Analysis, A Bayesian Tutorial 2nd edn., (Oxford University Press) [Google Scholar]
Wright, J. T., & Howard, A. W. 2009, ApJS, 182, 205 [NASA ADS] [CrossRef] [Google Scholar]

All Tables

Table 1

Coverage fractions from 10³ trials.

In the text

All Figures

	Fig. 1 Posterior densities for $\log M_{1}$ $\log {\cal M}_{1}$ and $\log M_{2}$ $\log {\cal M}_{2}$ . The long vertical arrows indicate exact values. The short vertical arrows and lines indicate the posterior means and the 1- and 2- σ credibility intervals.
In the text

	Fig. 2 Posterior density for logϖ(′′). The long vertical arrow indicates the exact value. The short vertical arrow and lines indicate the posterior mean and the 1- and 2- σ credibility intervals.
In the text

	Fig. 3 Test of Bayesian p-values. From 1,000 simulations, the fraction with p_B > p is plotted against p for p = 0.01(0.01)1.00. The dashed line shows the expected result if the null hypothesis is correct and if the statistic $χ_{B}^{2}$ $\chi^{2}_{B}$ obeys the $χ_{ν}^{2}$ $\chi^{2}_{\nu}$ distribution with ν = n − k degrees of freedom.
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Abbott, B.P., et al. (LIGO Scientific & Virgo Collaborations) 2016, Phys. Rev. Lett., 116, 061102 [Google Scholar]

[2] Anscombe, F. J. 1963 J. Roy. Stat. Soc. Ser. B, 25, 81 [Google Scholar]

[3] Catanzarite, J. 2010, ArXiv e-prints [arXiv:1008.3416] [Google Scholar]

[4] Eastman, J., Gaudi, B., & Agol, E. 2013, PASP, 125, 83 [NASA ADS] [CrossRef] [Google Scholar]

[5] Fischer, D. A., Anglada-Escude, G., Arriagada, P., et al. 2016, PASP, 128, 6001 [Google Scholar]

[6] Gelman, A., Carlin, J. B., Stern, H. S., et al. 2013, Bayesian Data Analysis (3rd edn.), ed. Chapman and Hall (Boca Raton, FL: CRC Press) [Google Scholar]

[7] Gentle, J. E. 2009, Computational Statistics (New York: Springer), 315 [Google Scholar]

[8] Hamilton, W.C. 1964, Statistics in Physical Science (New York: Roland Press) [Google Scholar]

[9] Hearnshaw, J. B., Komonjinda, S., Skuljan, J. & Kilmartin, P.M. 2012, MNRAS, 427, 298 [NASA ADS] [CrossRef] [Google Scholar]

[10] Jeffreys, H. 1939, Theory of Probability (Oxford: Clarendon Press) [Google Scholar]

[11] Lucy, L. B. 2005, A&A, 439, 663 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[12] Lucy, L. B. 2014a, A&A, 563, A126 (L14a) [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[13] Lucy, L. B. 2014b, A&A, 565, A37 (L14b) [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[14] Lucy, L. B. 2016, A&A, 588, A19 (L16) [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[15] Martinez, G. D., Kosmo, K., Hees, A., Ahn, J., & Ghez, A. 2017, IAU Symp., 322, 239 [NASA ADS] [Google Scholar]

[16] Mason, B. D., Douglass, G.G. & Hartkopf, W.I. 1999, AJ, 117, 1023 [NASA ADS] [CrossRef] [Google Scholar]

[17] Press W. H., Teukolsky S. A., Vetterling W. T., & Flannery B. P. 2007, Numerical Recipes 3rd edn. (Cambridge: Cambridge Univ. Press) [Google Scholar]

[18] Sivia, D. S.,& Skilling, J. 2006, Data Analysis, A Bayesian Tutorial 2nd edn., (Oxford University Press) [Google Scholar]

[19] Wright, J. T., & Howard, A. W. 2009, ApJS, 182, 205 [NASA ADS] [CrossRef] [Google Scholar]

Binary orbits from combined astrometric and spectroscopic data

1 Introduction

2 Synthetic orbits

2.1 Orbital elements

2.2 Model binary

2.3 Observing campaigns

3 Conditional probabilities

3.1 Approach

3.2 Astrometry only

3.3 Astrometry and spectroscopy

4 Likelihoods

4.1 Astrometry only

4.2 Astrometry and spectroscopy

5 Numerical results

5.1 Parameter cloud

5.2 Inferences

5.3 An example

5.4 Coverage

6 Discussion: Bayesian hypothesis testing

6.1 Some quotes

6.2 Additional justifications

6.3 The χB2 statistic

6.4 Posterior predictive p-values

7 Conclusion

Acknowledgements

Appendix A Statistics in λ-space

A.1 Random sampling in λ-space

Appendix B Modifiedstatistics in ψ- space

Appendix C The χB2 and ψB2 statistics

C.1 Generalization

References

All Tables

All Figures

6.3 The $χ_{B}^{2}$ $\chi^{2}_{B}$ statistic

Appendix C The $χ_{B}^{2}$ $\chi^{2}_{B}$ and $ψ_{B}^{2}$ $\psi^{2}_{B}$ statistics