Pitfalls of statisticslimited Xray polarization analysis
^{1}
KTH Royal Institute of Technology, Department of Physics,
106 91
Stockholm,
Sweden
email: mikhalev@kth.se
^{2}
The Oskar Klein Centre for Cosmoparticle Physics, AlbaNova University Centre,
106 91
Stockholm,
Sweden
Received:
19
September
2017
Accepted:
23
March
2018
Context. One of the difficulties with performing polarization analysis is that the mean polarization fraction of subdivided data sets is larger than the polarization fraction for the integrated measurement. The resulting bias is one of the properties of the generating distribution discussed in this work. The limitations of Gaussian approximations in standard analysis based on Stokes parameters for estimating polarization parameters and their uncertainties are explored by comparing with a Bayesian analysis. The effect of uncertainty on the modulation factor is also shown, since it can have a large impact on the performance of gammaray burst polarimeters. Results are related to the minimum detectable polarization (MDP), a common figure of merit, making them easily applicable to any Xray polarimeter.
Aims. The aim of this work is to quantify the systematic errors induced on polarization parameters and their uncertainties when using Gaussian approximations and to show when such effects are nonnegligible.
Methods. The probability density function is used to deduce the properties of reconstructed polarization parameters. The reconstructed polarization parameters are used as sufficient statistics for finding a simple form of the likelihood. Bayes theorem is used to derive the posterior and to include nuisance parameters.
Results. The systematic errors originating from Gaussian approximations as a function of instrument sensitivity are quantified here. Different signaltobackground scenarios are considered making the analysis relevant for a large variety of observations. Additionally, the change of posterior shape and instrument performance MDP due to uncertainties on the polarimeteric response of the instrument is shown.
Key words: polarization / methods: data analysis / methods: statistical
© ESO 2018
1 Introduction
The first observations of astrophysical Xray polarization were made more than forty years ago (Novick et al. 1972; Weisskopf et al. 1976). The field has been reinvigorated in the past decade by a series of measurements by satellite, including INTEGRAL/SPI (Dean et al. 2008; Chauvin et al. 2013); INTEGRAL/IBIS (Forot et al. 2008; Moran et al. 2016), AstroSat/CZTI (Vadawale et al. 2018); and IKAROS/GAP (Yonetoku et al. 2011) as well as balloonborne instruments PoGOLite (Chauvin et al. 2016) and PoGO+ (Chauvin et al. 2017). Results from several ongoing missions are expected in the near future (POLAR, Produit et al. 2005; XCalibur, Beilicke et al. 2014), and a dedicated satellite mission is in development (IXPE, Weisskopf et al. 2016). Although some instruments, for example, IXPE, will be able to make polarization measurements of astrophysical sources with high precision, where the statistical analysis becomes relatively trivial, it is always desirable to observe weaker sources or to subdivide the data. Fine splitting of data may be necessary for understanding physical phenomena, for example, gammaray bursts (GRBs) and pulsars require temporal binning, nebulae require spatial binning, and spectral binning is interesting for all objects. It is therefore important to know when the number of photons is sufficient for making an accurate analysis using simple methods and when a more rigorous approach is necessary.
The parameters describing linear polarization are the polarization fraction and the polarization angle. In the frequentist approach, the major challenge in estimating the polarization fraction is that it is a positive definite quantity, meaning that a nonzero fraction is measured even for an unpolarized source, thus introducing a bias. Ways of correcting for this bias have been studied by Simmons & Stewart (1985) and more recently by Maier et al. (2014). A Bayesian approach was first introduced by Vaillancourt (2006) and extensively expanded upon by Quinn (2012), where the shapes of the resulting parameter distributions are described. These preceding works focus on optical measurements of polarization for which some formulae differ from the Xray counterpart (Kislat et al. 2015).
This work quantifies, as a function of measurement sensitivity, the error incurred when using the conventional Stokes parameter analysis. This is shown for both the polarization fraction and angle as well as for their uncertainties. Henceforth the expression “low statistics” is used to indicate poor data quality having a low , where S is the number of signal photons and N is the total number of photons (signal and background). Since a pointsource polarimeter working in the lowstatistics regime is likely to have a low signaltobackground ratio R, whereas a GRB polarimeter is expected to have a higher R (especially if the GRB is bright and short in duration), the study is conducted for different values of R where meaningful. Additionally, the effects of uncertainty on the modulation factor μ_{0}, defined as the polarization fraction measured for a 100% polarized beam, on the performance of the polarimeter are studied. The uncertainty is typically large for GRB polarimeters which are not optimized for localization of GRBs since μ_{0} varies with the photon incidence angle.
2 Parameter estimation
For a Compton scattering (Lei et al. 1997) or photoelectric polarimeters (Bellazzini & Muleri 2010) with a uniform response, the conditional distribution for a measured polarization angle ψ_{i} given a polarization fraction p_{0}, a polarization angle ψ_{0} and a modulation factor μ_{0} follows (1)
where ψ = ϕ − 90° for the Compton process and ψ = ϕ for the photoelectric effect, ϕ is the measured scattering angle, S is the number of signal photons and B is the number of background photons. Treating signal and background as latent variables is beyond the scope of this paper so it is assumed that the background is unpolarized and is known with much higher precision than the polarization parameters. The modulation factor μ_{0} is a property of the polarimeter and is derived from calibration. In what follows, the notation with subscript “0” means a physical parameter generating the data, in this case a set of measured scattering angles transformed to the polarization frame {ψ_{i} }, while subscript “r” denotes reconstructed parameters from this set. The reconstructed parameters are actually sufficient statistics for the data set allowing it to be represented by 2 scalars rather than a set of N angles.
The two most common ways of computing p_{r} and ψ_{r} are by performing a χ^{2}fit of Eq. (1) to a histogram of scattering angles or by computing the Stokes parameters. This work only considers the latter as it avoids complicating the analytical form of the likelihood due to binning effects.
2.1 Stokes parameters
Polarimeters operating in the radio, infrared, or optical domain measure photon intensities rather than individual photons as opposed to Xray polarimeters. Clarke et al. (1983) discuss how the Stokes parameter distributions vary depending on the instrumental technique used for measuring these intensities. Since Eq. (1) is not used in the lowenergy domain, not all results presented in that publication can be extrapolated to the Xray energy band. In particular, the standard deviations and the correlation coefficient of Stokes parameters are affected.
In the Xray energy band, the Stokes parameters are derived from individual photon events comprising two quantities (2)
For a total number of photons N = S + B the normalized Stokes parameters are written as (3)
The normalization is only proportional to the signal because the background is assumed to be unpolarized. Therefore the background contributes only to the variance of (Q_{r}, U_{r}).
The Central Limit Theorem (CLT) makes Q_{r} and U_{r} Gaussian distributed as long as N is sufficiently large. This is referred to here as the CLT approximation. Thus Q_{r} and U_{r} follow the bivariate Gaussian distribution (4)
Since the mean, the standard deviation, and the correlation coefficient are sufficient statistics for Gaussian distributed data, so are Q_{r} and U_{r} together with the second moments of the data derived in Appendix B. Although not written explicitly in the conditioning, Q_{0}, U_{0}, σ_{U}, σ_{Q} and ρ are functions of p_{0} and ψ_{0}.
2.2 Polar coordinates
Stokes parameters (Q, U) are in Cartesian coordinates but can be transformed to polar coordinates to allow the polarization parameters to be expressed in terms of (p, ψ). The sufficient statistics become (5) (6)
as per the derivations in Appendix A.
Now Eq. (4) can be transformed to polar coordinates by using the general coordinate transformation for a probability density function (p.d.f.) yielding the likelihood (7)
where det(J) is the determinant of the Jacobian given by (8)
Although the polarization parameters have a very similar form to that of the sufficient statistics (simply replacing the subscript “r ” by “0”) (9) (10)
p_{r} does not correspond to the most probable estimate of p_{0}. This occurs because (σ_{Q}, σ_{U}, ρ) depend on (p_{0}, ψ_{0}) and the fact that 0 ≤ p_{0} ≤ 1 (since a polarization greater than 100% is unphysical) but there is no upper limit on p_{r}, that is, 0 ≤ p_{r}. Therefore, using the sufficient statistic p_{r} as the estimator of the polarization fraction p_{0} incurs an error.
The error is nonneglible for the onedimensional likelihood of p_{0} (obtained after marginalizing over ψ_{0}) if the statistical significance of the measurement is low. It results in and ⟨p_{r}⟩≠p_{0}, where ⟨p_{r} ⟩ is the expected value of p_{r}. Hence p_{r}, is neither the maximum likelihood nor an unbiased estimator of p_{0}. This is the case even if μ_{r} = μ_{0} which occurs when there is no uncertainty on μ as is assumed throughout this section.
As the statistical precision of a measurement improves and . Conversely, due to symmetry, there is no bias on ψ so ⟨ψ_{r}⟩ = ψ_{0} and .
Fig. 1 The comparison of the CLT approximation to the full likelihood, as given by Eqs. (7) and (11), respectively. For p_{0} = 0.5, both pure signal and mixed scenarios are well described by the CLT approximation. For p_{0} = 1.0, the pure signal scenario deviates farther from the approximation than the scenario with background because the pure signal scenario has fewer photons. All scenarios use μ_{0} = 0.5 and MDP = 2. 

Open with DEXTER 
2.3 Central Limit Theorem approximation of the likelihood
The CLT approximation in B(Q_{r}, U_{r}Q_{0}, U_{0}) provides an always easily computable analytical form of the likelihood given by Eq. (7). However, if the polarization fraction is high and the number of photons is low, the likelihood is not well described by such an approximation. The full form of the likelihood is given by a product over Eq. (1). (11)
It has N factors and is therefore cumbersome to evaluate when the number of photons is large. This is unlike optical polarimetery where photon intensities of (Q, U) are directly measured instead of individual scattering angles.
Figure 1 shows the difference between the CLT approximation and L(p_{r} p_{0}, ψ_{0}, μ_{0}) as calculated by generating adata set {ψ_{i}} from Eq. (11), computing p_{r} using the Stokes parameters, repeating for many iterations and making a normalized histogram of p_{r}.
In what follows, the concept of minimum detectable polarization (MDP) at 99% confidence level is important. The MDP Weisskopf et al. (2010) is given by (12)
Its statistical meaning is that, given an unpolarized source (p_{0} = 0), the probability of measuring p_{r} > MDP is 1%. This quantity is a standard figure of merit for polarimeter performance and can easily be calculated for any measurement. The number of signal photons in Fig. 1 is chosen such that MDP = 2, since this is the highest (“worst”) MDP considered in later sections. As seen from Fig. 1, the CLT approximation becomes more accurate as the number of photons increases and the polarization fraction p_{0} decreases. In particular, only measurements with high signaltobackground ratio and high polarization fraction require the full likelihood given by Eq. (11). Although the error incurred using the CLT approximation is negligible in most cases, it becomes larger when using Eq. (7) to derive the posterior, asis discussed below. Figure 1 is for qualitative purposes only; the equivalent figure for ψ_{r} has been omitted since it leads to the same conclusions.
2.4 Magnitude and importance of bias
Bias is a frequentist concept which relies on fixing (p_{0}, ψ_{0}) and investigating ⟨p_{r}⟩. This approach provides an intuitive understanding for how an unpolarized source can produce a polarized signal (p_{r} > 0). Several previous measurements use the sufficient statistic p_{r} as an estimate of the polarization fraction p_{0} (e.g., Weisskopf et al. 1978; Slowikowska et al. 2009) and it is therefore necessary to understand when the bias is negligible and when a more sophisticated approach is necessary.
In the frequentist approach, the p.d.f. B(Q_{r}, U_{r}Q_{0}, U_{0}) is not a function of ψ_{0} (it is a fixed parameter) but of ψ_{r}. It is assumed, without loss of generality due to the angular symmetry of the problem, that ψ_{0} = 0 so that ρ = 0. This can be thought of as rotating the angular coordinate system which does not have any special reference point. The definition of the “zero angle” of such a system has no influence on the polarization fraction. The p.d.f. now simplifies to the product of two normal distributions (13)
It can now be transformed to polar coordinates similarly to Eq. (7) yielding (14)
The relative mean bias β is now given by (15)
where Δψ = ψ_{r} − ψ_{0} = ψ_{r}.
To understand which parameters have a significant impact on β an approximate analytical expression can be derived by introducing (16)
It is now possible to write Eq. (14) as (17)
Integrating over Δψ results in (18)
which is the Rice distribution where I_{0} is the modified Bessel function of zeroth order. In the limit of high statistics, the relative mean bias is given by (19)
as shown in Appendix C.
Since p_{0} is not known a priori, Eq. (19) needs to be expressed as a function of p_{r} by recursion. After some simplification the result is (20)
where x ≡MDP∕4.29p_{r}. This shows that MDP∕p_{r} is a good choice of independent variable. Equation (20) is shown in Fig. 2 where it is seen that β > 0 and increases monotonously, i.e., on average the reconstructed polarization p_{r} will always be greater than the fixed polarization p_{0}. Exact numerical integration of Eq. (15) is also provided for different signaltobackground ratios R yielding similar results to the approximation in Eq. (20). Here R = 0 is the limit of large N, low S and yet finite MDP. To avoid inaccurately computing β for small S (a problem under the CLT approximation) Eq. (11) is used for calculating L(p_{r}p_{0}, ψ_{0}, μ_{0}) and ultimately its mean when S < 200.
To understand when the bias is significant with respect to the statistical uncertainty on p_{r}, the bias fraction is shown in Fig. 3. Here is derived (see Appendix D) using standard error propagation yielding (21)
It becomes clear that this bias cannot be ignored in the low statistics regime. For a measured polarization below MDP, i.e., when MDP∕p_{r} > 1, the bias is more than 18% of the statistical uncertainty. Hence, one should not use Eq. (5) for estimating p_{0} in this regime.
A conclusion of this analysis is that significant errors are incurred when dividing the data into several data sets as is done for example by Dean et al. (2008) in order to estimate the statistical uncertainty. The smaller the data set, the bigger the bias, and thus, on average, the result of an integrated measurement will be lower than the mean of its constituent data sets. This is also relevant when fitting models to polarization fraction subdivided with respect to energy or time. Binning the data will result in a higher reconstructed polarization fraction, thus biasing the fit and therefore physical conclusions should not be drawn from p_{r}, as is done for example in Vadawale et al. (2018) where the evolution of polarization parameters throughout the Crab pulsar pulse phase is investigated.
Fig. 2 Relative mean bias β as given by Eq. (15) (solid colored lines) and the approximation of Eq. (20) (black dashed line). Here p_{0}= μ_{0} = 1 has been used to show the maximum difference between the different signaltobackground ratios R. A loglog plot is shown in the inset. 

Open with DEXTER 
Fig. 3 Ratio of the absolute bias to the statistical uncertainty . The β in the approximation (black dashed line) is given by Eq. (20). Here p_{0} = μ_{0} = 1 has been used to show the maxi mum difference between the different signaltobackground ratios R. 

Open with DEXTER 
3 Maximum a posteriori estimate
The previous section showed the bias arising when using Eq. (5) for estimating p_{0} independently of the polarization angle. It is not obvious if such a bias exists when using (p_{r}, ψ_{r}) for the joint point estimate of (p_{0}, ψ_{0}). However, in this case it does not make sense to find a ⟨p_{r}, ψ_{r}⟩ for a fixed (p_{0}, ψ_{0}) so here the word “bias” is interpreted as the difference between the Maximum A Posteriori (MAP) estimate, p_{MAP}, and p_{r}. Such an analysis requires a Bayesian approach.
3.1 Central Limit Theorem approximation of the posterior
The posterior P(p_{0}, Δψ_{0}p_{r}) is derived by applying the Bayes theorem to the likelihood given by Eq. (7) (22)
where is the normalization factor, P(p_{0}, ψ_{0}) is the prior and Δ ψ_{0} = ψ_{0} − ψ_{r}. Since in a Bayesian approach the parameters (p_{0}, ψ_{0}) vary and the data (p_{r}, ψ_{r}) are fixed, L(p_{r}, 0p_{0}, Δψ_{0}) is a function of ψ_{0}. It follows that σ_{Q}≠σ_{U} and ρ≠ 0. This differs from the typical assumption of σ_{Q} = σ_{U} and no correlation between Q and U as done in other works (Simmons & Stewart 1985; Vaillancourt 2006; Quinn 2012; Maier et al. 2014).
The Jeffreys prior has the desirable property of being invariant under reparametrization (Jeffreys 1939). In this case it is a uniform prior in the Cartesian coordinates (Q_{0}, U_{0}). It will result in a preference of high polarization fractions after transforming to polar coordinates, as discussed in Quinn (2012). This is unphysical because it is more difficult to make a highly polarized photon beam in nature since any disorder in the emission region will lower the polarization fraction. A more realistic approach is to instead take a uniform prior 0 ≤ p_{0} ≤ 1 in polar coordinates. The prior for ψ_{0} is also uniform due to symmetry.
Figure 4 shows the posterior as given by Eq. (22) for a lowstatistics measurement MDP∕p_{r} = 1. The asymmetry is apparent as the posterior broadens for low values of p_{0}. The effect of the prior is illustrated by including the unbound prior scenario where p_{0} ≥ 0.
Since the posterior is derived using a likelihood which utilizes the CLT approximation, the posterior may, for certain combinations of parameters, be inaccurate. To check the magnitude of the effect, pairs of (p_{0}, ψ_{0}) are sampled from the prior and then used to generate a data set. Data sets falling within a narrow window of a chosen p_{r} and ψ_{r} (p_{r} ± 0.002 and ψ_{r} ± 0.36°) are selected andthe posterior for each data set is found using Eq. (11) as the likelihood. Finally, the intervals containing 90% of the posteriors marginalized over the polarization angle (for clarity) are shown in Fig. 5 along with the CLT approximation. The situation is similar for the posterior of the polarization angle marginalized over the polarization fraction.
The most important feature of Fig. 5 is that the posterior is not the same for fixed (p_{r}, ψ_{r}) when the number of signal photons is low. Hence (p_{r}, ψ_{r}) are not always sufficient statistics and the full data set {ψ_{i}} is required for deriving the posterior. However, the posterior converges quickly towards the CLT approximation when R < 0.5 since any meaningful measurement with such a low R requires a large number of photons. An effect not shown in the figure is that the difference between the full posterior and the CLT approximation increases as p_{r} and μ_{r} increase requiring more photons for good convergence. However, few Xray polarimeters have μ_{0} higher than 0.5 (Krawczynski et al. 2011; Kaaret 2014) and synchrotron emission, a proposed mechanism for many astrophysical sources, is not expected to produce a polarization fraction above ~ 0.6 (Lyutikov et al. 2003).
In conclusion, the full likelihood in Eq. (11) rather than the CLT approximation in Eq. (7) is required for sources where the signaltobackground ratio is high but the total number of photons is low (e.g., shortduration GRBs). A guideline is to use the CLT approximation only for data sets where S >10^{3}. As mentioned before, polarimeters foreseen to be active in the near future (Astrosat/CZTI, POLAR, IXPE, XCalibur) have μ_{0} < 0.5 so this guideline translates to the CLT approximation being generally valid for MDP < 0.27.
Fig. 4 Highest posterior density credibility regions for the posterior given by Eq. (22) for MDP ∕p_{r} = 1, R = 0 and μ_{0} = 1. The contours correspond to 1σ, 2σ and 3σ probability content. The posterior is truncated by the prior at p_{0} = 1. The unbound prior scenario (black line) corresponds to the unnormalized prior p_{0} ≥ 0. 

Open with DEXTER 
Fig. 5 Intervals containing 90% of the posteriors for (p_{r} = 0.5, ψ_{r} = 22.5°) and the CLT approximation. As the number of signal photons increases, the posteriors converge towards the approximation. Here μ_{0} = μ_{r} = 0.5 and MDP∕p_{r} = 1. 

Open with DEXTER 
Relative MAP bias β_{MAP} for MDP ∕p_{r} = 2, different signaltobackground ratios R, sufficient statistic p_{r} and modulation factor μ_{0}.
3.2 Relative MAP bias
The MAP estimate is at p_{MAP} > p_{r} and ψ_{0} = ψ_{r}. The inequality occurs because σ_{Q} and σ_{U} depend on p_{0} but only measurements with high R and are significantly affected. In all other cases and p_{MAP} ≈ p_{r}.
The relative MAP bias is defined as (23)
Table 1 shows β_{MAP} for the extreme case MDP∕p_{r} = 2 (a monotonously decreasing function) for different polarization parameters. Only scenarios with a sufficient number (S > 884 based on the results shown in Fig. 5) of signal photons are considered. The results show that β_{MAP} increases with p_{r} and μ_{0} but its value is negligible for all cases where the CLT approximation is valid. Therefore, when (p_{r}, ψ_{r}) are sufficient statistics of the data, they are a good estimate for the MAP.
The broadening of the joint posterior at low values of p_{0}, as shown in Fig. 4, results inp_{MAP} being a poor estimate for p_{0} when marginalizing over ψ_{0}. More generally, this occurs for any twodimensional distribution when marginalizing over the nuisance parameter if the estimated parameter has an asymmetric distribution.
Fig. 6 Skewness of the marginalized posterior for the polarization fraction. The unbound prior scenario (black line) uses p_{0} ≥ 0 and p_{r} = 1 (here the result is independent of p_{r}). A positive skewness implies an asymmetric distribution with a longer tail on the right, while a negative skewness results in a tail on the left. In all cases R = 0 is assumed. 

Open with DEXTER 
4 Uncertainties on polarization parameters
There are two reasons why it is not always appropriate to use naive Stokes estimates for uncertainties of polarization parameters: nonGaussianity and the prior. Equation (21) gives the correct uncertainty on the polarization fraction when statistics are high and the reconstructed fraction p_{r} is well below 1. If these conditions are not fulfilled, the marginalized posterior becomes highly asymmetric either due to low statistics or because a part of the likelihood is truncated by the prior at p_{0} = 1 and is thus poorly approximated by a Gaussian.
The asymmetry of the marginalized posterior for the polarization fraction is shown in terms of the skewness in Fig. 6. The unbound prior scenario employs the unnormalized uniform prior p_{0} ≥ 0 and p_{r} = 1 (here all values of p_{r} produce the same result). It shows that when statistics are low, the marginalized posterior gets a longer tail on the right (positive skewness). This is because the peak moves left, which is equivalent to measuring a higher p_{r} for a fixed p_{0}, that is, the same interpretation as for the relative mean bias shown in Fig. 2.
Although the marginalized posterior for the polarization angle is always symmetric, it deviates from Gaussianity when statistics are low, for example, approaching the uniform distribution since all angles are equally likely. It is also affected by the prior when p_{r} is sufficiently high, resulting in a distribution with longer tails because, as shown in Fig. 4, the prior only truncates the part of the likelihood contributing to the central part of the marginalized distribution. Therefore, the uncertainty given by (24)
(derived in Appendix D) is not always valid.
In a Bayesian approach, the posterior is used instead of a parameter estimate and its uncertainty since the posterior contains the full information about the parameter. The posterior cannot easily be described in text so a simplified description is necessary such as its peak and the region of Highest Posterior Density (HPD) containing the probability content corresponding to one Gaussian standard deviation.
It is possible to quantify when the HPD region must be derived from the posterior and when Eqs. (21) and (24) are good approximations by considering the ratios and as functions of MDP∕p_{r}, which are shown in Figs. 7 and 8, respectively. In both figures, the unbound prior scenario uses the unnormalized uniform prior p_{0} ≥ 0 and p_{r} = 1. Only scenarios with R = 0 are considered, meaning that the CLT approximation is valid in the entire range of MDP∕p_{r} for any p_{r}. Increasing R does not change the results shown in this section but may invalidate the CLT approximation. Such cases cannot be represented because p_{r} is not a sufficient statistic. However, since the posterior shape does not change drastically (as shown in Fig. 5), the following discussion is a good qualitative description of its behavior for any R.
For ease of comparison, is defined as half of the region containing 1σ Gaussian probability. Additionally, the Gaussian interval is limited to account for the prior, for example, p_{0} = 0.8 ± 0.3 is the interval [0.5, 1.0] which gives an uncertainty of , and therefore it does not contain 68.3% (1σ) probability content.
The unbound prior scenario in Fig. 7 shows that for MDP∕p_{r} ~ 1 the marginalized posterior has longer tails than a Gaussian. Additionally, the prior truncates the posterior for high p_{r} or for low statistics. This can result in either a higher or alower depending on where exactly the distribution is truncated. Unless p_{r} is high, is a good approximation for the magnitude of the uncertainty at MDP∕p_{r} ~ 1 but it is important to remember that the posterior is asymmetric for such low statistics.
For the polarization angle, Fig. 8 shows that . The situation is simpler than for p_{0} because the angle has a symmetric posterior. The unbound prior shows the deviation from a Gaussian manifesting in longer tails. Figure 4 shows that when adding the prior 0 ≤ p_{0} ≤ 1, the part of the likelihood which extends past p_{0} = 1 is truncated. This part contributes to the density at the center of the marginalized posterior. Removing it further extends the tails. It follows that Eq. (24) underestimates the uncertainty on the polarization angle by relative 10–20% at MDP∕p_{r} ~ 1.
This analysis shows that in the limit of low statistics, MDP∕p_{r} ~ 1, the uncertainty on the polarization angle is not welldescribed by Eq. (24) and a Bayesian treatment is necessary. For high p_{r}, such a treatment is required even for highstatistics measurements, MDP∕p_{r} ~ 0.5, because of the asymmetry in the uncertainty on the polarization fraction.
To illustrate the effects described above, two examples are presented in Table 2. The first is a recent measurement of the Crab nebula by PoGO+ (Chauvin et al. 2017) and the second is a hypothetical measurement of a high reconstructed polarization fraction highlighting the importance of the prior 0 ≤ p_{0} ≤ 1. The largest differences are in the polarization fraction and the uncertainty on the polarization angle.
Fig. 7 Ratio of the uncertainty derived from the HPD region of the marginalized posterior and the Gaussian uncertainty given by Eq. (21) for the polarization fraction. Since the posterior is asymmetric, an effective is defined as half of the region containing 1σ Gaussianprobability. The unbound prior scenario corresponds to the unnormalized prior 0 ≤ p_{0} and p_{r} = 1 (here the result is independent of p_{r}). In all cases R = 0 is assumed. 

Open with DEXTER 
Fig. 8 Ratio of the uncertainty derived from the HPD region of the marginalized posterior and the Gaussian uncertainty given by Eq. (24) for the polarization angle. The unbound prior scenario corresponds to the unnormalized prior 0 ≤ p_{0} and p_{r} = 1 (here the result is independent of p_{r}). In all cases R = 0 is assumed. 

Open with DEXTER 
Examples of the difference between a Bayesian approach and a Gaussian approximation.
5 Uncertainties on the modulation factor
The uncertainty σ_{μ} on the modulation factor μ_{0} can be minimized for most polarimeters designed for measuring point sources by increasing the statistics during calibration tests. However, for polarimeters measuring GRBs it is often impossible to make σ_{μ} arbitrarily small. The problem is that μ varies depending on the location of the GRB, typically having the highest values for onaxis GRBs but significantly lower values for GRBs located at a large angular separation from the detector axis. Determining the location of the GRB by using a polarimeter involves large uncertainties since it is often not optimized for the task. The uncertainty on the location propagates to a nonnegligible σ_{μ}. Additionally, the primary spectrum of GRBs may not be reconstructed with sufficient precision by the polarimeter, introducing further uncertainties in the simulation required for deducing the μ_{0} for a particular GRB. If the GRB is simultaneously observed by a dedicated GRB monitor, the uncertainty on its location and spectrum will be smaller, but no GRB monitor has complete sky coverage or 100% dutycycle, so it is inevitable that some GRBs will only be seen by the polarimeter.
An example of a GRB polarimeter is POLAR (Produit et al. 2005). For a typical bright GRB, POLAR is expected to have a σ_{μ} ∕μ of between 5 and 15% assuming that the burst occurs onaxis (SuarezGarcia et al. 2010). The simplest way to account for this additional uncertainty is to propagate σ_{μ} through (25)
where M is the measured modulation. The total symmetric uncertainty is then given by (26)
To check for which parameters the symmetry is a good approximation, an additional prior can be added to the posterior Eq. (22). For simplicity, this prior is assumed to be Gaussian but it can vary depending on localization sensitivity. The nuisance parameter μ_{0} can then be marginalized over, yielding (27)
where the integral is taken between 0 and 1 because μ_{0} > 1 is unphysical. As an example, the posterior (marginalized over the polarization angle) for a signalonly measurement at MDP = 10% (CLT approximation is valid because S is large), p_{r} = 0.4 and μ =0.4 is shown in Fig. 9 for different σ_{μ}∕μ. Due to the high statistics, MDP∕p_{r} = 0.25, the distribution is symmetric for low σ_{μ}∕μ, however, the tail on the right grows rapidly as σ_{μ}∕μ increases. The behavior is governed by the reciprocal Gaussian distribution which the posterior approaches in the limit of high photon statistics (28)
shown as an approximation in Fig. 9. If not for the prior 0 ≤ p_{0} ≤ 1, the moments of this distribution would be undefined.
Figure 9 demonstrates that for σ_{μ}∕μ > 10% the shape of the posterior changes significantly and Eq. (27) is required. In the extreme case of σ_{μ} ∕μ = 20%, the halfwidth of the HPD region is 0.092, whereas Eq. (26) yields 0.086, implying not only that the uncertainty is asymmetric but also that it is significantly larger.
These results can be related to the instrument perfomance by studying the effect σ_{μ} ∕μ has on the MDP. A frequentist approach must be followed since MDP is a frequentist concept. MDP is derived by finding the 99% upper limit for the Rayleigh distribution – a special case of the Rice distribution where p_{0} = 0. However, since there is an uncertainty on μ, the Rayleigh distribution must be multiplied by the likelihood for μ (assumed tobe a Gaussian for simplicity) and integrated to yield (29)
Finally, f(p_{r}p_{0} = 0, μ_{0}) can be integrated to find the 99% upper limit so that . Figure ?? shows the relative increase in the MDP, defined as the ratio MDP_{σ}∕MDP where MDP is given by Eq. (12). Although the effect is small for low σ_{μ} ∕μ (1% at σ_{μ} ∕μ = 5%), it becomes significant at larger σ_{μ}∕μ (14% at σ_{μ}∕μ = 15%) and deteriorates the instrument performance for extreme values (80% at σ_{μ} ∕μ = 25%). The result is independent of the intial MDP.
Fig. 9 Polarization fraction posterior density for MDP∕p_{r} = 0.25, p_{r} = 0.4 and μ_{r} = 0.4 for different σ_{μ}∕μ. As σ_{μ} ∕μ increases, the distribution becomes wider and more asymmetric. The approximation is the reciprocal Gaussian distribution in Eq. (28). 

Open with DEXTER 
Fig. 10 Relative increase in MDP, defined as the ratio MDP_{σ}∕MDP where MDP_{σ} is the solution to as a function of the relative uncertainty on μ. The results are independent of the initial MDP. 

Open with DEXTER 
6 Conclusions
The results presented here provide a means of quantifying the errors incurred when using the simple estimators for polarization parameters as well as for their uncertainties. These errors are related to the wellestablished figure of merit MDP and the reconstructed polarization fraction, making the results easily applicable to any Xray polarimeter. Unlike someprevious works, this analysis does not ignore the correlation between the Stokes parameters Q and U.
Additionally, the extent to which the reconstructed fraction p_{r} and polarization angle ψ_{r} can be used as sufficient statistics is explored. In certain situations, such as for high polarization fraction and high signaltobackground ratio, Q and U are not Gaussian distributed and their likelihood is nonGaussian, resulting in a significantly different posterior for the polarization parameters when the number of photons is low. In such cases, the Stokes parameters cannot be used at all because (p_{r}, ψ_{r}) are not sufficient statistics.
The fact that the reconstructed polarization fraction is biased towards higher values implies that binning a highstatistics data set into smaller subsets to estimate the evolution of the polarization fraction will yield incorrect results when using the simple Gaussian estimator p_{r}. This work provides a way to decide how coarsely the data must be binned in order to justify the use of Gaussian estimators.
The uncertainties on the estimated parameters are as important as the parameter estimates themselves. When statistics are high (MDP∕p_{r} < 0.5) the errors are small and can usually be neglected (justifying Gaussian assumptions made when using simple estimators) unless the reconstructed polarization fraction is high, for example, > 0.8. However, in the statisticslimited regime (MDP∕p_{r} > 1) the systematic error made from using such simple estimators is nonnegligible compared to the statistical uncertainty. Additionally, it is shown how quickly the posterior becomes asymmetric as the statistical power decreases. This effect is strongly dependent on the reconstructed polarization fraction because of the prior 0 ≤ p_{0} ≤ 1.
Lastly, the effect of uncertainties on the modulation factor μ is studied. It is shown to be important once the relative uncertainty exceeds 10% and to dominate the performance of an instrument when it is above 20%. This is relevant for the optimization of future GRB polarimeters, since they tend to have large uncertainties on μ due to the difficulty of localizing bursts and measuring the primary GRB spectrum. It also shows that simple Gaussian estimators cannot be used in the highphotoncounts regime for GRB polarimeters when localization uncertainties are high.
Acknowledgements
This research was supported by the Swedish National Space Board. M. Kiss, M. Pearce, F. Xie and the Referee are thanked for providing constructive feedback on the manuscript.
Appendix A Stokes parameters
The Stokes parameters can be constructed from (A.1)
To find an expression for p_{0}, the means ⟨q⟩ and ⟨u⟩ must be computed. (A.2) (A.3)
where f(ψ) is given by Eq. (1). Similarly (A.4)
The polarization fraction p_{0} is then given by (A.5)
where Q_{0} and U_{0} are the normalized Stokes parameters defined as (A.6)
The sufficient statistics for data generated from (Q_{0}, U_{0}) are (Q_{r}, U_{r}) defined as (A.7)
where N = B + S. In polar coordinates this becomes (A.8)
Appendix B Uncertainties on Stokes parameters
The uncertainties on Q and U as well as theircovariance can be derived by computing their second moments. (B.1)
Combining Eqs. (A.2) and (B.1) yields the standard deviation (B.4)
Here σ_{q} describes the dispersion of q but it is σ_{Q} which is of interest. It is given by (B.5)
where the division by S^{2} comes from the definition of normalized Stokes parameters. (B.6)
It is finally possible to compute the covariance and the correlation coefficient ρ (B.8) (B.9)
Appendix C Relative mean bias
An approximate expression for the relative mean bias can be derived from the Rice distribution (C.1)
where . The mean ⟨p_{r}⟩ is given by (C.2)
where I_{0} and I_{1} are modified Bessel functions of the zeroth and first order, respectively. For highstatistics measurements (where p_{0} ≫ 2σ) the Bessel functions can be approximated by expanding them to second order. (C.3) (C.4)
So in the limit of high statistics, the relative mean bias β is given by (C.6)
By using recursion, β can be expressed as a function of p_{r} instead of p_{0} which is not known a priori. This simplifies to (C.7)
where x = MDP∕4.29p_{r} and C_{n} is the nth Catalan number.
Appendix D Gaussian uncertainties on polarization parameters
The uncertainty on p_{r} can be derived by using the standard uncertainty propagation formula (D.1)
References
 Beilicke, M. Kislat, F., Zajczyk, A., et al., 2014, JAI, 3, 1440008 [NASA ADS] [Google Scholar]
 Bellazzini, R., & Muleri, F., 2010, NIM A, 623, 766 [NASA ADS] [CrossRef] [Google Scholar]
 Chauvin, M., Roques, J. P., Clark, D. J., et al., 2013, ApJ, 769, 137 [NASA ADS] [CrossRef] [Google Scholar]
 Chauvin, M., Florén, H.G., Jackson, M., et al., 2016, MNRAS, 456, L86 [NASA ADS] [CrossRef] [Google Scholar]
 Chauvin, M., Florén, H.G., Friis, M., et al., 2017, Sci. Rep., 7, 7816 [NASA ADS] [CrossRef] [Google Scholar]
 Clarke, D., Stewart, B. G., Schwarz, H. E., et al., 1983, A&A, 126, 260 [NASA ADS] [Google Scholar]
 Dean, A. J., Clark, D. J., Stephen, J. B., et al., 2008, Science, 321, 1183 [NASA ADS] [CrossRef] [PubMed] [Google Scholar]
 Forot, M., Laurent, P., Grenier, I. A., et al., 2008, ApJ, 688, L29 [NASA ADS] [CrossRef] [Google Scholar]
 Jeffreys, H., 1939, Theory of Probability (Oxford: Oxford University Press) [Google Scholar]
 Kaaret, P., 2014, ArXiv eprint [arXiv:1408.5899] [Google Scholar]
 Kislat, F., Clark, B., Beilicke, M., et al., 2015, Astropart. Phys., 68, 45 [NASA ADS] [CrossRef] [Google Scholar]
 Krawczynski, H., Garson, A., Guo, Q., et al., 2011, Astropart. Phys., 34, 550–67 [NASA ADS] [CrossRef] [Google Scholar]
 Lei, F., Dean, A. J., Hills, G. L., 1997, Space Sci. Rev., 82, 309–88 [NASA ADS] [CrossRef] [Google Scholar]
 Lyutikov, M., Pariev, V. I., & Blandford, R. D., 2003, ApJ, 597, 998 [NASA ADS] [CrossRef] [Google Scholar]
 Maier, D., Tenzer, C., & Santangelo, A., 2014, PASP, 126, 459 [NASA ADS] [CrossRef] [Google Scholar]
 Moran, P., Kyne, G., Gouiffès C., et al., 2016, MNRAS, 456, 2974 [NASA ADS] [CrossRef] [Google Scholar]
 Novick, R., Weisskopf, M. C., Berthelsdorf, R., et al., 1972, ApJ, 174, L1 [NASA ADS] [CrossRef] [Google Scholar]
 Produit, N., Barao, F., Deluit, S., et al., 2005, NIM A, 550, 616 [NASA ADS] [CrossRef] [Google Scholar]
 Quinn, J. L., 2012, A&A, 538, A65 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Simmons, J. F. L., & Stewart, B. G., 1985, A&A, 142, 100 [NASA ADS] [Google Scholar]
 Słowikowska, A., Kanbac, G., Kramer, M., et al., 2009, MNRAS, 397, 103 [NASA ADS] [CrossRef] [Google Scholar]
 SuarezGarcia, E., Haas D., Hajdas W., et al., 2010, NIM A, 624, 624 [NASA ADS] [CrossRef] [Google Scholar]
 Vadawale S. V., Chattopadhyay, T., Mithun N. P. S., et al., 2018, Nat. Astron., 2, 50 [NASA ADS] [CrossRef] [Google Scholar]
 Vaillancourt, J. E., 2006, PASP, 118, 1340 [NASA ADS] [CrossRef] [Google Scholar]
 Weisskopf, M. C., Cohen, G. G., Kestenbaum, H. L., Long, K. S., et al., 1976, ApJ, 208, L125 [NASA ADS] [CrossRef] [Google Scholar]
 Weisskopf, M. C., Silver, E. H., Kestenbaum, H. L., et al., 1978, ApJ, 220, L117 [NASA ADS] [CrossRef] [Google Scholar]
 Weisskopf, M. C., Elsner R. F., & O’Dell S. L., 2010, Proc. SPIE, 7732, 77320E [CrossRef] [Google Scholar]
 Weisskopf, M. C, Ramsey, B., O’Dell, S., et al., 2016, Proc. SPIE, 9905, 990517 [CrossRef] [Google Scholar]
 Yonetoku, D., Murakami, T., Gunji, S., et al., 2011, ApJ, 743, L30 [NASA ADS] [CrossRef] [Google Scholar]
All Tables
Relative MAP bias β_{MAP} for MDP ∕p_{r} = 2, different signaltobackground ratios R, sufficient statistic p_{r} and modulation factor μ_{0}.
Examples of the difference between a Bayesian approach and a Gaussian approximation.
All Figures
Fig. 1 The comparison of the CLT approximation to the full likelihood, as given by Eqs. (7) and (11), respectively. For p_{0} = 0.5, both pure signal and mixed scenarios are well described by the CLT approximation. For p_{0} = 1.0, the pure signal scenario deviates farther from the approximation than the scenario with background because the pure signal scenario has fewer photons. All scenarios use μ_{0} = 0.5 and MDP = 2. 

Open with DEXTER  
In the text 
Fig. 2 Relative mean bias β as given by Eq. (15) (solid colored lines) and the approximation of Eq. (20) (black dashed line). Here p_{0}= μ_{0} = 1 has been used to show the maximum difference between the different signaltobackground ratios R. A loglog plot is shown in the inset. 

Open with DEXTER  
In the text 
Fig. 3 Ratio of the absolute bias to the statistical uncertainty . The β in the approximation (black dashed line) is given by Eq. (20). Here p_{0} = μ_{0} = 1 has been used to show the maxi mum difference between the different signaltobackground ratios R. 

Open with DEXTER  
In the text 
Fig. 4 Highest posterior density credibility regions for the posterior given by Eq. (22) for MDP ∕p_{r} = 1, R = 0 and μ_{0} = 1. The contours correspond to 1σ, 2σ and 3σ probability content. The posterior is truncated by the prior at p_{0} = 1. The unbound prior scenario (black line) corresponds to the unnormalized prior p_{0} ≥ 0. 

Open with DEXTER  
In the text 
Fig. 5 Intervals containing 90% of the posteriors for (p_{r} = 0.5, ψ_{r} = 22.5°) and the CLT approximation. As the number of signal photons increases, the posteriors converge towards the approximation. Here μ_{0} = μ_{r} = 0.5 and MDP∕p_{r} = 1. 

Open with DEXTER  
In the text 
Fig. 6 Skewness of the marginalized posterior for the polarization fraction. The unbound prior scenario (black line) uses p_{0} ≥ 0 and p_{r} = 1 (here the result is independent of p_{r}). A positive skewness implies an asymmetric distribution with a longer tail on the right, while a negative skewness results in a tail on the left. In all cases R = 0 is assumed. 

Open with DEXTER  
In the text 
Fig. 7 Ratio of the uncertainty derived from the HPD region of the marginalized posterior and the Gaussian uncertainty given by Eq. (21) for the polarization fraction. Since the posterior is asymmetric, an effective is defined as half of the region containing 1σ Gaussianprobability. The unbound prior scenario corresponds to the unnormalized prior 0 ≤ p_{0} and p_{r} = 1 (here the result is independent of p_{r}). In all cases R = 0 is assumed. 

Open with DEXTER  
In the text 
Fig. 8 Ratio of the uncertainty derived from the HPD region of the marginalized posterior and the Gaussian uncertainty given by Eq. (24) for the polarization angle. The unbound prior scenario corresponds to the unnormalized prior 0 ≤ p_{0} and p_{r} = 1 (here the result is independent of p_{r}). In all cases R = 0 is assumed. 

Open with DEXTER  
In the text 
Fig. 9 Polarization fraction posterior density for MDP∕p_{r} = 0.25, p_{r} = 0.4 and μ_{r} = 0.4 for different σ_{μ}∕μ. As σ_{μ} ∕μ increases, the distribution becomes wider and more asymmetric. The approximation is the reciprocal Gaussian distribution in Eq. (28). 

Open with DEXTER  
In the text 
Fig. 10 Relative increase in MDP, defined as the ratio MDP_{σ}∕MDP where MDP_{σ} is the solution to as a function of the relative uncertainty on μ. The results are independent of the initial MDP. 

Open with DEXTER  
In the text 