EDP Sciences
Free Access
Issue
A&A
Volume 615, July 2018
Article Number A54
Number of page(s) 10
Section Astronomical instrumentation
DOI https://doi.org/10.1051/0004-6361/201731971
Published online 11 July 2018

© ESO 2018

1 Introduction

The first observations of astrophysical X-ray polarization were made more than forty years ago (Novick et al. 1972; Weisskopf et al. 1976). The field has been reinvigorated in the past decade by a series of measurements by satellite, including INTEGRAL/SPI (Dean et al. 2008; Chauvin et al. 2013); INTEGRAL/IBIS (Forot et al. 2008; Moran et al. 2016), AstroSat/CZTI (Vadawale et al. 2018); and IKAROS/GAP (Yonetoku et al. 2011) as well as balloon-borne instruments PoGOLite (Chauvin et al. 2016) and PoGO+ (Chauvin et al. 2017). Results from several on-going missions are expected in the near future (POLAR, Produit et al. 2005; X-Calibur, Beilicke et al. 2014), and a dedicated satellite mission is in development (IXPE, Weisskopf et al. 2016). Although some instruments, for example, IXPE, will be able to make polarization measurements of astrophysical sources with high precision, where the statistical analysis becomes relatively trivial, it is always desirable to observe weaker sources or to sub-divide the data. Fine splitting of data may be necessary for understanding physical phenomena, for example, gamma-ray bursts (GRBs) and pulsars require temporal binning, nebulae require spatial binning, and spectral binning is interesting for all objects. It is therefore important to know when the number of photons is sufficient for making an accurate analysis using simple methods and when a more rigorous approach is necessary.

The parameters describing linear polarization are the polarization fraction and the polarization angle. In the frequentist approach, the major challenge in estimating the polarization fraction is that it is a positive definite quantity, meaning that a non-zero fraction is measured even for an unpolarized source, thus introducing a bias. Ways of correcting for this bias have been studied by Simmons & Stewart (1985) and more recently by Maier et al. (2014). A Bayesian approach was first introduced by Vaillancourt (2006) and extensively expanded upon by Quinn (2012), where the shapes of the resulting parameter distributions are described. These preceding works focus on optical measurements of polarization for which some formulae differ from the X-ray counterpart (Kislat et al. 2015).

This work quantifies, as a function of measurement sensitivity, the error incurred when using the conventional Stokes parameter analysis. This is shown for both the polarization fraction and angle as well as for their uncertainties. Henceforth the expression “low statistics” is used to indicate poor data quality having a low , where S is the number of signal photons and N is the total number of photons (signal and background). Since a point-source polarimeter working in the low-statistics regime is likely to have a low signal-to-background ratio R, whereas a GRB polarimeter is expected to have a higher R (especially if the GRB is bright and short in duration), the study is conducted for different values of R where meaningful. Additionally, the effects of uncertainty on the modulation factor μ0, defined as the polarization fraction measured for a 100% polarized beam, on the performance of the polarimeter are studied. The uncertainty is typically large for GRB polarimeters which are not optimized for localization of GRBs since μ0 varies with the photon incidence angle.

2 Parameter estimation

For a Compton scattering (Lei et al. 1997) or photo-electric polarimeters (Bellazzini & Muleri 2010) with a uniform response, the conditional distribution for a measured polarization angle ψi given a polarization fraction p0, a polarization angle ψ0 and a modulation factor μ0 follows (1)

where ψ = ϕ − 90° for the Compton process and ψ = ϕ for the photo-electric effect, ϕ is the measured scattering angle, S is the number of signal photons and B is the number of background photons. Treating signal and background as latent variables is beyond the scope of this paper so it is assumed that the background is unpolarized and is known with much higher precision than the polarization parameters. The modulation factor μ0 is a property of the polarimeter and is derived from calibration. In what follows, the notation with subscript “0” means a physical parameter generating the data, in this case a set of measured scattering angles transformed to the polarization frame {ψi }, while subscript “r” denotes reconstructed parameters from this set. The reconstructed parameters are actually sufficient statistics for the data set allowing it to be represented by 2 scalars rather than a set of N angles.

The two most common ways of computing pr and ψr are by performing a χ2-fit of Eq. (1) to a histogram of scattering angles or by computing the Stokes parameters. This work only considers the latter as it avoids complicating the analytical form of the likelihood due to binning effects.

2.1 Stokes parameters

Polarimeters operating in the radio, infra-red, or optical domain measure photon intensities rather than individual photons as opposed to X-ray polarimeters. Clarke et al. (1983) discuss how the Stokes parameter distributions vary depending on the instrumental technique used for measuring these intensities. Since Eq. (1) is not used in the low-energy domain, not all results presented in that publication can be extrapolated to the X-ray energy band. In particular, the standard deviations and the correlation coefficient of Stokes parameters are affected.

In the X-ray energy band, the Stokes parameters are derived from individual photon events comprising two quantities (2)

For a total number of photons N = S + B the normalized Stokes parameters are written as (3)

The normalization is only proportional to the signal because the background is assumed to be unpolarized. Therefore the background contributes only to the variance of (Qr, Ur).

The Central Limit Theorem (CLT) makes Qr and Ur Gaussian distributed as long as N is sufficiently large. This is referred to here as the CLT approximation. Thus Qr and Ur follow the bivariate Gaussian distribution (4)

Since the mean, the standard deviation, and the correlation coefficient are sufficient statistics for Gaussian distributed data, so are Qr and Ur together with the second moments of the data derived in Appendix B. Although not written explicitly in the conditioning, Q0, U0, σU, σQ and ρ are functions of p0 and ψ0.

2.2 Polar coordinates

Stokes parameters (Q, U) are in Cartesian coordinates but can be transformed to polar coordinates to allow the polarization parameters to be expressed in terms of (p, ψ). The sufficient statistics become (5) (6)

as per the derivations in Appendix A.

Now Eq. (4) can be transformed to polar coordinates by using the general coordinate transformation for a probability density function (p.d.f.) yielding the likelihood (7)

where det(J) is the determinant of the Jacobian given by (8)

Although the polarization parameters have a very similar form to that of the sufficient statistics (simply replacing the subscript “r ” by “0”) (9) (10)

pr does not correspond to the most probable estimate of p0. This occurs because (σQ, σU, ρ) depend on (p0, ψ0) and the fact that 0 ≤ p0 ≤ 1 (since a polarization greater than 100% is unphysical) but there is no upper limit on pr, that is, 0 ≤ pr. Therefore, using the sufficient statistic pr as the estimator of the polarization fraction p0 incurs an error.

The error is non-neglible for the one-dimensional likelihood of p0 (obtained after marginalizing over ψ0) if the statistical significance of the measurement is low. It results in and ⟨pr⟩≠p0, where ⟨pr ⟩ is the expected value of pr. Hence pr, is neither the maximum likelihood nor an unbiased estimator of p0. This is the case even if μr = μ0 which occurs when there is no uncertainty on μ as is assumed throughout this section.

As the statistical precision of a measurement improves and . Conversely, due to symmetry, there is no bias on ψ so ⟨ψr⟩ = ψ0 and .

thumbnail Fig. 1

The comparison of the CLT approximation to the full likelihood, as given by Eqs. (7) and (11), respectively. For p0 = 0.5, both pure signal and mixed scenarios are well described by the CLT approximation. For p0 = 1.0, the pure signal scenario deviates farther from the approximation than the scenario with background because the pure signal scenario has fewer photons. All scenarios use μ0 = 0.5 and MDP = 2.

Open with DEXTER

2.3 Central Limit Theorem approximation of the likelihood

The CLT approximation in B(Qr, Ur|Q0, U0) provides an always easily computable analytical form of the likelihood given by Eq. (7). However, if the polarization fraction is high and the number of photons is low, the likelihood is not well described by such an approximation. The full form of the likelihood is given by a product over Eq. (1). (11)

It has N factors and is therefore cumbersome to evaluate when the number of photons is large. This is unlike optical polarimetery where photon intensities of (Q, U) are directly measured instead of individual scattering angles.

Figure 1 shows the difference between the CLT approximation and L(pr |p0, ψ0, μ0) as calculated by generating adata set {ψi} from Eq. (11), computing pr using the Stokes parameters, repeating for many iterations and making a normalized histogram of pr.

In what follows, the concept of minimum detectable polarization (MDP) at 99% confidence level is important. The MDP Weisskopf et al. (2010) is given by (12)

Its statistical meaning is that, given an unpolarized source (p0 = 0), the probability of measuring pr > MDP is 1%. This quantity is a standard figure of merit for polarimeter performance and can easily be calculated for any measurement. The number of signal photons in Fig. 1 is chosen such that MDP = 2, since this is the highest (“worst”) MDP considered in later sections. As seen from Fig. 1, the CLT approximation becomes more accurate as the number of photons increases and the polarization fraction p0 decreases. In particular, only measurements with high signal-to-background ratio and high polarization fraction require the full likelihood given by Eq. (11). Although the error incurred using the CLT approximation is negligible in most cases, it becomes larger when using Eq. (7) to derive the posterior, asis discussed below. Figure 1 is for qualitative purposes only; the equivalent figure for ψr has been omitted since it leads to the same conclusions.

2.4 Magnitude and importance of bias

Bias is a frequentist concept which relies on fixing (p0, ψ0) and investigating ⟨pr⟩. This approach provides an intuitive understanding for how an unpolarized source can produce a polarized signal (pr > 0). Several previous measurements use the sufficient statistic pr as an estimate of the polarization fraction p0 (e.g., Weisskopf et al. 1978; Slowikowska et al. 2009) and it is therefore necessary to understand when the bias is negligible and when a more sophisticated approach is necessary.

In the frequentist approach, the p.d.f. B(Qr, Ur|Q0, U0) is not a function of ψ0 (it is a fixed parameter) but of ψr. It is assumed, without loss of generality due to the angular symmetry of the problem, that ψ0 = 0 so that ρ = 0. This can be thought of as rotating the angular coordinate system which does not have any special reference point. The definition of the “zero angle” of such a system has no influence on the polarization fraction. The p.d.f. now simplifies to the product of two normal distributions (13)

It can now be transformed to polar coordinates similarly to Eq. (7) yielding (14)

The relative mean bias β is now given by (15)

where Δψ = ψrψ0 = ψr.

To understand which parameters have a significant impact on β an approximate analytical expression can be derived by introducing (16)

It is now possible to write Eq. (14) as (17)

Integrating over Δψ results in (18)

which is the Rice distribution where I0 is the modified Bessel function of zeroth order. In the limit of high statistics, the relative mean bias is given by (19)

as shown in Appendix C.

Since p0 is not known a priori, Eq. (19) needs to be expressed as a function of pr by recursion. After some simplification the result is (20)

where x ≡MDP∕4.29pr. This shows that MDP∕pr is a good choice of independent variable. Equation (20) is shown in Fig. 2 where it is seen that β > 0 and increases monotonously, i.e., on average the reconstructed polarization pr will always be greater than the fixed polarization p0. Exact numerical integration of Eq. (15) is also provided for different signal-to-background ratios R yielding similar results to the approximation in Eq. (20). Here R = 0 is the limit of large N, low S and yet finite MDP. To avoid inaccurately computing β for small S (a problem under the CLT approximation) Eq. (11) is used for calculating L(pr|p0, ψ0, μ0) and ultimately its mean when S < 200.

To understand when the bias is significant with respect to the statistical uncertainty on pr, the bias fraction is shown in Fig. 3. Here is derived (see Appendix D) using standard error propagation yielding (21)

It becomes clear that this bias cannot be ignored in the low statistics regime. For a measured polarization below MDP, i.e., when MDP∕pr > 1, the bias is more than 18% of the statistical uncertainty. Hence, one should not use Eq. (5) for estimating p0 in this regime.

A conclusion of this analysis is that significant errors are incurred when dividing the data into several data sets as is done for example by Dean et al. (2008) in order to estimate the statistical uncertainty. The smaller the data set, the bigger the bias, and thus, on average, the result of an integrated measurement will be lower than the mean of its constituent data sets. This is also relevant when fitting models to polarization fraction sub-divided with respect to energy or time. Binning the data will result in a higher reconstructed polarization fraction, thus biasing the fit and therefore physical conclusions should not be drawn from pr, as is done for example in Vadawale et al. (2018) where the evolution of polarization parameters throughout the Crab pulsar pulse phase is investigated.

thumbnail Fig. 2

Relative mean bias β as given by Eq. (15) (solid colored lines) and the approximation of Eq. (20) (black dashed line). Here p0= μ0 = 1 has been used to show the maximum difference between the different signal-to-background ratios R. A log-log plot is shown in the inset.

Open with DEXTER
thumbnail Fig. 3

Ratio of the absolute bias to the statistical uncertainty . The β in the approximation (black dashed line) is given by Eq. (20). Here p0 = μ0 = 1 has been used to show the maxi- mum difference between the different signal-to-background ratios R.

Open with DEXTER

3 Maximum a posteriori estimate

The previous section showed the bias arising when using Eq. (5) for estimating p0 independently of the polarization angle. It is not obvious if such a bias exists when using (pr, ψr) for the joint point estimate of (p0, ψ0). However, in this case it does not make sense to find a ⟨pr, ψr⟩ for a fixed (p0, ψ0) so here the word “bias” is interpreted as the difference between the Maximum A Posteriori (MAP) estimate, pMAP, and pr. Such an analysis requires a Bayesian approach.

3.1 Central Limit Theorem approximation of the posterior

The posterior P(p0, Δψ0|pr) is derived by applying the Bayes theorem to the likelihood given by Eq. (7) (22)

where is the normalization factor, P(p0, ψ0) is the prior and Δ ψ0 = ψ0ψr. Since in a Bayesian approach the parameters (p0, ψ0) vary and the data (pr, ψr) are fixed, L(pr, 0|p0, Δψ0) is a function of ψ0. It follows that σQσU and ρ≠ 0. This differs from the typical assumption of σQ = σU and no correlation between Q and U as done in other works (Simmons & Stewart 1985; Vaillancourt 2006; Quinn 2012; Maier et al. 2014).

The Jeffreys prior has the desirable property of being invariant under re-parametrization (Jeffreys 1939). In this case it is a uniform prior in the Cartesian coordinates (Q0, U0). It will result in a preference of high polarization fractions after transforming to polar coordinates, as discussed in Quinn (2012). This is unphysical because it is more difficult to make a highly polarized photon beam in nature since any disorder in the emission region will lower the polarization fraction. A more realistic approach is to instead take a uniform prior 0 ≤ p0 ≤ 1 in polar coordinates. The prior for ψ0 is also uniform due to symmetry.

Figure 4 shows the posterior as given by Eq. (22) for a low-statistics measurement MDP∕pr = 1. The asymmetry is apparent as the posterior broadens for low values of p0. The effect of the prior is illustrated by including the unbound prior scenario where p0 ≥ 0.

Since the posterior is derived using a likelihood which utilizes the CLT approximation, the posterior may, for certain combinations of parameters, be inaccurate. To check the magnitude of the effect, pairs of (p0, ψ0) are sampled from the prior and then used to generate a data set. Data sets falling within a narrow window of a chosen pr and ψr (pr ± 0.002 and ψr ± 0.36°) are selected andthe posterior for each data set is found using Eq. (11) as the likelihood. Finally, the intervals containing 90% of the posteriors marginalized over the polarization angle (for clarity) are shown in Fig. 5 along with the CLT approximation. The situation is similar for the posterior of the polarization angle marginalized over the polarization fraction.

The most important feature of Fig. 5 is that the posterior is not the same for fixed (pr, ψr) when the number of signal photons is low. Hence (pr, ψr) are not always sufficient statistics and the full data set {ψi} is required for deriving the posterior. However, the posterior converges quickly towards the CLT approximation when R < 0.5 since any meaningful measurement with such a low R requires a large number of photons. An effect not shown in the figure is that the difference between the full posterior and the CLT approximation increases as pr and μr increase requiring more photons for good convergence. However, few X-ray polarimeters have μ0 higher than 0.5 (Krawczynski et al. 2011; Kaaret 2014) and synchrotron emission, a proposed mechanism for many astrophysical sources, is not expected to produce a polarization fraction above ~ 0.6 (Lyutikov et al. 2003).

In conclusion, the full likelihood in Eq. (11) rather than the CLT approximation in Eq. (7) is required for sources where the signal-to-background ratio is high but the total number of photons is low (e.g., short-duration GRBs). A guideline is to use the CLT approximation only for data sets where S >103. As mentioned before, polarimeters foreseen to be active in the near future (Astrosat/CZTI, POLAR, IXPE, X-Calibur) have μ0 < 0.5 so this guideline translates to the CLT approximation being generally valid for MDP < 0.27.

thumbnail Fig. 4

Highest posterior density credibility regions for the posterior given by Eq. (22) for MDP ∕pr = 1, R = 0 and μ0 = 1. The contours correspond to 1σ, 2σ and 3σ probability content. The posterior is truncated by the prior at p0 = 1. The unbound prior scenario (black line) corresponds to the unnormalized prior p0 ≥ 0.

Open with DEXTER
thumbnail Fig. 5

Intervals containing 90% of the posteriors for (pr = 0.5, ψr = 22.5°) and the CLT approximation. As the number of signal photons increases, the posteriors converge towards the approximation. Here μ0 = μr = 0.5 and MDP∕pr = 1.

Open with DEXTER
Table 1

Relative MAP bias βMAP for MDP ∕pr = 2, different signal-to-background ratios R, sufficient statistic pr and modulation factor μ0.

3.2 Relative MAP bias

The MAP estimate is at pMAP > pr and ψ0 = ψr. The inequality occurs because σQ and σU depend on p0 but only measurements with high R and are significantly affected. In all other cases and pMAPpr.

The relative MAP bias is defined as (23)

Table 1 shows βMAP for the extreme case MDP∕pr = 2 (a monotonously decreasing function) for different polarization parameters. Only scenarios with a sufficient number (S > 884 based on the results shown in Fig. 5) of signal photons are considered. The results show that βMAP increases with pr and μ0 but its value is negligible for all cases where the CLT approximation is valid. Therefore, when (pr, ψr) are sufficient statistics of the data, they are a good estimate for the MAP.

The broadening of the joint posterior at low values of p0, as shown in Fig. 4, results inpMAP being a poor estimate for p0 when marginalizing over ψ0. More generally, this occurs for any two-dimensional distribution when marginalizing over the nuisance parameter if the estimated parameter has an asymmetric distribution.

thumbnail Fig. 6

Skewness of the marginalized posterior for the polarization fraction. The unbound prior scenario (black line) uses p0 ≥ 0 and pr = 1 (here the result is independent of pr). A positive skewness implies an asymmetric distribution with a longer tail on the right, while a negative skewness results in a tail on the left. In all cases R = 0 is assumed.

Open with DEXTER

4 Uncertainties on polarization parameters

There are two reasons why it is not always appropriate to use naive Stokes estimates for uncertainties of polarization parameters: non-Gaussianity and the prior. Equation (21) gives the correct uncertainty on the polarization fraction when statistics are high and the reconstructed fraction pr is well below 1. If these conditions are not fulfilled, the marginalized posterior becomes highly asymmetric either due to low statistics or because a part of the likelihood is truncated by the prior at p0 = 1 and is thus poorly approximated by a Gaussian.

The asymmetry of the marginalized posterior for the polarization fraction is shown in terms of the skewness in Fig. 6. The unbound prior scenario employs the unnormalized uniform prior p0 ≥ 0 and pr = 1 (here all values of pr produce the same result). It shows that when statistics are low, the marginalized posterior gets a longer tail on the right (positive skewness). This is because the peak moves left, which is equivalent to measuring a higher pr for a fixed p0, that is, the same interpretation as for the relative mean bias shown in Fig. 2.

Although the marginalized posterior for the polarization angle is always symmetric, it deviates from Gaussianity when statistics are low, for example, approaching the uniform distribution since all angles are equally likely. It is also affected by the prior when pr is sufficiently high, resulting in a distribution with longer tails because, as shown in Fig. 4, the prior only truncates the part of the likelihood contributing to the central part of the marginalized distribution. Therefore, the uncertainty given by (24)

(derived in Appendix D) is not always valid.

In a Bayesian approach, the posterior is used instead of a parameter estimate and its uncertainty since the posterior contains the full information about the parameter. The posterior cannot easily be described in text so a simplified description is necessary such as its peak and the region of Highest Posterior Density (HPD) containing the probability content corresponding to one Gaussian standard deviation.

It is possible to quantify when the HPD region must be derived from the posterior and when Eqs. (21) and (24) are good approximations by considering the ratios and as functions of MDP∕pr, which are shown in Figs. 7 and 8, respectively. In both figures, the unbound prior scenario uses the unnormalized uniform prior p0 ≥ 0 and pr = 1. Only scenarios with R = 0 are considered, meaning that the CLT approximation is valid in the entire range of MDP∕pr for any pr. Increasing R does not change the results shown in this section but may invalidate the CLT approximation. Such cases cannot be represented because pr is not a sufficient statistic. However, since the posterior shape does not change drastically (as shown in Fig. 5), the following discussion is a good qualitative description of its behavior for any R.

For ease of comparison, is defined as half of the region containing 1σ Gaussian probability. Additionally, the Gaussian interval is limited to account for the prior, for example, p0 = 0.8 ± 0.3 is the interval [0.5, 1.0] which gives an uncertainty of , and therefore it does not contain 68.3% (1σ) probability content.

The unbound prior scenario in Fig. 7 shows that for MDP∕pr ~ 1 the marginalized posterior has longer tails than a Gaussian. Additionally, the prior truncates the posterior for high pr or for low statistics. This can result in either a higher or alower depending on where exactly the distribution is truncated. Unless pr is high, is a good approximation for the magnitude of the uncertainty at MDP∕pr ~ 1 but it is important to remember that the posterior is asymmetric for such low statistics.

For the polarization angle, Fig. 8 shows that . The situation is simpler than for p0 because the angle has a symmetric posterior. The unbound prior shows the deviation from a Gaussian manifesting in longer tails. Figure 4 shows that when adding the prior 0 ≤ p0 ≤ 1, the part of the likelihood which extends past p0 = 1 is truncated. This part contributes to the density at the center of the marginalized posterior. Removing it further extends the tails. It follows that Eq. (24) underestimates the uncertainty on the polarization angle by relative 10–20% at MDP∕pr ~ 1.

This analysis shows that in the limit of low statistics, MDP∕pr ~ 1, the uncertainty on the polarization angle is not well-described by Eq. (24) and a Bayesian treatment is necessary. For high pr, such a treatment is required even for high-statistics measurements, MDP∕pr ~ 0.5, because of the asymmetry in the uncertainty on the polarization fraction.

To illustrate the effects described above, two examples are presented in Table 2. The first is a recent measurement of the Crab nebula by PoGO+ (Chauvin et al. 2017) and the second is a hypothetical measurement of a high reconstructed polarization fraction highlighting the importance of the prior 0 ≤ p0 ≤ 1. The largest differences are in the polarization fraction and the uncertainty on the polarization angle.

thumbnail Fig. 7

Ratio of the uncertainty derived from the HPD region of the marginalized posterior and the Gaussian uncertainty given by Eq. (21) for the polarization fraction. Since the posterior is asymmetric, an effective is defined as half of the region containing 1σ Gaussianprobability. The unbound prior scenario corresponds to the unnormalized prior 0 ≤ p0 and pr = 1 (here the result is independent of pr). In all cases R = 0 is assumed.

Open with DEXTER
thumbnail Fig. 8

Ratio of the uncertainty derived from the HPD region of the marginalized posterior and the Gaussian uncertainty given by Eq. (24) for the polarization angle. The unbound prior scenario corresponds to the unnormalized prior 0 ≤ p0 and pr = 1 (here the result is independent of pr). In all cases R = 0 is assumed.

Open with DEXTER
Table 2

Examples of the difference between a Bayesian approach and a Gaussian approximation.

5 Uncertainties on the modulation factor

The uncertainty σμ on the modulation factor μ0 can be minimized for most polarimeters designed for measuring point sources by increasing the statistics during calibration tests. However, for polarimeters measuring GRBs it is often impossible to make σμ arbitrarily small. The problem is that μ varies depending on the location of the GRB, typically having the highest values for on-axis GRBs but significantly lower values for GRBs located at a large angular separation from the detector axis. Determining the location of the GRB by using a polarimeter involves large uncertainties since it is often not optimized for the task. The uncertainty on the location propagates to a non-negligible σμ. Additionally, the primary spectrum of GRBs may not be reconstructed with sufficient precision by the polarimeter, introducing further uncertainties in the simulation required for deducing the μ0 for a particular GRB. If the GRB is simultaneously observed by a dedicated GRB monitor, the uncertainty on its location and spectrum will be smaller, but no GRB monitor has complete sky coverage or 100% duty-cycle, so it is inevitable that some GRBs will only be seen by the polarimeter.

An example of a GRB polarimeter is POLAR (Produit et al. 2005). For a typical bright GRB, POLAR is expected to have a σμμ of between 5 and 15% assuming that the burst occurs on-axis (Suarez-Garcia et al. 2010). The simplest way to account for this additional uncertainty is to propagate σμ through (25)

where M is the measured modulation. The total symmetric uncertainty is then given by (26)

To check for which parameters the symmetry is a good approximation, an additional prior can be added to the posterior Eq. (22). For simplicity, this prior is assumed to be Gaussian but it can vary depending on localization sensitivity. The nuisance parameter μ0 can then be marginalized over, yielding (27)

where the integral is taken between 0 and 1 because μ0 > 1 is unphysical. As an example, the posterior (marginalized over the polarization angle) for a signal-only measurement at MDP = 10% (CLT approximation is valid because S is large), pr = 0.4 and μ =0.4 is shown in Fig. 9 for different σμμ. Due to the high statistics, MDP∕pr = 0.25, the distribution is symmetric for low σμμ, however, the tail on the right grows rapidly as σμμ increases. The behavior is governed by the reciprocal Gaussian distribution which the posterior approaches in the limit of high photon statistics (28)

shown as an approximation in Fig. 9. If not for the prior 0 ≤ p0 ≤ 1, the moments of this distribution would be undefined.

Figure 9 demonstrates that for σμμ > 10% the shape of the posterior changes significantly and Eq. (27) is required. In the extreme case of σμμ = 20%, the half-width of the HPD region is 0.092, whereas Eq. (26) yields 0.086, implying not only that the uncertainty is asymmetric but also that it is significantly larger.

These results can be related to the instrument perfomance by studying the effect σμμ has on the MDP. A frequentist approach must be followed since MDP is a frequentist concept. MDP is derived by finding the 99% upper limit for the Rayleigh distribution – a special case of the Rice distribution where p0 = 0. However, since there is an uncertainty on μ, the Rayleigh distribution must be multiplied by the likelihood for μ (assumed tobe a Gaussian for simplicity) and integrated to yield (29)

Finally, f(pr|p0 = 0, μ0) can be integrated to find the 99% upper limit so that . Figure ?? shows the relative increase in the MDP, defined as the ratio MDPσ∕MDP where MDP is given by Eq. (12). Although the effect is small for low σμμ (1% at σμμ = 5%), it becomes significant at larger σμμ (14% at σμμ = 15%) and deteriorates the instrument performance for extreme values (80% at σμμ = 25%). The result is independent of the intial MDP.

thumbnail Fig. 9

Polarization fraction posterior density for MDP∕pr = 0.25, pr = 0.4 and μr = 0.4 for different σμμ. As σμμ increases, the distribution becomes wider and more asymmetric. The approximation is the reciprocal Gaussian distribution in Eq. (28).

Open with DEXTER
thumbnail Fig. 10

Relative increase in MDP, defined as the ratio MDPσMDP where MDPσ is the solution to as a function of the relative uncertainty on μ. The results are independent of the initial MDP.

Open with DEXTER

6 Conclusions

The results presented here provide a means of quantifying the errors incurred when using the simple estimators for polarization parameters as well as for their uncertainties. These errors are related to the well-established figure of merit MDP and the reconstructed polarization fraction, making the results easily applicable to any X-ray polarimeter. Unlike someprevious works, this analysis does not ignore the correlation between the Stokes parameters Q and U.

Additionally, the extent to which the reconstructed fraction pr and polarization angle ψr can be used as sufficient statistics is explored. In certain situations, such as for high polarization fraction and high signal-to-background ratio, Q and U are not Gaussian distributed and their likelihood is non-Gaussian, resulting in a significantly different posterior for the polarization parameters when the number of photons is low. In such cases, the Stokes parameters cannot be used at all because (pr, ψr) are not sufficient statistics.

The fact that the reconstructed polarization fraction is biased towards higher values implies that binning a high-statistics data set into smaller subsets to estimate the evolution of the polarization fraction will yield incorrect results when using the simple Gaussian estimator pr. This work provides a way to decide how coarsely the data must be binned in order to justify the use of Gaussian estimators.

The uncertainties on the estimated parameters are as important as the parameter estimates themselves. When statistics are high (MDP∕pr < 0.5) the errors are small and can usually be neglected (justifying Gaussian assumptions made when using simple estimators) unless the reconstructed polarization fraction is high, for example, > 0.8. However, in the statistics-limited regime (MDP∕pr > 1) the systematic error made from using such simple estimators is non-negligible compared to the statistical uncertainty. Additionally, it is shown how quickly the posterior becomes asymmetric as the statistical power decreases. This effect is strongly dependent on the reconstructed polarization fraction because of the prior 0 ≤ p0 ≤ 1.

Lastly, the effect of uncertainties on the modulation factor μ is studied. It is shown to be important once the relative uncertainty exceeds 10% and to dominate the performance of an instrument when it is above 20%. This is relevant for the optimization of future GRB polarimeters, since they tend to have large uncertainties on μ due to the difficulty of localizing bursts and measuring the primary GRB spectrum. It also shows that simple Gaussian estimators cannot be used in the high-photon-counts regime for GRB polarimeters when localization uncertainties are high.

Acknowledgements

This research was supported by the Swedish National Space Board. M. Kiss, M. Pearce, F. Xie and the Referee are thanked for providing constructive feedback on the manuscript.

Appendix A Stokes parameters

The Stokes parameters can be constructed from (A.1)

To find an expression for p0, the means ⟨q⟩ and ⟨u⟩ must be computed. (A.2) (A.3)

where f(ψ) is given by Eq. (1). Similarly (A.4)

The polarization fraction p0 is then given by (A.5)

where Q0 and U0 are the normalized Stokes parameters defined as (A.6)

The sufficient statistics for data generated from (Q0, U0) are (Qr, Ur) defined as (A.7)

where N = B + S. In polar coordinates this becomes (A.8)

Appendix B Uncertainties on Stokes parameters

The uncertainties on Q and U as well as theircovariance can be derived by computing their second moments. (B.1)

Similarly (B.2)

and the cross-term is (B.3)

Combining Eqs. (A.2) and (B.1) yields the standard deviation (B.4)

Here σq describes the dispersion of q but it is σQ which is of interest. It is given by (B.5)

where the division by S2 comes from the definition of normalized Stokes parameters. (B.6)

Similarly (B.7)

It is finally possible to compute the covariance and the correlation coefficient ρ (B.8) (B.9)

Appendix C Relative mean bias

An approximate expression for the relative mean bias can be derived from the Rice distribution (C.1)

where . The mean ⟨pr⟩ is given by (C.2)

where I0 and I1 are modified Bessel functions of the zeroth and first order, respectively. For high-statistics measurements (where p0 ≫ 2σ) the Bessel functions can be approximated by expanding them to second order. (C.3) (C.4)

Finally (C.5)

So in the limit of high statistics, the relative mean bias β is given by (C.6)

By using recursion, β can be expressed as a function of pr instead of p0 which is not known a priori. This simplifies to (C.7)

where x = MDP∕4.29pr and Cn is the nth Catalan number.

Appendix D Gaussian uncertainties on polarization parameters

The uncertainty on pr can be derived by using the standard uncertainty propagation formula (D.1)

yielding (D.2)

and similarly for ψr (D.3)

References

All Tables

Table 1

Relative MAP bias βMAP for MDP ∕pr = 2, different signal-to-background ratios R, sufficient statistic pr and modulation factor μ0.

Table 2

Examples of the difference between a Bayesian approach and a Gaussian approximation.

All Figures

thumbnail Fig. 1

The comparison of the CLT approximation to the full likelihood, as given by Eqs. (7) and (11), respectively. For p0 = 0.5, both pure signal and mixed scenarios are well described by the CLT approximation. For p0 = 1.0, the pure signal scenario deviates farther from the approximation than the scenario with background because the pure signal scenario has fewer photons. All scenarios use μ0 = 0.5 and MDP = 2.

Open with DEXTER
In the text
thumbnail Fig. 2

Relative mean bias β as given by Eq. (15) (solid colored lines) and the approximation of Eq. (20) (black dashed line). Here p0= μ0 = 1 has been used to show the maximum difference between the different signal-to-background ratios R. A log-log plot is shown in the inset.

Open with DEXTER
In the text
thumbnail Fig. 3

Ratio of the absolute bias to the statistical uncertainty . The β in the approximation (black dashed line) is given by Eq. (20). Here p0 = μ0 = 1 has been used to show the maxi- mum difference between the different signal-to-background ratios R.

Open with DEXTER
In the text
thumbnail Fig. 4

Highest posterior density credibility regions for the posterior given by Eq. (22) for MDP ∕pr = 1, R = 0 and μ0 = 1. The contours correspond to 1σ, 2σ and 3σ probability content. The posterior is truncated by the prior at p0 = 1. The unbound prior scenario (black line) corresponds to the unnormalized prior p0 ≥ 0.

Open with DEXTER
In the text
thumbnail Fig. 5

Intervals containing 90% of the posteriors for (pr = 0.5, ψr = 22.5°) and the CLT approximation. As the number of signal photons increases, the posteriors converge towards the approximation. Here μ0 = μr = 0.5 and MDP∕pr = 1.

Open with DEXTER
In the text
thumbnail Fig. 6

Skewness of the marginalized posterior for the polarization fraction. The unbound prior scenario (black line) uses p0 ≥ 0 and pr = 1 (here the result is independent of pr). A positive skewness implies an asymmetric distribution with a longer tail on the right, while a negative skewness results in a tail on the left. In all cases R = 0 is assumed.

Open with DEXTER
In the text
thumbnail Fig. 7

Ratio of the uncertainty derived from the HPD region of the marginalized posterior and the Gaussian uncertainty given by Eq. (21) for the polarization fraction. Since the posterior is asymmetric, an effective is defined as half of the region containing 1σ Gaussianprobability. The unbound prior scenario corresponds to the unnormalized prior 0 ≤ p0 and pr = 1 (here the result is independent of pr). In all cases R = 0 is assumed.

Open with DEXTER
In the text
thumbnail Fig. 8

Ratio of the uncertainty derived from the HPD region of the marginalized posterior and the Gaussian uncertainty given by Eq. (24) for the polarization angle. The unbound prior scenario corresponds to the unnormalized prior 0 ≤ p0 and pr = 1 (here the result is independent of pr). In all cases R = 0 is assumed.

Open with DEXTER
In the text
thumbnail Fig. 9

Polarization fraction posterior density for MDP∕pr = 0.25, pr = 0.4 and μr = 0.4 for different σμμ. As σμμ increases, the distribution becomes wider and more asymmetric. The approximation is the reciprocal Gaussian distribution in Eq. (28).

Open with DEXTER
In the text
thumbnail Fig. 10

Relative increase in MDP, defined as the ratio MDPσMDP where MDPσ is the solution to as a function of the relative uncertainty on μ. The results are independent of the initial MDP.

Open with DEXTER
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.