EDP Sciences
Free access
Volume 571, November 2014
Article Number A89
Number of page(s) 8
Section Astronomical instrumentation
DOI http://dx.doi.org/10.1051/0004-6361/201423921
Published online 14 November 2014

© ESO, 2014

1. Introduction

Astronomical polarimetry and spectropolarimetry, which have historically been under-utilized, are now routinely being used in many experiments to search for and test new physics. Besides the diverse application of polarimetric techniques in solar and stellar astronomy, two areas of particular note are polarimetric observations of the cosmic microwave background (CMB) and spectropolarimetric observations of supernovae (SNe).

Detection of inflation signatures on the CMB using polarimetric imaging were recently announced by the BICEP2 team (Ade et al. 2014b,a). This result would confirm one of astrophysics most cherished but experimentally elusive theories. After a review of the public BICEP2 data, some have questioned the interpretation of the BICEP2 data and suggested that the foreground dust contamination had been underestimated or that the signal could have been produced by other mechanisms such as “radio loops” (Liu et al. 2014). The BICEP2 team have stood by their results although they have moderated their language in the peer-reviewed version of their paper (Ade et al. 2014b). If valid, the BICEP2 discovery continues the trend set by all-sky mapping satellites like COBE, WMAP, and Planck and various ground and balloon surveys of the CMB being perhaps the single most fruitful source to test cosmological theory and sometimes even fundamental physics. Further results related to CMB polarization are expected in the near future from BICEP3 (Kuo & BICEP3 and POLAR1 Collaborations 2013), the Keck Array (Staniszewski et al. 2012), Polarbear (The POLARBEAR Collaboration 2014), and the Planck team (Planck Collaboration XIX 2014). It is clear that the CMB will continue to be a vital source of cutting edge science; but, as the controversy around the BICEP2 announcement indicates, polarization work is tricky and extreme care must be made to handle properly statistical and systematic error in the data.

Other research groups have been using spectropolarimetric observations to investigate the systematic behavior and statistical variation of thermonuclear (Type Ia) and core-collapse SNe (all other types) arising from asymmetries. (The Wang & Wheeler 2008 Annual Review is a good introduction.) Spectropolarimetry is the only known way to probe asymmetries of unresolved sources such as extra-galactic supernovae so it will continue to be an effective research tool. The detection of asymmetries has helped place much needed constraints on SNe explosion physics/progenitor scenarios (recent examples include Maund et al. 2010b; Tanaka et al. 2012; Patat et al. 2012; Zelaya et al. 2013; Maund et al. 2013) and environments (Mauerhan et al. 2014). Asymmetries of Type Ia SNe are of particular interest as high-redshift events were famously used to discover cosmic acceleration (Riess et al. 1998; Perlmutter et al. 1999), a result for which leads of two large teams shared the 2011 Nobel Prize. This acceleration implies the existence of so-called “dark energy”, whose physical nature is completely puzzling and is perhaps the biggest unsolved problem in astrophysics. As such, any possible correlations between metrics of asymmetry in Type Ia SNe and other observables (Leonard et al. 2005; Wang et al. 2007; Maund et al. 2010a) are of high importance because correlations may be used to identify contaminating Type Ia sub-classes (Wang et al. 2013) or apply statistical corrections to the maximum brightness, both of which could help refine the measurement of the cosmological constant (Maeda et al. 2011). Spectropolarimetry of SNe is a blooming field but, just as with the CMB, handling the statistical and systematic error of polarimetric data for SNe is challenging.

As both polarimetric imaging and spectropolarimetry require dividing incoming light into bins, researchers in these fields often find themselves in photon-starved situations. Indeed, in both subjects discussed above, photon-limited data at low signal-to-noise must be confronted. (Low signal-to-noise is often the hallmark of forefront science by its very nature.) This problem is compounded in polarimetric imaging when a narrow-band filter is needed. Working at low signal-to-noise will continue to be an issue for polarimetrists.

Polarization data is often measured in Stokes parameters but expressed in human-digestible quantities like degrees of polarization and angles of polarization. Despite this, there has been relatively little attention given to the surprisingly complicated statistics involved when making transformations to these quantities at low signal-to-noise. Most of the published literature of this fundamental, practical topic has focused on the analysis of linear polarization alone that involves marginalizations over a two-dimensional gaussian distribution over the Q-U plane cross section of the Poincaré sphere (the “Poincaré disk” is a useful expression although it is already in use in other mathematical contexts). Marginalization over the linear polarization angle results in the Rice distribution (Rice 1945) while marginalization over the degree of linear polarization results in another distribution given by Vinokur (1965). The authors refer to this marginalized approach as the “one-dimensional problem”. A small number of papers such as Wang et al. (1997), Vaillancourt (2006), and Quinn (2012) have used a non-marginalized distribution over the Poincaré disk (the “two-dimensional problem”), with the latter being the first to put the subject on a rigorous mathematical footing. To date, there has been little to no attention paid to the three-dimensional problem defined by a three-dimensional distribution over the Poincaré sphere. Progress on higher dimensional generalizations of the two-dimensional problem have been made, for instance, by members of the Planck Collaboration (Planck Collaboration XIX 2014; Montier et al. 2014a,b) by including intensity in the analysis (see also Viola et al. 2014) but researchers have thus far generally ignored circular polarization when measuring linear polarization in astronomical sources (and vice versa); however, the measurement of one is not actually independent of the other. This effect is generally negligible except when one is working at very small signal-to-noise ratios or with objects with polarization near 100% (see Quinn 2012 for details on how large polarization introduces complication). The aim of this paper is to elucidate some of this interdependence and to see how it relates to the more common ways of treating linear polarization measurements at low signal-to-noise via the Rice distribution or through Bayesian techniques as outlined in Vaillancourt (2006) and Quinn (2012). The results should be of interest to all polarimetrists and those those studying the CMB or SNe in particular.

2. The sampling distribution

A very practical way of encoding the polarization state of electromagnetic radiation is the Stokes parameter formalism invented by George Stokes in 1852. Stokes parameters consist of four quantities: I (intensity), Q and U (related to linear polarization), and V (related to circular polarization)1. They are convenient quantities to measure but it is easier for people to think about polarization in terms of percentages and angles. This may be done by introducing a spherical coordinate system (p,θ,φ) on points of the Poincaré sphere (a unit ball in Cartesian coordinates centered on the origin of , , and axes), which represents all possible polarization states. Accessible modern introductions to Stokes parameters and their theory are given in del Toro Iniesta (2003) and Landi Degl’Innocenti (2002).

If an astronomical source’s polarization state is measured with gaussian error IQUV) on each Stokes parameter (I,Q,U,V), how do the corresponding spherical coordinates (p,θ,φ) relate to the “true” polarization state (p0,θ0,φ0) determined uniquely from (I0,Q0,U0,V0)? (Unsubscripted variables will usually refer to measured values while “0”-subscripted values will refer to “true” values. Later, an overbar on some variables and distributions will indicate signal-to-noise related quantities.) As it turns out, a spherical coordinate system introduces non-trivial biases into the measured quantities and the measurement itself must be “corrected” to remove them. This paper extends Quinn (2012), which studied these effects in polar coordinates before discussing the Bayesian approach, to spherical coordinates. The reader is referred to that paper for a more extensive introduction on Stokes parameters.

Under the assumptions that the source intensity is large compared to the measurement error, i.e., I0 ≫ ΣI which implies II0 (Quinn 2012)2, and the error on a Stokes measurement is described by a three-dimensional uncorrelated ellipsoidal Gaussian distribution (with semi-major axes ΣQ, ΣU, ΣV), the normalized distribution, FC, of the measured Stokes parameters (Q, U, V) given the “true” Stokes parameters (Q0, U0, V0) is (1)These parameters are not usually worked with directly. Instead “normalized Stokes parameters” are defined as qQ/I0, uU/I0, vV/I0 (with also q0Q0/I0, u0U0/I0, and v0V0/I0), and σq ≡ ΣQ/I0, σu ≡ ΣU/I0, and σv ≡ ΣV/I0, yielding a new normalized distribution (2)Lastly, one most often works with “signal-to-noise” versions of the Stokes parameters defined as , , and which gives another distribution (3)which is normalized, . This distribution no longer explicitly depends on the σ’s and naturally handles ellipsoidal distributions. The σ’s do, however, play a minor role: they limit the possible values of q0, u0, and v0 so it is important not to forget their significance (Quinn 2012).

thumbnail Fig. 1

Illustration of the spherical coordinate system used in the paper. The variable θ is used for the azimuthal angle with a range of (−π,π ], and the variable ϕ is measured from the positive v-axis with a range of [ 0 ].

Open with DEXTER

We wish to now transform our equations to spherical coordinates as the Poincaré sphere suggests. The transformation equations3 are (4)which has inverse (5)where θ is the angle in the q-u plane and ϕ is the angle from the positive v-axis (see Fig. 1). The “true values”, p0, θ0, and ϕ0, are similarly related to q0, u0, and v0.

It is crucial not to confuse the degree of polarization, p, with the degree of linear polarization in the following. The degree of linear polarization is a separate quantity equal to or psin(ϕ) and, although it is certainly important, is not discussed in this paper.

We now restrict ourselves to the case where all three standard deviations are equal (ΣQ = ΣU = ΣV ≡ Σ) so that σq = σu = σvσ. The distribution in spherical coordinates is then (6)The p2sinϕ in the numerator of the factor out front is due to the Jacobian of the transformation, i.e., dq du dv = p2sinϕ dp dθ dϕ4. It is rather promising that the analytic form of Eq. (6) is similar to and not much more complicated than the two-dimensional polar version. Already one can see that the functional form of the equation is not independent of the true value of the circular polarization.

The barred version is (7)where and . This is still normalized, .

3. Classical estimators

Now that (the sampling distribution in signal-to-noise variables) is at hand, two important classical estimators may be calculated. These are the so-called “Most Likely” and “Most Probable” estimators. (Note that the prime is not indicating a derivative in but serving as a warning that we are using the q-u plane angle, θ, instead of the sky angle for linear polarization.)

3.1. The “Most Likely” solution

The “Most Likely” (ML) solution is obtained by maximizing with respect to , θ0, and ϕ0. Physically, this corresponds to finding the value of which makes the measured value of the most statistically likely, hence the name. A general solution may be found by solving the system , , and for , θ0, and ϕ0, which are then treated as estimators. Technically, a second derivative test must also be done to check that the point is actually a maximum. For functions of more than two variables, this test requires checking that all the eigenvalues of the function’s Hessian matrix are positive. (All negative would be a minima, while mixed values would be inconclusive.) The equations involved in the test are very large and cumbersome. From graphical investigation or intuition, it is obvious, however, that the solution to be found is a maximum.

From the azimuthal equation, , it is quickly deduced that θ0 = θ everywhere except for values that correspond to an origin (where or equals zero) or a circular polarization axis (where ϕ0 or ϕ equals zero or π) of the true and measured Poincaré spheres. Further tedious but straight-forward algebra allows the full solution to be found, which is just


This solution involves no “bias corrections” at all. What you measure is your best estimate for the “true” polarization.

This solution is consistent with the polar coordinate case (that is, the two-dimensional case corresponding to ϕπ/ 2, ϕ0π/ 2, and σv → 0) presented in Quinn (2012). It is only when the Rice distribution (a one-dimensional distribution arising from marginalization) is used that a non-trivial solution is found for the ML method (Simmons & Stewart 1985). As marginalization intentionally removes information present in the original distribution, the logical basis for the use of non-trivial ML estimators found from a marginalized distribution to perform a “bias correction” is questionable. This is underscored by the fact that the p0, θ0, and ϕ0 are not stochastic variables but input parameters in and should not be treated as such as is done in the ML approach. The logically correct way to “invert” the distribution is to use Bayes’ Theorem.

3.2. The “Most Probable” solution

The “Most Probable” (MP) solution is found by maximizing with respect to , θ, and ϕ. This corresponds to finding the point that produces a distribution of measured points with a maximum at the actual measured point . A general solution may be found by solving the system , , and for , θ0, and ϕ0. (As before, a second derivative test is technically necessary to determine if the solution is actually a maximum as opposed to a minimum or some point of mixed inflection but this test is similarly impractical to perform because the equations end up being quite large. Graphically it can be shown that the solutions are maximums.)

Again, one quickly finds θ0 = θ from the azimuthal equation, . Using this condition, the ϕ and equations yield and , respectively. This two-equation system is somewhat difficult to solve and the solution is best accomplished with a computer algebra system. The full analytic solution is


which is only valid for points satisfying the condition (10)where csc(x) is the cosecant function. Thus there are two coupled “bias corrections” that must be made to the data: one for and one for ϕ. A graphical representation of the preceding solution is given in Fig. 2. The left-most extension of the region consisting of points satisfying Eq. (10) (shown in blue in the figure) approaches the point . In the special case of ϕ = π/ 2 and , the correction is just ϕ0,MP = π/ 2 and .

thumbnail Fig. 2

Visualization of the bias correction field of Eq. (9). The tail of each arrow is located at a measured point and the head is attached to the corresponding “Most Probable” estimate of . This correction should only be applied to points lying in the blue region given by Eq. (10). See the text for discussion about points outside the blue region.

Open with DEXTER

thumbnail Fig. 3

Behavior of the bias correction field (Eq. (9)) on the boundary of the blue valid region (Eq. (10)). The left-most point on the boundary of the blue region is at .

Open with DEXTER

The “bias correction field” in Fig. 2 shows that the correction applied to gets larger as gets smaller (the actual magnitude is ϕ-dependent). This is consistent with the known behavior in the two-dimensional case. Figure 2 also shows that ϕ itself must be corrected: it should be made larger when ϕ>π/ 2 and smaller when ϕ<π/ 2. The magnitude of this correction also tends to be larger as gets smaller. The exact magnitude of the ϕ-correction at a given depends on ϕ itself with the magnitude being largest in the “mid-latitudes” of the Poincaré sphere and zero on the equator. Generally, -points not on the equator are “attracted” towards the poles, an effect most prominent at low signal-to-noise (between, say, ). The bias correction vector field diminishes as gets larger but for ϕ values near the poles is seen to remain significant even for , a regime typically thought of as having signal-to-noise high enough not to have to worry about such effects.

For points infinitesimally close to the boundary defined by Eq. (10), the bias correction for ϕ estimates ϕ0 = 0 or ϕ0 = π, whichever pole is nearest. Figure 3 illustrates this behavior. The special case of ϕ = π/ 2 “corrects” to ϕ0 = π/ 2. More specifically, points on the boundary with π/ 2 <ϕ<π all bias correct to ϕ0 = π while those with 0 <ϕ<π/ 2 all bias correct to ϕ0 = 0. The correction on the boundary acts typically and corrects more and more towards the origin as gets smaller.

From Figs. 2 and 3, it can be seen that every possible with and sinϕ0 ≠ 0 is associated with a point within the blue region and vice versa. Unfortunately, a closed form solution for the inverse of Eq. (9) was not found and may not exist.

Intuitively5, the correction may be understood by considering a value of zero. The measured value of cannot be negative and, due to scatter from measurement error, you expect that will almost always be positive (that is, there is a negligibly small chance of zero) even though . Thus, the measurement process “biases” to larger values by some amount. Therefore, a correction should be made that subtracts a bit from the measured value to obtain a good estimate of . Even for a non-zero , this biasing occurs although with diminishing effect as gets larger. The same line of reasoning can be used to gain an intuitive understanding of the ϕ bias correction. Consider a source in a state of pure circular polarization. This corresponds to a pole of the Poincaré sphere. Let’s focus on the purely right-circularly polarized case (ϕ0 = 0). Measurements of this state will be clustered around the pole (including outside the Poincaré sphere) but ϕ is never negative and is almost always positive. Similarly for the left-circularly polarized case (ϕ0 = π), measurements of ϕ will be clustered near that pole but cannot have ϕ greater than π and almost always will have ϕ less than π. Just as with , a correction is needed to remove this effect from ϕ. It is important to remark that both the and ϕ bias corrections are purely coordinate effects. It may seem at first as if the ϕ-correction is inconsistent with the symmetry of the system (from ΣQ = ΣU = ΣV) but it is not. While conventionally the v-axis is always chosen to isolate circular polarization, the effect would arise regardless of the orientation of the v-axis within the Poincaré sphere, which preserves the overall symmetry (i.e., the symmetry is spontaneously broken by introduction of the coordinate system).

It must be stressed that the solution of Eq. (9) only applies to measured points satisfying Eq. (10) (illustrated by the blue region in the figures). Presumably, the area outside the blue region divides into three other regions: one characterized by , another by and ϕ0,MP = 0, and still another by and ϕ0,MP = π. In the latter two cases, there could still be a bias correction at work. It is unclear how to determine mathematically a “MP solution” in the non-blue region of Fig. 2. A formal solution may simply not exist there. The authors’ best guess as to the best practice is that measurements within the “triangle” defined by points (0,0), (0), and correct to (0) if ϕ>π/ 2, (0,0) if ϕ<π/ 2, and (0/ 2) if ϕ = π/ 26. The remaining points could correct to the same value as the blue region boundary point whose “correction arrow” passes through the point. Another possibility for these later points is to translate the correction arrow from the boundary of the blue region with the same ϕ to the point and use the intersection of that arrow with the -axis as the corrected value. Formal solutions in these regions where not, however, found.

Equation (9) does indeed provide an estimate of the point that would produce a distribution of measured points with maximum at . It is however just a point estimate. The actual distribution for a small values of tends to be broad, shallow, and nearly flat, especially in the ϕ dimension. In other words, the errors bars associated with each estimate will be large so it is cautioned that the point estimate should not be taken too literally. Using the distribution, one could construct confidence regions. These regions are however not uniquely determined, even in the one-dimensional case (Simmons & Stewart 1985). Construction of confidence regions is reserved for future work.

4. Marginalized distributions

Experimenters often work with only a partial description of a physical system. This may be necessary when independent variables of a distribution of interest related to the system are not measurable, or desired when there are so-called nuisance variables of little concern. In a polarization study, for instance, it may be that only the degree of linear polarization is the focus and not the angle of linear polarization. In such situations, one can marginalize over “unwanted” independent variables to find a new distribution which no longer depends on them but at the cost of losing some information present in the original distribution.

The marginalization possibilities for f are to marginalize over (I) p only; (II) θ only; (III) ϕ only; (IV) p and θ; (V) p and ϕ; and (VI) θ and ϕ. The first and last cases are perhaps the most interesting and shall now be investigated.

4.1. The p-marginalized angular distribution

The p-marginalized angular distribution is (11)or, in signal-to-noise variables, (12)This equation can be simplified. Start with and use the definite integral (13)where erfc(x) is the complementary error function defined as7(14)and erf(x) is the error function to reduce this integral. (15)is obtained where (16)which may be viewed as a higher-dimensional analog of the angular distribution presented in Vinokur (1965; see also Clarke & Stewart 1986; Naghizadeh-Khouei & Clarke 1993; Quinn 2012).

Examining a few density plots of Eq. (15) for various values of , θ0, and ϕ0 reveals that the probability distribution is attenuated near the poles of the coordinate system, an effect that becomes broader for small values of . In particular, one should notice that even for , the probability density still varies with ϕ. This is due to the pole bias effect discussed earlier.

One would like to go further and find the two one-dimensional marginalized angular distributions (cases IV and V) but the integrations seem unable to be completely performed using elementary or even common special functions.

4.2. The angular-marginalized p-distribution

One is often primarily concerned with the degree of polarization, p, and not the value of the angular variables. Marginalization over the angular variables will therefore produce a new distribution, to be called Q ( in signal-to-noise variables) of high practical importance. In the polar coordinate case, marginalization over the (lone) angular variable, θ, produces the Rice distribution. In the spherical coordinate case, the marginalization must occur over both θ and ϕ. Thus, the function may be considered a higher-dimensional analog of the Rice distribution. Let (17)or similarly (18)Let’s calculate. (19)The θ-integral can be solved using , where 0(x) is the modified Bessel function of the first kind. 0 is an even, real function when its argument is real. (Because the range of integration is over a full period of θ, the value of θ0 is immaterial because it appears as a phase shift and does not affect the integral.) Once the θ-integral is performed, the distribution no longer explicitly depends on θ0. We shall therefore drop θ0 from the conditions. Continuing with the calculation, (20)The ϕ-integral in the last expression is non-trivial but it appears it can be solved. Let (21)and (22)where . No computer algebra system tested could solve the integral in Eq. (22) and no applicable integrals were found in tables of integrals. Manual attempts to solve it also failed. From graphs of Ω versus ϕ0 created numerically8 for several values of a, it is noticed that the function may actually be constant in ϕ0 for a given a. Unfortunately, it is not a simple matter to show but if is graphed for 0 ≤ ϕ0π at several sampled values of a it seems to always show residuals about zero with scatter consistent with numerical noise. The previous numerical observations lead one to suspect very strongly that Ω(a,ϕ0) is independent of the value of ϕ0. This may be surprising in light of the bias corrections previously discussed; however, the same marginalized formula seems like it should be found regardless of the rotational orientation of the coordinate system within the Poincaré sphere. If it is true that Ω(a,ϕ0) is constant in ϕ0, any value of ϕ0 may be chosen and it does not alter Ω(a,ϕ0)’s value at a given a. In that case, a value of ϕ0 may be judiciously chosen that simplifies the integral. Good choices are ϕ0 = 0 or π for which (23)is found using the common integral , where sinh(x) is the hyperbolic sine of x. Indeed, simple arguments modifying the factors in the integrand of Eq. (22) can prove this to be a lower limit for Ω(a,ϕ0) at a given a (see Appendix A). Proof that it is also the upper limit was elusive (except at ϕ0 = 0, π/ 2, and π). A Taylor series expansion of Ω(a,ϕ0) in ϕ0 at ϕ0 = 0 was found to be consistent with Eq. (23) out to 26-th order using computer algebra systems9. Based on physical suspicion corroborated by numerical results, Eq. (23) shall be assumed to be true henceforth.

Returning to the main distribution, , and using Eq. (23), it is proposed that (24)Since the value of ϕ0 is irrelevant, ’s dependence on ϕ0 has been dropped from the notation. This is the higher-dimensional analog of the Rice distribution. Surprisingly, it is a simpler function than the Rice distribution in the sense that it depends on the hyperbolic sine function rather than a modified Bessel function. The distribution is normalized, .

thumbnail Fig. 4

Contour plot of the angular-marginalized -distribution given by Eq. (24). The contours are at 0.1 intervals with the lighter shades corresponding to larger values. The dotted and dashed curves (Eqs. (25) and (26)) trace maximums along horizontal and vertical slices and end at the red points at and , respectively. Although the “shape” of the distribution is not affected by the value of σ, the plot assumes that σ ≤ 1/4 so that the plotted values for up to 4 (=1 /σmax) exist.

Open with DEXTER

Some general properties of can now be investigated. A contour plot of the function is shown in Fig. 4 with contours at 0.1 intervals. The function achieves a global maximum as approaches of . Tracing the maximums of horizontal slices (the dotted line in the figure) yields the equation or (25)This implicit curve intersects the -axis at the global maximum at and is the “Most Probable” estimator curve for . Tracing the maximums of vertical slices (dashed line) yields or (26)This curve intersects at the -axis at and is the “Most Likely” estimator curve for .

The MP estimator curve of Eq. (25) (dotted curve in Fig. 4) can be used for all . Yet it was stressed that the full MP solution given Eqs. (9) and (10) is only valid for some values with . Marginalization has thus hidden some of the complexity of the problem. The ML estimator given by Eq. (26) (dashed curve) now has non-trivial behavior unlike the full solution in Eq. (8). It is not easy to anticipate the effect that marginalization will have on various estimators.

It is interesting to compare the long form of these estimator curves and the corresponding curves based on the Rice distribution (Wardle & Kronberg 1974; Simmons & Stewart 1985; Quinn 2012). These curves are based on the hyperbolic trigonometric functions whereas the curves in the two-dimensional case are based on Bessel functions. Their points of intersection with the -axis have also been shifted. The differences arise because the Jacobian factor is proportional to p2 in spherical coordinates but only p for polar coordinates.

5. Discussion

How do these new results relate to the old results derived using polar coordinates for the “Poincaré disk”? First, the old results are not special cases of the new results when ϕπ/ 2 and ϕ0π/ 2; for instance, in the special case of ϕ = π/ 2, the equation of Eq. (9) reduces to rather than the Wang estimator (Wang et al. 1997; Quinn 2012) and does not reduce to angular distribution given by Vinokur (1965). Similarly, is altogether distinct from the Rice distribution. The new results are therefore not direct extensions of the old functions but they do have somewhat similar forms. The three versus two dimensional nature of the parameter space is responsible for this.

Particularly interesting is a comparison of Fig. 4 with Fig. 2a of Quinn (2012). The intercepts of all the estimator curves have undergone shifts of the form . This again is due to the increased dimensionality of the problem. Thus, when circular polarization has equal measurement error to the linear Stokes axes’ measurement error, it takes a bigger measurement of the magnitude of the polarization vector to be significant than the two-dimensional case suggests. The two-dimensional results can perhaps be recovered by some limiting procedure where σv → 0 while σ (=σq = σu) is held constant. This suggests that decreasing the measurement error on one of σq, σu, or σv should decrease the -threshold for a significant measurement. On the other hand, often an instrument measures only linear or only circular polarization. This may correspond to a scenario where the distribution of Stokes parameters is best modeled by assuming a flat (instead of gaussian) distribution for the variables of the undetected type. This would violate our starting assumptions in Eq. (1) and would produce rather different results. As a flat distribution is in an informal sense a “wide” distribution, it is probable that this would raise the detection threshold higher. It should be clear that interpreting polarization data at low signal-to-noise in “human digestible” quantities like degrees of polarization or angle of polarization is difficult.

6. Conclusion

The three-dimensional Stokes sampling distribution in spherical coordinates is given. From it the “Most Likely” and “Most Probable” classical (that is, non-Bayesian) estimators for the “true” Stokes parameters given measured values have been derived. Additionally, a two useful marginalizations of the sampling distribution have been calculated including a higher dimensional analog of the Rice distribution. These results are necessary stepping stones towards a full and proper treatment of polarization measurements.

It is cautioned that a deep understanding of the measurement of Stokes parameters in spherical coordinates like here or the polar coordinates usually used for linear polarization requires a Bayesian approach. The presented results may nonetheless have tolerable accuracy for many experiments.

While the setup of the problem and the presentation of the results has been framed in terms of the study of polarization, the solutions are in fact far more general. They are applicable to the measurement of any “source point” contained in a three-dimensional ball measured with Gaussian error in each Cartesian direction and presented in spherical coordinates. The results could be used to study the location of neutrino production within the Sun or the determination of the epicenter of an earthquake. The results could also influence the interpretation of cosmic microwave background polarization data. Extensions of the work removing the constraint that the error be equal in each direction or that involve correlated variables would be highly desirable.


This notation is fairly common in optical astronomy but other notations are also in use in different disciplines like solar astronomy or optical physics.


This somewhat technical assumption does not (especially at low signal-to-noise) imply QQ0, UU0, or VV0. It is a weaker condition used implicitly (but without recognition) by many important past papers on polarization statistics even though it is crucial for defining the reduced Stokes parameters as, for instance, qQ/I0 rather than the problematic qQ/I. Even more fundamentally, without the condition, studying a distribution over I, Q, U, and V instead of just Q, U, and V would be thrust upon us. That II0 does not imply the other approximations is easy to understand intuitively with back-of-the-envelope calculations by comparing the relative (Poisson) error on the intensity to the relative expected (Poisson) error on the Stokes parameters for some low degree of polarization source and some assumed moderate number of total counts for the measured intensity (like, for instance, a 1% source and 1000 total counts). The cited paper details the proper role of the condition in the theory.


In this set of equations, the arctangent function, arctan(y,x), is the two-dimensional version often supported in computer languages. Its value is assumed to vary continuously from π to + π on the unit circle starting from the negative x-axis counterclockwise again to the negative x-axis. Be extremely careful when implementing this function in computer code. Some languages switch the arguments so that it is arctan(x,y). Make sure your implementation gives valid angles for all quadrants and axes, especially if you use the single argument arctangent.


The function f is a scalar density and therefore gains a factor due to the Jacobian under a coordinate transform.


This argument, while “intuitive” for understanding the bias correction from a classical statistical perspective, can easily be misapplied in a Bayesian scenario. Nevertheless, when properly used, it is still an important tool to understand Bayesian results. The key difference is that in the classical approach, it applies to a single “true” point producing the measured points, whereas in the Bayesian approach it applies individually to all “possibly true” points (weighted by their likelihood).


It may seem strange to worry about the ϕ angle’s best estimate if but using a “proper” solution may have nicer continuity properties and be better for computer programs than, for instance, simply assuming or defining ϕ0,MP = π/ 2.


Some authors define erfc(x) without the factor.


This is a sensitive result and may require use of a program or library that allows the user to specify a higher level of precision than the native floating point size.


The zeroth-order term of the expansion is 2sinh(a) /a. If Eq. (23) is true, then it is the only non-zero term. It was proven that all the odd-order coefficients in the expansion are identically zero. It appears that the positive even-order expansion coefficients will also always be identically zero but this remains unproven.


Appendix A: Bounds on Ω(a, ϕ0)

It helps to have functional bounds on (A.1)to show, for instance, that it is at least non-infinite over the ϕ0-domain of interest. There are three factors (sin(ϕ), eacos(ϕ)cos(ϕ0), and I0(asin(ϕ)sin(ϕ0))) in the integrand, each of which is simple enough that they allow various bounds to be obtained for the total expression.

Appendix A.1: Lower bound

Under variation of ϕ0, the eacos(ϕ)cos(ϕ0) factor is minimized when cos(ϕ0) is the smallest, which occurs for ϕ0 = π, so that the factor becomes eacos(ϕ). Furthermore, the I0(asin(ϕ)sin(ϕ0)) factor is the smallest when its argument is zero, which also occurs at ϕ0 = π (also ϕ0 = 0) so that the factor become 0(0) = 1. After these substitutions, the integral for Ω can be performed explicitly: (A.2)For any given value of a, this sets a (constant) global lower bound on the integral: the value of integral must be greater than or equal to for all values of ϕ0. It is probably also the greatest lower bound for each ϕ0.

Appendix A.2: Upper bounds

The eacos(ϕ)cos(ϕ0) factor is maximized when its exponent is the largest while the I0(asin(ϕ)sin(ϕ0)) factor is the largest when the magnitude of its argument is as big as possible.

A constant global upper bound is found by modifying the factors in the obvious way: (A.3)A stricter (but non-constant) global upper bound can be found merely by allowing sin(ϕ0) → 1 in the Bessel function factor, (A.4)This last solution has functional dependence on ϕ0 and is not valid for ϕ0 = π/ 2 because of the sec(ϕ0) component. For ϕ0 = 0 this bound is and it decreases smoothly and approaches as ϕ0π/ 2+. Similarly, for ϕ0 = π it is and it decreases smoothly and approaches as ϕ0π/ 2. Of course, if ϕ0 = π/ 2, Eq. (A.1) can be accomplished directly: (A.5)so the discontinuity in Eq. (A.4) at ϕ0 = π/ 2 can be “removed”. Equation (A.1) can also be performed for ϕ0 = 0 and ϕ0 = π and in both cases results, so we have explicit upper bounds at those values too.

Appendix A.3: Best bounds

Combining the previous information, the best bounds are (A.6)which are finite for all ϕ0 at a given value of a.

All Figures

thumbnail Fig. 1

Illustration of the spherical coordinate system used in the paper. The variable θ is used for the azimuthal angle with a range of (−π,π ], and the variable ϕ is measured from the positive v-axis with a range of [ 0 ].

Open with DEXTER
In the text
thumbnail Fig. 2

Visualization of the bias correction field of Eq. (9). The tail of each arrow is located at a measured point and the head is attached to the corresponding “Most Probable” estimate of . This correction should only be applied to points lying in the blue region given by Eq. (10). See the text for discussion about points outside the blue region.

Open with DEXTER
In the text
thumbnail Fig. 3

Behavior of the bias correction field (Eq. (9)) on the boundary of the blue valid region (Eq. (10)). The left-most point on the boundary of the blue region is at .

Open with DEXTER
In the text
thumbnail Fig. 4

Contour plot of the angular-marginalized -distribution given by Eq. (24). The contours are at 0.1 intervals with the lighter shades corresponding to larger values. The dotted and dashed curves (Eqs. (25) and (26)) trace maximums along horizontal and vertical slices and end at the red points at and , respectively. Although the “shape” of the distribution is not affected by the value of σ, the plot assumes that σ ≤ 1/4 so that the plotted values for up to 4 (=1 /σmax) exist.

Open with DEXTER
In the text