Issue 
A&A
Volume 538, February 2012



Article Number  A65  
Number of page(s)  15  
Section  Astronomical instrumentation  
DOI  https://doi.org/10.1051/00046361/201015785  
Published online  03 February 2012 
Bayesian analysis of polarization measurements
Pontificia Universidad Católica de Chile, Departamento de Astronomía y
Astrofísica,
Vicuña Mackenna 4860, Macul
6904411
Santiago
Chile
email: jquinn@astro.puc.cl
Received:
19
September
2010
Accepted:
25
October
2011
Context. A detailed and formal account of polarization measurements using Bayesian analysis is given based on the assumption of gaussian error for the Stokes parameters. This analysis is crucial for the measurement of the polarization degree and angle at low and high signaltonoise. The treatment serves as a framework for customized analysis of data based on a particular prior suited to the experiment.
Aims. The aim is to provide a rigorous and selfconsistent Bayesian treatment of polarization measurements and their statistical error focused on the case of a single measurement.
Methods. Bayes Theorem is used to derive a variety of posterior distributions for polarization measurements.
Results. A framework that may be used to construct accurate polarization point estimates and confidence intervals based on Bayesian ideas is given. The results may be customized for a prior and loss function chosen for a particular experiment.
Key words: polarization / methods: data analysis
© ESO, 2012
1. Introduction
Measuring polarization presents several challenges. The most important are that a naive calculation of small polarization seems to be biased towards higher values (Serkowski 1958), the error bars of the measured polarization are nonsymmetric below a threshold signaltonoise (Simmons & Stewart 1985), and the angular distribution is nongaussian (NaghizadehKhouei & Clarke 1993). Given measured values of the Stokes parameters, what are the best estimates for the true values of the polarization degree and angle and how should error bars be assigned?
The polarization state of quasimonochromatic light may be stated in terms of I_{0}, Q_{0}, U_{0}, and V_{0} Stokes parameters which are defined as certain averages of the electric field along a pair of orthogonal axes perpendicular to the direction of the light’s propagation (see, e.g., del Toro Iniesta (2003) or Landi Degl’Innocenti (2002) for more information). I_{0} is the intensity, Q_{0} and U_{0} are related to linear polarization, and V_{0} is circular polarization. The Stokes parameters are subject to two conditions: (1)and (2)If incoming light is totally polarized, then the equality holds in Eq. (1), otherwise it is partially polarized or unpolarized. To keep the equations of manageable size, the rest of this paper makes the assumption that circular polarization is negligible (i.e., V_{0} = 0). It should be straightforward to generalize the results to include a nonzero V_{0}. This restriction is not very severe because circular polarization is small compared to linear polarization for many astrophysical sources. It will also be taken that the Stokes parameters are constant during the course of a single observation.
Stokes parameters are calculated from intensities that follow a Poisson distribution governed by photon counts. Clarke et al. (1983) and Maronna et al. (1992) have investigated this to make optimal estimates of them from repeated measurements. In general, however, a full Poisson treatment is computationally difficult when large intensities are involved. To avoid concerns about Poisson statistics, it is assumed that all intensities used to calculate Q and U (and also I) are large enough to be treated as having gaussian distributions.
The specifics regarding the calculation of I, Q, U and their errors (, , ) depend on the experimental design (Keller 2002). For example, if one is using a dualbeam polarimeter with a halfwaveplate, Patat & Romaniello (2006) present an optimal set of equations for calculating the Stokes parameters from the intensities of the ordinary and extraordinary beams at different waveplate angles. Since the assumption that these intensities are large enough to be approximated by a Gaussian is being made, this implies that one may use the usual Gaussian error propagation formula for , , and based on the errors for the individual intensities (square root of the counts) that went into their calculation. Regardless how I, Q, and U are actually calculated, they may be treated as logically separate (e.g., one could imagine one instrument measuring intensity with error in conjunction with completely separate instruments measuring Q and U with errors and , respectively). Hereon, the error on the measured total intensity, , is treated as being independent of and .
Most of the polarization error literature makes the further assumption that the measured Stokes parameters are normally distributed about their true values, Q_{0} and U_{0}, with standard deviation and . It is also usually assumed that the Q and U distributions are characterized by equal dispersion. Unfortunately, is not generally satisfied for arbitrary datasets. This assumption is a pragmatic one to avoid ellipsoidal distributions which would complicate the analysis. Careful design of the experiment can make it such that . Extension of the results to allow for unequal variance is reserved for future work. It is now assumed that and are equal and called Σ. The quantity only tangentially enters the analysis, leaving Σ as the lone important error quantity. In practice, Σ could be set equal to the average of and or the maximum value if one is more conservative.
The units on I, Q, U and , , are energy per time. Shortly, new variables will be defined that are unitless. They will be best interpreted as percentages or ratios and the issue of units vanishes.
2. Theoretical background
In this section, the various distributions that will be needed are derived. The distributions will be presented in several different coordinate systems because this is intended to serve as a reference and because it is exceedingly easy to miss factors introduced by the Jacobian.
2.1. The sampling distribution
2.1.1. Variable introductions and large intensity limit
The fundamental assumption of this paper is that the measurement of the (unnormalized) Stokes parameters, Q and U, which are assumed to be uncorrelated, is described by a twodimensional gaussian with equal standard deviation, Σ, in both directions. This distribution, F_{C}, is (3)(Most of the work for this paper will be done in polar coordinates. A Csubscript will be used for distributions in Cartesian coordinates.) The value of Σ is a positive constant assumed to be known precisely and, in practice, is estimated from the data. The parameters, Q_{0} and U_{0}, must be elements of a disk of radius I_{0} centered on the origin in the Q_{0}U_{0} plane. The range of both Q and U is (−∞,∞) due to measurement error although the probability of measuring a value outside a disk of radius I_{0} diminishes rapidly even for values of Q_{0} and U_{0} near the rim. Equation (3) is normalized ().
It is often helpful to work in normalized Stokes parameters. Define new variables q ≡ Q/I_{0}, u ≡ U/I_{0}, q_{0} ≡ Q_{0}/I_{0}, u_{0} ≡ U_{0}/I_{0}, and σ ≡ Σ/I_{0}. The scaled error, σ, is an admixture between the intensity and the mean QU error. It or its inverse, 1/σ =I_{0}/Σ, may be regarded as a measure of data quality (cf. , , and ). Small σ (large 1/σ) implies good data. The values for q_{0} and u_{0} are restricted to a unit disk centered on the origin. The new probability density, f_{C}, after the change of coordinates is (4)This equation is also normalized ().
The normalized Stokes parameters q and u present a minor problem. How can they be calculated if I_{0} is an unknown quantity? It might be thought that the definition of q and u should use the measured intensity, I, instead of the true intensity, I_{0}. If that alternative definition is used, however, the differentials of q and u are more complicated than just dq = dQ/I_{0} and du = dU/I_{0}. This makes a change of variables for the distribution difficult. It is better to keep the first definition and require that I_{0} ≈ I. This occurs when the total number of counts that went into the measurement of I is large, which also implies Σ_{I} ≪ I.
Another helpful set of variables are “signaltonoise” ratios. They are defined by , , , and . In these barred variables, the new distribution, , is (5)This time, the Jacobian causes the 1/σ^{2} leftover after the change of variables to disappear. This is normalized as well (); but, in this case, and are restricted to a disk of radius 1/σ.
2.1.2. Transformation to polar coordinates
The true polarization degree, p_{0}, and true angle on the sky, , may be defined in terms of the true (i.e., perfectly known) q_{0} and u_{0} Stokes parameters by (6)The true angle on the sky is related to the true angle in the q_{0} − u_{0} plane by the relation . Similarly, (7)and θ ≡ 2φ. The range of p is [0,∞) and the range of p_{0} is [0,1] . Define and . It is important to notice that the range of is now [0,1/σ] or [0,I_{0}/Σ] because in the Bayesian analysis, this will be the range of integration. The polarization angle defines the plane of vibration of the electric field on the sky. By convention, it is usually defined such that 0° corresponds to North and increases eastward such that East is φ = 90°.
Equation (4) may be converted to polar coordinates by inverting Eqs. (6) and (7). Using θ instead of φ, this yields q = pcosθ and u = psinθ and corresponding equations for the zerosubscripted true values. The normalized probability density distribution in polar coordinates (p,θ) is (8)A prime is used as part of the function’s name to serve as a warning that the angle in the q − u plane, θ, is being used as the angular variable. The factor of p in front enters because dq du = p dp dθ (i.e., due to the Jacobian of the transformation). Some authors choose to leave the factor of p off of the p, θdistribution and insert it manually when needed. Probability distributions transform as a scalar density under coordinate transformations so they should “pickup” the Jacobian factor. Caution is needed when reading the literature.
The distribution in terms of the sky angle is (9)and in barred variables, (10)The distribution is normalized under the support [0,∞) × (−π/2,π/2] (i.e., ). The origin has infinitesimal measure so usually it does not contribute finitely to an integral. In later sections, integration over delta functions (aka delta distributions) centered at the origin will occur. This requires some formality in the polar coordinate definition. The parameter space technically has (p = 0,φ) identified for all φ as the origin and (p,φ) is identified with (p,φ = φ + nπ) for all p > 0 and any integer n. The resulting quotient space has as a fundamental cell. Functions on this space are periodic in φ and they must satisfy . Here, “mod” is the modulus operator. The π/2 operations perform shifts needed because the angles are defined as instead of [0,π). This formality is sometimes important in calculations, like median estimates of φ_{0}, where it is natural to have the range of integration extend outside of the domain.
Fig. 1 Contour plots of (Eq. (10)) for a few different values of p_{0}/σ with . The dots indicate the maximums of the distributions. Contour spacings are at intervals of 0.02. 
Some intuition is gained by examining the distribution in Eq. (10) for differing values of p_{0}/σ. Figures 1a–c show the distributions for and p_{0}/σ = 0.0, 0.5 and 2.0, respectively. Choosing a different value of , for a given value of p_{0}/σ, only translates the distribution in the φdirection and does not change its shape; therefore, all plots simply use . When p_{0}/σ = 0 the angular distribution is flat and much of the probability is concentrated in a band around p/σ = 1. As p_{0}/σ increases, an ovalshaped “probability bubble” forms. This shape persists even at large values. The probability under the distribution to the left and right of the maximum is asymmetric, with more to the left. As p_{0}/σ continues to increase, the probability on each side of the maximum approaches 0.5. This is true even though the oval shape stays present.
The sampling distribution for N measurements is simply a product of those for individual measurements when the measurements are independent of one another, (11)In this notation, (p,φ)_{n} denotes the nth measurement and { p,φ } _{N} is the whole set of measurements. This paper focuses on the case of a single measurement, such that ℱ = f.
2.2. The “most probable” and maximum likelihood estimators
Two classical analytic estimators for the unknown parameters are easily found from the sampling distribution. These are the maximum likelihood and “most probable” estimators. Later new estimators will be introduced using the posterior distribution.
If is viewed as a function of p_{0} and for a fixed p and φ, it is called the likelihood function. Solving the system and yields the maximum likelihood estimate (ML), (12)This solution is not particularly useful as it does not correct for bias. There is potential for confusion here: if the same construction is applied to the sampling distribution marginalized over the angle (i.e., Rice distribution), the maximum likelihood estimator does correct for some bias. The Rice distribution will be covered in Sect. 2.3.
Wang et al. (1997) suggest simultaneously maximizing Eq. (8) with respect to p and θ to estimate p_{0} and . This yields another estimator sometimes called the “most probable” estimator (MP) in the literature (or Wardle and Kronberg estimator when applied to the Rice distribution). The system and has the following solution for a maximum: (13)where h(x) is the Heaviside step function. The Heaviside function can be motivated by contemplation of Fig. 1a. Equation (13) may be used to find the maximums in Fig. 1. It is worth noting that this formula is also expressible as , which may be more practical in data reduction pipelines, and in terms of the barred variables.
The goal is to replace these estimators with new ones based on Bayesian ideas and Decision Theory. The ML estimate continues to be relevant in that context because it is equal to some Bayesian estimators when a uniform prior is used.
2.3. The Rice distribution
The Rice distribution (Rice 1945), R, is the marginal distribution of p over f, (14)This integration may be accomplished using the integral so that (15)Here, ℐ_{0}(x) denotes the zerothorder modified Bessel function of the first kind^{1}. It is an even real function when its argument is real and ℐ_{0}(0) = 1. The Rice distribution does not depend on or φ after the integration over φ so they have been dropped from the notation.
Fig. 2 Two contour plots of the Rice distribution (Eq. (16)). The contours are spaced by 0.1 intervals. The value of σ is mostly unimportant for . It does not affect the shape of the distribution. The only purpose of mentioning it is to insure exists for values above 1/σ. 
In barred variables, the Rice distribution is (16)This is shown as a contour plot in Fig. 2a. In this plot, is plotted against and . The contours are at 0.1 intervals. There are two special points in the plot. A global maximum of 1/ is reached at (1,0) (black dot) and a critical point at (gray dot) is found if maximums along vertical slices are examined. Recall and . The maximum along a vertical slice in the interval lies on the p/σaxis while for values greater than it lies above the axis. Tracing the maximum of the vertical slices for implicitly defines a curve which is given by (17)This is the ML estimator of Simmons & Stewart (1985) (with a factor of i corrected). ℐ_{1}(x) is the firstorder modified Bessel function of the first kind. It is an odd real function when its argument is real. Similarly, horizontal slices define a curve via (18)This is the Wardle and Kronberg estimator (Wardle & Kronberg 1974). These are the two dashed curves in the plot. The curve for the horizontal slice maximums is the one that terminates at the global maximum. The red dotted line is the Wang estimator from Eq. (13). There is a significant difference between the Rice distributionbased estimators and the twodimensional estimator. For large values of , it estimates a smaller value of lower than the two other curves.
Figure 2b plots the Rice distribution with the full range of when σ = 1/8. Except near the origin, much of the probability is distributed in a diagonal linear band that continues uninterrupted until the maximum value of . Later it will be seen that the posterior distribution becomes nonlinear for large values of polarization.
2.4. The posterior distribution
The distribution given by Eq. (9) is the distribution of the measured values given true values as input parameters. In practice, one usually wishes to estimate the true values from the measured values. This is accomplished through the posterior distribution, B, given by Bayes Theorem provided one accepts some prior distribution, , for the model parameters. The posterior distribution (for a single measurement) is, (19)Care must be taken with the limits of the p_{0} integration and it must always be remembered if one is using the unnormalized, normalized, or barred variables. The barred version is (20)with . It is critical to notice the upper limit of the integration of is 1/σ, which equals I_{0}/Σ. It is not infinity as is sometimes used. In practice, however, many of the integrals over that occur in Bayesian polarization equations, tend to be extremely insensitive to the value of 1/σ so long as it is above a value of about 4. Thus, using infinity as the upper limit often produces a very reasonable approximation. This has the added advantage that integrals sometimes have explicit closed forms when otherwise they might not.
There is great freedom in defining a Bayesian prior. The function is ostensibly required to be a probability distribution. This necessitates that κ(p_{0},φ_{0}) be nonnegative everywhere; however, in most situations, the additional requirement that the distribution be normalizable may be relaxed as the normalization constant cancels. The choice of prior will be further discussed in Sect. 3.
Equation (20) is shown in Figs. 3 and 4 for I_{0}/Σ = 100 and two different choices of prior. Figure 3 uses and Fig. 4 uses . These priors are discussed in more detail in Sect. 3. Each plot shows several different values of p/σ. As before, changing the value of φ merely shifts the distribution along the φ_{0} axis so φ = 0 has been used. In Figs. 2.5 and 4a, p/σ = 0 and there is no preference for any over another. In Figs. 2.5 and 4b, p/σ = 0.5. Here a probability bubble is beginning to form. In Figs. 2.5 and 4c, p/σ = 2.0 and the probability bubble is now fairly mature. The panels of Fig. 4 are similar. Notice that Fig. 3 is nearly identical to the distribution plotted in Fig. 1. There is a very weak dependence of on σ that becomes larger for larger values of p_{0}. If p_{0} were plotted near the maximum, the plots would differ radically. When 1/σ → ∞, the plots do become identical. The similarity (and differences) will be discussed later in conjunction with the onedimensional, marginalized version of this plot.
2.5. Simple Bayesian estimators
Bayesian estimators based on the mean, median, and mode can be defined for the posterior distribution.
The mode estimator (usually called the MAP or maximum a posteriori estimate) is the maximum of the posterior distribution. If any delta function is used in the prior, this causes delta functions to appear in the posterior distribution, which trumps any finite maximum of the posterior distribution and renders the mode estimate somewhat useless without further modification. If no delta functions and a uniform prior is used, then the mode estimate is the same as the ML estimate (Eq. (12)).
The median estimator, , is the value such that (21)This estimator will usually have to be found numerically. A median estimator is usually easy to find from symmetry considerations but for our parameter space must be defined as (22)Under special circumstances, may not be unique but after a nonzero measurement of p, it will generally be the case that it is.
The mean estimators are the values (23)and (24)These three estimators will be different in general.
Fig. 3 Contour plots of (Eq. (20)) for a few different values of p/σ with φ = 0 assuming a Jeffreys prior () and I_{0}/Σ = 100. The dots indicate the maximums of the distributions. Contour spacings are at intervals of 0.02. This figure is nearly the same as Fig. 1 with the roles of p and p_{0} reversed. While for small values of p/σ and p_{0}/σ the plots appear identical, they are not. Their similarity diverges for larger values of signaltonoise, as they must since p_{0}/σ has a maximum value while p/σ does not. 
Fig. 4 Contour plots of (Eq. (20)) for a few different values of p/σ with φ = 0 assuming a uniform polar prior () and I_{0}/Σ = 100. The dots indicate the maximums of the distributions. Contour spacings are at intervals of 0.02. 
2.6. Bayesian decision theory
The posterior distribution contains all information about the relative likelihood of the model parameters. It represents what has been learned from the observation. For many problems it is usually desired to summarize the posterior distribution by a few statistics such as an estimate of the “best” value and some confidence interval (often called “credible sets” in this context). In Bayesian Decision and Estimation Theory determining the “best” estimate of the model parameters requires defining a loss function and a decision rule (there are many good standard texts offering much more detail such as Berger 1985; and Robert 1994). The loss function, L, assigns a weight to deviations from the true value such that measured values that are far away from it incur a greater penalty than those that are closer. The decision rule assigns an estimate of the true value based upon the data, which will usually be the estimate that minimizes the expected posterior probable loss.
Deciding upon a loss function is one of the toughest parts of a Bayesian analysis. The three most common are squarederror loss, absolute deviation loss, and the “0–1” loss. These loss functions have as their solutions for each variable the mean, median, and mode values of the posterior, respectively (Robert 1994).
The squareddeviation loss function for the polarization problem may be defined as the squared distance between the true value (q_{0},u_{0}) and the best estimate (q_{a},u_{a}) under a Euclidean metric (scaled by σ^{2}), (25)Transforming Eq. (25) to polar coordinates gives, (26)The posterior expected loss, Z, under B is (27)Loss can be minimized by solving and and checking that the solution is a minimum via the second derivative test. These two equations are difficult to solve completely but it can be shown that if κ(p_{0},φ_{0}) = κ(p_{0}) then (28)and (29)are a solution and when p_{a},p_{0} > 0 so the solutions are a minimum.
The absolute loss function is (30)It will be assumed that the usual median estimators in polar coordinates are in fact a solution to this loss function.
The “0–1” loss function is (31)It will also be assumed that the mode is the solution of this loss function. When the prior contains a delta function, this estimator is not very useful.
2.7. Posterior odds
An analog of hypothesis testing in Bayesian analysis is the comparison of posterior odds. In the simplest case, suppose one wishes to test the hypothesis that the model parameters lie in some subset Θ_{1} of the parameter space. This is accomplished by integrating the posterior distribution over the subset. If the probability is greater than 50%, the hypothesis is accepted. If it is less than 50%, it is rejected. The simplicity of these tests is one of the benefits of the Bayesian approach. The subset need not consist of more than a single point but in that case the associated probability will usually be infinitesimal unless distributions are allowed for priors. In the next section, priors will be introduced that use a delta function at the origin to represent the probability that a source has negligible polarization. After the measurement, one may compute the posterior odds for the origin (a single point) to test if the measurement is consistent with zero. One of the benefits of is that it naturally performs this test for a delta function at the origin.
Two probabilities are of interest, the probability that p_{0} is zero and the probability that p_{0} is greater than zero. In mathematical form, (32)and (33)Since the distribution is normalized, only Eq. (32) need actually be calculated because Prob(p_{0} = 0) + Prob(p_{0} > 0) = 1. If Prob(p_{0} = 0) > 1/2, then it is more likely the source has negligible than appreciable polarization. If Prob(p_{0} = 0) < 1/2, then it is more likely the source has appreciable than negligible polarization.
It is now time to discuss choosing a prior so that these equations may be used.
3. Elicitation of priors
In Bayesian statistics, the choice of the prior distribution is nontrivial and can be very controversial because it is frequently subjective and based on previous experience for the problem at hand. The purpose of the prior density is to represent knowledge of the model parameters before an experiment. One common situation is where one has no prior knowledge of their values and wishes to construct a prior to represent this ignorance. Such priors are called “objective”. Constructing objective priors can be extremely subtle. The Bertrand Paradox illustrates that the method of producing or defining a “random” value can have important consequences for the resulting probability distributions (Bertrand 1888). In general, the solution to the Bertrand Paradox is that “real randomness” requires certain translation and scaling invariance properties (Jaynes 1973; Tissier 1984; Di Porto et al. 2010). In practice, such objective priors may contain too little information to satisfactorily model the phenomena being studied.
In general, the polarization state of an unresolved source at a particular wavelength is described by a point of the Poincaré sphere, which is the threedimensional set of the physically possible values for the q, u, and v Stokes parameters given an intensity. An assumption was made in the introduction that circular polarization is negligible. This is a physical assumption which influences the meaning of the polarization state of a “random” source. The resulting q − u distribution of random sources is not to be confused with the expected q − u distribution of a random source whose v component was ignored. For a uniform distribution of points within the Poincaré sphere, this latter distribution would be the density distribution obtained by projection onto the q − u plane, which would have a larger density near the origin. This distribution is important and interesting but its threedimensional motivation is somewhat outside twodimensional scope of this work. A brief discussion of it is given in Appendix B.
Investigation is still required into the meaning of “randomness” even under just the V = 0 pointofview. It is necessary to understand what role the choice of coordinates plays and how “uniform” does not imply “random”.
3.1. Uniform prior in Cartesian coordinates (the Jeffreys prior)
The prior most likely to coincide with a person’s intuitive notion of a “random point” in the unit disk is a uniform distribution in q_{0}u_{0} coordinates is (34)In polar coordinates, this is equivalent to (35)The barred versions are always . Both distributions are normalized. Random points generated using this distribution have an expected density that is constant across the unit disk (see Fig. 5a). One aspect of this prior is that a randomly generated point will generally be more likely to have a strong polarization than a weak one because an annulus associated with a large radius has more area than for a smaller radius.
It can be shown that the prior given by Eq. (34) is the normalized Jeffreys prior for the polarization problem. The Jeffreys prior is an objective prior constructed from the square root of the determinate of the Fisher information matrix (Jeffreys 1939, 1946; Robert et al. 2009a,b). One of the features of the Jeffreys prior is that it is invariant under a reparametrization. As an objective prior, the Jeffreys prior is in some sense the prior desired when absolutely no previous knowledge of the model parameters is known.
While the objectivity of this prior is admirable, in practice, most astronomical sources are not expected to have very large polarizations (masers can be a notable exception); so, in many cases, it is undesirable that the polarization be more likely to be large than small. In this respect, this prior may not be a good choice for some astrophysical observations despite its objectivity: it simply contains too little information to be suitable and overestimates our lack of knowledge and previous experience.
Fig. 5 A comparison of “random” points generated with the Jeffreys prior and the uniform polar prior. 
3.2. Uniform prior in polar coordinates
The uniform prior in polar coordinates captures some features that are usually desired for astronomical sources than the previous case. Its (normalized) formula is (36)While this prior may be “uniform” in p_{0} − φ_{0} space (see Fig. 5b), it is definitely not uniform in q_{0} − u_{0} space. In fact, in that space, the density of points produced randomly by this prior increases near the origin, as is shown in Fig. 5c. This is a subjective prior in the sense that it prefers points closer to the origin.
3.3. The addition of a delta term at the origin
One subtle feature of the previous priors is that they do not allow for a finite probability that an object’s polarization is exactly zero. Instead there is only a finite chance that the polarization will be in some interval that includes zero. A Dirac delta function at the origin may be used to allow for the possibility that an object truly has zero polarization. “Exactly zero” polarization may be interpreted as equivalent to the statement that there is a large probability that the polarization is negligible. In this case, the delta function then becomes just a mathematical tool to replace a probability density that rises extremely sharply at the origin in such a way to bound an appreciable probability.
When one is investigating if some class of objects are polarized, it seems reasonable to assign a 50% chance to the probability that the object is unpolarized and a 50% chance to it being polarized. More generally, the percentage can be parametrized, such that the probability of zero polarization is A and greater than zero polarization is (1 − A). Let the prior consists of two components: a normalized delta function term at the origin, , and a normalized nondelta function term, . The general form for the prior can now be written as (37)or (38)The delta term must be (39)respectively. The two versions in Eq. (39) have the same form because the delta function transforms in a way that cancels the σ’s that typically will appear in the barred version. Notice too that the p_{0}delta function occurs at the lower limit of the p_{0} integration. This is a somewhat quirky situation but by remembering that delta functions are constructed as a limit of a sequences of functions it is easily seen that for α > 0. Taking into account the factors of onehalf that arise in this way, it can be shown that Eqs. (38) and (39) are both normalized properly^{2}. A delta function could also be placed at any other point in the disk of physically possible values but the origin is unique as it is the only case that preserves rotational symmetry.
The previous priors with the addition of a delta term become (40)and (41)These priors simplify to the more typical cases when A = 0. The delta term however is an interesting addition. It is only with the addition of the delta term and the use of the median estimator that one is allowed to state the source is consistent with no polarization from the posterior distribution. It seems most reasonable to assign A the value 0.5 when it is unknown if the source is polarized. Another interesting choice is . This value causes the Prob(p_{0} = 0) for the prior to be equal to the probability density of the nondelta function component. There is a temptation to say that it recaptures some translation invariance properties that are otherwise lost upon the introduction of the delta term. It is unclear to the author if this has any mathematical significance or indeed how to justify a particular choice of A in general. Using A = 0 is the most conservative choice and is probably the most appropriate in many experiments unless a nonzero A is explicitly required. A nonzero value of A, as will be seen, can cause the median estimator of to be zero below some critical value of . One way of interpreting this is that the median estimator determines what value of polarization must be measured to convince a person who is skeptical of polarization with degree A that a source is actually polarized. Ultimately, it is up to the researcher to decide the best prior for the experiment.
Under Eq. (38), the posterior distribution reduces to (42)Figures 3 and 4 were special cases of this formula.
3.4. Other constructions
Bayesian statisticians have invented a number of other techniques for constructing priors such as using conjugate families and building hierarchical models. These are not covered.
4. Marginal distributions of polarization magnitude
The marginal posterior distribution, S, of p_{0} over B is given by (43)The denominator of B is a function of p and φ only and may be pulled through the integral. The numerator becomes . No further simplifications can be made unless a prior is chosen. While not necessary, it will frequently be the case that priors for a first measurement are independent of angle, that is, κ(p_{0},φ_{0}) = κ(p_{0}). If so, then it is easy to show that (44)(An integration of the sampling function under consideration, , arrives at the same result, Eq. (15), regardless if the integration is over φ_{0} or φ due to the symmetry in those variables). The resulting new posterior, which may be treated as a prior for the next measurement, will not, in general, be independent of angle. The condition therefore does not extend to multiple measurements. Those are not the focus of this investigation and none of the κpriors examined so far vary with angle. Equation (44) turns out to be exactly the same form as the posterior Rice distribution, which is now examined.
4.1. Posterior Rice distributions
Vaillancourt (2006) uses a Bayesian approach to argue for the use of the posterior Rice distribution to calculate the maximum likelihood and error bars on measured polarization. This special case was the motivation for the more general theory of this paper.
Introduce new onedimensional prior τ(p_{0}). The posterior Rice distribution, ρ, is (45)The conversion to barred priors is given by . The barred version of the formula is (46)The objective prior in one dimension is τ(p_{0}) = 1 so technically the onedimensional Jeffreys prior and the uniform polar prior are the same. As can be seen from Eqs. (44) and (45), when κ(p_{0}) is independent of angle, S and ρ have the same form with κ(p_{0}) playing the role of τ(p_{0}). The “onedimensional” version of the twodimensional Jeffreys prior is τ(p_{0}) = 2p_{0} and the onedimensional version of the twodimensional uniform polar prior is τ(p_{0}) = 1. There is only an immaterial factor of 1/π difference between κ(p_{0}) and τ(p_{0}). In a slight abuse of terminology, τ(p_{0}) = 2p_{0} will still be called the “Jeffreys prior” despite switching to the onedimensional notation. The twodimensional theory is still within context because of Eq. (44). In barred variables, the priors are and , respectively.
4.1.1. Mean estimator for the posterior Rice distribution
The mean estimator, , is given by (47)Its value is always greater than zero for any given and will usually have to be found numerically.
4.1.2. Median estimator for the posterior Rice distribution
The median estimator, , for the posterior Rice distribution is the value of the upper integrand such that (48)This expression will also have to be solved numerically in most cases.
4.2. Selected plots of the posterior Rice distribution
Figure 6 shows two versions of the posterior Rice distribution using the Jeffreys prior, . Figure 6a shows the distribution near the origin and assuming 1/σ = 100. Figure 6b shows the full range of assuming 1/σ = 8. This figure exhibits several key results. For the Jeffreys prior, the posterior Rice distribution is similar to just the transpose of the Rice distribution. This similarity breaks down at values of near or exceeding the maximum value of . The mean (black) and median (blue) estimator curves, are very similar. They are also similar to the (transposed) onedimensional Rice distribution estimator curves but that similarity also breaks down at near the maximum value of .
Fig. 6 Two contour plots of the posterior Rice distribution (Eq. (46)) with and two values of σ. The contours are spaced by 0.1 intervals. 
Fig. 7 Two contour plots of the posterior Rice distribution (Eq. (46)) with and two values of σ. The contours are spaced by 0.1 intervals. 
Fig. 8 These four panels show the posterior distribution with prior for four combinations of value of A and σ. The blue dotted line is the median estimator. The black dashed line is the mean estimator. The red, dashed line at is a reminder of the finite amount of probability that exists along the axis due to the delta function term. No such line is needed in Figs. 6a and 6b because A = 0 there and the delta term does not contribute. This probability along the axis is plotted underneath each contour plot. 
Fig. 9 These four panels show the posterior distribution with prior for four combinations of value of A and σ. The blue dotted line is the median estimator. The black dashed line is the mean estimator. The red, dashed line at is a reminder of the finite amount of probability that exists along the axis due to the delta function term. No such line is needed in Figs. 7a and 7b because A = 0 there and the delta term does not contribute. This probability along the axis is plotted underneath each contour plot. 
Figure 7 uses and two different values of σ. Figure 7a is nearly the same plot as presented in Vaillancourt (2006), which used τ(p_{0}) = 1 and the unstated assumption that I_{0}/Σ → ∞, which allows direct integration of the denominator of Eq. (46) at the expense of having an unnormalizable prior. While this approximation will be acceptable most of the time when the signaltonoise is large, it is unphysical. The true value of cannot exceed 1/σ (=I_{0}/Σ), which is finite. This is best seen by plotting the full range of as is done in Fig. 7b for I_{0}/Σ = 8 (a small value). There it is seen that for values of that are near or greater than the maximum possible value of (8 in this case), the probability density function “crowds up” near the maximum value. This time the mean (black) and median (blue) curves are rather different. The median curve prefers value closer to . The median estimator seems to be a better statistic in this case as the mean estimator curve is affected too much by outlier possibility.
A delta term may also be added to the prior as was previously motivated. Let τ(p_{0}) = Aτ_{1}(p_{0}) + (1 − A)τ_{2}(p_{0}), where τ_{1}(p_{0}) is normalized a delta term at the origin and τ_{2}(p_{0}) is a normalized component without any delta terms. The normalized barred prior is . This time τ_{1}(p_{0}) = 2δ(p_{0}) and . Under this prior, Eq. (46) reduces to (49)Here, the function arguments have been dropped for space. For the twocomponent prior, the definition of the median estimator (Eq. (48)) leads to the following expression (50)Figures 8 and 9 show Eq. (49) for the Jeffreys and uniform priors, respectively, and four different combination of values of A and σ each. The red, dashed lines at are a reminder that Eq. (49) contains a delta term that cannot be plotted in the usual fashion. Finite probability is attached to each point along the red line and is graphed beneath each contour plot. This is critical to fully understand these plots. Several new features are seen for nonzero A. In Fig. 8 for the Jeffreys prior, the probability band seems to bend towards the axis (Fig. 8a) but in actuality abruptly turns and heads towards the axis (Fig. 8b). This behavior is however strongly damped by residual probability that the polarization is actually zero (as in Figs. 8c and d). Figure 9 for the uniform polar prior is similar except this time the probability band, after heading towards the axis as well, turns towards the origin if the behavior is not damped by the probability that the polarization is zero (as in Figs. 9c and d).
The mean and median estimator curves in Figs. 8 and 9 have changed with the introduction of a nonzero A. The mean curve is “pulled” more towards the axis at small values of for larger values of A or smaller values of σ. The median curve has a totally new feature: it intercepts the axis at a critical value, , and is zero for smaller values. This occurs when the righthand side of Eq. (50) equals zero. Tables of some critical values, including those corresponding to the figures, are presented in Appendix A.
When A = 0 and σ = 0.01, a contour plot of Eq. (49) for the Jeffrey prior reproduces Fig. 6a. This plot resembles the Rice distribution in Fig. 2a, only transposed! Figure 8b is a good example of the transitional form that occurs in the contour shape between the standard A = 0 form and the form seen for larger values of A and/or smaller values of σ. This is an important result. It means that the Rice distribution results can effectively be transposed to find best estimates of for small values of polarization when 1/σ → ∞.
When A = 0 and σ = 0.01, a contour plot of Eq. (49) for the uniform prior reproduces Fig. 7a. There is no delta function and the global maximum of this distribution occurs at the origin. The wrenchlike shape characteristic of this plot arises from small values of A for a given value of σ. Figure 9b is a good example of the transitional form.
All the usual mathematical machinery may now be applied to to produce estimators for given a measured value of .
Previously “bias corrections” were performed on measurements at low signaltonoise to make them closer to zero. As is seen in the figures with A = 0, the mean and median estimators seem to act for very small values of p like a “bias correction” that goes the wrong way! (The median estimator curves with paxis intercepts are an exception.) Bias corrections should not be used. A nonzero value for the estimate of p_{0} even when p = 0 is not a deficiency with the estimators. It is a result that can be understood intuitively when all possible values of p_{0} and φ_{0} that could have produced a given p and φ are considered.
5. Marginal distributions of polarization angle
The marginal posterior distribution, T, of φ_{0} is given by (51)As before, the denominator of B is a function of p and φ only and may be pulled through the integral. The numerator becomes . If the prior is a function of p_{0} to some nonnegative integer power n (i.e., , there appears to exist a family of solutions via repeated integration by parts. Explicit solutions were found up to n = 10 using the computer algebra system Mathematica. Beyond n = 1, these solutions quickly become impractically large.
The marginal distribution, G, of φ over f has been investigated by several authors (Vinokur 1965; Clarke & Stewart 1986; NaghizadehKhouei & Clarke 1993). Its defining formula in terms of the sky angle is Unfortunately, the solution to this equation in Clarke & Stewart (1986) has some sign mistakes and the derivation in Appendix B of NaghizadehKhouei & Clarke (1993) confuses the sky angle, φ, with the angle in the q − u plane, θ. It is worth explicitly stating both versions to avoid confusion: (52)and (53)where and . It is useful to notice that the distribution is symmetric about φ_{0} and using a value of φ_{0} different than zero only translates the distribution along the angular axis. Therefore when calculating confidence intervals, only φ_{0} = 0 need be used. Unlike the Rice distribution, which is independent of the angular variable, this distribution depends on p_{0}. G is normalized (i.e., ).
The distribution T cannot in general be related to G as S was to R through ρ, even if the prior is independent of angle, because f lacks the symmetry in p and p_{0} that it has for φ and φ_{0}. T and G may have different form. If however κ(p_{0},φ_{0}) ∝ p_{0}, as with the Jeffreys prior, it is easy to show that T → G as 1/σ → ∞.
Figure 10 compares T to G for the Jeffreys prior, i.e., , and I_{0}/Σ = 100 and three different signaltonoise values. As expected, the T distribution and the G distribution are almost identical because 1/σ is fairly large. The two sets of curves overlap in the figure but the T curves are not precisely identical to the G curves. The difference between the T and G curves is larger if 1/σ is small or for polarization measurements near 1/σ. In practice, the classical error bars for the angle based on G are perfectly acceptable under this prior.
Figure 11 compares T to G for the case of a uniform polar prior, i.e., , and I_{0}/Σ = 100. In the figure, T (blue) is plotted for φ = 0 and p/σ = 0.5 (dotted), 1.0 (solid), 2.0 (dashed) and G (red) is plotted for and p_{0}/σ = 0.5 (dotted), 1.0 (solid), 2.0 (dashed). The figure clearly demonstrates that using G, the nonBayesian formula, to compute errors bars results in errors bars that are too small. This is especially important when the polarization degree to σ ratio is about one.
If a delta term is added to the priors as was done with the polarization magnitude plots, the result can roughly be described by saying that the distribution is scaled by (1 − A) and it is then shifted upwards by a constant of A/π. This constant shift occurs because the origin has an indeterminate angle. The equation in barred variables is (54)When A = 1.0, the distribution is flat with value 1/π, as it must be to be normalized.
Fig. 10 This plot compares the classical angle distribution, G(φp_{0},φ_{0},σ), to the Bayesian distribution, T(φ_{0}p,φ,σ), with and I_{0}/Σ = 100. The Bayesian curves are virtually identical to the classical curves and overlap in the figure. Both plots simply use zero for the angle since the choice is immaterial. Three different values for the signaltonoise, i.e., p_{0}/σ (G) or p/σ (T), are used: 0.5 (dashed lines), 1.0 (solid lines), and 2.0 (dotted lines). 
Fig. 11 This plot compares the classical angle distribution, G(φp_{0},φ_{0},σ) (red), to the Bayesian distribution, T(φ_{0}p,φ,σ) (blue), with and I_{0}/Σ = 100. The Bayesian approach now suggests the classical approach underestimates the error bars. Both plots simply use zero for the angle since the choice is immaterial. Three different values for the signaltonoise, i.e., p_{0}/σ (G) or p/σ (T), are used: 0.5 (dashed lines), 1.0 (solid lines), and 2.0 (dotted lines). 
6. Summary of derivations
Figure 12 is a flow chart to help remember the relationships among the many distributions presented. All the main functions result from a change of variables, Bayes Theorem, or marginalization (denoted by “Marg.” in the figure). Attempts were made to remain consistent with the function names used in previous published papers when possible.
7. Confidence intervals (aka credible sets)
All the pieces needed to construct confidence intervals (known as “credible sets” in Bayesian statistics) for p_{0} and have been given. In general, these sets will be twodimensional. Usually they will be constructed so as to contain the “best” estimate ( and ) that was found. In practice, especially with polarimetric spectra, using twodimensional sets as “error bars” is impossible. In these cases, the marginal distributions, T and S, may be used to construct error bars.
Highquality confidence interval tables, suitable for use in data reduction pipelines, will be the subject of a future paper.
8. Conclusion
A foundation for a Bayesian treatment of polarization measurements has been presented based when V = 0. The treatment covers the regime where the measured polarization has large or small Stokes parameters but all the intensities that were used in the calculation of the Stokes parameters are large enough that Poisson statistics is unimportant. This justifies the use of a gaussian approach for the Stokes parameters’ error.
A Bayesian analysis of polarization allows the observer some freedom in choosing a prior and loss function. These should be stated and justified when presenting one’s results. In particular, the importance of choosing an appropriate prior has been stressed.
The transposed Rice distribution is intimately connected with the posterior distribution with a twodimensional Jeffreys prior. As 1/σ → ∞, the distributions are the same. The classical angular distribution is also the same as the posterior angular distribution in this limit. A uniform polar prior may be better for sources with small expected polarization. In this case, the posterior distributions are rather different than the classical ones.
Fig. 12 This figure is a flow chart to help remember the relationships between the distributions presented. “Marg.” stands for marginalization over the accompanying variable. 
Further advances may be made by generalizing the analysis to include ellipsoidal, perhaps even correlated, distributions for Q and U, Poisson statistics, circular polarization, or repeated measurements. Monte Carlo methods will likely be necessary to handle the integrations for more general approaches. Polarimetric spectra and time series data that involve measurements that are not independent and may be highly correlated; modeling the wavelengthdependence or timeevolution could further improve results in those situations.
The results presented have focused on the interpretation of the quantities p_{0} and φ_{0} as polarization degree and angle but the formulas are applicable to any application that uses a bounded radial function of the form , where x and y are measured with equal gaussian error.
Acknowledgments
The authors acknowledge support from the Millennium Center for Supernova Science (MCSS) through grant P10064F funded by “Iniciativa Científica Milenio”. The contour plots were generated using computer algebra system Mathematica (version 7.01.0). The authors wish to thank Alejandro Clocchiatti and Lifan Wang for helpful discussion and the anonymous referee for valuable comments.
References
 Berger, J. O. 1985, Statistical Decision Theory and Bayesian Analsyis, 2nd ed. (Springer) [Google Scholar]
 Bertrand, J. 1888, Calcul des probabilités (Paris: GauthierVillars) [Google Scholar]
 Clarke, D., & Stewart, B. G. 1986, Vist. Astron., 29, 27 [NASA ADS] [CrossRef] [Google Scholar]
 Clarke, D., Stewart, B. G., Schwarz, H. E., & Brooks, A. 1983, A&A, 126, 260 [NASA ADS] [Google Scholar]
 del Toro Iniesta, J. C. 2003, Introduction to Spectropolarimetry (Cambridge University Press) [Google Scholar]
 Di Porto, P., Crosignani, B., Ciattoni, A., & Liu, H. C. 2010 [arXiv:1008.1878] [Google Scholar]
 Jaynes, E. T. 1973, Foundations of Physics, 3, 477 [NASA ADS] [CrossRef] [Google Scholar]
 Jeffreys, H. 1939, Theory of probability (Oxford University Press) [Google Scholar]
 Jeffreys, H. 1946, Proc. Royal Society of London Series A, Mathematical and Physical Sciences, 186, 453 [Google Scholar]
 Keller, C. U. 2002, in Astrophysical Spectropolarimetry, ed. J. TrujilloBueno, F. MorenoInsertis, & F. Sánchez, 303 [Google Scholar]
 Landi Degl’Innocenti, E. 2002, in Astrophysical Spectropolarimetry, ed. J. TrujilloBueno, F. MorenoInsertis, & F. Sánchez, 1 [Google Scholar]
 Maronna, R., Feinstein, C., & Clocchiatti, A. 1992, A&A, 260, 525 [NASA ADS] [Google Scholar]
 NaghizadehKhouei, J., & Clarke, D. 1993, A&A, 274, 968 [NASA ADS] [Google Scholar]
 Patat, F., & Romaniello, M. 2006, PASP, 118, 146 [NASA ADS] [CrossRef] [Google Scholar]
 Rice, S. O. 1945, Bell Systems Tech. J., 24, 46 [CrossRef] [Google Scholar]
 Robert, C. P. 1994, The Bayesian Choice: A DecisionTheoretic Motivation (Springer) [Google Scholar]
 Robert, C. P., Chopin, N., & Rousseau, J. 2009a, Statist. Sci., 24, 141 [Google Scholar]
 Robert, C. P., Chopin, N., & Rousseau, J. 2009b, Statist. Sci., 24, 191 [CrossRef] [Google Scholar]
 Serkowski, K. 1958, Acta Astron., 8, 135 [NASA ADS] [Google Scholar]
 Simmons, J. F. L., & Stewart, B. G. 1985, A&A, 142, 100 [NASA ADS] [Google Scholar]
 Tissier, P. E. 1984, The Mathematical Gazette, 68, pp. 15 [Google Scholar]
 Vaillancourt, J. E. 2006, PASP, 118, 1340 [Google Scholar]
 Vinokur, M. 1965, Ann. d’Astrophys., 28, 412 [Google Scholar]
 Wang, L., Wheeler, J. C., & Hoeflich, P. 1997, ApJ, 476, L27 [NASA ADS] [CrossRef] [Google Scholar]
 Wardle, J. F. C., & Kronberg, P. P. 1974, ApJ, 194, 249 [NASA ADS] [CrossRef] [Google Scholar]
Appendix A: Critical values for the median estimator curves with the uniform polar prior
Tables A.1 and A.2 show some critical values of the median estimator curve for the Jeffreys and uniform polar prior and an array of choices for A and 1/σ (= I_{0}/Σ). Entries with “...” never intercept the axis. As is expected, for a given value of A, the critical value decreases as the data quality increases (i.e., 1/σ gets larger). Also as expected, the critical values increase as A increases for a given value of data quality. These tables are created using Eq. (50). The critical values occur when the righthand side of that equation is equal to zero.
Unbarred variables are used in the tables. The behavior in barred variables is somewhat counterintuitive and should be avoided to prevent confusion. As can be seen by comparing the median estimator intercepts for in Figs. 8c, d, 9c and d, the intercept is larger for σ = 0.01 than σ = 0.04 even though the data quality is better for σ = 0.01. This does not mean that, as the data gets better, a larger measured value of polarization is required to prove polarization is nonzero! There is a human tendency to interpret the barred plots as if σ = 1, but in practice σ is always less than 1. When converted to unbarred variables, the polarization behaves as expected.
More tables such as this are intended to be included in a future work.
Median estimator critical values of p such that Prob(p_{0} = 0) = 1/2 for the Jeffreys prior.
Median estimator critical values of p such that Prob(p_{0} = 0) = 1/2 for the uniform polar prior.
Appendix B: “Projected” prior
A third interesting prior was discussed in Sect. 3. This prior, called here the “projected” prior, is motivated by considering an instrument that only measures linear polarization but ignores circular polarization. While a distribution for q and u can be derived for this case and the machinery given in the main text can be applied to it, this is somewhat inappropriate since it violates the leading assumption that V = 0 upon which the subsequent equations were derived. Nevertheless, it is worth recording here some of the salient formulas.
The starting point is a normalized uniform Cartesian prior on the Poincaré sphere (B.1)Upon projection onto the q − u plane, the probability density function becomes (B.2)Equivalently in polar coordinates, this is (B.3)or (B.4)Notice that for small values of , Eq. (B.5) has the same form as Eq. (35) up to an irrelevant scale factor. Thus for small values of , the posterior distribution should be similar to that of the Jeffreys posterior. The large value behavior will be slightly different. Marginalization of over φ produces (B.5)which is plotted in Fig. B.1 and may be contrasted with the previous similar figures.
Fig. B.1 The posterior Rice distribution using the “projected” prior of Eq. (B.5) with I_{0}/Σ = 8 (a very small value). The distribution of the measured values near the physicallymaximum value is notably different than the previous corresponding figures for the Jeffreys and uniform polar prior but the smallvalue behavior is nearly exact to the Jeffreys case. 
All Tables
Median estimator critical values of p such that Prob(p_{0} = 0) = 1/2 for the Jeffreys prior.
Median estimator critical values of p such that Prob(p_{0} = 0) = 1/2 for the uniform polar prior.
All Figures
Fig. 1 Contour plots of (Eq. (10)) for a few different values of p_{0}/σ with . The dots indicate the maximums of the distributions. Contour spacings are at intervals of 0.02. 

In the text 
Fig. 2 Two contour plots of the Rice distribution (Eq. (16)). The contours are spaced by 0.1 intervals. The value of σ is mostly unimportant for . It does not affect the shape of the distribution. The only purpose of mentioning it is to insure exists for values above 1/σ. 

In the text 
Fig. 3 Contour plots of (Eq. (20)) for a few different values of p/σ with φ = 0 assuming a Jeffreys prior () and I_{0}/Σ = 100. The dots indicate the maximums of the distributions. Contour spacings are at intervals of 0.02. This figure is nearly the same as Fig. 1 with the roles of p and p_{0} reversed. While for small values of p/σ and p_{0}/σ the plots appear identical, they are not. Their similarity diverges for larger values of signaltonoise, as they must since p_{0}/σ has a maximum value while p/σ does not. 

In the text 
Fig. 4 Contour plots of (Eq. (20)) for a few different values of p/σ with φ = 0 assuming a uniform polar prior () and I_{0}/Σ = 100. The dots indicate the maximums of the distributions. Contour spacings are at intervals of 0.02. 

In the text 
Fig. 5 A comparison of “random” points generated with the Jeffreys prior and the uniform polar prior. 

In the text 
Fig. 6 Two contour plots of the posterior Rice distribution (Eq. (46)) with and two values of σ. The contours are spaced by 0.1 intervals. 

In the text 
Fig. 7 Two contour plots of the posterior Rice distribution (Eq. (46)) with and two values of σ. The contours are spaced by 0.1 intervals. 

In the text 
Fig. 8 These four panels show the posterior distribution with prior for four combinations of value of A and σ. The blue dotted line is the median estimator. The black dashed line is the mean estimator. The red, dashed line at is a reminder of the finite amount of probability that exists along the axis due to the delta function term. No such line is needed in Figs. 6a and 6b because A = 0 there and the delta term does not contribute. This probability along the axis is plotted underneath each contour plot. 

In the text 
Fig. 9 These four panels show the posterior distribution with prior for four combinations of value of A and σ. The blue dotted line is the median estimator. The black dashed line is the mean estimator. The red, dashed line at is a reminder of the finite amount of probability that exists along the axis due to the delta function term. No such line is needed in Figs. 7a and 7b because A = 0 there and the delta term does not contribute. This probability along the axis is plotted underneath each contour plot. 

In the text 
Fig. 10 This plot compares the classical angle distribution, G(φp_{0},φ_{0},σ), to the Bayesian distribution, T(φ_{0}p,φ,σ), with and I_{0}/Σ = 100. The Bayesian curves are virtually identical to the classical curves and overlap in the figure. Both plots simply use zero for the angle since the choice is immaterial. Three different values for the signaltonoise, i.e., p_{0}/σ (G) or p/σ (T), are used: 0.5 (dashed lines), 1.0 (solid lines), and 2.0 (dotted lines). 

In the text 
Fig. 11 This plot compares the classical angle distribution, G(φp_{0},φ_{0},σ) (red), to the Bayesian distribution, T(φ_{0}p,φ,σ) (blue), with and I_{0}/Σ = 100. The Bayesian approach now suggests the classical approach underestimates the error bars. Both plots simply use zero for the angle since the choice is immaterial. Three different values for the signaltonoise, i.e., p_{0}/σ (G) or p/σ (T), are used: 0.5 (dashed lines), 1.0 (solid lines), and 2.0 (dotted lines). 

In the text 
Fig. 12 This figure is a flow chart to help remember the relationships between the distributions presented. “Marg.” stands for marginalization over the accompanying variable. 

In the text 
Fig. B.1 The posterior Rice distribution using the “projected” prior of Eq. (B.5) with I_{0}/Σ = 8 (a very small value). The distribution of the measured values near the physicallymaximum value is notably different than the previous corresponding figures for the Jeffreys and uniform polar prior but the smallvalue behavior is nearly exact to the Jeffreys case. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.