A&A 487, 857-863 (2008)
DOI: 10.1051/0004-6361:20079125

The time variation of the fine structure constant: a statistical analysis of astronomical data

S. J. Landau[*] - C. Simeone[*]

Departamento de Física, FCEyN, Universidad de Buenos Aires, Ciudad Universitaria - Pab. 1, 1428 Buenos Aires, Argentina

Received 22 November 2007 / Accepted 4 June 2008

Aims. We analyze different data of the variation of the fine structure constant obtained with different methods to check their consistency.
Methods. We test consistency using the modified student test and confidence intervals. We split the data sets into smaller intervals. A criterion for this selection is proposed.
Results. Results show consistency for reduced intervals for each pair of data sets considered.
Conclusions. Results are at variance with the ones obtained considering mean values over the whole interval.

Key words: quasars: absorption lines - cosmology: observations - cosmology: miscellaneous

1 Introduction

The attempt to unify all fundamental interactions resulted in the development of multidimensional theories like string-motivated field theories (Damour et al. 2002a; Maeda 1988; Damour et al. 2002b; Wu & Wang 1986; Damour & Polyakov 1994; Barr & Mohapatra 1988), related brane-world theories (Brax et al. 2003; Youm 2001b,a; Palma et al. 2003), and (related or not) Kaluza-Klein theories (Overduin & Wesson 1997; Gleiser & Taylor 1985; Kaluza 1921; Weinberg 1983; Klein 1926). Among these theories, there are some in which the gauge coupling constants may vary over cosmological timescales. On the other hand, theoretical frameworks based on first principles, were developed by different authors (Bekenstein 1982,2002; Chamoun et al. 2001; Olive & Pospelov 2002; Barrow & Magueijo 2005; Barrow et al. 2002) to study the variation of the fine structure constant ($\alpha$) or the electron mass ($m_{\rm e}$).

Different versions of the theories mentioned above predict different time behaviors of the fundamental constants. Thus, bounds obtained from astronomical and geophysical data are an important tool to test the validity of these theories.

The experimental research can be grouped into astronomical and local methods. The latter ones include geophysical methods such as the natural nuclear reactor that operated about $1.8\times 10^9$ years ago in Oklo, Gabon (Damour & Dyson 1996; Fujii et al. 2000), the analysis of natural long-lived $\beta $ decayers in geological minerals and meteorites (Olive et al. 2004) and laboratory measurements such as comparisons of rates between clocks with different atomic numbers (Bize et al. 2003; Sortais et al. 2000; Fischer et al. 2004; Prestage et al. 1995; Marion et al. 2003; Peik et al. 2004). The astronomical methods are based mainly on the analysis of high-redshift quasar absorption systems. The relative magnitude of the fine splitting of resonance lines of alkaline ions is proportional to $\alpha^2 $. Murphy et al. (2001b) and Chand et al. (2005) have applied this method to SiIV doublet absorption lines systems at different redshifts. An extension of this method was proposed by Bahcall et al. (2004). These authors use strong nebular emission lines of O III to constrain the variation of $\alpha$. Furthermore, this method was improved by comparing transitions of different species with widely far atomic masses and led to the first results consistent with a time varying fine structure constant for a range of redshifts ( 0.5 < z < 3.5) (Webb et al. 1999,2001; Murphy et al. 2003). This method is known in the literature as the many multiplet method (MM). However, other recent independent analyses of similar data (Chand et al. 2004; Srianand et al. 2004; Quast et al. 2004; Chand et al. 2006) found no variation. On the other hand, the standard MM technique can be revised to avoid the deficiencies pointed out earlier in the literature (Bahcall et al. 2004; Quast et al. 2004). This improved method called in the literature as revised many multiplet (RMM) method, was applied by Quast et al. (2004) to a homogeneous sample of FeII lines at redshift z=1.15. Another method, to test cosmological variation of $\alpha$, from pairs of Fe II lines observed in individual exposures from a high-resolution spectrograph was proposed by Levshakov et al. (2005) (this method is known in the literature as SIDAM). The authors found no variation of $\alpha$ at z=1.84 and z=1.15 (Levshakov et al. 2006). However, a recent reanalysis of spectrum of the quasar Q1101-264 found variability within 1$\sigma$ (Levshakov et al. 2007). Besides, by comparing optical and radio redshifts, a bound on $\alpha ^2g_{\rm p}\frac{m_{\rm e}}{m_{\rm p}}$ (where $g_{\rm p}$ is the proton g factor) can be obtained (Cowie & Songaila 1995; Tzanavaris et al. 2007). Furthermore, comparing molecular and radio lines provides a bound on $g_{\rm p} \alpha^2$ and the most stringent constraints were obtained by Murphy et al. (2001a). On the other hand, Darling (2004) reports bounds on the variation of $\alpha$ at z=0.2467 from the satellite 18 cm OH conjugate lines. Finally, Kanekar et al. (2005) compared the HI and OH main line absorption redshifts of the different components in the z=0.765 absorber and the z=0.685 lens toward B0218+357 to establish stringent constraints on changes in $F= g_{\rm p} \left({\frac{\alpha^2 m_{\rm e}}{m_{\rm p}}}\right)^{1.57}$. Besides, the time variation of the gauge coupling constants in the early universe can be constrained using data from the cosmic microwave background (CMB) (Rocha et al. 2003; Martins et al. 2002) and the primordial abundances of light elements (Nollet & Lopez 2002; Ichikawa & Kawasaki 2002).

In this paper, we would like to focus on the discrepancy between the data on time variation of $\alpha$ from astronomical observations, following a purely statistical criterion. As described above, different methods are able to constrain the variation of $\alpha$. However, to achieve statistical consistency for each fixed redshift interval, the reported value should be the same. The usual assumption is that if there is any time variation of $\alpha$, such variation is constant over all observed redshifts. So, the reported results are means over a range of redshifts. However, theories mentioned above predict different time evolutions of $\alpha$ yielding different variations for different times (Bekenstein 1982; Okada 1985; Marciano 1984). Therefore, to test the evolution of $\alpha$ predicted by those kind of theories, not only a mean value over a range of redshifts, but each individual measurement should be considered. However, not all the reported results are consistent. The most important discrepancy in the literature are the results reported by Murphy et al. (2003) and those reported by Chand et al. (2004). It should be also mentioned that there is an important debate in the literature about the reliability of the Chand et al. (2004) results (Murphy et al. 2007; Srianand et al. 2007; Murphy et al. 2008). The aim of this paper is to test the consistency of different reported values for the variation of $\alpha$ for smaller intervals than the reported ones, in each case. In this way, we will be able to define a reduced interval where consistency can be assured and thus the theoretical prediction for the evolution of $\alpha$ with time within this interval can be tested. The selection of the redshift interval to be tested is not trivial and a method for this selection is also proposed and described in Sects. 2 and 3. It is also important to note that observational errors are important and can not be ignored in any statistical analysis. In Sect. 2 we describe the statistical tools (which include the observational errors) we use for testing consistency. In Sect. 3, we apply these tools to check consistency between the data on varying $\alpha$. Not all groups of data can be tested using the modified student test because the requirements of the method for selecting the redshift intervals can not be fulfilled by most groups of data. Therefore, we compute confidence intervals of a group of data and compare them with a single reported value of another author to check consistency in most of the cases. We find that consistency between pairs of data sets can be assured for a reduced interval in each case. In Sect. 4, we discuss our results and write our conclusions.

2 Statistical tools

The question to be answered is whether, for given redshift intervals, two experiments compared are consistent or not. The appropriate procedure is thus a test for the difference between two population means, which involves a statistic defined in terms of two sample means and two sample variances. However, in some cases, one of the experiments includes, for a given redshift interval, very little data, and therefore does not allow us to reasonably define a sample mean and a sample variance. In this situation, the procedure to be followed should rather involve a confidence interval constructed from the sample values of the experiment allowing for a statistical treatment. In what follows, we discuss in detail these two approaches as well as the choice of the sample size (which, consequently, determines the width of the redshift intervals).

2.1 Student test

Table 1: Groups of data considered for methods that constrain $\frac{\Delta \alpha}{\alpha}$.

Within a statistical framework, the null hypothesis (two experiments are consistent) can be formulated as

\begin{displaymath}H_0: \mu_1-\mu_2=0
\end{displaymath} (1)

where $\mu_1,~\mu_2$are the (unknown) population means of each experiment for a definite redshift interval. Because the sample sizes for the available observational data are not expected to be large, and the true variances are not known, to test the hypothesis we must use a t test, which involves the sample variances. Now, the usual t test is not robust to departures from normality or from equality of variances when the number of values within each sample are not equal. Thus, we adopt an approximate test starting from the statistic (Devore 1995):

\begin{displaymath}T=\frac{{\overline X_1}-{\overline X_2}}{\sqrt{\frac{S_1^2}{m}+\frac{S_2^2}{n}}}
\end{displaymath} (2)

where ${\overline X_1},~{\overline X_2}$ are the sample mean values for the given redshift interval, m,n are the data numbers of each sample and S12,S22 are, respectively, their sample variances corrected to include the observational errors ei (Brandt 1989):

\begin{displaymath}S^2=\sum p_i\left[{\rm e}_i^2+\left(x_i-{\overline X}\right)^2\right],
\end{displaymath} (3)

\begin{displaymath}p_i^2=\frac{\frac{1}{e_i^{2}}}{\sum \frac{1}{ e_i^{2}}}\cdot
\end{displaymath} (4)

The rejection region (RR) for the two-tailed test is defined by

 \begin{displaymath}RR: \left\{
T \leq - t_{ \frac{\lambda}{2},\nu}\\
T \geq \ \ t_{ \frac{\lambda}{2},\nu}
\end{displaymath} (5)

where the number of degrees of freedom $\nu$ is given by the rounded value of

\end{displaymath} (6)

and $\lambda$ is the (approximate) level of the test (Brownlee 1960). Thus, $\lambda$ is the approximate probability of type I error, that is, the probability of rejecting the null hypothesis when it is true. In practice, the algorithm used will yield a level value $\lambda^*$ such that the obtained value of the statistic lies within the associated rejection region (Eq. (5) with $\lambda^*)$. Therefore, at level $\lambda$ the null hypothesis should be rejected when $\lambda^*\leq\lambda$.

2.2 Confidence intervals

In some cases, one of the experiments to be tested includes very few data and, therefore, does not allow us to define a sample mean and a sample variance for a given redshift interval. In other cases, the amount of data is statistically very low to consider the results of the modified student test reliable. For those cases, we introduce a different procedure: Suppose that for a given redshift interval a group of data 1 allows for a statistical treatment, while a group of data 2 does not. To test the consistency of given observation 2 against observation 1, from the values of group 1 we construct an interval I of confidence $100~P\%$. Then, if the null hypothesis is true, $P=1-\lambda$ is the probability that the result of an observation of group 2 lies within this confidence interval, and the null hypothesis should be rejected at level $\lambda$ when this is not the case. The confidence interval is then centered at the mean value ${\overline X}$ of sample 1, and its width is determined by the complement of the rejection region of a two-tailed test. Thus, under the same hypothesis of the preceding subsection, we have

\begin{displaymath}I=\left({\overline X}-t_{\frac{\lambda}{2},~n-1}\frac{S}{\sqr...
...verline X}+t_{\frac{\lambda}{2}, n-1}\frac{S}{\sqrt{n}}\right)
\end{displaymath} (7)

where n is the number of values of sample 1, and S2 is the sample variance (corrected as above to include the observational errors); as before, the choice of the t distribution is motivated by the size of the samples, which are not expected to be large. In practice, we choose a level $\lambda$ and the algorithm yields a confidence interval for this level. Then, we compare the confidence interval obtained from group 1 with each single reported value for the variation of $\alpha$ obtained from group 2.

2.3 Sample size

From a statistical point of view, a possible criterion to estimate the appropriate sample sizes is to limit the probability of type II error, that is, the probability $\beta $ of not rejecting the null hypothesis when it is false. However, while the probability $\lambda$ of type I error can be fixed independent of the population or sample values, the calculation of $\beta $ requires the choice of a definite alternative hypothesis; that is, to determine $\beta $ the inequality $\mu_1\neq \mu_2$ must be specialized as a definite equality $\mu_1-\mu_2=\delta$. Then, once we have precisely determined a definite alternative hypothesis and chosen a level $\beta $, in some cases we can obtain a simple analytical expression of the required sample size n. For a t distribution no simple expression exists, which can easily be understood by recalling that this distribution includes as a parameter the number of degrees of freedom. However if one is only interested in an estimate, an approximate analytical expression is given in terms of a normal distribution; for a two-tailed test for the mean of a population (or for the associated confidence interval) we have (see Devore 1995):

 \begin{displaymath}n\simeq \left[\left(z_\frac{\lambda}{2}+ z_\beta\right)\frac {S}{\delta}\right]^2,
\end{displaymath} (8)

where $ z_\frac{\lambda}{2}$, ${\mathit z}_\beta$ are obtained by inverting a normal N(0,1) distribution (this would slightly underestimate the size n, because the t distribution is less peaked than the normal distribution, but this is not relevant if one is not interested in an exact result). In the case that we test the difference between the means of two populations, if the corresponding two samples are of equal (or at least similar) sizes, formula (8) can also be applied, with $S=\sqrt{S_1^2+S_2^2}$. In this approach, the choice of the approximate sample size is thus determined by the type II error that one is to admit for a given departure from the null hypothesis, this departure being measured by comparison with the sample variance. In practice, for a proposed level $\beta $ we take as a reasonable assumption an alternative hypothesis $\delta\sim S$ and obtain the corresponding sample size n; this proves to be consistent with usual choices of what is to be considered as a significant non null result (see the Discussion below).

Table 2: Groups of data considered for methods that constrain quantities related to $\frac{\Delta \alpha}{\alpha}$.

3 Results

Table 1 summarizes the bounds for direct measurements of $\frac{\Delta \alpha}{\alpha}$. Table 2 shows the details of data that constrain a combination of fundamental constants that include $\alpha$. Unifying schemes predict that the variation of fundamental constants is related, and that their relationship depends on the theoretical framework. However, we limit ourselves to studying the variation of $\alpha$ and do not consider the possible variation of other fundamental constants. We have excluded the data from quasar emission lines reported by Bahcall et al. (2004) and Grupe et al. (2005) because the individual errors of those data are 2 orders of magnitude above errors of other groups of data. On the other hand, the data reported by Darling (2004) and Murphy et al. (2001a) at z=0.24 could not be tested either because there are not enough data from other authors at similar redshift to build confidence intervals.

\par\includegraphics[angle=-90,width=9cm,clip]{9125fig1.ps}\end{figure} Figure 1: Results of modified student test comparing results from Chand et al. (2005) with Murphy et al. (2001b).
Open with DEXTER

One of the major problems of this analysis lies in the selection of the redshift interval to be tested. The natural choice is a length equal to the observational error of the measured redshifts. However, at present there are not enough data available to use this criterion. Another possibility would be a bin size coming from theories that predict time variation of fundamental constants. However, in general, those models have free parameters, which are estimated from observational bounds on varying $\alpha$. Therefore, we decided to use a purely statistical criterion. As explained in Sect. 2.3, the amount of data can be determined from limits on Type II error. Now, we fix $\lambda =0.025$, and Eq. (8) gives $n \simeq 12, 15, 18$ for $\beta=0.1, 0.05, 0.025$ respectively.

Table 3: Confidence intervals for different redshifts.

3.1 Student test

The selection of redshift intervals to be tested would proceed as follows: The first interval (to be considered to apply the test) starts at a redshift z=a with b width, where b is the minimum length that includes n data (n will depend on the desired value of $\beta $, see above) for both groups of data or for only one group of data in the case of confidence intervals. The following i intervals will start at redshift z=a+i*0.1 and again the value of b is chosen with the same criterion. For analyzing the results we also define a criterion: i) If all values of $\lambda^*$ are below the desired level for all intervals, we conclude that there is no consistency between both groups of data; ii) if all values of $\lambda^*$ are above the desired level, we conclude that there is consistency between both groups of data and; iii) in case that some values of $\lambda^*$ are below and some values of $\lambda^*$ are above the desired level, we conclude consistency for a reduced interval. We exclude from the consistency interval, all intervals for which $\lambda^* \le \lambda$. We can assure consistency for the remaining interval.

We applied the modified student test to the data reported by Chand et al. (2005) and Murphy et al. (2001a) for n=12 (a higher n is not possible since the total data from (Chand et al. 2005) are 15). Results showing consistency for the whole interval are shown in Fig. 1. It should be noted that the modified student test can be applied only for redshift intervals larger than 0.76 due to the requirement that n=12. Other groups of data could not be tested in this way due to two kinds of problems: i) the total number of data is lower than 12; ii) The number of data of one author for a fixed interval is much larger than the number of data of other authors for the same interval. This does not fulfill one of the assumptions to obtain Eq. (8).

Table 4: Confidence intervals for different redshifts.

3.2 Confidence intervals

Since the requirements to apply the modified student test are not always fulfilled by the available data, the calculation of confidence intervals is a useful tool to test consistency among data on varying $\alpha$. We choose $\lambda =0.025$ and build a confidence interval for a group of data and compare the results with each single reported value of another author. Again, we choose the bin size as the smallest interval centered at the reported value, which contains n data (n depends on the desired $\beta $ level). We also define a criterion for analyzing the results. If all confidence intervals overlap with the reported value of the other author, we conclude that there is consistency for the whole interval, while if none of them overlap, we conclude that there is no consistency. In case there are some confidence intervals that do not overlap with the respective reported interval, we exclude from the consistency interval the corresponding redshift interval. This is a conservative criterion, because we are probably overestimating the discarded intervals. The available data allow us to perform the analysis for n=18 ( $\beta=0.025$), and we show these results because the discarded intervals are larger than in the cases for higher values of $\beta $.

Table 3 shows the confidence intervals calculated for the data of Murphy et al. (2003) for redshift intervals centered in each value of the Chand et al. (2004) data and containing 18 data points (we have also calculated the confidence intervals for other values of n and results are shown in Sect. 4). From Table 3 it follows that the confidence intervals calculated for z=1.277, z=1.541, and z=1.637 do not overlap with the corresponding reported intervals. Therefore, the redshift interval (1.14,1.93) should be discarded from the consistency interval because the data from Murphy et al. (2003) used to calculate the confidence intervals belong to this interval. Therefore, from the confidence intervals analysis we conclude that there is consistency over the intervals (0.452, 1.14) and (1.93,2.3), i.e., 11 data points from Chand et al. (2004) are consistent with 70 data points from Murphy et al. (2003) (only 103 of the 128 data points are eligible to be tested against Chand et al. (2004) data using the criterion n=18).

From Table 4, it follows that the data reported by Levshakov et al. (2007) are at variance with the confidence interval calculated with the data of Chand et al. (2004) at z=1.84. From Table 5, we obtain that the consistency interval for the data sets of Murphy et al. (2001b) and Murphy et al. (2003) is (2.01,2.09)  (2.46,2.51). From Table 6, it follows that the consistency interval for the data sets of Chand et al. (2005) and Murphy et al. (2003) is (1.33,1.71)  (2.1,3.11). The consistency interval for the data sets of Murphy et al. (2001b) and Chand et al. (2005) is (1.33,1.72)  (2.09,3.02) (see Table 7). Finally, the consistency intervals for the Tzanavaris et al. (2007) and Murphy et al. (2003) data sets calculated from Table 8 are (2.27,2.51). These results are at variance with the result obtained comparing mean values for the whole interval.

Table 5: Confidence intervals for different redshifts.

Table 6: Confidence intervals for different redshifts.

Table 7: Confidence intervals for different redshifts.

Table 8: Confidence intervals for different redshifts.

4 Discussion and conclusion

In Sect. 3 we have obtained two consistency redshift intervals (( 0.45, 1.14) and (2.03, 2.3)), when comparing the data sets of Murphy et al. (2003) and Chand et al. (2004). Now, we apply the modified student test defined in Sect. 2.1 to the two consistency intervals. We obtain $\lambda^*=0.28$ and $\lambda^*=0.42$ respectively, which confirms the consistency of these two redshift intervals as analyzed using confidence intervals. In this case, we do not have to worry about the number of samples being equal, since the selection of the redshift interval is already done. It is also interesting to repeat the calculation of confidence intervals for different values of $\beta $ (or n). Table 9 shows that for a higher value of $\beta $, the consistency interval is larger than the interval obtained for a lower value of $\beta $. We also performed the modified student test for the consistency intervals of Table 9 and obtain $\lambda^* > 0.25$. We have also calculated the consistency intervals for the other groups of data for larger values of $\beta $. We find that there is consistency between the confidence interval and the reported interval for $\beta=0.05$ and $\beta=0.1$ in all cases but the comparison of the data set from Murphy et al. (2003) and Tzanavaris et al. (2007). For this case, the redshift consistency interval for $\beta=0.05$ is (0.238,1.78)  (2.17,2.47) and for $\beta=0.1$ we obtain (0.238,1.79)  (2.14,2.47). On the other hand, we would like to comment on the comparison between the Chand et al. (2005) and Murphy et al. (2001b) data sets. In this case, both the modified student test and the calculation of confidence intervals were used to analyze consistency with different results. However, it should be stressed that the modified student test is able to test redshift intervals larger than 0.76, while the length of the confidence intervals lies between 0.26 and 0.53.

A statistical criterion has been given to estimate the appropriate sample sizes (which determine the width of the redshift intervals). The criterion is based on the probability of not rejecting the null hypothesis (consistency of two experiments compared) when a definite alternative hypothesis is supposed to be true. While being well justified on statistical grounds, this proposal involves a subtle point: The choice of a given alternative hypothesis is not provided by a theoretical framework. Rather, our choice is suggested by the results of Murphy et al. (2003), which motivated our analysis. In that work, the non null result for the variation of $\alpha$ was supported by a departure from the null hypothesis $\delta \sim 4.5 ~ \sigma = 4.5 ~S/\sqrt{n}$ (S the sample variance); our choice for the alternative hypothesis $\delta\sim S$ is consistent with this for the employed sample sizes.

We thus have analyzed the consistency between data sets on varying $\alpha$, using the statistical tools described in Sect. 2. Usually, results are taken to be means over a range of redshifts. Here, instead, we have analyzed smaller intervals. The criterion for the selection of intervals is based on limits on the probability of type II error (see Sect. 2.3). We have obtained two consistency intervals for the data set of Murphy et al. (2003) and Chand et al. (2004), while the mean values and errors of the whole interval show no consistency. We have also shown that the data of Tzanavaris et al. (2007) and Murphy et al. (2003) are not consistent when tested over small redshift intervals. We have also obtained consistency intervals for this case, which should be used when testing theories on varying $\alpha$. For the other data sets, the consistency depends on the desired value of $\beta $. The result for $\beta=0.025$ is also at variance with the one obtained calculating the mean values and errors over the whole interval.

Table 9: Consistency intervals obtained for different values of $\beta $ ( $\lambda =0.025$ for all cases).

Support for this work was provided by PIP 5284 CONICET. The authors would like to thank Amalia Meza for useful discussions about statistics. The authors would also like to thank Michael Murphy for useful discussions about astronomical data.



Copyright ESO 2008