Absence of anticorrelations and of baryon acoustic oscillations in the galaxy correlation function from the Sloan Digital Sky Survey data release 7
F. Sylos Labini^{1,2}  N. L. Vasilyev^{3}  Y. V. Baryshev^{3}  M. LópezCorredoira^{4,5}
1  Centro Studi e Ricerche Enrico Fermi, via Panisperna 89 A,
Compendio del Viminale, 00184 Rome, Italy
2  Istituto dei Sistemi Complessi CNR,
via dei Taurini 19, 00185 Rome, Italy
3 
Institute of Astronomy, St. Petersburg
State University, Staryj Peterhoff, 198504
St. Petersburg, Russia
4 
Instituto de Astrofísica de Canarias,
C/vía Láctea, s/n
38200 La Laguna, Tenerife, Spain
5 
Departamento de Astrofísica, Universidad de La Laguna, 38205 La Laguna, Tenerife, Spain
Received 5 March 2009 / Accepted 23 July 2009
Abstract
Aims. One of the most striking features predicted by standard models of galaxy formation is the presence of anticorrelations in the matter distribution on large enough scales (
). Simple arguments show that the location of the length scale ,
marking the transition from positive to negative correlations, is the same for any class of objects as for the full matter distribution; i.e. it is invariant under biasing. This scale is predicted by models to be at about the same distance of the scale signaling the baryonic acoustic oscillation scale
.
Methods. We test these predictions in the newest SDSS galaxy samples where it is possible to measure correlations on 100 Mpc/h scales both in the main galaxy (MG) and in the luminous red galaxy (LRG) volumelimited samples. We determine, by using three different estimators, the redshiftspace galaxy twopoint correlation function.
Results. We find that, in several MG samples, the correlation function remains positive on scales >250 Mpc/h, while it should be negative beyond
Mpc/h in the concordance LCDM. In other samples, the correlation function becomes negative on scales <50 Mpc/h. To investigate the origin of these differences, we considered in detail the propagation of errors on the sample density into the estimation of the correlation function. We conclude that these are important at large enough separations and that they are responsible for the observed differences between different estimators and for the measured sampletosample variations in the correlation function. We show that in the LRG sample the scale corresponding to
cannot be detected because fluctuations in the density fields are too large in amplitude. Previous measurements in similar samples have underestimated volumedependent systematic effects.
Conclusions. We conclude that, in the newest SDSS samples, the largescale behavior of the galaxy correlation function is affected by intrinsic errors and volumedependent systematic effects that make the detection of correlations only an estimate of a lower limit of their amplitude, spatial extension, and statistical errors. We point out that these results represent an important challenge to LCDM models as they largely differ from its predictions.
Key words: cosmology: observations  largescale structure of Universe
1 Introduction
Standard models of galaxy formation (i.e., cold, warm and hot dark matter models) predict the twopoint correlation function of matter density fluctuations in the early universe, and they can make a simple prediction for that at the present time, in the regime of weak density perturbations, where fluctuations have been only linearly amplified by gravitational clustering in the expanding universe (Peebles 1980). The difference in the various models lying in the values of the characteristic length scales and in the particular scalebehavior of . In general, this is characterized by three length scales and three different regimes, (i) on scales smaller than r_{0}, where , matter distribution is characterized by strong clustering; i.e. , about which little is known analytically and which is generally constrained by Nbody simulations where it is typically found that, for r<r_{0}, with (Springel et al. 2005); (ii) the second length scale is such that , and it is located at (Peebles 1993; Gabrielli et al. 2002). In the range of scales , is characterized by positive correlations, which rapidly decay to zero when . This former regime can be easily related to the early universe correlation function by a simple rescaling of amplitudes given by the linear gravitational growth of small amplitude perturbations in an expanding universe (Peebles 1980). The scale is an imprint of the early universe physics. It corresponds to the size of the Hubble horizon at the time of the equality between matter and radiation and it is fixed by the values of standard cosmological parameters being proportional to where is the density parameter and h the normalized Hubble constant (Peacock 1999). The third length scale is located on scales on the order of, but smaller than, . This is the realspace scale corresponding to the baryon acoustic oscillations (BAO) at the recombination epoch. Its precise location depends on the matter density parameters, baryon abundance and Hubble constant (Eisenstein & Hu 1998); (iii) finally in the third range of scales, namely for , is characterized by a negative powerlaw behavior, i.e. (Gabrielli et al. 2005,2002). Positive and negative correlations are exactly balanced in such a way that . This is a global condition on the system fluctuations, which corresponds that the matter distribution being superhomogeneous (Gabrielli et al. 2005,2002) i.e. characterized by a sort of stochastic order and by fluctuations that are depressed with respect to a purely uncorrelated distribution of matter (i.e. white noise). This corresponds to the linear behavior of the matter power spectrum as a function of the wavenumber k for (named the HarrisonZeldovich tail), and it characterizes not only the LCDM model but all models of density fluctuations in the framework of the FriedmannRobertsonWalker metric (Gabrielli et al. 2005,2002).
In the new samples provided by the Sloan Digital Sky Survey Data Release 7 (SDSSDR7) (Abazajian et al. 2009), it is possible to estimate the galaxy correlation function on scales on the order of 100 Mpc/h to possibly determine and . Some years ago, Eisenstein et al. (2005) determined the Landy & Szalay (1993, LS) estimator of the galaxy twopoint correlation function in a preliminary luminous red galaxy (LRG) sample of the SDSS, claiming for an overall agreement with the LCDM prediction and for a positive detection of the scale at about 110 Mpc/h. More recently Cabré & Gaztañaga (2008) measured the same estimator of the correlation function in the LRGDR6 sample and Martínez et al. (2009) in the LRGDR7 sample. They both found that the LRG correlation is positive up to 200 Mpc/h and that the shape of the correlation function around is slightly different from the one measured by Eisenstein et al. (2005). While they claimed that the measured correlation function was compatible with the LCDM model, they did not discuss the fact that their detection implied that positive correlations extend to scales larger than the model predicted . In addition we note that Eisenstein et al. (2005); Martínez et al. (2009); Cabré & Gaztañaga (2008) did not discussed other estimator than the LS one.
In the present paper, we show that our results coincide very finely with the ones of the above mentioned papers for what concern the amplitude, shape and statistical error bars in the case of the LS estimator in the LRGDR7 sample. However we measure that in the SDSSDR7 main galaxy (MG) sample the twopoint correlation function (LS estimator) remains positive at large separations, i.e. for r> 250 Mpc/h, showing a clear systematic volumedependent behavior and a remarkable disagreement with the LCDM prediction. In addition, we find that there is a difference between the LS and the Davis & Peebles (1983, DP) estimator of the twopoint correlation function in redshift space. Finally we find that both estimators significantly vary in different sky regions. We interpret these results by studying the fluctuations in the sample density estimation.
The paper is organized as follows. We first define in Sect. 2 the estimators of the correlation function and a simple determination of its statistical errors that we use in the data analysis. Section 3 is devoted to the description of the samples selection while in Sect. 4 we present our main results. The discussion of the behaviors we have found and their interpretation is presented in Sect. 5. The behavior of the twopoint correlation function predicted by standard models of galaxy formation and the comparison with the results obtained are discussed in Sect. 6. Finally we draw our main conclusions in Sect. 7.
2 Pairwise estimators
In what follows we determine the twopoint correlation properties by using the LS, DP and the Hamilton (H) (Hamilton 1993) estimators. These estimators may have a number of systematic biases when correlations are long range as we discuss in Sect. 5. Firstly, it is interesting to discuss their properties and consider their determinations.
The LS estimator is defined as
where DD(r), RR(r) and DR(r) are the number of datadata, randomrandom and datarandom pairs, and are the number of random and data points (we use with K=3 and we have checked that the results do not significantly depend on K as long as this is larger than unity).
The DP estimator is defined as
and the H estimator can be written as
(3) 
In general, a statistical estimator X_{V} of the statistical quantity X in a finite sample V, to be a valid one, must satisfy the following limit condition
where in brackets we denote the ensemble average (infinite volume limit). A stronger condition is that
i.e. that the ensemble average in a finite volume is equal to the ensemble average in the infinite volume limit. If this condition is not satisfied the estimator is said to be biased (Gabrielli et al. 2005; Kerscher 1999). One wants to understand the bias and the variance of the various estimators and this is possible only for some specific estimators and for distributions with simple correlation properties (e.g. Poisson). The effect of bias, i.e. finite volume or size effects, can be studied through the analysis of artificial simulations with known properties; however the three estimators defined above are all biased (Sylos Labini & Vasilyev 2008; Kerscher et al. 2000; Kerscher 1999). It is worth noticing that Kerscher (1999) showed that, in a real galaxy sample, the three different estimators defined above use different finite size corrections yielding to different results on large enough scales, for small value of the correlation amplitude, while all of them agree on smaller scales, where the amplitude of the correlation was large enough.
It was shown (Landy & Szalay 1993) that the LS estimator has the minimal variance for a Poisson distribution, i.e. the variance decays as 1/N instead as as for the DP estimator. This fact, however, does not mean that its variance will be any more controllable for a wider class of distributions with more complex correlation properties than Poisson's (Gabrielli et al. 2005). Indeed, there is no formal proof that the DP is less accurate than the LS for a generally correlated point distribution even though this conclusion has been reached by, e.g., Kerscher et al. (2000) examining some specific properties of estimators in Nbody simulations. They concluded also that the H estimator is equivalent to the LS one. In Sylos Labini & Vasilyev (2008), by studying finite volume effects in the estimators, it was shown that the two estimators LS and the H are indeed indistinguishable, but that they are almost equivalent to the DP when the underlying distribution is positively correlated.
Among the various ways to compute statistical errors
(Sylos Labini & Vasilyev 2008) we use the jackknife (JK) estimate whose
variance is (Scranton et al. 2002)
where the index i is used to signify that the value of the correlation function is computed each time in all the N subsamples of a given samples but one (the ith).
3 The samples
We have constructed several subsamples of the maingalaxy (MG) and the luminousredgalaxy (LRG) samples of the spectroscopic catalog SDSSDR7. Concerning the latter we have constrained the flags indicating the type of object to select only the galaxies from the MG sample. We then consider galaxies in the redshift range with redshift confidence and with flags indicating no significant redshift determination errors. In addition we apply the apparent magnitude filtering condition r < 17.77 (Strauss et al. 2002).
The angular region we consider is limited, in the SDSS internal angular coordinates, by and : the resulting solid angle is steradians. We do not use corrections for the redshift completeness mask or for fiber collision effects. Fiber collisions in general do not present a problem for measurements of large scale galaxy correlations (Strauss et al. 2002). Completeness varies most near the current survey edges which are excluded in our samples. The completeness mask takes into account that the fraction of observed galaxies is not the same in all the fields, because of both fiber collision effects and small variation in limiting magnitude. One can, under certain assumption, take into account the completeness mask information in the statistical analysis. Otherwise it is possible to make tests by varying the limits in apparent magnitude and study the stability of the results obtained. We have applied this second possibility and we did not find sensible variations in the measured statistical properties when r<17.5 (Sylos Labini et al. 2009d). This conclusion is confirmed by the fact that our results for the LRG sample agree with those of Eisenstein et al. (2005); Martínez et al. (2009); Cabré & Gaztañaga (2008) and for the MG sample with those of Zehavi et al. (2005b,a), who have explicitly taken into account the completeness mask of the survey in their analysis. As noticed by Cabré & Gaztañaga (2008) the completeness mask could be the main source of systematic effects on small scale only, while we are interested on the correlation function on relatively large separations.
To construct volumelimited (VL) samples (see Table 1) we computed the metric distances using the standard cosmological parameters, i.e., and with H_{0}=100 h km s^{1} Mpc^{1}. We computed absolute magnitudes using Petrosian apparent magnitudes in the r filter corrected for Galactic absorption.
We checked that the main results in the MG sample we got do not depend on Kcorrections and/or evolutionary corrections as those used by Blanton et al. (2003). In this paper we use standard Kcorrection from the VAGC data^{} (see discussion in Sylos Labini et al. 2009d, for more details).
Figure 1: Correlation function in the VL1 sample: both the LS and the DP estimators are reported. The solid line gives prediction of the LCDM with (from Eisenstein et al. 2005) linearly rescaled, according to the simplest biasing scheme (Kaiser 1984), to fit the amplitude on 10 Mpc/h. In the insert panel we show the same behavior but in a loglinear scale. 

Open with DEXTER 
Table 1: Properties of the SDSSDR7 VL samples.
Concerning the LRG we have selected all the objects that have classification ``galaxy'' and which belong to the ``Cut I'' subset of the Galaxy Red objects with the same redshift quality criteria as for main galaxies. As this is only roughly VL sample we have applied cuts in absolute magnitude M and distance R to obtain a rectangular area in the MR diagram. In addition because evolutionary effects are small for LRG galaxies (Eisenstein et al. 2005) we have not applied further corrections to these data. Given that we have selected a truly VL sample, we did not apply a further redshift dependent weighting to the data.
The subsamples used to measure the JK errors are made by dividing the survey angular region we considered into 30 subfields, each of area 200 deg^{2}. In this way there are some thousands galaxies in each subsample.
4 Results
Figure 2: The same of Fig. 1 but for the VL2 sample. 

Open with DEXTER 
Figure 3: The same of Fig. 1 but for the VL3 sample. 

Open with DEXTER 
We find, in agreement with Eisenstein et al. (2005); Zehavi et al. (2002,2005a) in previous data releases of the SDSS, that the redshiftspace correlation function in different samples shows a different amplitude but similar shape on small scales (see Figs. 14). This is usually ascribed to the (physical) effect of selection, that brighter galaxies exhibit a larger clustering amplitude (Zehavi et al. 2002; Norberg et al. 2002; Zehavi et al. 2005a). However this is not the only change: the larger the correlation function amplitude the more extended is the range of scales where there are detectable (i.e. signal larger than JK errors) positive correlations. Indeed, in the MG samples the transition scale from positive to negative correlations occurs at a scale that grows roughly in proportion to the sample size and in the deepest samples this is located on rather larger scales, i.e. r>250 Mpc/h. However in the VL1 sample we find Mpc/h, i.e. less than the half of the LCDM prediction.
To show that finitevolume effects are important on large separations, we consider a single sample (VL4) and we cut it at different scales ; in addition we consider an angular cut of the LRG sample for which the depth is fixed but the volume is lowered. In the latter case the whole angular region of 6000 deg^{2} is cut into two nonoverlapping sky region, each of area 3000 deg^{2}, i.e only 20% smaller than the sample considered by Eisenstein et al. (2005). As one may notice from Figs. 57, there is a clear volume dependence of the twopoint correlation function on large scales. In particular, in the R1 subsample there is an evident difference between the data and the LCDM prediction. In addition, we note that almost in all cases the DP and LS estimator on large enough scales show a difference which can be larger than statistical error bars.
Figure 4: The same of Fig. 1 but for the LRG sample. 

Open with DEXTER 
Figure 5: Correlation function in the whole sample VL4 and in a subsample of it (VL4c) limited at Mpc/h. Jackknife errors are shown in both cases. 

Open with DEXTER 
It is worth noticing that our result for the LS estimator of the correlation function in the LRG sample finely agrees with the determination of Martínez et al. (2009), although these authors have used a slightly different technique to take into account the survey completeness mask, as we commented above (see Fig. 8).
The LS estimator for the LRG sample is also very similar to the determination made by Eisenstein et al. (2005), although the signal is larger then the statistical error bars and positive up to 200 Mpc/h, as it was found also by Martínez et al. (2009) in the same sample we considered. A similar trend was also seen in the analysis by Cabré & Gaztañaga (2008). In addition our result for the MG sample nicely agree with the determination of Zehavi et al. (2005b), although they did limit their analysis to smaller scales than the ones considered in our analysis.
We note that Eisenstein et al. (2005) stated that the MG sample does not have an enough large volume to measure the correlation function on 100 Mpc/h scales, without giving a clear quantitative argument of why statistical or systematic errors should prevent one to measure the correlation function on those scales. Indeed, we find that the signal to noise ratio, when JK error estimations are used, is larger than unity even on scales larger than 150 Mpc/h. In this respect one may ask whether statistical errors computed in this way are meaningful.
In addition we note that Martínez et al. (2009) also found that the correlation function becomes negative on scales of the order 50 Mpc/h in a 2dFGRS sample, without however commenting on this fact. Actually they even claimed that is detectable when the correlation function is negative, without discussing that this is not what one expects in the context of the LCDM model where the zero point of the correlation function must be a single scale for any type of objects (see below).
Finally we find that, as discussed in Sect. 2, the LS and the H estimators of the correlation function are almost indistinguishable: this is shown in Fig. 9 where we plot the behavior of the ratio as a function of separation. This remains smaller than 5% on all the relevant scales.
Figure 6: Correlation function measured through the LS estimator (with jackknife errors) in the LRG subsample (R1) which is limited by and , i.e. with solid angle steradians. The solid line is the LCDM prediction. 

Open with DEXTER 
Figure 7: The same of Fig. 6 but for the R2 angular region, which is limited by and , i.e. with solid angle steradians. 

Open with DEXTER 
Figure 8: Determination of the correlation function for the LRG sample with the LS estimator (LRG), compared with the Eisenstein et al. (2005) (E05) and the Martínez et al. (2009) (Martinez) determinations. The solid line is the LCDM prediction. 

Open with DEXTER 
Figure 9: Behavior of the ratio as a function of separation in the different MG and LRG samples. 

Open with DEXTER 
5 Fluctuations and volumedependent systematic effects
In theoretical models, the matter density field is uniform on large
scales and the average mass density
is provided by
an average over an ensemble of realizations of a given stochastic
process. In a finite sample of volume V, the average density
can be estimated in some way. In the limit in which the
sample volume is infinite and in the process is ergodic (Gabrielli et al. 2005)
then
because in this limit the relative variance goes
to zero if the distribution is uniform on large scales, i.e.
where N(V) is the mass in a volume V. In a finite volume is finite and therefore in any finite volume . In general for a uniform stochastic point process, in the ensemble average sense the relative mass variance can be written as
where is the ensemble average twopoint correlation function. In the r.h.s. of Eq. (6) there is the sum of the contribution to the variance due to correlation and due to Poisson noise, the former being always present in a point distribution.
Thus in a finite sample any determination of the average density has an intrinsic error . Given that the twopoint correlation determines the amplitude of correlations with respect to the sample density, it is natural to ask which is the error introduced in the estimation of the correlation function by the uncertainty on the value of the sample density. A second question is which kind of statistical estimation of the correlation function errors in a finite sample is representative of the errors induced by the average density uncertainty.
5.1 Fluctuations in the determination of the sample density
The twopoint correlation function is defined as
where
is the conditional density. Because of the definition in Eq. (7), any estimator of can be written as
where is the sample estimation of the conditional density and is the sample estimation of density. Note that, in general, to measure the conditional density, one performs an average over all points in the sample (Gabrielli et al. 2005). On the other hand the estimation of the sample density does not involve the average operation. For instance one can simply determine the sample density to be where V is the sample volume and N is the number of objects in it.
In addition it is worth noticing that the pairwise estimators introduced in Sect. 2, necessarily use a similar strategy, as in order to the measure the average of the sample density one would need many samples of size V. Thus, the determination of the twopoint correlation function requires the estimation of an average quantity and of a nonaverage quantity. The former can introduce volumedependent systematic effects in a nontrivial way.
Suppose that a certain estimator of the twopoint correlation function uses the sample estimation while another estimator uses : the difference between and is not due to the fact that the samples are different, rather that the different estimators use different boundary conditions to measure the twopoint correlation function, i.e. different ways of normalizing the datadata pairs to the datarandom and randomrandom pairs. Thus they are subject to a different bias (Kerscher 1999). Alternatively one can think to measure the same estimator but into two different samples of same geometry and volume, in which the sample density takes a slightly different value.
We show now that the different values the sample density may
result in a different measurement of the large scales behavior of the
correlation function. To this aim, let us assume that there is a small
difference between the value of the sample density used by the
estimator 1 and the estimator 2, so that we can write
with . Let us also suppose that the two estimators measure the exactly same conditional density . This is a simplifying but reasonable assumption as the conditional density is averaged over many points placed in different parts of the sample volume. In these conditions we may write that
(10) 
and thus from Eqs. (7)(9) we get
which makes explicit that a different determination of the sample density results in a variation of the estimated twopoint correlation function.
Figure 10: By taking the LS estimator in the LRG sample for in Eq. (11) we find that for the quantity (labeled as LS) almost perfectly agrees with the DP estimator in the same sample. 

Open with DEXTER 
As an illustrative example, we can take as the LS estimator for the LRG sample. We find that, for , in Eq. (11) almost perfectly agrees with the DP estimator in the same sample (see Fig. 10). It is thus clear than a small uncertainty in the value of the sample average (in this case 0.6%) can affect the large scale behavior of the correlation function in the range of scales and of amplitudes of interest, i.e. around 100 Mpc/h in the LRG sample. Therefore, we have to determine what is the error on the estimation of the sample density and then we have to clarify how this changes the large scale behavior of the correlation function. Is the above estimation of 0.6% representative of the true uncertainty on the large scale average density?
Simply stated, the problem is the following: in order to measure the BAO we need to have an error of about 10^{3} on the estimator of the correlation function. Indeed, for the LRG case, the correlation function on 100 Mpc/h has an amplitude of about 10^{2} while the feature corresponding to the BAO (a slight local increase followed by a decrease) corresponds to a local variation of about 10^{3} in the correlation function amplitude.
By errors propagation, we find from Eq. (8) that
We neglect again the statistical error on the determination of the conditional density on scales smaller than the sample, i.e. the first term in the r.h.s. of Eq. (12). As discussed above, this approximation is reasonable in view of the fact that the conditional density is determined by making an average over many points. Then by using again Eq. (8) we can rewrite the previous equation as
where we used that as this is the regime in which we are interested in. From Eq. (13), it follows that the error on the correlation function estimation is of the same order of the error in the estimation of the sample density. Therefore the question is whether we really know the sample density with an error of the order of 10^{3}.
The typical fluctuation on the density estimation in a given sample, on scales of the order of the sample size, is . The problem is to constrain from the data. As mentioned above it is not possible to make an average over many samples of volume V, as we have a single one, and thus we can determine the fluctuation only inside the sample itself by considering several subsamples of it.
We have estimated
on the relevant scales as
follows. We divide the sample into N independent (nonoverlapping)
angular fields and then we determine the number of galaxies in the
each field. We then compute the average
and the
variance
and thus the standard deviation as
As there is an arbitrariness in the choice of the number of fields Nwe let it to vary between a few, for which we have more than 10^{4}objects in each field, to some tens, to have a least several hundreds of galaxies in each field.
From Fig. 11 we may note that in the LRG sample, the typical fluctuation is about for about any value of N and that this is much larger than Poisson noise, i.e. almost a factor 100 larger than the error needed to measure the correlation function with a precision of the order of 10^{3}!
Figure 11: The typical fluctuation in the LRG sample average density is about for about any value of N in the range 4,30 and it is much larger than Poisson noise. 

Open with DEXTER 
Note that this value of the typical fluctuation is in agreement with that obtained in a smaller LRG sample by Hogg et al. (2004). For the MG samples we find that has about the same amplitude as for the LRG case (see Fig. 12). Thus given that we conclude that we can get in these samples a statistically significant estimation of the correlation function only for and thus any claim about smaller amplitude is biased by overall volumedependent systematic effects. This implies that, for the LRG sample, our estimation is statistically significant for Mpc/h. To measure correlations of smaller amplitude, and thus on larger scales, we need to have samples in which the typical fluctuation of the average density is, at least, a factor ten smaller than the present one.
Note that for the case of MG samples, and specifically for VL2 and VL3, the amplitude of the correlation functions is of the order of up to 250 Mpc/h. We stress however that one should also care about whether the property of selfaveraging is satisfied in these samples, and thus whether the determination of average quantities gives a meaningful estimation of intrinsic properties (Sylos Labini et al. 2009c,d).
Figure 12: The same of Fig. 11 but for the VL1, VL2 and VL3 case. 

Open with DEXTER 
While the above argument about error propagation strictly applies when we determine the correlation function by considering Eq. (8), we show in what follows that the above estimation holds also in the case of the DP and LS estimators. To show this, let us now compute statistical error bars in different way than by the JK method.
5.2 Statistical errors
The errors on the correlation function can be determined in various manners and the problem is to understand, in the case of the actual distribution, which methods gives the most reliable error estimation. To this aim, let us consider in more detail the computation of JK errors: in practice one takes almost fixed the sample density and computes the typical variation with respect to it. Indeed, we remind that each of the N subfields used to in the JK estimation is equal the full sample without a small subfield of angular area equal to 1/N of the full sample area. Therefore the different subfields are strongly overlapping: in the case in which large scale correlations are not negligible this method underestimates the errors in the correlation function estimation.
Figure 13: Jackknife errors, and fieldtofield errors computed with different number of fields N=10, 20, 30 in the LRG sample. The solid line corresponds to the fullsample determination of the LS estimator. 

Open with DEXTER 
We find that in the LRG sample the variation of the sample density in the N=30 subfields used to compute the JK errors is smaller, i.e. than what is estimated by computing the variance in nonoverlapping subfields. This result does not show a particular dependence on the number of subfields used as long as N>10.
Fieldtofield errors can quantify, volumedependent systematic
effects due to largescale variation of the sample density. They can
be computed by dividing the sample into N nonoverlapping
subfields. The correlation function can be estimated by
and then the variance is
In Fig. 13 we show the behavior of the errors, in the LRG sample, computed by Eqs. (4) and (16) and considering 10, 20 and 30 fields. One may note that (i) the fieldtofield error in larger than the signal for Mpc/h, i.e. for scales larger the amplitude of the estimated correlation function is ; (ii) the fieldtofield error is larger than the JK error on all scales by about five times. Note that the JK errors are similar to those derived by of Cabré & Gaztañaga (2008). The fieldtofield errors are much larger, and they could be overestimates because the fields used are smaller than the full sample. To check whether this is the case we can vary the number of subfields used to estimate the fieldtofield fluctuations as we did, for instance, to compute the typical rms fluctuation on the average density (see Figs. 11, 12). Clearly by reducing the number of fields N one has less determinations, while increasing N one is finally dominated by shot noise. For N in the range [10, 30] we do not notice any clear decrease in the fieldtofield errors. Our conclusion is therefore that the JK error is not the complete error but only the sampling error while the fieldtofield fluctuations include the possible fluctuations due to the uncertainty on the sample density estimation and it and should be larger or equal than JK errors. An additional problem we consider in the next section, is whether the statistical errors measured by considering nonoverlapping fields are able to take into account the whole uncertainty on the sample average, i.e. they can take into account the bias of the estimators.
Note that the behavior of the correlation function in the MG VL2 and VL3 samples on large enough scales, i.e. Mpc/h, is the same when considering both JK and fieldtofield errors, showing thus that there are positive correlations on scales larger than the cutoff on Mpc/h predicted by the LCDM model without a statistical robust evidence of the scale on 110 Mpc/h.
5.3 Large scale volumedependent systematic effects
The simple estimation of fieldtofield errors allows one to overcome the problem related to the JK method, in which the implicit assumption is that correlations on the scale of the sample are negligible. However the fieldtofield method is not able to take into account the full errors on the correlation function estimation. This is because the sample density is systematically different from the ensemble average density when the correlation function is non zero at large scales. This introduces a wellknown bias, i.e. a volumedependent systematic effects. Let us discuss this further effect.
Most of the literature on the correlation function measurements has focused on the determination of the statistical errors (Norberg et al. 2009; Zehavi et al. 2002; Eisenstein et al. 2005; Norberg et al. 2002; Zehavi et al. 2005a) while little attention has been devoted to the understanding of the distortions introduced by volumedependent systematic effects. These depend on the precise type of estimator used, but they affect any estimator , in a finite sample of volume V, in some ways at large enough scales (Sylos Labini & Vasilyev 2008).
For instance, an important volumedependent systematic effect is related to the socalled integral constraint (Peebles 1980) and can be understood as follows. The estimator measures amplitude and shape of conditional correlations normalized to the estimation of the sample mean instead to the ``true'' (ensemble or infinite volume limit average) average density (Sylos Labini & Vasilyev 2008). As long as the ``true'' correlation function is different from zero (e.g. in case of LCDM on all scales) any estimation of the average density in a finite sample differs from the ``true'' value. This situation introduces a systematic distortion of with respect to which, depending on the correlation properties of the underlying distribution, is manifested in (i) an overall difference in amplitude and (ii) a distortion of the shape for (Sylos Labini & Vasilyev 2008).
In order words, only if the zero point of the correlation function is due to the boundary condition corresponding to the integral constraint, then this will be different for different sample sizes. If the zeropoint is real, as it should be in a LCDM model, then it should not change from the sample to sample.
The definition of the range of scale in which this former effect occurs, depends on the precise estimator used. For instance, in the case of the fullshell (FS) estimator (Sylos Labini & Vasilyev 2008; Gabrielli et al. 2005) and for a spherical sample volume, our ignorance of the ``true'' average density value is explicitly present in the condition that , where the integral is performed over the whole sample volume V. Note that this condition holds for any V and it forces the estimator to become negative even if the ``true'' is always positive inside the given sample. The effect of this boundary condition is the following: as long as the ``true'' correlation function is positive, by enlarging the volume size the change of sign occurs at larger and larger scales (Sylos Labini & Vasilyev 2008). This effect may very well explain the behavior found in the MG VL samples discussed above, in which we noticed that the transition scale changes from Mpc/h for the smallest sample to more than 250 Mpc/h for the deepest sample we considered. Note that if the ``true'' correlation function is negative, then the distortion on large scales can be rather important (Sylos Labini & Vasilyev 2008).
While for the FS estimator one can analytically calculate the scale at which the systematic departure from the ``true'' shape occurs, for more complex estimators based on paircounting, like the LS one, it is possible to understand only through numerical simulations the ways in which this boundary condition affects the measured correlations. This is the complication to be considered having the advantage that these estimators can measure correlations on scales larger than those sampled by the FS estimator (Sylos Labini & Vasilyev 2008). For paircounting estimators it has been numerically shown (Sylos Labini & Vasilyev 2008) that, when fluctuations in the sample density are small enough, ; the prefactor of this proportionality depends on the type of estimator and on the sample geometry. However, we note that large scale fluctuations may alter this systematic behavior as a function of the sample volume in a non trivial way (see e.g., Sylos Labini et al. 2009c,d,b,a).
Note that the simple computation of how the error in the average density propagates into the error on the correlation function does not take explicitly into account of the situation in which the sample density itself can be a varying function of the sample size (the interested reader to Sylos Labini et al. 2009c,d,a,b, for a more complete discussion of this important point). Indeed, as mentioned above, the estimated sample average converges to the asymptotic average density with a rate determined from the decaying of the twopoint correlation function. When correlations are strong, there can be an important finitevolume dependence of the sample density, resulting in a similar finitesize effects of the twopoint correlation function (Sylos Labini & Vasilyev 2008).
6 Theoretical implications
To theoretically interpret these results it is necessary to take into account an important complication which changes the predictions of standard models described in the introduction. Indeed, these refer to the whole matter density field (dark and luminous) while we observe only a part of it in the form of luminous matter (i.e. galaxies). The relation between galaxy and dark matter distributions is usually formulated in terms of bias: the latter represent a certain (physical) sampling of the former. There are two different relevant regimes. At nonlinear scales, where the distribution has strong clustering characterized by non Gaussian fluctuations, this relation can be studied only through numerical models (Croton et al. 2006; Springel et al. 2005). Instead, on scales where perturbations are small and clustering is in the linear regime, there is a simple picture based on the threshold sampling of a Gaussian random field (Kaiser 1984). In the former case one may derive analytically that the ``biased'' twopoint correlation function is linearly amplified by threshold sampling (Kaiser 1984). This is found to occur also in the nonlinear regime but under different conditions, as shown by numerical Nbody simulations (Croton et al. 2006; Springel et al. 2005): the effect of biasing is to linearly amplify the correlation function, while the simple threshold sampling of a Gaussian random field predicts a strongly scale dependent amplification of the correlation function in the nonlinear regime (Gabrielli et al. 1999).
Therefore the prediction of the nonlinearity scale r_{0} for the full matter distribution (which, in current models, is Mpc/h) gives only an approximate estimate for that of galaxies of different luminosity. Indeed this scale has been found to slightly vary in Nbody simulations (Springel et al. 2005). On the other hand the scale is not affected by biasing for the simple reason that it is located, in current models, on about Mpc/h where fluctuations have low amplitude and thus where both biasing and gravitational clustering give rise to a linear amplification of the correlation function. Hence, given that for there are not positive correlations in the whole matter density field, these will not be present in the galaxy distribution as they cannot be generated by a biasing mechanism. Thus the length scale is invariant with respect to biasing, i.e. it must be the same for any class of objects as for the whole matter density field. It is then a fundamental scale to be measured in the observed galaxy distribution to verify the class of models characterized by the HarrisonZeldovich tail of the matter power spectrum. Finally the third length scale in current models is the BAO scale, located at , and it is weakly affected by gravitational evolution and biasing (Eisenstein & Hu 1998).
That the scales and are invariant under biasing is shown by the analysis of the Nbody simulations provided by the Horizon project (Kim et al. 2009) where it is found that these are the same for the whole matter distribution and for the subsample of particles corresponding to the LRG (see their Fig. 5).
7 Conclusions
In the newest SDSS samples it is possible to measure the correlation function on 100 Mpc/h scales both in the main galaxy (MG) and in the luminous red galaxy (LRG) samples. We measured, in the former case, positive correlations extending up to a factor two beyond the scale Mpc/h, at which in the LCDM model should cross zero being negative on larger scales. However in nearby samples we measured that positive correlations are detectable only up to 50 Mpc/h. Therefore we concluded that in these samples The correlation function shows a rather different behavior from the LCDM model prediction and that there is no statistical significant evidence for the scale corresponding to the baryonic acoustic oscillations (BAO). Moreover we found that the estimated twopoint correlation function in different MG VL samples shows a clear dependence on the sample volume. We concluded that the overall errors in the estimation of the correlation function cannot be simply evaluated by the computation of statistical error bars (e.g. JK) but they can only be studied by making systematic tests in samples with different volumes.
In addition, we have shown that, in the LRG sample, the uncertainty on the sample density estimation does not allow to measure the correlation function on scales of the order of 100 Mpc/h. Rather it puts a upper limit to the estimation of correlations at about 50 Mpc/h. More specifically the fluctuation on the estimation of the sample density for the LRG sample is of the order of . This is, as we have discussed, of the same order of the errors in the correlation function. We have pointed out that in order to measure the small bump in the correlation function associated with the BAO scale, one would need samples in which the fluctuation on the estimated density is more ten times lower than the value found in the LRGDR7 sample.
For this reason we concluded that in the LRG sample there is no statistical evidence for the BAO and that previous measurements (Eisenstein et al. 2005; Martínez et al. 2009; Cabré & Gaztañaga 2008) have underestimated the error bars in the estimation of the correlation function and neglected the possible effect of the bias in the estimator. This is due to the fact that they have measured statistical errors by means of the JK method. This computes the sample variance by considering different samples which are strongly overlapping. If large scale correlations are not negligible, this method underestimates the errors in the correlation function. We have shown that a more reliable way to compute statistical error bars is given by the simple estimation of fieldtofield fluctuations. However, we have pointed that even this method is not able to properly take into account overall volumedependent effects, i.e. the estimator's bias, related to our ignorance of the ensemble average density.
Determinations of correlations through the measurements of the galaxy power spectrum (Cole et al. 2005) are affected by similar volumedependent systematic effects (Sylos Labini & Amendola 1996). In addition one must take into account that threshold sampling of a Gaussian field does change the shape of power spectrum on large enough scales, i.e. on small enough wavenumbers (Durrer et al. 2003). A similar situation should occur in the case of the halo models (Gabrielli et al. 2005).
This situation represents an important challenge for models, especially in view of the fact that galaxy distribution does not present the negative correlations predicted by models up to scales larger than 250 Mpc/h. Our conclusion is that, in view of the finitevolume effects, the estimation of correlations presented here must be intended as a lower limit to the real correlations characterizing the large scale distribution of galaxies. Future surveys, like the extended SDSS III project (Schlegel et al. 2009), may allow us to study the behavior of the galaxy correlation function on scales larger than those considered here. To understand how volumedependent systematic effects perturb correlation measurements and to make tests on the volume stability of statistical quantities it is necessary to consider a more complete statistical analysis that focuses on conditional fluctuations (Sylos Labini et al. 2009c,d,b,a).
Acknowledgements
F.S.L. is grateful to Andrea Gabrielli and Michael Joyce for interesting discussions. Y.V.B. is grateful for partial support from Russian Federation grants: Leading Scientific School 1318.2008.2 and RFBR 090200143. M.L.C. was supported by the Ramón y Cajal Program of the Spanish Science Ministry. We thank an anonymous referee for a list of suggestions and criticisms that has allowed us to improve the presentation. We acknowledge the use of the Sloan Digital Sky Survey data (http://www.sdss.org) and of the NYU ValueAdded Galaxy Catalog (http://ssds.physics.nyu.edu/).
References
 Abazajian, K., AdelmanMcCarthy, J. K., Agüeros, M. A., et al. 2009, ApJS, 182, 543 [NASA ADS] [CrossRef] (In the text)
 Blanton, M. R., Hogg, D. W., Bahcall, N. A., et al. 2003, ApJ, 592, 819 [NASA ADS] [CrossRef] (In the text)
 Cabré, A., & Gaztañaga, E. 2009, MNRAS, 393, 1183 [NASA ADS] [CrossRef] (In the text)
 Cole, S., Percival, W. J., Peacock, J. A., et al. 2005, MNRAS, 362, 505 [NASA ADS] [CrossRef] (In the text)
 Croton, D. J., Springel, V., White, S. D. M., et al. 2006, MNRAS, 365, 11 [NASA ADS] [CrossRef]
 Davis, M., & Peebles, P. J. E. 1983, ApJ, 267, 46 [NASA ADS] [CrossRef] (In the text)
 Durrer, R., Gabrielli, A., Joyce, M., & Sylos Labini, F. 2003, ApJ, 585, L1 [NASA ADS] [CrossRef] (In the text)
 Eisenstein, D. J., & Hu, W. 1998, ApJ, 496, 605 [NASA ADS] [CrossRef] (In the text)
 Eisenstein, D. J., Zehavi, I., Hogg, D. W., et al. 2005, ApJ, 633, 560 [NASA ADS] [CrossRef] (In the text)
 Gabrielli, A., Labini, F. S., & Durrer, R. 2000, ApJ, 531, L1 [NASA ADS] [CrossRef] (In the text)
 Gabrielli, A., Joyce, M., & Sylos Labini, F. 2002, Phys. Rev. D, 65, 083523 [NASA ADS] [CrossRef]
 Gabrielli, A., Sylos Labini, F., Joyce, M., & Pietronero, L. 2005, Statistical Physics for Cosmic Structures (Berlin: Springer Verlag)
 Hamilton, A. J. S. 1993, ApJ, 417, 19 [NASA ADS] [CrossRef] (In the text)
 Hogg, D. W., Eisenstein, D. J., Blanton, M. R., et al. 2005, ApJ, 624, 54 [NASA ADS] [CrossRef] (In the text)
 Kerscher, M. 1999, A&A, 343, 333 [NASA ADS]
 Kerscher, M., Szapudi, I., & Szalay, A. S. 2000, ApJ, 535, 13 [NASA ADS] [CrossRef]
 Kaiser, N. 1984, ApJ, 284, L9 [NASA ADS] [CrossRef] (In the text)
 Kim, J., Park, C., Gott, J. R., & Dubinski, J. 2009, ApJ, 701, 1547 [NASA ADS] [CrossRef] (In the text)
 Landy, S. D., & Szalay, A. 1993, ApJ, 412, 64 [NASA ADS] [CrossRef] (In the text)
 Martínez, V. J., ArnalteMur, P., Saar, E., et al. 2009, ApJ, 696, L93 [NASA ADS] [CrossRef] (In the text)
 Norberg, P., Baugh, C. M., Hawkins, E., et al. 2002, MNRAS, 332, 827 [NASA ADS] [CrossRef]
 Norberg, P., Baugh, C. M., Gaztañaga E., & Croton, D. J. 2009, MNRAS, 396, 19 [NASA ADS] [CrossRef]
 Peacock, J. A. 1999, Cosmological Physics (Cambridge University Press) (In the text)
 Peebles, P. J. E. 1980, The LargeScale Structure of the Universe (Princeton University Press) (In the text)
 Peebles, P. J. E. 1993, Principles of physical cosmology (Princeton University Press)
 Schlegel, D., White, M., & Eisenstein, D. 2009 [arXiv: 0902.4680] (In the text)
 Scranton, E., Johnston, D., Dodelson, S., et al. 2002, ApJ, 579, 48 [NASA ADS] [CrossRef] (In the text)
 Springel, V., White, S. D. M., Jenkins, A., et al. 2005, Nature, 435, 629 [NASA ADS] [CrossRef] (In the text)
 Strauss, M. A., Weinberg, D. H., Lupton, R. H., et al. 2002, AJ, 124, 1810 [NASA ADS] [CrossRef] (In the text)
 Sylos Labini, F., & Amendola, L. 1996, ApJ, 468, L1 [NASA ADS] [CrossRef] (In the text)
 Sylos Labini, F., & Vasilyev, N. L. 2008, A&A, 477, 381 [NASA ADS] [CrossRef] [EDP Sciences]
 Sylos Labini, F., Vasilyev, N. L., & Baryshev, Yu. V. 2009a, Europhys. Lett., 85, 29002 [NASA ADS] [CrossRef]
 Sylos Labini, F., Vasilyev, N. L., & Baryshev, Yu. V. 2009b, A&A, 496, 7 [NASA ADS] [CrossRef] [EDP Sciences]
 Sylos Labini, F., Vasilyev, N. L., Baryshev, Yu. V., & Pietronero, L. 2009c, Europhys. Lett., 86, 49001 [NASA ADS] [CrossRef]
 Sylos Labini, F., Vasilyev, N. L., & Baryshev, Yu. V. 2009d, A&A, in press [arXiv:0909.0132] (In the text)
 Zehavi, I., Blanton, M. R., Frieman, J. A., et al. 2002, ApJ, 571, 172 [NASA ADS] [CrossRef]
 Zehavi, I., Zheng, Z., Weinberg, D. H., et al. 2005a, ApJ, 630, 1 [NASA ADS] [CrossRef]
 Zehavi, I., Eisenstein, D. J., Nichol, R. C., et al. 2005b, ApJ, 621, 22 [NASA ADS] [CrossRef]
Footnotes
All Tables
Table 1: Properties of the SDSSDR7 VL samples.
All Figures
Figure 1: Correlation function in the VL1 sample: both the LS and the DP estimators are reported. The solid line gives prediction of the LCDM with (from Eisenstein et al. 2005) linearly rescaled, according to the simplest biasing scheme (Kaiser 1984), to fit the amplitude on 10 Mpc/h. In the insert panel we show the same behavior but in a loglinear scale. 

Open with DEXTER  
In the text 
Figure 2: The same of Fig. 1 but for the VL2 sample. 

Open with DEXTER  
In the text 
Figure 3: The same of Fig. 1 but for the VL3 sample. 

Open with DEXTER  
In the text 
Figure 4: The same of Fig. 1 but for the LRG sample. 

Open with DEXTER  
In the text 
Figure 5: Correlation function in the whole sample VL4 and in a subsample of it (VL4c) limited at Mpc/h. Jackknife errors are shown in both cases. 

Open with DEXTER  
In the text 
Figure 6: Correlation function measured through the LS estimator (with jackknife errors) in the LRG subsample (R1) which is limited by and , i.e. with solid angle steradians. The solid line is the LCDM prediction. 

Open with DEXTER  
In the text 
Figure 7: The same of Fig. 6 but for the R2 angular region, which is limited by and , i.e. with solid angle steradians. 

Open with DEXTER  
In the text 
Figure 8: Determination of the correlation function for the LRG sample with the LS estimator (LRG), compared with the Eisenstein et al. (2005) (E05) and the Martínez et al. (2009) (Martinez) determinations. The solid line is the LCDM prediction. 

Open with DEXTER  
In the text 
Figure 9: Behavior of the ratio as a function of separation in the different MG and LRG samples. 

Open with DEXTER  
In the text 
Figure 10: By taking the LS estimator in the LRG sample for in Eq. (11) we find that for the quantity (labeled as LS) almost perfectly agrees with the DP estimator in the same sample. 

Open with DEXTER  
In the text 
Figure 11: The typical fluctuation in the LRG sample average density is about for about any value of N in the range 4,30 and it is much larger than Poisson noise. 

Open with DEXTER  
In the text 
Figure 12: The same of Fig. 11 but for the VL1, VL2 and VL3 case. 

Open with DEXTER  
In the text 
Figure 13: Jackknife errors, and fieldtofield errors computed with different number of fields N=10, 20, 30 in the LRG sample. The solid line corresponds to the fullsample determination of the LS estimator. 

Open with DEXTER  
In the text 
Copyright ESO 2009