A&A 508, 1743 (2009)
Breaking the selfaveraging properties of spatial galaxy fluctuations in the Sloan Digital Sky Survey  Data release six
F. Sylos Labini^{1,2}  N. L. Vasilyev^{3}  Y. V. Baryshev^{3}
1  Centro Studi e Ricerche Enrico Fermi, via Panisperna 89 A, Compendio del Viminale, 00184 Rome, Italy
2  Istituto dei Sistemi Complessi CNR, via dei Taurini 19, 00185 Rome, Italy
3 
Institute of Astronomy, St. Petersburg State University, Staryj Peterhoff, 198504
St. Petersburg, Russia
Received 22 December 2008 / Accepted 27 August 2009
Abstract
Statistical analyses of finite sample distributions usually assume
that fluctuations are selfaveraging, i.e. statistically similar
in different regions of the given sample volume. By using
the scalelength method, we test whether this assumption is satisfied
in several samples of the Sloan Digital Sky Survey Data Release Six. We
find that the probability density function (PDF) of
conditional fluctuations, if filtered on large enough spatial scales
(i.e., r>30 Mpc/h), shows relevant systematic variations in different subvolumes of the survey. Instead for scales of r<30 Mpc/h,
the PDF is statistically stable, and its first moment presents scaling
behavior with a negative exponent around one. Thus while up to
30 Mpc/h galaxy structures have welldefined powerlaw
correlations, on larger scales it is not possible to consider whole
sample average quantities as meaningful and useful statistical
descriptors. This situation stems from galaxy structures corresponding
to density fluctuations that are too large in amplitude and too
extended in space to be selfaveraging on such large scales inside the
sample volumes: galaxy distribution is
inhomogeneous up to the largest scales, i.e.
Mpc/h probed by the SDSS samples. We show that cosmological corrections, such as Kcorrections
and standard evolutionary corrections, do not qualitatively change the
relevant behaviors. We consider in detail the relation between several
statistical measurements generally used to quantify galaxy fluctuations
and the scalelength analysis by discussing how the breaking of
selfaveraging properties makes it impossible to have a reliable
estimation of average fluctuations amplitude, variance, and
correlations for r>30 Mpc/h. Finally we show that the
largeamplitude galaxy fluctuations observed in the SDSS samples
are at odds with the predictions of the standard CDM model of structure formation.
Key words: cosmology: observations  largescale structure of Universe
1 Introduction
The statistical characterization of galaxy structures represents a central problem for our understanding of the largescale universe. Once threedimensional galaxy samples are provided by observations, one may think the problem is relatively simple; i.e., all that remains to do is to characterize the statistical properties of N points (galaxies) contained in a volume V. However, there are several issues that must be considered with great care; namely, (i) the definition of the statistical methods employed and analysis of the assumptions implicitly used by them; (ii) construction of the samples and the consideration of cosmological corrections; (iii) comparison of results in galaxy catalogs with model predictions. Even though each of these issues requires a separate discussion, sometimes in the literature the reliability of statistical methods is hidden by the problems related to cosmological corrections and/or by sampling (or biasing) of a given distribution, which is the problem to be considered when comparing results of observations with theoretical predictions and cosmological Nbody simulations^{}. In this way one is not able to properly disentangle the different problems and to ask the relevant questions at each step. For this reason, in what follows we try to discuss the three issues above by considering each in turn. In particular, only when there is agreement about the statistical methods used will it be possible to compare clearly results from different authors and to isolate the problems related to cosmological corrections and/or sampling.
There has been an intense debate about the most suitable statistical methods for characterizing galaxy properties, particularly galaxy correlations (see Sylos Labini et al. 2007; Wu et al. 1999; Baryshev & Teerikorpi 2006; Joyce et al. 2005; Gabrielli & Sylos Labini 2001; Davis 1997; Vasilyev et al. 2006; Sylos Labini et al. 2009a,b; Pietronero et al. 1997; Joyce et al. 1999a; Sylos Labini et al. 2009c; Hogg et al. 2004; Sylos Labini et al. 1998; Pietronero 1987). The most suitable statistical method for characterizing the properties of a given stochastic point process depends on the underlying correlations of the point distribution itself. There can be different situations for the statistical properties of any set of points (in the present case, galaxies) in a finite sample. Let us briefly consider four different cases. Inside a given sample, the galaxy distribution is approximated by a uniform stochastic point process, or in other words, inside a given sample the average density is welldefined. This means that the density, measured for instance in a sphere of radius r randomly placed inside the sample, has small fluctuations. In this situation the relative fluctuations between the average density estimator and the ``true'' density is less than unity. Density fluctuations may be correlated, and the correlation function can be (i) shortranged (e.g., exponential decay) or (ii) longranged (e.g., power law). In other words these two cases correspond to a uniform stochastic point process with (i) shortrange and (ii) longrange correlations.
On the other hand, it may happen that galaxy distribution is not uniform. In this situation, the density measured, for instance, in a sphere of radius r randomly placed inside the sample, has large fluctuations; i.e., it varies wildly in different regions of the sample. In this situation the point distribution can generally present longrange correlations of large amplitude. Then it may, case (iii), or may not, case (iv), present selfaveraging properties, depending on whether measurements of the density in different subregions show systematic (i.e., not statistical) differences that depend, for instance, on the spatial positions of the specific subregions. When this is so, the considered statistics are not statistically selfaveraging in space because the PDF systematically differs in different subregions and wholesample average values are not meaningful descriptors. In general, such systematic differences may be related to two different possibilities: (i) the underlying distribution is not translationally and/or rotationally invariant; (ii) the volumes considered are not large enough for fluctuations to be selfaveraging.
In determining statistical properties, a fundamental assumption is very often used in the finitesample analysis: that sample density is supposed to provide a reliable estimate of the ``true'' space density, i.e., that the point distribution is wellrepresented by cases (i) or (ii) above. This corresponds to the assumption the relative fluctuations between the average density estimator and the ``true'' density are smaller than unity. In general, this is a very strong assumption that may lead to underestimating finite size effects in the statistical analysis.
For instance, let us suppose that the distribution inside the given sample is not uniform, i.e. cases (iii) and (iv) above. In this case the results of the statistical analysis are biased by important finitesize effects, so that all estimations of statistical quantities based on the uniformity assumption (i.e. the twopoint correlation function and all quantities normalized to the sample average) are affected, on all scales, by this a priori assumption that is inconsistent with the data properties (Gabrielli et al. 2005). In addition, while for case (iii) one may consider a class of whole sampleaveraged quantities, i.e. conditional statistics^{}, in case (iv) these become meaningless.
For this reason, our first aim is to study whether galaxy distribution is selfaveraging by characterizing conditional fluctuations. If the distribution is selfaveraging, then one can consider a wholesample average quantity and study the possible transition from nonuniformity to uniformity by characterizing the behavior of, for instance, the conditional density. If the distribution is uniform, or becomes uniform on a certain scale smaller than the sample size, one can characterize the (residual) correlations between density fluctuations by studying the standard twopoint correlation function. Therefore the consideration of is the last point on this list, and it is appropriate only if one has proved that the distribution is selfaveraging and uniform inside the given sample.
These issues are relevant in studies of the galaxy distribution because in the past twenty years it is has been observed that galaxy structures are organized in a complex network of clusters, filaments, and voids on scales up to hundreds of Mpc (see Einasto et al. 2006b; Gott et al. 2005; Geller & Huchra 1989; Broadhurst et al. 1990; Kirshner et al. 1983; Giovanelli & Haynes 1993; Einasto et al. 2006a). From the statistical point of view, the problem is whether these structures are compatible with the very small characteristic length scale of the galaxy distribution of about ten Mpc. This is the scale at which the twopoint correlation function is equal to unity, and it has been measured to be in the range of 515 Mpc/h in different (angular and threedimensional) catalogs (Zehavi et al. 2005; Norberg et al. 2002; Zehavi et al. 2002; Davis et al. 1988; Davis & Peebles 1983; Park et al. 1994; Totsuji & Kihara 1969; Norberg et al. 2001; Benoist et al. 1996). The essence of the problem is not whether these measurements have been properly made as indeed they have been, but whether the statistical methods used to get this result are consistent with the properties of the galaxy distribution in these samples (see Gabrielli et al. 2005; Sylos Labini et al. 2009c).
By measuring the redshiftdependent luminosity function and the comoving radial density of galaxies in the Sloan Digital Sky Survey (SDSS) Data Release 1 (DR1), it has been found that the apparent number density of bright galaxies increases by a factor 3 as redshift increases from z = 0 to z = 0.3 (Loveday 2004). To explain these observations, a significant evolution in the luminosity and/or number density of galaxies at redshifts z < 0.3 has then been proposed (Loveday 2004). However, an independent test has not been provided to support such a conclusion; in particular, the possible effect of large density fluctuations on the basic assumptions used in this analysis (i.e. largescale uniformity of the density field) was not tested, although it was noticed that these results do not preclude significant density fluctuations in the local universe on very large scales. In what follows, we will carefully consider these results and present a different conclusion for these observations, namely that galaxy clustering on very large scales is certainly making an important contribution to the observed behaviors of galaxy counts.
Regardless of the origin of the big change in the spatial density found by Loveday (2004), we note that the density varies by a factor three within the given sample implies that it is meaningless to derive amplitudes of fluctuations with respect to this quantity. Indeed, in this situation the estimation of the amplitude of fluctuations normalized to the sample density is biased by systematic effects, and whole sampleaveraged quantities, such as the twopoint correlation function and the powerspectrum, are not meaningful and stable statistical descriptors. Another question we address here in more detail concerns the physical origin of the density growth. As mentioned, while in Loveday (2004) it is concluded that the density growth comes from evolution leaving, however, open the question of the contribution of large scale structures, we concluded that this stems from largescale fluctuations (Sylos Labini et al. 2009c). Here we show that, if relevant on such low redshifts, galaxy evolution is not the main cause of the measured behaviors. This result is reached by performing several specific tests that include some rough determinations of the effect of evolution as in Blanton et al. (2003); Tegmark et al. (2004).
The paper is organized as follows. In Sect. 2 we give a brief overview of our statistical methods, stressing the role of assumptions and the properties of conditional and unconditional fluctuations. Then in Sect. 3 we discuss the procedure used for selecting the data from the SDSSDR6 (AdelmanMcCarthy et al. 2008) archive and the various corrections applied to constructing the samples used in the analysis. In Sect. 4 we discuss the main results of the statistical analysis we considered, that concerns the study of conditional fluctuations in the SDSS samples and their PDF. Then, in Sect. 5 we compare the conditional fluctuations in the real galaxy samples with the predictions of theoretical models and with those measured in mock galaxy catalogs constructed from cosmological Nbody simulations. These are the outcome of gravitational Nbody simulations of a concordance model, i.e. a Cold Dark Matter (CDM) model (Springel et al. 2005), and represent the predictions of theoretical models for the correlation properties of nonlinear structures (Croton et al. 2006). Finally Sect. 6 we draw our main conclusions.
2 Overview of statistical methods
There are several a priori assumptions that are generally used in statistical studies of galaxy samples and that require detailed consideration (see Gabrielli et al. 2005). Galaxy distribution is considered to be a realization of a stationary stochastic point process. This means that it is assumed to be statistically translationally, and rotationally invariant, thereby satisfying the conditions of statistical isotropy and homogeneity in order to avoid special points or directions. These conditions are enough to satisfy the Copernican principle, i.e., that there are no special points or directions; however they do not imply spatial homogeneity. Indeed an inhomogeneous distribution can satisfy the Copernican principle even though this is characterized by large voids and structures (Sylos Labini 1994; Gabrielli et al. 2005; Joyce et al. 2000).
Figure 1: Four different realizations of the toy model discussed in the text. Upperleft panel: uniform distribution with shortrange positive correlations (T0). Upperright panel: uniform distribution with longrange positive correlations (T1). Bottomleft panel: nonuniform distribution with longrange positive correlations (T3). Bottomright panel: nonuniform distribution with longrange positive correlations and non selfaveraging properties (T4). 

Open with DEXTER 
2.1 A brief summary of the statistical properties
We now briefly discuss several properties of stochastic point processes (SPP) that are useful in the rest of the paper (Gabrielli et al. 2005).
 A stationary SPP (SSPP) satisfies the conditions for a statistically translational and rotational invariant. It can be uniform (spatially homogeneous) or nonuniform (spatially inhomogeneous).
 An SSPP is ergodic if the ensemble average of a statistical quantity characterizing its properties equals its infinite volume average. In a finite volume, only volume averages determinations are defined (i.e. estimations of statistical quantities). The ergodicity of an SSPP is a necessary assumption when one wants to compare volume average quantities with theoretical predictions.
When V in Eq. (1) is finite, then is a statistical estimator of in a given sample. Therefore the assumption of ergodicity is necessary if we want to use a statistical estimator to verify a theoretical prediction, which is expressed in terms of ensemble averages.
 An SSPP is uniform if, in a finite but large enough sample, fluctuations in the density are small enough. For instance, the scale at which an SSPP becomes uniform can be defined to be scale beyond which the fluctuations on the average density filtered on that scale are of the same order of the average density itself, and then they are smaller on larger scales. To test whether an SSPP is uniform one can use conditional properties, which are defined also when the SSPP is not uniform.
 A uniform SSPP inside a given sample has a welldefined average density, i.e. the sample determination is representative of the ensemble value within some relative small errors. Alternatively the amplitude of the twopoint correlation function is small enough on large scales to guarantee that positive average density exists. This is, however, a necessary but not a sufficient condition, as the amplitude of estimator of this function can be small also for a nonuniform distribution in a finite sample. In the latter case however the amplitude is not a significant statistical measurement.
 An SSPP has a welldefined crossover to homogeneity, if it is nonuniform on scales smaller than and uniform on larger scales . The length scale marks the transition from the regime of large to small fluctuations. At scales onepoint statistical properties (i.e. unconditional properties) are well defined. To study the approach to uniformity one should consider conditional properties.
 A uniform SSPP can have long range correlations, i.e. characterized by a nonzero twopoint correlation function at all scales. This latter case describes the case of an LCDM model, which is indeed characterized by large scale superhomogeneity (Gabrielli et al. 2002). A system can be uniform and, at the same time, longrange correlated only if the amplitude of the twopoint correlation function is small enough on large scales.
 The range of correlations for a uniform SSPP is measured by the functional behavior of the twopoint correlation function . If the system has critical correlations, is a powerlaw function of distance.
 An SSPP is nonuniform (or spatially inhomogeneous), inside a given sample, if the conditional density does not converge to a constant value. If the distribution is selfaveraging (see below) and nonuniform then the conditional density is a varying function of the distance. When this does not change anymore as a function of distance, the distribution uniform.
 To test whether a nonuniform SSPP is selfaveraging in a finite volume and on a certain scale r, one may study the PDF of conditional fluctuations. If this is not statistically stable in different subvolumes of linear size r, then the selfaveraging property is not satisfied.
and variance
a system is said to exhibit selfaveraging if (Aharony & Harris 1996)^{}
In such a case, a single large system is enough to represent the whole ensemble. When there are longrange correlations, the property of selfaveraging is non trivial as selfaveraging requires the size L of the sample to be larger than the range of correlations (Aharony & Harris 1996). The concepts of ergodicity and selfaveraging refer to two different properties of a stochastic process; namely ergodicity of the variable F implies Eq. (1) while the selfaveraging property has to be ascribed to the ensemble variable , which is determined in a finite sample.
Figure 2: Conditional density for the toy models shown in Fig. 1. The case of a Poisson point distribution is added as a reference. (The conditional density has been normalized to the number of points in the simulations.) The model T1 has a shortrange correlation, which corresponds to a fast decay of . The model T2 is still uniform on large scales, i.e. is flat. The models T3 and T4 have strong clustering up to the box size. 

Open with DEXTER 
Finally it is worth noticing that, if the distribution is uniform, for the cases in which correlations are both short or long ranged, any global (spatially averaged) observable of the system has Gaussiantype fluctuations, in agreement with the central limit theorem. When there are longrange correlations of large amplitude the central limit theorem does not hold and fluctuations in global quantities usually have nonGaussian fluctuations (see Antal et al. 2009, for a more detailed discussion).
2.2 A toy model
To further clarify the concepts previously illustrated, we discuss a simple toy model. We generate a stochastic point distribution as follows. We distribute randomly in two dimensional Euclidean space rectangular sticks and the points within each stick. The center of each stick and its orientation are chosen randomly in a box of side L (for simplicity we fix L=1). The points of each stick are placed randomly within its area, which for simplicity we take to be ; these points have constant density within each stick. The length scale can vary as can the number of sticks placed in the box. Different realizations of this toy model are shown in Fig. 1. The conditional average density (see Eq. (2) below), i.e. the average density computed in spheres whose center is a distribution point, is shown in Fig. 2 and the PDF of the conditional density in Fig. 3.
By taking the dimension of the sticks (in this case equal for all sticks) small enough and the number of sticks large enough, one has generated a uniform distribution with positive correlations, i.e. , on small scales (Fig. 1 upper left panel  model T1). In this case the average conditional density (Fig. 2) rapidly decays to a constant value, and the PDF of fluctuations (Fig. 3) is approximated by a Gaussian function. When the dimension of the sticks is increased and their number still large enough, then the distribution is still uniform, but it is positively correlated on larger scales. In the example shown in Fig. 1 (upper right panel  model T2) the dimension of the sticks is about the box side, making it a uniform distribution with (weak) correlations extending up to the box size. In such a situation, the average conditional density reaches a constant value on a scale (the homogeneity scale) comparable to the box size (Fig. 2). Correspondingly, the PDF is Gaussian only when fluctuations are filtered on scales comparable to the homogeneity scale (Fig. 3).
We can then increase the dimension of the sticks further and decrease their number (Fig. 1, bottom left panel  model T3). In this case the distribution is not uniform, as there are holes as large as the sample. The density thus presents large fluctuations and it is not a welldefined quantity on the sample scale. This is clearly a positive correlated distribution, with longrange correlations (up to the sample size in this case) of large amplitude. This is shown by the behavior of the average conditional density (Fig. 2), which does not converge to a constant value inside the box. Therefore this is not a uniform distribution; indeed, the PDF of fluctuations (Fig. 3), filtered on large enough spatial scales, does not converge to a Gaussian function. To show whether the distribution is selfaveraging inside the simulation box, one may compare the full PDF with the ones measured in two half parts of the box. One may see from Fig. 3 that, although there are differences, the shape of the PDF is similar in the two subsamples. Particularly, the peak and the width of the three PDF are approximately the same.
Finally we can take sticks with different . In the example shown in Fig. 1 (bottom right panel  model T4), this is the same for all but for a single stick that has a larger than the sample size. As for the previous case this is a strongly correlated distribution, which is not uniform inside the box. Indeed, the average conditional density does not flatten inside the box (Fig. 2). In addition, this distribution is not selfaveraging. Indeed, by measuring the PDF of conditional fluctuations in different regions of the sample (say the upper and the bottom parts  see Fig. 3), one finds systematic (i.e., not statistical) differences. This is an effect of the strong correlations extending well over the size of the sample.
A quantitative measurement of the breaking of the selfaveraging property is represented, for instance, by determining the first and second moment of the PDF and by checking whether they are stable in different subregions of the samples. One may note from Fig. 3 that, for the model T4, both the peak and the width of the PDF are different when measured in different sample subregions or in the whole sample box, thus indicating the breaking of selfaveraging.
Figure 3: PDF of conditional fluctuations (black line) filtered at 1/10 of the sample size (i.e., r=0.1) for the toy models shown in Fig. 1. Also shown is the PDF computed in two parts of the box, i.e. for y>0.5 (red line) and y<0.5 (green line). Both the models T1 and T2 approach to a Gaussian distribution (blue dashed lines), as these distributions are uniform although correlated. The PDF of the model T3 does not approach a Gaussian function but it is selfaveraging inside the box. Finally the PDF of the model T4 is not Gaussian and it does not show selfaveraging properties. 

Open with DEXTER 
2.3 Strategy for a statistical analysis of a finitesample distribution
In a finite sample we need to set up a strategy for testing
the different assumptions used in the statistical analysis.
To this aim we have to make a clear distinction between
statistical quantities that are normalized to the sample average
density and those that are not. Given that the primary scope of our
study is to determine whether a statistically meaningful estimate of
the average density is possible in the given samples, we mainly use
statistical quantities that do not
require the assumption of homogeneity inside the sample and thus avoid
the normalization of fluctuations to the estimation of the sample
average. These are thus conditional quantities, such as the conditional
density n_{i}(r) from the galaxy, which gives the density in a sphere of radius r centered on the galaxy.
Conditional quantities are welldefined both in the case of homogeneous
and inhomogeneous point distributions. If a distribution is
selfaveraging inside a given sample or in the range of scales where
such a property is found to hold, then it is possible to consider the
whole sample average of the conditional density, which is determined by
computing
with respect to the i=1,...,M galaxies contained in the given sample. When a distribution is nonuniform (i.e. inhomogeneous), the conditional variance, which quantifies the amplitude of conditional fluctuations, is such that
where the last equality corresponds to the fluctuations being persistent (Gabrielli & Sylos Labini 2001). On the other hand for homogeneous distributions, with any kind of smallamplitude correlations we find that (Gabrielli & Sylos Labini 2001)
(4) 
To test whether a distribution is selfaveraging inside a given sample one may measure the PDF of conditional fluctuations and determine whether this is stable in different subregions of the given sample. Only when the statistical selfaveraging property is satisfied may one consider determining wholesample average quantities. Then only if the conditional density is roughly constant inside a given sample, and thus the distribution in that sample is approximately uniform, may one consider determining fluctuations in amplitude and their correlations normalized to the sample density.
Quantities like the twopoint correlation function, whose estimator can generally be written as (Gabrielli et al. 2005)
measure the correlation amplitude of fluctuations with respect to determining the sample average ^{}. When the distribution is nonuniform the estimation of the sample average is illdefined, even if the distribution is selfaveraging inside the sample volume, resulting in systematic effects in determining the estimator (Gabrielli et al. 2005). Thus unconditional quantities are only welldefined for uniform distributions.
3 The samples
The SDSS (York et al. 2000) is currently the largest spectroscopic survey of extragalactic objects, and here we consider the data from the public data release six (SDSS DR6) (AdelmanMcCarthy et al. 2008)^{} containing redshifts for about 800 000 galaxies and 100 000 quasars. There are two independent parts of the galaxy survey in the SDSS: the main galaxy (MG) sample and the luminous red galaxy sample. We only discuss the former. The spectroscopic survey covers an area of 7425 square degrees on the celestial sphere. The Petrosian apparent magnitude limit with extinction corrections for the galaxies is 17.77 in the rfilter and photometry for each galaxy is available in five different bands. A detailed discussion of the spectroscopic target selection in the SDSS MG sample can be found in Strauss et al. (2002).
3.1 The query from the SDSS database
We used the following criteria to query the SDSS DR6 database, in particular from the SpecPhoto view (Strauss et al. 2002; AdelmanMcCarthy et al. 2008):
 We constrained the flags indicating the type of object so that we select only the galaxies from the MG sample, i.e. (specClass = 2 and (primTarget & 64) > 0 or (primTarget & 128) > 0 or (primTarget & 256) > 0).
 We constrained the redshift confidence parameter to be with flags indicating no significant redshift determination errors, i.e. zConf > 0.35 AND zWarning & 193 821 = 0 AND NOT zStatus IN (0, 1, 2).
 We then considered galaxies in the redshift range , i.e. z >= 0.0001 AND z <= 0.3. Given the low value of the lower redshift limit, nearby galaxies that are large enough may get ``shredded'' into smaller pieces by the SDSS automatic pipelines and may represent an unwanted contamination to the data. However, these are excluded by considering samples that have not a too low redshift limit and that do not contain extremely bright galaxies (see below).
 We applied the filtering condition for Petrosian apparent magnitudes with extinction corrections r < 17.77, thus taking the target magnitude limit for the MG sample in the SDSS DR6 into account, i.e. (petroMag_r  extinction_r) < 17.77.
3.2 The angular regions
We use the internal angular coordinates of the survey , which can be transformed into the usual equatorial angular coordinates by a simple rotation. The angular coverage of the survey is not uniform, but observations have been done in different disconnected sky regions. For this reason we have considered three rectangular angular regions in the SDSS internal angular coordinates: in this way we did not have to consider the irregular boundaries of the survey mask, as we cut such boundaries to avoid uneven edges of observed regions. In Table 1 we report the limits of the cuts are chosen using the internal coordinates of the survey and (in degrees) and the sample solid angle in steradians. We did not use corrections for the redshift completeness mask or for fiber collision effects. Completeness varies most near the current survey edges, which are excluded in our samples. Fiber collisions in general do not present a problem for measurements of largescale galaxy correlations (Strauss et al. 2002).
Table 1: Properties of the angular regions considered.
Let us add a comment on incompleteness, which we concluded does not play a major role in out results. This conclusion is reached by considering several fact.
(i) All statistical quantities we measured, such as counts of galaxies as a function of apparent magnitude, the redshift distribution in the magnitudelimed sample, and the measurements of the correlation function in volumelimited samples, agree very well with previous works that have taken into account the variation in completeness in the whole survey area (Zehavi et al. 2005,2002). This implies that there are no major differences in the way we treated the data, while there is a substantial difference in the interpretation of the results of the statistical analysis, as we discuss below.
(ii) Some authors use the method of making a random catalog with the same selection function of the real sample, and to this aim the detailed information given by the survey completeness mask is used. The completeness mask takes (mainly) into account that the limiting magnitude has small variations in different fields, so that a small fraction of galaxies in the photometric catalog have not been observed. There is not a way, that is free of a priori assumptions, to correct for such an incompleteness. Given that the detailed information of the real galaxy distribution is unknown one has to make some assumptions on the statistical properties of such a distribution. On the other hand, a way of checking the possible effects of incompleteness free of assumptions, is to limit the selection of galaxies to more stringent limits in apparent magnitude, especially for faint magnitudes. By limiting the apparent magnitude to 17.5 instead of 17.77, we found no statistical difference from the results presented in what follows (Sylos Labini et al. 2009c). A similar conclusion on the survey incompleteness has been found in the twodegree field galaxy redshift survey (2dFGRS) (Sylos Labini et al. 2009b). In addition it has been shown by Cabré & Gaztañaga (2009) that the completeness mask could be the main source of systematic effects only on small scales, while we are interested in the correlation function on relatively large separations.
3.3 The volumelimited samples
To construct volume limited (VL) samples that are unbiased for the
selection effect related to the cuts in the apparent magnitude, we
applied a standard procedure (see Zehavi et al. 2005).
Firstly we computed metric distances as (Hogg 1999)
where we used the cosmological parameters and for the concordance model. We checked that our results do not depend significantly on the choice of cosmological parameters when taken in a reasonable range of values. This is expected since the redshift involved in these studies is limited to and relativistic redshiftdistance corrections are generally small and linear to the redshift for z <1.
Second, the galaxy absolute magnitude was determined to be (see Zehavi et al. 2005)
where K_{r}(z) is the Kcorrection.
We determined the K_{r}(z) term from the NYU VACG data^{} (Blanton et al. 2005): to calculate the Kcorrection, a template fit to observed galaxy fluxes was used (Blanton & Roweis 2007).
To match these data with the results of the query form
SDSS data archive we applied the following criteria:
(i) right ascension and declination must match within
1 arcsec; (ii) relative difference between redshifts should
be less than .
With these constraints we find 517 729 galaxies which
successfully matched and 8084 galaxies which are not matched. For
unmatched galaxies we considered a polynomial approximation to K_{r}(z)
where a_{0} = 0.006, a_{1}=0.847 and a_{2} = 1.232. The behavior of Eq. (7) corresponds to the average Kcorrection of matched galaxies.
As discussed above, the MG sample corresponds to the observations, in a certain sky area, of all galaxies with apparent magnitude in a given range. There is thus an intrinsic selection effect because faint galaxies can only be observed if they are close enough to us, while brighter galaxies can be observed both at low and high redshift. Thus to avoid this observational selection effect, a VL sample is defined by two cuts in distance and two in absolute magnitude, so that it covers a rectangular area in the Mz diagram (Zehavi et al. 2005). To define VL samples, we restricted apparent magnitudes to the range , with the bright limit imposed to avoid the small incompleteness associated with galaxy deblending (Zehavi et al. 2005). In Table 2 we report the limits of the five VL samples we have considered: , (in Mpc/h) are the chosen limits for the metric distance; define the interval for the absolute magnitude in each sample. In Table 3 we report the number of galaxies in each of the three angular regions for the five VL samples: in the second column there is the case where Kcorrections have been applied, the third column without K+Ecorrections and the fourth column with K+Ecorrections.
Table 2: Main properties of the obtained VL samples with Kcorrections and without Ecorrections.
Table 3: Number of galaxies in each of the VL samples (VL1, ..., VL5) and in each region (R1, R2, R3).
In what follows we make a detailed study to understand the effects of K and of other redshift dependent corrections. The reason these corrections could play a role is that they introduce a redshiftdependent behavior of secondary quantities (absolute magnitude and distance) when they are derived from primary quantities (redshift and apparent magnitude). As several statistical quantities we discuss in Sect. 4 show a distance (or redshift) dependence, one may ask whether there is an effect from these corrections. To constraint the possible effects of these corrections, we discuss two different choices of them in Sects. 3.4, 3.5. We refer to Appendix A for a discussion of the derivation and the role of the cosmological corrections.
3.4 Effect of Kcorrection
To study the effect induced by the Kcorrections on the correlation analysis discussed in what follows, we constructed a set of VL samples without applying the K_{r}(z) term in Eq. (6). This choice is clearly not justified from the physical point of view and can be interpreted as a way to introduce a general linear redshiftdependent correction to the absolute magnituderedshift relation. The limits in distance of the corresponding VL samples are the same as for the samples with Kcorrections, and the limits in absolute magnitude are all the same, except for VL2 and VL5 where there is a difference of 0.1 mag while the range in absolute magnitudes is the same (see Table 4). In Table 3 we report the number of galaxies for the five VL samples: one may note that the main changes occur in the deepest samples (i.e. VL4 and VL5) where the number of objects decreases by a factor of .
Table 4: The same as for Table 2 but for VL samples without Kcorrections and without Ecorrections.
3.5 Effect of Ecorrection
According to standard models of galaxy formation (Kauffmann et al. 2003), because of the evolution of stars, elliptical and spiral galaxies were more luminous in the past. To take this physical change in the galaxy properties into account, one should include some corrections to the measured luminosity. These corrections are generally unknown; i.e., there is not an adequate model of evolution to allow for proper calculation of the corrected absolute magnitudes. For this reason and because smallscale clustering at low redshift is thought not to be affected by galaxy evolution, these corrections have frequently been neglected in the construction of VL samples, e.g., (see Zehavi et al. 2002). However, this omission is a reasonable working hypothesis only if one considers local (conditional) quantities. Indeed, as we discuss below, when one normalizes fluctuations to the sample average, one uses information concerning all scales in the sample, so that all statistical quantities derived by such a normalization are affected by the largescale properties of the distribution inside the given sample.
As discussed in Appendix A, the formula
E(z) = 1.6 z has been used more recently as a simple fit for the average evolution in galaxy luminosities in the recent past (see Zehavi et al. 2005; Tegmark et al. 2004)^{}. In this situation the Ecorrected absolute magnitude is
The limits in distance for the samples with E+Kcorrections are the same as in the Kcorrected samples while the limits in absolute magnitude change (see Table 5). For this reason a rough comparison of the number of objects in each VL sample is not meaningful. In Table 3 we report the number of galaxies for the five VL samples.
Table 5: The same of Table 2 but for VL samples with E+Kcorrections and without (see text for details).
4 Scalelength analysis
As discussed in Sect. 2,
the main stochastic variable that we consider and for which we
determine statistical properties is the conditional number of points in
spheres^{}.
That is to say, we compute for each scale r the
determinations of the number of points inside a sphere of radius r whose center is on the galaxy. The number of centers M, as we discuss in more detail below, depends on the sphere radius r, i.e. M=M(r). The random variable N_{i}(r) thus depends on scale r and on the spatial position of the sphere's center. We can express the sphere center coordinates with its radial
distance R_{i} and with its angular coordinates
.
Thus, in general, we can write
When we integrate over the angular coordinates for fixed radial distance R_{i}, we find that N_{i}(r)=N(r; R_{i}); i.e., it depends on two variables the length scale of the sphere r and the distance scale of the sphere center R_{i}, so it has been called the scalelength analysis (Sylos Labini et al. 2009c).
4.1 Number of centers as a function of scale
The reason the number of centers M(r) depend on the scale r follows. The sample geometry is a spherical portion delimited by the minimal and maximal value of the radial distance and by the angular coordinates reported in Table 1. For the galaxy, with coordinates ( , we compute the six distances from the boundaries of the sample and consider the minimal one . By simple geometrical considerations these distances are
(10) 
and . The length scale corresponds to the radius of the largest sphere, which is centered on the position of the galaxy and which is fully contained in the sample volume. As in Eq. (9) we consider only fully enclosed spheres in the sample volume, then the galaxy will not be included in M(r) as long as the sphere radius is . In this situation for large sphere radii, M(r) decreases and the location of the galaxies contributing to M(r) is mostly placed at radial distances in the range , from the radial boundaries of the sample at , .
One could also make the choice considering incomplete spheres, i.e. spheres that are only partially contained in the sample volume. In this case, one could then weight the number of points inside the incomplete sphere by the volume of it contained in the sample, thus obtaining a more robust statistics, especially on large scales. We avoid this for the following reason. Suppose that outside the sample there is a largescale structure (or a deep underdensity): the weighting above will underestimate (or overestimate) the real number of points inside the full sphere with respect to the incomplete one. This inevitably introduces a bias in the measurements, which affect largescale determinations. As it is precisely the scope of our study to determine the properties of large spatial fluctuations, we avoid using a method that implicitly assumes that these are irrelevant (Sylos Labini et al. 2009b; Gabrielli et al. 2005).
Given their different sizes, the number of centers as a function of scale M(r) is quantitatively different in each of the five VL samples. However, one may note from Fig. 4 that the behavior of M(r) is similar in the different cases. For small sphere radii almost all galaxies are included; i.e., M(r) is equal to the number of points contained in the sample. Instead, when the sphere radius becomes comparable to the size of the largest sphere radius which is fully contained in the sample volume, M(r) shows a fast decay. The scale at which this occurs, grows proportionally (taking the sample solid angle fixed) to the depth of the VL sample. The largest scales explored in this survey, i.e. Mpc/h, can only be reached with the deepest VL samples.
Figure 4: Number of centers M(r) as a function of scale in the five VL samples (see text for details). 

Open with DEXTER 
4.2 Probability distribution of conditional fluctuations
The main information about the statistical properties of the random variable N_{i}(r) is provided by its PDF, P(N,r). This gives the probability distribution to find N points in a spherical volume of radius r centered on a distribution point. It should be noticed that this is different from the PDF of unconditional fluctuations, which provides the probability density that in a spherical volume of radius r centered on an arbitrary point of space, there are N points (Saslaw 2000). Only when unconditional properties are welldefined then does PDF of conditional and the unconditional give similar results (Gabrielli et al. 2005).
The frequency distribution in bins of conditional fluctuations at fixed scale r
gives an estimation of the PDF at that scale. The error bars are
computed as the square root of the number of points in each bin.
To compare the behavior in different VL samples, which are
defined by different luminosity cuts so generally containing galaxies
of different absolute magnitudes, we define the normalized variable
and we determine its PDF, that is.
where P(N(r))=P(N,r) is the PDF of the variable N_{i}(r), is its estimated whole sample first moment and is the estimated standard deviation on the scale r.
In Fig. 5 we show the PDF, estimated in the region R1 only, of the VL samples with Kcorrections, of the samples where E+K corrections have been applied and finally of the samples in which no corrections have been imposed. In Fig. 6 we also show, but only for some cases, the PDF with the estimated Poisson error bars, together with the best fit obtained by a Gaussian function. The PDF is not affected by E and/or K corrections even in the deepest samples as VL4 and VL5. For this reason, and given that Ecorrections are not welldefined, as discussed above, in what follows we mostly focus on the case where only Kcorrections have been applied.
Figure 5: Conditional PDF on different scales for the 5 VL samples (each row corresponds to a VL sample; the scale r is reported in the caption) with K corrections (black), with K+E corrections (red) and without K+E corrections (green). 

Open with DEXTER 
Figure 6: The PDF in different samples (with Kcorrections only) and for different sphere radius with the bestfit Gaussian function (see captions). Poisson error bars are reported as a reference. 

Open with DEXTER 
It is interesting to compare results for r=5, 10, 20, 30 Mpc/h in different Kcorrected VL samples (see Fig. 7): the PDFs collapse fairly into one another^{}. The overall shape is characterized by a long (or fat) tail, slowly decaying, for x values high, which makes it substantially different from a Gaussian function. This is the effect of the large structures (i.e. large fluctuations) contained in these samples. Similar behaviors have been found in the 2dFGRS (Sylos Labini et al. 2009b,a).
Both for small (i.e., r<30 Mpc/h) and large (i.e., r>30 Mpc/h) the PDF does not even converge to a Gaussian function. Actually, for the largest sphere radii (i.e. r=80, 100 Mpc/h), in the sample VL4 and VL5, the PDF shows a relatively long tail for low x values followed by a sharp cutoff at values higher than the peak of the PDF. We interpret this behavior as due to an intrinsic bias, because given the finiteness of the sample volume, only a few structures can be contained in it and thus this statistical measurement cannot properly give a reliable estimate of largescale fluctuations. Already in the less distant samples (e.g., VL1, VL2, and VL3), the main trends discussed above are clearly present up to sphere radii Mpc/h. The distant samples (e.g., VL4 and VL5), where the effect of other cosmological corrections maybe more important, allow us to reach the scales of Mpc/h.
To summarize the main result: (i) the PDF is not affected by E and/or K corrections. (ii) For scales on which conditional fluctuations are selfaveraging and the PDF is stable in different sample subregions, i.e. for r<30 Mpc/h, the overall shape of the PDF is characterized by a long (or fat) tail that makes it substantially different from a Gaussian function. (iii) For r>30 Mpc/h, the PDF does not converge to a Gaussian function and it has a different shape in different samples. In the next section we present specific measurements to study the largescale properties of conditional fluctuations in these samples testing selfaveraging properties.
4.3 Test for statistical selfaveraging
To study the origin of the differences in the behavior of the PDF in different VL samples, for large enough sphere radii, we can consider a specific test. This is useful for studying the selfaveraging properties of the distribution in a given sample. This test allows us to establish whether, inside a given sample, it is meaningful to derive, for instance, wholesample average quantities and whether we can consider that a certain estimator gives a reliable and stable measurement of the ensemble properties of the distribution.
We divide the sample volume into two nonoverlapping subvolumes of same size, one near of volume V_{n}, and the other more distant of volume V_{f}, and we determine whether statistical quantities are stable or show systematic differences in these subsamples. In principle the ideal test would be to compute the PDF in many different and nonoverlapping subvolumes, more than the two we use here. The limitation we face in doing this stems from only the data available in the SDSSDR6 and the corresponding sample volumes. In the future data releases, once the regions R1, R2, and R3 will become contiguous, we will be able to consider more subvolumes of a single sample.
Given the two limits of the sample in radial distance,
and
,
we computed the distance R_{h} at which V_{n} = V_{f}, thus obtaining
(13) 
To increase the statistics, for a large enough sphere radius r, we have allowed the center of a sphere of radius r to be at a distance d from R_{h} such that r>d. In this situation the sphere, whose center is placed, e.g., in the less distant subsample, has part of its volume in the more distant subsample and viceversa. Thus a certain overlap of the determinations of N_{i}(r) is allowed between the two halfregions. This method gives a conservative estimate of the actual fluctuations between the subsamples. Indeed the overlapping of different determinations clearly smooths out fluctuations between the two subsamples: thus any difference we find is certainly a genuine feature of the distribution.
In addition for each VL sample we consider the PDF determined by all the values, at fixed r, in all three sky regions. The determination of N_{i}(r) has to be done separately, for each VL sample, in the three different sky regions R1, R2, and R3 because of the geometrical constraints discussed above. This allows us to improve the statistics, although the R1 region contains about a factor 10 more galaxies than the other two regions and its larger volume allows many more determinations than in the two other regions. In particular for a large enough sphere radius only the values in the R1 region can be measured.
Results for Kcorrected samples are shown in Fig. 8. The peak of the PDF in the two half volumes of the different VL samples is located approximately at the same N value for Mpc/h: although in this range of sphere radii a difference is sometimes detectable in the location of the peak (e.g., in the samples VL4 and VL5), the overall shape of the PDF does not substantially change in the two subvolumes; instead for r>30 Mpc/h, the whole PDF shows a systematic shift, because the shape is very sensitive to the different kinds of fluctuations (structures) present in each subvolume. In this situation the estimation of the first and second moment in the whole sample is affected by systematic effects that preclude a statistically meaningful information from them.
In all samples but VL2, the PDF is shifted more toward lower N values in the nearby part of the sample than in the more distant one. This occurs because fluctuations are generally wilder in the more distant part of the sample. This is the effect of the sample geometry: larger structures can only be found where the geometry of the sample volume allows to them contain and indeed this happens toward the far boundaries of the samples.
The sample VL2, for r> 20 Mpc/h, shows an interesting and peculiar feature: particularly, the PDF in the nearby subvolume is shifted toward higher N values than that in the more distant one. In this case, there is a large underdense region for R>220 Mpc/h extending up to the limits of the sample at R=300 Mpc/h (see discussion below). The trend found in VL2 is interesting, because it shows that there is not only the occurrence of large fluctuations in the more distant part of the sample volume, which could be thought to be ascribed to a systematic selection effect other than structures. It shows instead that there is not such a systematic trend in each of the samples.
This situation clearly agrees with the behavior of the whole sample PDF discussed in the previous section, particularly that there are, at the same sphere radius r, detectable changes in shape of the PDF in different VL samples. This implies that the sample volumes are not large enough to allow stable determination of the PDF and its moments for sphere radii r>30 Mpc/h.
Figure 7: Normalized PDF (see Eqs. (11), (12)) for r=5,10,20,30 Mpc/h in the five VL Kcorrected samples. 

Open with DEXTER 
As a final remark, to reach the important conclusion about non selfaveraging properties of conditional fluctuations, when they are filtered on scale r>30 Mpc/h, it is enough to consider the nearby samples VL1, VL2, and VL3. In these samples, due to the narrow range of redshifts involved, any other type of cosmological correction than the ones considered here, is expected to perturb our results a little.
Figure 8: PDF in the two subvolumes of the Kcorrected VL samples (each row corresponds to a VL sample): the black line marks the PDF in nearby subsample, and the red line in the more distant subsample. The xaxis reports the number of points N(r) (the scale r is reported in the caption) and the PDF P(N;r) is on the yaxis. 

Open with DEXTER 
4.4 Effect of Kcorrections and evolutionary corrections
As illustrative examples of the situation in the samples with E+K corrections, and in those where no corrections are applied at all, we show in Fig. 9 the cases of VL3 and VL5. In the former one the corrections, because of the relatively high redshifts involved, are expected to modify the behaviors more. As one can see from the above figures, there is no substantial change with respect to the case where only Kcorrections are applied. Thus even in this case, the effect of K+E corrections represents minor modifications to the measured behaviors.
Figure 9: As Fig. 8 but for the K+Ecorrected VL3 and VL5 samples ( left) and for the same samples without K+E corrections ( right). 

Open with DEXTER 
4.5 Average in bins
Figure 10: Behavior of the local average of (see Eqs. (14), (15)) normalized to the whole sample average (see Eq. (19) below) in bins of thickness Mpc/h for sphere radius r=10 Mpc/h normalized to the whole sample average for the 5 VL samples with Kcorrection (K), with evolution and Kcorrection (E+K), and without evolution and Kcorrection (K0). The insert panel shows the number of centers, over which the average and variance are computed in each bin. In the bottom right panel we report the behavior of in bins of thickness Mpc/h for r=20 Mpc/h, normalized to the luminosity factors as explained in the text (see Sect. 4.6), for Kcorrected VL samples. 

Open with DEXTER 
To determine the features of galaxy structures on different scales, we now consider a local average of N_{i}(r)
computed in the following way. We divide the whole range of radial
distances in each VL sample, in bins of thickness
and we compute the average
where the sum is extended to the M_{b} determinations of N_{i}(r)=N(r;R_{i}) such that the radial distance of the center is in the interval range . Its variance can be estimated by
To study the sequence of structures and voids present in the samples, we choose a relatively small radial bin, i.e. Mpc/h, and consider the sphere radius r=10 Mpc/h. It is clear that, as , there is some overlap in the determinations in contiguous bins resulting in an artificial smoothing of the signal. This means that the fluctuations we detect in this way represent a lower limit to the real ones. In Fig. 10 we show the behavior of Eq. (14) in bins of thickness Mpc/h for sphere radius r=10 Mpc/h normalized to the whole sample average (see Eq. (19) below) for the three sets of five VL samples (region R1) with different corrections, as defined in Sect. 3. In VL1, VL2, and VL3 the signal is completely unaffected by corrections while in VL4 and VL5 there is a small effect that however, does not change the main trends. In addition the insert panels of Fig. 10 shows the number of centers which contribute to the average in each bin. The fact that this grows as a function of the radial distance reflects the limitations imposed by the sample geometry discussed in Sect. 4.1. Below we summarize the situation.
 In the VL1 sample there are fluctuations of . There is no a welldefined radialdistance trend; instead the scatter in the measurements corresponds to the location of largescale structures. The behavior is insensitive to the effect of the K and/or E corrections considered.
 In the VL2 sample there is a high overdensity in the radial distance range [180,220] Mpc/h which is followed by a sharp decay, signaling a relative underdensity for R>220 Mpc/h. Even in this case there is no detectable impact of K and/or E corrections considered.
 The high overdensity up to Mpc/h is also visible in the VL3 sample, and is followed by an underdensity in the range 220 < R < 270 Mpc/h. Beyond 300 Mpc/h there is another relative overdensity extending up to the sample boundaries. The effect of Ecorrections is to relatively amplify the overdensity at Mpc/h with respect to the underdensity on larger scales.
 The behavior in VL4 is similar to the one in VL3. Here the sharp fall in the average conditional density in bins at Mpc/h is followed by a relatively slow growth, which seem to saturate at about Mpc/h at about the same level as the fluctuation at Mpc/h. The effect of K and/or E corrections is to amplify the difference between amplitude of fluctuations at the short and long radial distances.
 Even in the case of the sample VL5 the average behavior is quantitatively but not qualitatively changed by the effect of K and/or E corrections. In this sample, as well as in VL4, there is a coherent trend over the whole sample volume, which is a signature of persisting largescale fluctuations.
4.6 Normalization of the behaviors in different VL samples
We can now normalize the behaviors of the radial density and of the average conditional density in bins discussed in Sect. 4.5
in the different VL samples. This is done by computing the
normalizing factors for the different VL samples assuming
Eq. (C.2) and by knowing the galaxy luminosity function (Joyce & Sylos Labini 2001). In this approximation the observed radial
density in the VL1 sample can be written as
(16) 
where L_{1} and L_{2} are respectively the limit at the faint and bright absolute luminosities of the sample VL1, and we have defined
(17) 
Clearly, the radial density, for instance, in the sample VL2, can be normalized to that of VL1 by computing
Hereafter, to compute the normalization factors we use the bestfit parameters to the luminosity function found in Appendix C^{}. The normalization factor for VL5 is the most uncertain because the measured luminosity function deviates from the simple Schechter function fit for bright magnitudes.
Figure 11 shows the distance behavior of the normalized radial counts of galaxies in the region R1. A persistent growth of the density for distances R>300 is found, while for smaller radial distances there is the fluctuating behavior already discussed in the Sect. 4.5. This is very similar^{} to Fig. 10 (bottomright panel) where we considered the average in bins of the SL analysis, i.e. Eq. (14), as a function of the radial distance for r=10 Mpc/h with the same normalization factors as are used for the radial density. Indeed the same approximations as are used to derive the radial density normalization can be used to normalize the average SL data. The normalization factors obtained in this way allow us to produce a single behavior from 50 Mpc/h to 600 Mpc/h. The main features are again the overdensity at Mpc/h, the relative low underdensity in the range [220,300] Mpc/h and the persistent growth for R>300 Mpc/h.
In this way we reach a completely different conclusion from that of Loveday (2004). Indeed, from the analysis of the luminosity function for galaxies selected in four redshift slices ( 0.001 < z < 0.1, 0.1 < z < 0.15, 0.15 < z < 0.2 and 0.2 < z < 0.3) and despite the uncertainties in the shape of the luminosity function in the redshift slices, Loveday (2004) concluded that there is clear evolution in the amplitude of the luminosity function, in the sense of an increasing amplitude (vertical shift) and/or luminosity (horizontal shift) with redshift. On the other hand, we conclude that the behavior of the radial counts of galaxies as a function of distance is consistent with the average conditional number of galaxies in spheres as a function of the radial distance i.e. Eq. (14). The behaviors of can be normalized simply by using the results obtained in the same samples for the luminosity function. Thus our conclusion is perfectly consistent with the measurements of the PDF presented in the previous section, and it does not imply that a strong evolution has occurred up to z=0.2. Rather, as discussed above for the behavior of average conditional density in bins, we can trace the various main structures in these samples: namely there are large fluctuations at about 200 Mpc/h followed by a large underdense region up to 400 Mpc/h, which is then followed by other coherent structures up to the sample limits.
4.7 The whole sample average and the variance
When Eq. (9) is averaged over the whole sample, it gives an estimate of the average conditional density
In Fig. 12 we show the wholesample average conditional density in the different Kcorrected VL samples, normalized by using Eq. (18). Contrary to the behavior of the radial number counts and of the SL statistics averaged in bins (Eq. (14)), in this case the behavior of the average conditional density in different samples do not overlap in a satisfactory way. This is due to the fact that the wholesample average is biased by the lack of selfaveraging properties and it does not give a reliable estimation of the ensemble quantity. Regardless of its amplitude the quantity shows a power law behavior with exponent D=2.2 0.1 up to Mpc/h. On larger scales, its determination is strongly affected by the nonself averaging properties of conditional fluctuations discussed above. To reliably detect uniformity, the conditional density has to be flat for a wide enough range of scales, while in the data we measure a different scaledependence for r>20 Mpc/h than on small scale, but we cannot detect a clear flattening. However, in view of the large fluctuations detected by the complete PDF analysis and by the selfaveraging test, we conclude that there is no crossover to uniformity up to 100 Mpc/h. We need to consider larger samples to properly constrain correlations properties for scales greater than r>30 Mpc/h.
Figure 11: Radial counts in bins of thickness Mpc/h, normalized to the luminosity factors as explained in the text, for the Kcorrected VL samples. 

Open with DEXTER 
Figure 12: Wholesample average conditional density in the different Kcorrected VL samples in the regions R1, normalized as explained in the text. 

Open with DEXTER 
As mentioned above, for r<30 Mpc/h in all samples the PDF is stable with respect to the K+Ecorrections, so there are no detectable differences in the three sets of VL samples with different corrections. Thus, while for scales r<30 Mpc/h the data show an approximated power law behavior for r>30 Mpc/h, we are not able to make a reliable conclusion, in these samples, about the behavior of this quantity, because conditional fluctuations do not exhibit selfaveraging properties when filtered on scales r>30 Mpc/h. The largescale inhomogeneity shown by the non selfaveraging conditional fluctuations is compatible with a continuation of powerlaw correlations, i.e. scaling properties, to scales larger than 30 Mpc/h.
4.8 The standard twopoint correlation function
When determining the standard twopoint correlation function, we implicitly make two assumptions that, inside a given sample, (i) the distribution is selfaveraging and (ii) it is uniform. The first assumption is used when computing wholesample average quantities. For instance it is assumed when the wholesample average conditional density is measured, as discussed in the previous section. However, in that case, there is need to assume that the estimation of the sample average gives a fairly good estimation of the ensemble average density. This corresponds to the assumption (ii) above. When one of these assumptions, or both, is not verified then interpretation of the results given by the determinations of the standard twopoint correlation function must be reconsidered with great care as we discuss in what follows.
To measure the twopoint correlation function, the most commonly used estimators are based on paircounting algorithms, as the Davis and Peebles Davis & Peebles (1983) (DP) and the Landay & Szalay (1993) (LS) estimators. These are relatively easy to implement practically by generating random distributions in artificial samples with the same geometry as real ones. In general, it is not straightforward to interpret the results obtained with these estimators at large enough scales; i.e., from around the scale r^{ps} at which the spherical shell of radius r^{ps}, centered on a typical distribution point, is only partially contained in the sample volume (see Gabrielli et al. 2005, 405). The scale r^{ps} is the one up to which one can calculate the socalled fullshell (FS) estimator, i.e. in which only complete spherical shells are considered (Kerscher 1999).
The FS estimator considers, similarly to the case of the conditional density estimator, a pair of points at distance r only if a sphere of radius r,
centered on one of the points, is fully contained in the sample volume.
Thus this method, because it requires fewer assumptions, is the one we
consider in more detail here. The FS estimator can be written as (Gabrielli et al. 2005)
The first ratio in the r.h.s. of Eq. (20) is the average conditional density, i.e., the number of galaxies in shells of thickness averaged over the wholesample, divided by the volume of the shell. The second ratio in the r.h.s. of Eq. (20) is the density estimated in a sample containing N galaxies, with volume V. Thus, the FS estimator requires determination of the distances of all points to the boundaries as for the case of the conditional density (see Eqs. (9) and (19)). However, it should be stressed that, when measuring this function, we implicitly assume in a given sample, that (i) fluctuations are selfaveraging in different subvolumes and (ii) the linear dimension of the sample volume is (Gabrielli et al. 2005), i.e., the distribution has reached homogeneity inside the sample volume. If one of them, or both, is not verified in the actual data, then the amplitude and shape of the estimated will strongly depend on the sample volume. This finitesize dependence can be investigated by making specific tests as we discuss in what follows. We stress that the most efficient way to test the above assumptions is represented by the determination of the conditional fluctuations presented in the previous sections.
To show how non selfaveraging fluctuations inside a given sample bias the analysis, we consider the estimator
where the second ratio on the r.h.s. is now the density of points in spheres of radius r^{*} averaged over the galaxies lying in a shell of thickness around the radial distance R. If the distribution is homogeneous, i.e., , and statistically stationary, Eq. (21) should be statistically independent on the range of radial distances considered.
Indeed the twopoint correlation function is defined as a ratio between the local conditional density and the sample average density: if both vary in the same way when the radial distance is changed, then its amplitude remains nearly constant. This does not imply, however, that the amplitude of is meaningful as the density estimated in subvolumes of size r^{*} can show large fluctuations, and this occurs with a radialdistance dependence. To show that the analysis gives a meaningful estimate of the amplitude of fluctuations, one has to test that this amplitude remains stable by changing the relative position of the subvolumes of size r^{*} used to estimate the local conditional density and the sample average density. This is achieved by using the estimator in Eq. (21). On the other hand, standard estimators are unable to test for such an effect, as the main contributions for both the local conditional density and the sample average density come from the same part of the sample (typically the more distant part where the volume is larger).
For instance we consider, in the VL3 sample, Mpc/h and R = 250 Mpc/h or R=350 Mpc/h, with r^{*} = 60 Mpc/h. We thus find large variations in the amplitude of (see Figs. 13, 14). This is simply an artifact generated by the large density fluctuations on scales close to the sample sizes. The results that the estimator Eq. (20), or others based on pair counting (Sylos Labini & Vasilyev 2008; Gabrielli et al. 2005), has nearly the same amplitude in different samples, e.g., (Zehavi et al. 2005; Norberg et al. 2002; Zehavi et al. 2002; Davis & Peebles 1983; Benoist et al. 1996; Park et al. 1994; Norberg et al. 2001), despite the large fluctuations of N_{i}(r;R), are simply explained by the fact that is a ratio between the local conditional density and the sample average density. Both vary in the same way when the radial distance is changed, so the amplitude is nearly constant.
Figure 13: Standard twopoint correlation function in the VL3 sample estimated by Eq. (21): the sample average density is computed in spheres of radius r^{*}=60 Mpc/h and considering all centerpoints lying in a bin of thickness Mpc/h centered at different radial distance R: R_{1}=250 Mpc/h ( ) and R_{2}=350 Mpc/h ( ). The case in which we have used the estimation of the sample average N/V () is also shown and it agrees with the FS estimator. This former agrees with the measurements provided by the LS and DP estimators which give essentially the same result. (For sake of clarity error bars are shown for the FS, DP and LS estimators, and they are relatively small except at small and large r.) 

Open with DEXTER 
Figure 14: The same as in Fig. 13 but now for the VL5 sample. In this case the sample average density is computed in spheres of radius r^{*}=80 Mpc/h and considering all centerpoints lying in a bin of thickness Mpc/h centered at different radial distance R: R_{1}=320 Mpc/h ( ) and R_{2}=450 Mpc/h ( ). 

Open with DEXTER 
To understand how largescale fluctuations can be hidden in the
analysis performed by the twopoint correlation function we consider
the following simple example. Let us suppose the catalog consists of
two disconnected volumes: for simplicity we fix them to be spherical
with radii
and
respectively.
Let us suppose that the average conditional density (supposed to
be selfaveraging at all scales considered) is powerlaw, i.e.
(22) 
where B^{1} is the amplitude in the volume V^{1} and in the volume V^{2}. The estimation of the sample density is
(23) 
It is clear that if there will be large fluctuations between the two volumes on scales close to the sample sizes. However if , from Eq. (20) we find that the estimator of the twopoint correlation function is
This no longer depends on the different amplitudes of the conditional density. That is, despite the difference in the conditional density and in the wholesample density in the two volumes (which depends on the ratio between B^{1} and B^{2}), the amplitude of the twopoint correlation function does not reflect these (arbitrarily large) variations.
Similarly in the case
,
the difference in amplitude between the estimation of the twopoint correlation function in the two volumes is simply
thus resulting in a relatively small factor, when , even though the difference between B^{1} and B^{2} can be arbitrarily large^{}! That is, even though the average density can fluctuate by an arbitrarily large factor, the amplitude of may not show a similar variation. This does not imply, however, that the amplitude measures an intrinsic property of the distribution. Actually, in Eq. (25) the difference in the amplitude is related to the sample sizes. This means that the only unambiguous way to establish whether the average density is a welldefined quantity, hence whether the results obtained by the standard correlation function analysis are meaningful, is represented by the study of conditional fluctuations presented in the previous sections.
By using different normalizations, which however are all in principle equally valid if the distribution has a welldefined average density inside the sample, we have shown that the amplitude of the estimated correlation function varies in the SDSS samples. This occurs because both the assumptions on which the determination of the standard to point correlation function is based, are not verified in these samples, and is certainly larger than the samples size.
Finally we note that, not only the amplitude, but also the shape of the correlation function is affected by the normalization to a sample average, which largely differs from the ensemble average one. The shape however is strongly biased only on large separations when , i.e. when the first term in the r.h.s. of Eq. (24) becomes comparable to the second one. We refer the interested reader to Sylos Labini et al. (2009d) for a more detailed discussion of the determination of the standard estimators (i.e., the LS and DP estimators) of the twopoint correlation function in these samples.
4.9 The SDSS Great Wall and other structures
Figure 15: Threedimensional representation of the SL analysis with r=10 Mpc/h for R3VL2. The x,z coordinates of the sphere center define the bottom plane,and on the vertical axis we display the intensity of the structures, the conditional number of galaxies N_{i}(r) contained in the sphere of radius r. 

Open with DEXTER 
As mentioned above, the measurements of the M(r) values of N_{i}(r) on the scale r allow derivation of many interesting properties about structures in these samples. Beyond the statistical properties already described, it is interesting for example to consider the density profile derived from N(r;R_{i}). An example is shown in Fig. 16, which displays the behavior of N(r;R_{i}) in the sample VL2 (with Kcorrections) and in the three different regions for r=10 Mpc/h. This analysis is more powerful than the simple counting as a function of radial distance, in tracing largescale galaxy structures. Indeed, one may precisely describe the sequence of structures and voids characterizing the samples and, by changing the sphere radius r, one may determine the situation at different spatial resolutions. For instance, the distribution in the angular region R3 (see the bottom panel of Fig. 16) is dominated by a single largescale structure, which is known as the SDSS Great Wall (Gott et al. 2005). In the R2 and R3 regions, one is also able to isolate structures well at different distances, while the R1 region, which covers a solid angle about six times larger than the other two sky areas, the signal is determined by the superposition of different structures of different amplitude and on different scales. In the latter case, it would be useful to divide the sample into smaller angular slices.
Figure 16: Behavior of N(r;R_{i}) in the Kcorrected VL2 sample and in the three different regions for r=10 Mpc/h (R1 top, R2 Middle and R3 bottom). 

Open with DEXTER 
In Fig. 17 we show the projection on the XZ plane of R3VL2 where the SDSS Great Wall is placed in the middle of the sample, and it is clearly visible as a coherent structure of large amplitude, similar to a mountain chain, extending over the whole sample. The information contained in the N(r;R_{i}) data allow quantitative determination of the properties of this structure in an unambiguous way, as we discussed above. For instance by a simple visual comparison of the profile in the different angular region we can conclude that, although the Great Wall is a particularly long filament of galaxies, it represents a typical persistent fluctuation in the samples' volume.
Figure 17: Projection on the XZ plane of R3VL2. The SDSS Great Wall is the filament in the middle of the sample. 

Open with DEXTER 
In addition it is interesting to consider the full N_{i}(r) = N(r; x_{i},y_{i},z_{i}) data, where (x_{i},y_{i},z_{i}) are the Cartesian coordinates of the center. To this aim we chose a threedimensional representation where on the bottom plane we use the x,z Cartesian coordinates of the sphere center and on the vertical axis we display the intensity of the structures, i.e., the conditional number of galaxies contained in the sphere of radius r. (In the y direction the thickness of the sample is small, i.e., Mpc/h.) This is shown in Fig. 15 for r=10 Mpc/h. One may note that that the SDSS Great Wall is clearly visible as a coherent structure similar to a mountain chain, extending all over the sample. It is worth noticing that profiles similar to those shown in Figs. 15, 16 have also been found in the 2dFGRS (Sylos Labini et al. 2009b,a) supporting that the fluctuations we have identified in this catalog are typical of galaxy distribution.
4.10 Role of spatial correlations
To show that the largescale fluctuations in the galaxy density field we have detected are genuinely due to longrange spatial correlations and not to some selection effects, we performed the following test. In a given VL sample we have assigned a redshift randomly to each galaxy extracted from the list of redshifts of the galaxies in the same sample^{}. In this way the angular coordinates of each object are fixed, its redshift is randomized while the redshift distribution in the sample is taken fixed. This operation washes out the intrinsic spatial correlations of the galaxy distribution, but conserves the main observational coordinates (i.e. angular positions and redshift). Thus the result of this test may tell us whether fluctuations and structures are an effect of spatial correlations. The results is that the signal in N_{i}(r,R) is substantially washed out as one may noticed by comparing Figs. 1816.
Figure 18: As in Fig. 16 but for the randomized VL2 samples as described in the text. 

Open with DEXTER 
However, it should be stressed that, if there are structures of spatial extension comparable to the sample size, these will not be completely washed out by the randomization adopted given that the redshift distribution is taken to be fixed. Indeed this is the case for the SDSS Great Wall, contained in the sample R3VL2. In Fig. 18 the structure is almost completely washed out, but as it is as large as the sample, there is a residual in the randomized version. By means of the the statistical analysis shown in Fig. 19, we find that the PDF of conditional fluctuations becomes very peaked in the randomized sample; i.e., it tends to a Gaussian function, while in the real sample it displays a long tail for high N values, corresponding as discussed above to the large fluctuations present in this sample. In addition the conditional density (i.e., the conditional average number of points in spheres given by Eq. (19) divided by the spherical volume of radius r) becomes flat for the randomized sample, signaling the absence of correlations, while it was a power law in the real sample with an exponent approximately equal to 0.8 0.1. Because the redshift distribution is taken as fixed in the randomized sample, its PDF will converge to the one of the real sample by considering larger sphere radii.
Figure 19: Upper panels: the PDF of conditional fluctuations in spheres of radius r=5 Mpc/h ( left) and r=10 Mpc/h ( right) for the real sample (Real) and the randomized one (RND) as explained in the text. Bottom panel: conditional density as a function of scale. 

Open with DEXTER 
5 Comparison with theoretical models
Let us now discuss the problem of comparing the statistical analysis of real galaxy samples with theoretical predictions. In this respect it is useful to remember that theoretical models predict that, by gravitationally evolving a density field compatible with cosmic microwave background anisotropies (CMBR) observations, there is a maximum scale up to which nonlinear structures have formed at the present time. The precise value of such a scale depends on the details of the initial correlations in the density field and on the values of the cosmological parameters, and this is roughly placed at about Mpc/h in the CDM concordance model (see Springel et al. 2005). On scales of the average density becomes welldefined as long its fluctuations become small enough. As discussed in Sect. 2, this scale may be defined as the one at which the variance of the fluctuations is twice the square of the asymptotic (largescale) average density. Then for scales , the situation is simple: gravitational clustering has linearly amplified initial fluctuations and thus correlation properties reflect those at the initial time. The linear amplification factor can be easily computed by making a perturbation analysis of the selfgravitating fluid equations (i.e., VlasovPoisson equations) in an expanding universe.
Table 6: Main properties of the obtained mock VL samples.
There is no full analytical understanding of the properties of selfgravitating particles in the nonlinear phase occurring for scales ; for this reason, generally gravitational Nbody simulations represent the means to study these structures. A gravitational Nbody simulation follows the motion of particles (supposed in cosmology to be dark matter particles) moving under the effect of their selfgravity in an expanding universe. By normalizing the initial amplitude of fluctuations and the density correlations to the observations of the CMBR anisotropies, one finds that there is a welldefined time scale that allows one to define the present time at which the simulation is stopped. In this context it is worth remembering that, in the CDMlike models gravitational clustering builds up nonlinear structures in a bottomup way because of the small initial velocity dispersion.
An Nbody simulation provides the distribution of dark matter particles and not that of galaxies. It is here that the problem of sampling, or biasing, is relevant. Indeed galaxies are supposed to form on the highest peaks of the dark matter density field, so one has to define the rules to make a correlated sampling of the dark matter particles to identify ``mock'' galaxies. There are different sampling procedures in the literature and they are the outcome of the socalled semianalytic models of galaxy formation. Generally these sampling procedures only modify correlations at small scales, i.e., they are local sampling. Only nonlocal sampling procedures may give rise to different correlation properties on large scales. However, no form of currently known galaxy bias can produce the largescale fluctuations we observe in the catalog. Indeed the current accepted theoretical model of biasing (Kaiser 1984) predicts that, when clustering is in the linear phase, threshold sampling the highest peaks in a Gaussian density field gives rise to a simple linear amplification of fluctuations and of their correlations. This situation is expected to hold for scales r>10 Mpc/h where density fluctuations in Nbody simulations of the dark matter field are in the linear regime, the PDF of fluctuations is Gaussian, and thus biasing is linear. In these conditions, there is a simple relation between mock galaxies and dark matter particle correlation properties. Thus complications with respect to this simple picture are expected only on small scales.
We use a semianalytic galaxy catalog constructed from the Millennium CDM Nbody simulation (Springel et al. 2005). To construct mock samples corresponding to SDSS VL samples, we used full version of the catalog in the filter system. The catalog contains about 9 million galaxies in a 500 Mpc/h cube^{}. We used the absolute magnitudes in r filter used in the SDSS case, to construct the mock samples with the same limits in absolute magnitude as for the SDSS VL samples with Kcorrections. In Table 6 we report the properties of the mock samples: , (in Mpc/h) are the chosen limits for the metric distance; define the interval for the absolute magnitude in each sample, (in degrees) the limits in right ascension, (in degrees) the limits in declination, N_{z} the number of objects in the sample in redshiftspace and N_{r} the same for the sample in realspace^{}. In addition, we construct only the mock samples corresponding to VL1, VL3 and VL5. The volume of the samples is constrained to be the same as, or similar to, the volumes of the real SDSS samples. In particular, the sample region can be easily fitted in the simulation cube in case of VL1. For the VL3 it should be slightly reduced in declination, while is reduced significantly for the VL5 the range in declination (see Table 6).
Figure 20: PDF of conditional fluctuations in the mock R1VL1, R1VL3, and R1VL5 samples in real (red line) and redshiftspace (black line). (Each row corresponds to a VL sample; the scale r is reported in the caption.) The bestfit Gaussian function (green line) is reported. 

Open with DEXTER 
The PDF of conditional fluctuations (see Fig. 20) show a clear departure from a Gaussian function for r<10 Mpc/h, while it rapidly approaches the Gaussian function for r>20 Mpc/h. For r>5 Mpc/h there is no detectable difference between the real and redshiftspace cases. Additionally, for r<10 Mpc/h, the PDF exhibits a large N tail that is the signature of the correlations present at those scales.
In addition the analysis of the PDF into two disconnected subvolumes of the mock samples does not show any trend forward non selfaveraging and they coincide in a statistical sense (see Fig. 21). That is, in contrast to the case of real galaxy samples, the simulations are selfaveraging.
Figure 22 shows the behavior of the conditional average density in the mock samples. The main difference between real and redshiftspace occurs at scales r<5 Mpc/h, where the redshiftspace exponent is systematically smaller than the real space one. For the case of real galaxy samples we cannot make the realspace analysis because galaxy peculiar velocities are not known. We noticed in that the same finitesize effects that perturb the redshiftspace reduced twopoint correlation function may affect the projected one, and thus the whole method to infer the realspace correlation function from the redshiftspace one (Vasilyev et al. 2006). In addition, both in real and redshiftspace, the exponent is smaller when the average galaxy luminosity increases, a trend that is not as welldefined in the real data as shown by Fig. 12. In the mock catalogs, the powerlaw behavior extends up to Mpc/h, beyond which there is a welldefined crossover which corresponds to the scale where PDF of conditional fluctuations approaches the Gaussian function. Thus, while the exponent of the conditional density is closer in redshiftspace, up to Mpc/h, to what is observed in the real galaxy data, the mock samples show a clear difference for r>20 Mpc/h, in that the crossover to homogeneity is welldefined and the distribution does not present largescale fluctuations similar to those characterizing the SDSS galaxy distribution. In addition, redshift and realspace properties are indistinguishable for r>10 Mpc/h.
As a final remark, we note that Einasto et al. (2006a) find that the fraction of very luminous (massive) superclusters in real samples extracted from 2dFGRS and from the SDSS (Data Release 4) is more than ten times greater than in simulated samples constructed from the Millennium simulations  see also Einasto et al. (2006b,2008). Our results are compatible with these findings.
6 Conclusion
We have studied the statistical properties of galaxy distribution from the SDSSDR6 sample. This is brief summary of our results:
 The probability density function (PDF) of spatial conditional fluctuations, in volume limited samples and filtered on spatial scales r<30 Mpc/h, shows a long tail, which is the signature of the largescale structures in these samples.
 The PDF of conditional fluctuations does not converge to a Gaussian function even for largesphere radii (i.e., r > 30 Mpc/h).
 The PDF of conditional fluctuations is unaffected by K and (standard) evolutionary corrections.
 The PDF of conditional fluctuations, filtered on spatial scales r>30 Mpc/h, does not show selfaveraging properties when this is computed in two nonoverlapping samples of equal volume.
Figure 21: The selfaveraging test for the mock catalogs. It is analysis of the PDF into two disconnected subvolumes (s_{1} and s_{2}) of the mock sample R1VL3 in redshiftspace.
Open with DEXTER Figure 22: Conditional density in the mock R1VL1, R1VL3, and R1VL5 sample in real (e.g., VL1r, VL3r, VL5r) and redshiftspace (e.g., VL1z, VL3z, VL5z). In the panel on the bottom left there is a comparison of the behaviors in the different samples in redshiftspace, where it is evident that the exponent becomes steeper for brighter objects. The normalization is taken to have the same largescale density.
Open with DEXTER  The wholesample averaged conditional density shows scaling properties up to Mpc/h, the largest scales where this statistics shows selfaveraging properties (i.e., where the PDF is statistically stable inside the sample).
 The normalization of the luminosityredshift function and of the twopoint correlation function are affected by systematic effects, thus do not provide meaningful information.
 The PDF of conditional fluctuations in mock galaxy catalogs rapidly converges to a Gaussian function for r>10 Mpc/h so that structures predicted by theoretical models are at odds with observations.
 The above behaviors are compatible with galaxy counts as a function of apparent magnitude and of redshift in the magnitudelimited sample. Indeed these show fluctuations on large spatial scales of , which are persistent in the sample volume.
We interpret this as caused by a systematic effect in that sample volumes are not large enough for conditional fluctuations, filtered on such largescales, to be selfaveraging; i.e., to contain enough structures and voids of large size to allow reliable determination of average (conditional) quantities. This result implies, for instance, that the average behaviors of both magnitude and redshift distributions in the magnitudelimited sample are biased by large spatial fluctuations and thus that their variance only represent a lower limit to the real intrinsic variance. Furthermore we discussed that K and standard evolutionary corrections (Blanton et al. 2003) do not qualitatively affect these behaviors. We pointed out the problems related to the estimation of amplitude of fluctuations and correlation properties from statistical quantities which employ the normalization to the estimation of the sample average. As long as a distribution inside the given sample is not selfaveraging, hence not homogeneous, the estimation of the twopoint correlation function is necessarily biased by strong finitesize effects. Our results are compatible with a continuation of the powerlaw behavior of the conditional density on scales larger than 30 Mpc/h and incompatible with homogeneity on scales smaller than Mpc/h. Only the availability of larger samples will allow average correlation properties determined on scales larger than Mpc/h.
Our results, because the imply that galaxy distribution is inhomogeneous on scales of Mpc/h, are perfectly compatible with a ``Copernican'' principle. Indeed any statistically stationary inhomogeneous point distribution is compatible with the principle that there is no special point or direction in the universe. If for instance there were a ``local hole'' or a particular large structure around us, this would not imply that the ``Copernican'' principle is violated, but simply that the distribution is spatially inhomogeneous (Joyce et al. 2000). These are two different properties which are sometimes confused in the literature (Clifon et al. 2008; Ellis 2008).
Finally we found that fluctuations in mock galaxy catalogs are Gaussian for r>20 Mpc/h, implying that our results are at odds with the predictions of the concordance CDM model of galaxy formation. This result remains the same when considering redshift space fluctuations (as for the real data) or real space ones. Indeed we find that the main difference going from real to redshiftspace occurs for scales smaller than Mpc/h, where the exponent of the conditional density passes from 1.8 to about 1.
Our results are compatible with a series of analyses of galaxy number counts in different catalogs, e.g., APM (Maddox et al. 1990; Shanks 1990), 2dFGRS (Busswell et al. 2003), 2MASS (Frith et al. 2003), and a sample of galaxies in the H band (Frith et al. 2006). In all those surveys count fluctuations not normalized to the sample average have been considered, and it was concluded that there are local fluctuations of extending over scales of Mpc/h, which are at odds with CDM predictions. Furthermore our results are compatible with the results by Loveday (2004) on the SDSSDR1 sample, although their interpretation is different.
Similar persistent spatial fluctuations in the galaxy density field have been found in the 2dFGRS by Sylos Labini et al. (2009b,a): this shows that these fluctuations are quite typical of the galaxy distribution.
In addition the comparison with the model predictions with real galaxy data, through the analysis of mock galaxy catalogs, which we have discussed, agrees with that of Einasto et al. (2006b,2008,2006a) who found, for instance, that superclusters in real samples extracted from the 2dFGRS and from the SDSS are more than ten times larger than in simulated samples constructed from the Millennium simulations. A similar conclusion was reached by Sylos Labini et al. (2009b,a) on the 2dFGRS.
AcknowledgementsWe thank Tibor Antal, Michael Joyce, Andrea Gabrielli, and Luciano Pietronero for useful remarks and discussions. We are grateful to Michael Blanton, David Hogg and Martin LópezCorredoira for interesting comments. We warmly thank an anonymous referee for a list of suggestions and criticisms that allowed us to improve the presentation. Y.V.B. acknowledges the support by Russian Federation grants Leading Scientific School  1318.2008.2 and RFBR090200143. We acknowledge the use of the Sloan Digital Sky Survey data (http://www.sdss.org), of the NYU ValueAdded Galaxy Catalog (http://ssds.physics.nyu.edu/), and of the Millennium run semianalytic galaxy catalog (http://www.mpagarching.mpg.de/galform/agnpaper/).
Appendix A: Cosmological corrections
In this Appendix we discuss the problem of cosmological corrections to be applied to the data in some detail. For each galaxy it is observed, among other quantities, the angular coordinates, the redshift z, and the apparent magnitude m_{r}. From these data we aim to construct threedimensional samples that are not affected by observational selection effects. It is observationally established that the galaxy redshift is linearly proportional to its distance, i.e., the Hubble law (Hubble 1929) R=c/H_{0} z, where c is the speed of light and H_{0} is the Hubble constant^{}. In the framework of the Friedmann solutions of Einstein field equations, the linearity of the Hubble law (Peebles 1980) is verified only for very low redshifts. In general, the (metric) distance R depends on the values of cosmological parameters such as the mass density and the cosmological constant , so that (see Hogg 1999). These formulas introduce secondorder corrections to the linear law that are generally unimportant at low redshifts, e.g., z<0.2, such as the ones we consider in what follows.
In order to reconstruct the absolute magnitude from the apparent one we needed to determine the socalled Kcorrection. This correction must be applied because galaxies observed at different redshifts are sampled, by any particular instrument, at different restframe frequencies. The transformations between observed and restframe broadband photometric measurements involve terms known as Kcorrections (Humason et al. 1956; Hogg et al. 2002). In general, if the galaxy spectrum is known, we can calculate precisely what the Kcorrection is. While in the past (see Joyce et al. 1999b) these were known in an average way, in the case of the SDSS it is possible to reconstruct the Kcorrection for each object from the measurements of galaxy magnitudes in different frequency bands (Blanton et al. 2005).
Another correction that has to be considered to determine the absolute magnitude from the apparent one is related to the way galaxies have evolved from high to low redshifts (Blanton et al. 2003; Kauffmann et al. 2003). We expect that this is a relatively minor problem in our studies because the maximum redshift we consider is z=0.2 and evolutionary corrections (or Ecorrections) are generally believed to be small and linearly proportional to the redshift. Indeed, as we discuss in what follows these corrections may only play a role for very bright objects which can be observed far away from us. There is no wellaccepted model for galaxy evolution and in what follows we adopt the corrections that are usually used in the literature (see Blanton et al. 2003; Tegmark et al. 2004). Being applied in an average way, these corrections have the disadvantage of not taking the galaxy type into account: spiral, elliptical and irregular galaxies should have in principle different starformation histories and thus different corrections (Yoshii & Takahara 1988).
It is, however, worth commenting on the derivation of the average evolution corrections by Blanton et al. (2003). These have been derived by assuming that the space density is constant (i.e., uniformity), by including the effect of largescale fluctuations in some adhoc parameters of a phenomenological behavior of the luminosityredshift function and by assuming that unknown evolutionary factors may explain the residual behaviors that are not taken into account by those parameters  see Eq. (5) in Blanton et al. (2003) and Lin et al. (1999). Thus the results for the amount of evolution are based on very strong assumptions which are reflected in the following: any deviation from uniformity on a large scale, which is not properly described by the assumed phenomenological luminosityredshift function results as a sign of galaxy evolution; that is, galaxy evolution corrections were not measured in a way that is free of a priori assumptions.
Because there is no welldefined way to describe Ecorrections, we use the same type of average functional behavior adopted by other authors (see Tegmark et al. 2004) to reach, in the same samples we consider, conclusions that are substantially different from ours. We find that the results for the PDF of conditional fluctuations are basically unaffected. Although this does not strictly imply that evolution is not playing any role, this does imply that galaxy fluctuations are not selfaveraging and that galaxy distribution is not uniform in these samples, at least not in the range of scales that we define properly below.
Thus the question remains open of whether some more detailed evolutionary corrections can qualitatively change the results we get. The basic issue to be considered in this respect is that we mainly focus the PDF of conditional fluctuations. While the Ecorrections may change average behaviors as a function of scale, it is unlikely that they can produce the large amplitude fluctuations of large spatial extension that we observe. In what follows we present specific tests computing the effect of average evolution corrections on the relevant statistical quantities we measure.
Appendix B: Number counts in the magnitude limited sample
The advantage in using the magnitude limited sample is that one only considers directly observed quantities, i.e., , without K and E corrections that introduce some additional hypotheses about the shape of galaxy spectrum and the evolution process. Here we determine galaxy counts as a function of the apparent magnitude and the redshift distribution in the ML sample, also determining their typical fluctuations. Given the spread in the galaxy luminosity function, it is not straightforward to derive precise information on spatial fluctuations and their correlations from these measurements.B.1 Magnitude counts
The analysis of galaxy counts as a function of apparent magnitude allows us to make an independent estimation, from those based on the threedimensional analysis of fluctuations in the survey in VL samples. However this analysis does not allow us to disentangle luminosity selection effects from spatial fluctuations. By studying the variance of counts we are only able to estimate realspace fluctuations indirectly.
We first divide the angular region of the survey into N_{f}=20 angular subregions of almost equal solid angle. In the subregion, of solid angle ,
we compute the differential
counts of galaxies
in magnitude bins of size
.
We then compute the average
and we estimate the variance
The normalized variance is
Figure B.1 shows the average differential number counts of galaxies as a function of apparent magnitude, which agrees nicely with the determination by Strauss et al. (2002). In particular the counts grow as with 0.01 in the magnitude range [14, 17].
Figure B.1: Average differential number counts of galaxies as a function of apparent magnitude m_{r} in bins of per unit solid angle in deg^{2}. The best fit with a behavior of type is shown for 0.01. 

Open with DEXTER 
Figure B.2: Behavior of as a function of apparent magnitude (i.e., Eq. (B.3)) in bins of size per unit solid angle in deg^{2}. 

Open with DEXTER 
In Fig. B.2 we present the behavior of
as a function of apparent magnitude. The fast decay of
at bright magnitudes (i.e., m_{r}
< 14) comes from the dominance of Poisson noise on the intrinsic
variance of the distribution. It is in fact simple to show that
for a perfectly Poisson distribution of galaxies (i.e., without any
spatial correlation), we get
(B.4) 
with . The parameter is in general determined by the decay of spatial correlations. For correlated distributions, the decay is slower than for the Poisson case; i.e., but it is not straightforward to relate the parameter to the exact value of the correlation exponent in realspace (Gabrielli et al. 2005).
One may note from Fig. B.2 that, for magnitudes fainter than , there are fluctuations of up to the faintest magnitude limit of the survey, i.e., m_{r}=17.77: these are thus persistent up to the deepest scales observed. This result is not unexpected, because for many years relatively large fluctuations have been detected by different authors in many different catalogs. For instance by studying the POSSII photographic plates, fluctuations of in the surface galaxy density were observed in the magnitude range between 16.519 in the r filter (Picard 1991), although calibration and systematic errors could affect the photometric determinations from the photographic plates (Weir et al. 1995).
Furthermore a deficiency of bright galaxies around the south galactic pole was first examined by Shanks (1990) and then by Maddox et al. (1990) which observed a large deficit in the number counts ( at B =16, at B =17) over a 4000 deg^{2} solid angle. More recently in a CCD survey of bright galaxies within the Northern and Southern strips of the 2dFGRS conclusive evidence was found that there are fluctuations of about in galaxy counts as a function of apparent magnitude (Busswell et al. 2003). In addition Frith et al. (2003), using the bright galaxy counts from the 2 Micron All Sky Survey, found results indicating a very large ``local hole'' in the Southern Galactic Cap (SGC) to >150 Mpc/h with a linear size across the sky of Mpc/h, suggesting the presence of a potentially huge contiguous void stretching from south to north, and indicating the possible presence of significant correlations on scales of the order of 300 Mpc/h. Similarly, by studying Hband number counts over 0.30 deg^{2} to H=19, as well as H<14 counts from 2MASS, concluded that these counts represent a 4.0 sigma fluctuation implying a local hole which extends over the entire local galaxy distribution and being at odds with CDM predictions Frith et al. (2006). We investigate in Sect. 4, by using the realspace analysis, the relation between these measurements and fluctuations in realspace, trying to determine whether the above estimation of the normalized variance is a reliable statistical measurement of the intrinsic variance of the distribution or whether there is a systematic effect that may reduce, or enlarge, the fluctuations measured in this way (see Sylos Labini et al. 2009b,c,a).
It is worth noticing that Yasuda et al. (2001), measured bright galaxy number counts in two independent stripes of imaging scans along the celestial equator, one toward the north and the other one toward the south galactic cap, covering about 230 and 210 square degrees respectively, from imaging data taken during the commissioning phase of SDSS. They find that the counts from the two stripes differ by about 30% at magnitudes brighter than 15.5. Despite the presence of these large fluctuations they concluded that the shape of the number countsmagnitude relation, brighter than m_{r} = 16 is characterized by the relation expected for a homogeneous galaxy distribution in a ``Euclidean'' universe (for which ) (Peebles 1980). This result is probably affected by the small number of objects in the bright end of the counts, which indeed does not exceed a few hundred galaxies  see Tables 2 and 6 of Yasuda et al. (2001). In addition, they notice that in the magnitude range 16 < m_{r} < 21, the galaxy counts from both stripes agree very well and follow the prediction of the noevolution model, although the data do not exclude a small amount of evolution. This conclusion thus contrasts with the one by Loveday (2004) who, as mentioned, instead invokes a substantial amount of galaxy evolution to explain the radial counts.
Moreover, it should be noticed that, by measuring the rms scatter of galaxy number counts in the SDSSDR1, in different parts of the sky after correcting for Galactic extinction, Fukugita et al. (2004) find that this is consistent with what is expected from the angular twopoint correlation function integrated over circular areas. They did not analyze the behavior of the rms scatter as a function of apparent magnitude, i.e., Eq. (B.3), and their results show compatibility of angular correlations with counts fluctuation, but they do not constraint uniquely spatial correlations. Indeed angular correlations may be degenerate with respect to threedimensional properties (see Montuori & Sylos Labini 1997; Durrer et al. 1997).
B.2 Redshift distribution
The analysis of the counts of galaxies as a function of redshift in the full magnitudelimited survey is a complementary study to the counts as a function of apparent magnitude. As in the former case, it is difficult to extract a clear information about correlation properties of galaxy distribution. However, analysis of the redshift distribution of galaxies in different regions on the sky is an useful instrument for getting a first qualitative information about the position, sizes and amplitudes of the spatial galaxy number fluctuations.
For instance, by studying the redshift distribution in the Durham/UKST Galaxy Redshift Survey, fluctuations were found in the observed radial density function of occurring on Mpc/h scales (Ratcliffe et al. 1998; Busswell et al. 2003). In a similar way in the 2dFGRS two clear ``holes'' in the galaxy distribution were detected in the ranges 0.03<z<0.055, with an underdensity of , and 0.06<z<0.1 where the density deficiency is about (Busswell et al. 2003). These two underdensities, detected in particular in the 2dFGRS southern galactic cap (SGC), are also clear features in the Durham/UKST survey. Given that the 2dFGRS SGC field is entirely contained within the areas of sky observed for the Durham/UKST survey the similarities in the redshift distributions are both evidence of the same features in the galaxy distribution.
Figure B.3: Differential number counts as a function of redshift, in bins of , for unit solid angle, in the 3 angular regions R1, R2 and R3. 

Open with DEXTER 
In Fig. B.3 we show the differential number counts, in bins of for unit solid angle, as a function of redshift in the three angular regions R1, R2, and R3. Although the three angular regions cover different solid angles (in particular R1 has a solid angle six times larger than R2 and R3), it is interesting to note that in R3 there is a very large fluctuation which, as we discuss in Sect. 2, corresponds to the famous SDSS Great Wall (Gott et al. 2005). Other structures of smaller amplitude are visible in R2 and R3, and we present a more detailed analysis below. A part the fluctuations, the behavior of the counts as a function of redshift involve a convolution with the luminosity selection of the survey. Thus it generally displays asymmetric bellshaped behavior, where the peak corresponds to the maximum of the luminosity selection of the survey (see Busswell et al. 2003).
In Fig. B.4 we show the average differential number counts, in bins of size
,
for unit solid angle. This is computed similarly to the average counts
as a function of apparent magnitude described above. We divide the
angular sky region of the survey into N_{f}=20 independent and nonoverlapping angular regions (the angular region has solid angle ). We then compute
(B.5) 
where represents the counts in the sky region. In Fig. B.4 we report the average differential number counts, in bins of , for unit of solid angle, as a function of redshift, where the fluctuations again trace largescale structures and the peak at corresponds to the SDSS Great Wall (Gott et al. 2005).
The redshift counts variance is given by
(B.6) 
The normalized variance is thus
(B.7) 
In general the variance for a point distribution is the sum of the intrinsic variance due to correlations and to Poisson noise. Here we subtract the Poisson term, so we only consider the intrinsic variance due to correlations. In Fig. B.5 we present the normalized (intrinsic) standard deviation for different choices of . When the redshift bin is increased to (which corresponds to Mpc/h) fluctuations are still of , and they persist at the different scales sampled by the survey, in agreement with the results obtained by the apparent magnitude counts analysis and with the analysis in other galaxy redshift surveys (Sylos Labini et al. 2009b; Ratcliffe et al. 1998; Busswell et al. 2003).
Figure B.4: Average differential number counts, in redshift bins of size , for unit of solid angle, as a function of redshift. 

Open with DEXTER 
Figure B.5: Standard deviation of the differential number counts, in redshift bins of size , for unit of solid angle, as a function of redshift. Poisson noise has been subtracted so this only contains the contribution due to galaxy correlations. 

Open with DEXTER 
Appendix C: The luminosity function
One of the main problem in the study of galaxy structures is to
disentangle spatial properties of galaxies from their luminosity
distribution. Thus an important quantity to be determined is the
galaxy luminosity function
and the quantity
provides the probability that a galaxy has luminosity L in the range .
In general an assumption is made that the ensemble average number of
galaxies for unit volume and unit luminosity can be written as
where is the ensemble average density and the ensemble average luminosity function. This implies the independence between space and luminosity distributions, i.e., that galaxy positions are independent of their luminosities. Although there is clear evidence of a correlation between them (as for instance the brightest elliptical galaxies are found in the center of rich galaxy clusters) it has been tested that this is nevertheless a reasonable assumption in the galaxy catalogs available so far (see Gabrielli et al. 2005). To go beyond this assumption one should use the multifractal formalism as in Sylos Labini & Pietronero (1996).
An additional, much stronger, assumption often adopted is that the space density is a constant, i.e., . This assumption is for instance at the basis of the socalled standard minimum variance estimator (Davis & Huchra 1982; Blanton et al. 2003; Loveday 2004). It is clear that we want to avoid making this further assumption because we want to test whether the space density is (or can be approximated by) a simple constant. It is also evident that if this assumption is inconsistent with the sample data properties, all results derived from methods encoding it are intrinsically biased.
To determine the shape of the luminosity function, the socalled inhomogeneityindependent method is commonly employed (Blanton et al. 2003; Loveday 2004) which uses a modified version of Eq. (C.1), namely that
where n(R) is the density as function of the radial (metric) distance R and the luminosity function. This can be a useful working hypothesis. Under this approximation, in a VL sample the luminosity function can be written as
where N is the total number of galaxies in the VL sample and its solid angle. In this way, even when n(R) is highly fluctuating, one may recover the shape of as spatial inhomogeneities cancel out in the ratio given in Eq. (C.3). Thus by making the normalized histogram of the number of galaxies in luminosity bins in each VL sample, we get . Then we look for the best fit in all the VL samples with the Schechter function (Schecther 1976)
(C.4) 
For this determination we used other VL than those listed in Table 2; namely, we constructed VL samples each with only one magnitude in range. We then find (see Fig. C.1) in the Kcorrected catalog with no Ecorrections that the bestfit parameters are 0.02 and M^{*}=20.63 0.02, in good agreement with previous determinations (see Loveday 2004).
Figure C.1: Luminosity function in the SDSS Kcorrected catalog and its bestfit estimation with a Schecther function. 

Open with DEXTER 
To conclude this discussion we note that while the effect of inhomogeneities is fairly taken into account in Eqs. (C.1), (C.2), the amplitude of the luminosity function is usually estimated under the assumption that this is a constant proportional to the average density. We have seen that this situation cannot be satisfied in the data; i.e., when has a clear scale dependence, the amplitude of the luminosity function gives a systematically biased estimation of the average density.
References
 AdelmanMcCarthy, J. K., Agüeros, M. A., Allam, S. S., et al. 2008, ApJS, 175, 297 [NASA ADS] [CrossRef]
 Aharony, A., & Harris, B. 1996, Phys. Rev. Lett., 77, 3700 [NASA ADS] [CrossRef]
 Antal, T., Sylos Labini, F., Vasilyev, N. L., & Baryshev, Yu. V. 2009, Europhys. Lett., in press [arXiv:0909.1507]
 Baryshev, Yu., & Teerikorpi, P. 2006, Bull. Spec. Astr. Obs., 59, 92
 Benoist, C., Maurogordato, S., da Costa, L. N., Cappi, A., & Schaeffer, R. 1996, ApJ, 472, 452 [NASA ADS] [CrossRef]
 Blanton, M. R., & Roweis, S. 2007, AJ, 133, 734 [NASA ADS] [CrossRef]
 Blanton, M. R., Hogg, D. W., Bahcall, N. A., et al. 2003, ApJ, 592, 819 [NASA ADS] [CrossRef]
 Blanton, M. R., Schlegel, D. J., Strauss, M. A., et al. 2005, AJ, 129, 2562 [NASA ADS] [CrossRef]
 Broadhurst, T. J., Ellis, R. S., Koo, D. C., & Szalay, A. S. 1990, Nature, 343, 726 [NASA ADS] [CrossRef]
 Buchert, T. 2008, Gen. Rel. Grav., 40, 467 [NASA ADS] [CrossRef]
 Busswell, G. S., Shanks, T., Frith, W. J., et al. 2004, MNRAS, 354, 991 [NASA ADS] [CrossRef]
 Cabré, A., & Gaztañaga, E. 2009, MNRAS, 393, 1183 [NASA ADS] [CrossRef]
 Clifton, T., Ferreira, P. G., & Land, K. 2008, Phys. Rev. Lett., 101, 131302 [NASA ADS] [CrossRef]
 Croton, D. J., Springel, V., White, S. D. M., et al. 2006, MNRAS, 365, 11 [NASA ADS] [CrossRef]
 Davis, M. 1997, in Proc. Conf., Critical Dialogues in Cosmology (Singapore: World Scientific), ed. N. Turok, 12 [arXiv:astroph/9610149]
 Davis, M., & Huchra, J. 1982, ApJ, 254, 437 [NASA ADS] [CrossRef]
 Davis, M., & Peebles, P. J. E. 1983, ApJ, 267, 465 [NASA ADS] [CrossRef]
 Davis, M., Meiksin, A., Strauss, M. A., da Costa, L. N., & Yahil, A. 1988, ApJ, 333, L9 [NASA ADS] [CrossRef]
 Durrer, R., Eckmann, J. P., Sylos Labini, F., Montuori, M., & Pietronero, L. 1997, Europhys. Lett., 40, 491 [NASA ADS] [CrossRef]
 Durrer, R., Gabrielli, A., Joyce, M., & Sylos Labini, F. 2003, ApJ, 585, L1 [NASA ADS] [CrossRef]
 Einasto, J., Einasto, M., Saar, E., et al. 2006, A&A, 459, 1 [NASA ADS] [CrossRef] [EDP Sciences]
 Einasto, J., Einasto, M., Saar, E., et al. 2007, A&A, 462, 397 [NASA ADS] [CrossRef] [EDP Sciences]
 Einasto, J., Saar, E., Martínez, V. J., et al. 2008, ApJ, 685, 83 [NASA ADS] [CrossRef]
 Ellis, G. F. R. 2008, in Proc. Conf., Dark Energy and Dark Matter [arXiv:0811.3529v1]
 Freedman, W. L., Madore, B. F., Gibson, B. K., et al. 2001, ApJ, 553, 47 [NASA ADS] [CrossRef]
 Frith, W. J., Busswell, G. S., Fong, R., Metcalfe, N., & Shanks, T. 2003, MNARS, 345, 1049 [CrossRef]
 Frith, W. J., Metcalfe, N., & Shanks, T. 2006, MNRAS, 371, 1601 [NASA ADS] [CrossRef]
 Fukugita, M., Yasuda, N., Brinkmann, J., et al. 2004, AJ, 127, 3155 [NASA ADS] [CrossRef]
 Gabrielli, A., & Sylos Labini, F. 2001, Europhys. Lett., 54, 1 [CrossRef]
 Gabrielli, A., Joyce, M., & Sylos Labini, F. 2002, Phys. Rev., D65, 083523 [NASA ADS]
 Gabrielli, A., Sylos Labini, F., Joyce, M., & Pietronero, L. 2005, Statistical Physics for Cosmic Structures (Berlin: Springer Verlag)
 Geller, M., & Huchra, J. 1989, Science, 246, 897 [NASA ADS] [CrossRef]
 Giovanelli, R., & Haynes, M. P. 1993, AJ, 105, 1271 [NASA ADS] [CrossRef]
 Gott, J. R. III, Juric, M., Schlegel, D., et al. 2005, ApJ, 624, 463 [NASA ADS] [CrossRef]
 Hogg, D. W. 1999 [arXiv:astroph/9905116v4]
 Hogg, D. W., Baldry, I. K., Blanton, M. R., & Eisenstein, D. J. 2002 [arXiv:astroph/0210394v1]
 Hogg, D. W., Eisenstein, D. J., Blanton, M. R., et al. 2005, ApJ, 624, 54 [NASA ADS] [CrossRef]
 Hubble, E. 1929, PNAS, 15, 168 [NASA ADS] [CrossRef]
 Humason, M. L., Mayall, N. U., & Sandage, A. R. 1956, AJ, 61, 97 [NASA ADS] [CrossRef]
 Kaiser, N. 1984, ApJ, 284, L9 [NASA ADS] [CrossRef]
 Kauffmann, G., Heckman, T. M., White, S. D. M., et al. 2003, MNRAS, 341, 33 [NASA ADS] [CrossRef]
 Kerscher, M. 1999, A&A, 343, 333 [NASA ADS]
 Kirshner, R. P., Oemler, A., Schecter, P. L., & Schectman, S. A. 1983, AJ, 88, 1285 [NASA ADS] [CrossRef]
 Joyce, M., & Sylos Labini, F. 2001, ApJ, 554, L1 [NASA ADS] [CrossRef]
 Joyce, M., Montuori, M., & Sylos Labini, F. 1999a, ApJ, 514, L5 [NASA ADS] [CrossRef]
 Joyce, M., Sylos Labini, F., Montuori, M., & Pietronero, L. 1999b, A&A, 344, 387 [NASA ADS]
 Joyce, M., Anderson, P. W., Montuori, M., Pietronero, L., & Sylos Labini, F. 2000, Europhys. Lett., 50, 416 [NASA ADS] [CrossRef]
 Joyce, M., Sylos Labini, F., Gabrielli, A., Montuori, M., & Pietronero, L. 2005, A&A, 443, 11 [NASA ADS] [CrossRef] [EDP Sciences]
 Landy, S. D., & Szalay, A. 1993, ApJ, 412, 64 [NASA ADS] [CrossRef]
 Lin, H., Yee, H. K. C., Carlberg, R. G., et al. 1999, ApJ, 518, 533 [NASA ADS] [CrossRef]
 Loveday, J. 2004, MNRAS, 347, 601 [NASA ADS] [CrossRef]
 Maddox, S. J., Sutherland, W. J., Efstathiou, G., & Loveday, J. 1990, MNRAS, 243, 692 [NASA ADS]
 Massey, R., Rhodes, J., Ellis, R., et al. 2007, Nature, 445, 286 [NASA ADS] [CrossRef]
 Montuori, M., & Sylos Labini, F. 1997, ApJ, 487, L21 [NASA ADS] [CrossRef]
 Norberg, P., Baugh, C. M., Hawkins, Ed, et al. 2001, MNRAS, 328, 64 [NASA ADS] [CrossRef]
 Norberg, P., Baugh, C. M., Hawkins, Ed, et al. 2002, MNRAS, 332, 827 [NASA ADS] [CrossRef]
 Park, C., Vogeley, M. S., Geller, M. J., & Huchra, J. P. 1994, ApJ, 431, 569 [NASA ADS] [CrossRef]
 Peebles, P. J. E. 1980, The LargeScale Structure of the Universe (Princeton University Press)
 Picard, A. 1991, AJ, 102, 445 [NASA ADS] [CrossRef]
 Pietronero, L. 1987, Physica A, 144, 257 [NASA ADS] [CrossRef]
 Pietronero, L., Montuori, M., & Sylos Labini, F. 1997, in Proc. Conf., Critical Dialogues in Cosmology, ed. N. Turok (Singapore: World Scientific), 24
 Ratcliffe, A., Shanks, T., Parker, Q. A., & Fong, R. 1998, MNRAS, 293, 197 [NASA ADS] [CrossRef]
 Saslaw, W. C. 2000, The Distribution of the Galaxies (Cambridge: Cambridge University Press)
 Schecther, P. 1976, ApJ, 203, 297 [NASA ADS] [CrossRef]
 Shanks, T. 1990, IAUS, 136, 269 [NASA ADS]
 Spergel, D. N., Verde, L., Peiris, H. V., et al. 2003, ApJS, 148, 175 [NASA ADS] [CrossRef]
 Springel, V., White, S. D. M., Jenkins, A., et al. 2005, Nature, 435, 629 [NASA ADS] [CrossRef]
 Strauss, M. A., Weinberg, D. H., Lupton, R. H., et al. 2002, AJ, 124, 1810 [NASA ADS] [CrossRef]
 Sylos Labini, F. 1994, ApJ, 433, 464 [NASA ADS] [CrossRef]
 Sylos Labini, F., & Pietronero, L. 1996, ApJ, 469, 28 [NASA ADS] [CrossRef]
 Sylos Labini, F., & Vasilyev, N. L. 2008, A&A, 477, 381 [NASA ADS] [CrossRef] [EDP Sciences]
 Sylos Labini, F., Montuori, M., & Pietronero, L. 1998, Phys. Rep., 293, 61 [NASA ADS] [CrossRef]
 Sylos Labini, F., Vasilyev, N. L., & Baryshev, Yu. V. 2007, A&A, 465, 23 [NASA ADS] [CrossRef] [EDP Sciences]
 Sylos Labini, F., Vasilyev, N. L., & Baryshev, Yu. V. 2009a, Europhys. Lett., 85, 29002 [NASA ADS] [CrossRef]p1
 Sylos Labini, F., Vasilyev, N. L., & Baryshev, Yu. V. 2009b, A&A, 496, 7 [NASA ADS] [CrossRef] [EDP Sciences]
 Sylos Labini, F., Vasilyev, N. L., Pietronero, L., & Baryshev, Yu. V. 2009c, Europhys. Lett, 86, 49001 [NASA ADS] [CrossRef]
 Sylos Labini, F., Vasilyev, N. L., Baryshev, Yu. V., & LópezCorredoira, M. 2009d, A&A, 505, 981 [CrossRef] [EDP Sciences]
 Tegmark, M., Blanton, M. R., Strauss, M. A., et al. 2004, ApJ, 606, 702 [NASA ADS] [CrossRef]
 Totsuji, H., & Kihara, T. 1969, PASJ, 21, 221 [NASA ADS]
 Vasilyev, N. L., Baryshev, Yu. V., & Sylos Labini, F. 2006, A&A, 447, 431 [NASA ADS] [CrossRef] [EDP Sciences]
 Yasuda, N., Fukugita, M., Narayanan, V. K., et al. 2001, AJ, 122, 1004 [NASA ADS] [CrossRef]
 Yoshii, Y., & Takahara, F. 1988, ApJ, 326, 1 [NASA ADS] [CrossRef]
 Weir, N., Djorgovski, G. S., & Fayyad, U. M. 1995, AJ, 110, 1 [NASA ADS] [CrossRef]
 York, D. G., Adelman, J., Anderson, J. E., Jr., et al. 2000, AJ, 120, 1579 [NASA ADS] [CrossRef]
 Wu, K. K., Lahav, O., & Rees, M. 1999, Nature, 397, 225 [NASA ADS] [CrossRef]
 Zehavi, I., Blanton, M. R., Frieman, J. A., et al. 2002, ApJ, 571, 172 [NASA ADS] [CrossRef]
 Zehavi, I., Zheng, Z., Weinberg, D. H., et al. 2005, ApJ, 630, 1 [NASA ADS] [CrossRef]
Footnotes
 ... statistics^{}
 Conditional statistics are not normalized to the sample density estimation (see Sect. 2).
 ... simulations^{}
 In general particles in cosmological Nbody simulations are supposed to represent a coarse grained distribution of the microscopic dark matter particles. From the Nbody dark matter particles one constructs the galaxy density field, by using certain procedures, which can be generally thought of as a sampling mechanism. The key element of this selection is that galaxies are supposed to form on the highest density peaks of the underlying dark matter field.
 ...(Aharony & Harris 1996)^{}
 Equivalently, if the PDF P(F,L) tends to a Dirac's delta function for then the system is said to exhibit selfaveraging properties.
 ...^{}
 We remind the reader that the previous equation is also valid in the infinite volume limit. The various estimators that can be found in the literature use different methods to treat boundary conditions, hence to estimate both the nominator and the denominator of the previous relation (Gabrielli et al. 2005).
 ...(AdelmanMcCarthy et al. 2008)^{}
 See www.sdss.org
 ... data^{}
 See NYU ValueAdded Galaxy Catalog, 2008 http://ssds.physics.nyu.edu/
 ...
) ^{}  Note that these authors use E(z)=1.6(z0.1) to construct the absolute magnitude M_{0.1r}, corresponding to the SDSS band shifted to match its restframe shape at z = 0.1, from the apparent magnitude m_{r} and redshift z by using Eq. (8). In this case the K_{r}(z) term is the Kcorrection from the r band of a galaxy at redshift z to the ^{0.1}r band. Because here we use K_{r}(r) at z = 0 instead of at z = 0.1 the evolution correction has to be shifted by 0.1 in redshift.
 ... spheres^{}
 In Appendix B we discuss how several properties of galaxy fluctuations can be measured in the magnitudelimed sample by considering galaxy counts as function of apparent magnitude and the redshift distribution. This study has the advantage of using direct observational quantities. It is interesting to note that these studies are compatible with the results presented in this section.
 ... another^{}
 The PDF of VL1 for r=30 Mpc/h is not as regular as the other cases because of poor statistics.
 ...
^{}  In Appendix C we discuss the determination of the luminosity function and of two important, commonly used assumptions, that the space density is constant and that space and luminosity distributions are independent. We emphasize that the latter can be used also when the density field is inhomogeneous while the former corresponds to the strict assumption of spatial homogeneity.
 ... similar^{}
 Although these behaviors look very similar, they refer to two different measurements which in principle are not expected to give the same behavior.
 ... large^{}
 For a sample of arbitrary geometry is defined to the radius of the largest sphere fully contained in the sample volume (Gabrielli et al. 2005).
 ... sample^{}
 We are grateful to David Hogg for interesting suggestions about this test.
 ... cube^{}
 See http://www.mpagarching.mpg.de/galform/agnpaper/ for semianalytic galaxy data files and description, and see http://www.mpagarching.mpg.de/millennium/ for information on Millennium LCDM Nbody simulation.
 ... realspace^{}
 The difference between real and redshiftspace is due to peculiar velocities.
 ... constant^{}
 In what follows we denote the Hubble constant as H_{0}=100 h km s^{1} Mpc^{1} where h is a parameter in the range 0.5<h<0.75 according to observations (Freedman et al. 2001).
All Tables
Table 1: Properties of the angular regions considered.
Table 2: Main properties of the obtained VL samples with Kcorrections and without Ecorrections.
Table 3: Number of galaxies in each of the VL samples (VL1, ..., VL5) and in each region (R1, R2, R3).
Table 4: The same as for Table 2 but for VL samples without Kcorrections and without Ecorrections.
Table 5: The same of Table 2 but for VL samples with E+Kcorrections and without (see text for details).
Table 6: Main properties of the obtained mock VL samples.
All Figures
Figure 1: Four different realizations of the toy model discussed in the text. Upperleft panel: uniform distribution with shortrange positive correlations (T0). Upperright panel: uniform distribution with longrange positive correlations (T1). Bottomleft panel: nonuniform distribution with longrange positive correlations (T3). Bottomright panel: nonuniform distribution with longrange positive correlations and non selfaveraging properties (T4). 

Open with DEXTER  
In the text 
Figure 2: Conditional density for the toy models shown in Fig. 1. The case of a Poisson point distribution is added as a reference. (The conditional density has been normalized to the number of points in the simulations.) The model T1 has a shortrange correlation, which corresponds to a fast decay of . The model T2 is still uniform on large scales, i.e. is flat. The models T3 and T4 have strong clustering up to the box size. 

Open with DEXTER  
In the text 
Figure 3: PDF of conditional fluctuations (black line) filtered at 1/10 of the sample size (i.e., r=0.1) for the toy models shown in Fig. 1. Also shown is the PDF computed in two parts of the box, i.e. for y>0.5 (red line) and y<0.5 (green line). Both the models T1 and T2 approach to a Gaussian distribution (blue dashed lines), as these distributions are uniform although correlated. The PDF of the model T3 does not approach a Gaussian function but it is selfaveraging inside the box. Finally the PDF of the model T4 is not Gaussian and it does not show selfaveraging properties. 

Open with DEXTER  
In the text 
Figure 4: Number of centers M(r) as a function of scale in the five VL samples (see text for details). 

Open with DEXTER  
In the text 
Figure 5: Conditional PDF on different scales for the 5 VL samples (each row corresponds to a VL sample; the scale r is reported in the caption) with K corrections (black), with K+E corrections (red) and without K+E corrections (green). 

Open with DEXTER  
In the text 
Figure 6: The PDF in different samples (with Kcorrections only) and for different sphere radius with the bestfit Gaussian function (see captions). Poisson error bars are reported as a reference. 

Open with DEXTER  
In the text 
Figure 7: Normalized PDF (see Eqs. (11), (12)) for r=5,10,20,30 Mpc/h in the five VL Kcorrected samples. 

Open with DEXTER  
In the text 
Figure 8: PDF in the two subvolumes of the Kcorrected VL samples (each row corresponds to a VL sample): the black line marks the PDF in nearby subsample, and the red line in the more distant subsample. The xaxis reports the number of points N(r) (the scale r is reported in the caption) and the PDF P(N;r) is on the yaxis. 

Open with DEXTER  
In the text 
Figure 9: As Fig. 8 but for the K+Ecorrected VL3 and VL5 samples ( left) and for the same samples without K+E corrections ( right). 

Open with DEXTER  
In the text 
Figure 10: Behavior of the local average of (see Eqs. (14), (15)) normalized to the whole sample average (see Eq. (19) below) in bins of thickness Mpc/h for sphere radius r=10 Mpc/h normalized to the whole sample average for the 5 VL samples with Kcorrection (K), with evolution and Kcorrection (E+K), and without evolution and Kcorrection (K0). The insert panel shows the number of centers, over which the average and variance are computed in each bin. In the bottom right panel we report the behavior of in bins of thickness Mpc/h for r=20 Mpc/h, normalized to the luminosity factors as explained in the text (see Sect. 4.6), for Kcorrected VL samples. 

Open with DEXTER  
In the text 
Figure 11: Radial counts in bins of thickness Mpc/h, normalized to the luminosity factors as explained in the text, for the Kcorrected VL samples. 

Open with DEXTER  
In the text 
Figure 12: Wholesample average conditional density in the different Kcorrected VL samples in the regions R1, normalized as explained in the text. 

Open with DEXTER  
In the text 
Figure 13: Standard twopoint correlation function in the VL3 sample estimated by Eq. (21): the sample average density is computed in spheres of radius r^{*}=60 Mpc/h and considering all centerpoints lying in a bin of thickness Mpc/h centered at different radial distance R: R_{1}=250 Mpc/h ( ) and R_{2}=350 Mpc/h ( ). The case in which we have used the estimation of the sample average N/V () is also shown and it agrees with the FS estimator. This former agrees with the measurements provided by the LS and DP estimators which give essentially the same result. (For sake of clarity error bars are shown for the FS, DP and LS estimators, and they are relatively small except at small and large r.) 

Open with DEXTER  
In the text 
Figure 14: The same as in Fig. 13 but now for the VL5 sample. In this case the sample average density is computed in spheres of radius r^{*}=80 Mpc/h and considering all centerpoints lying in a bin of thickness Mpc/h centered at different radial distance R: R_{1}=320 Mpc/h ( ) and R_{2}=450 Mpc/h ( ). 

Open with DEXTER  
In the text 
Figure 15: Threedimensional representation of the SL analysis with r=10 Mpc/h for R3VL2. The x,z coordinates of the sphere center define the bottom plane,and on the vertical axis we display the intensity of the structures, the conditional number of galaxies N_{i}(r) contained in the sphere of radius r. 

Open with DEXTER  
In the text 
Figure 16: Behavior of N(r;R_{i}) in the Kcorrected VL2 sample and in the three different regions for r=10 Mpc/h (R1 top, R2 Middle and R3 bottom). 

Open with DEXTER  
In the text 
Figure 17: Projection on the XZ plane of R3VL2. The SDSS Great Wall is the filament in the middle of the sample. 

Open with DEXTER  
In the text 
Figure 18: As in Fig. 16 but for the randomized VL2 samples as described in the text. 

Open with DEXTER  
In the text 
Figure 19: Upper panels: the PDF of conditional fluctuations in spheres of radius r=5 Mpc/h ( left) and r=10 Mpc/h ( right) for the real sample (Real) and the randomized one (RND) as explained in the text. Bottom panel: conditional density as a function of scale. 

Open with DEXTER  
In the text 
Figure 20: PDF of conditional fluctuations in the mock R1VL1, R1VL3, and R1VL5 samples in real (red line) and redshiftspace (black line). (Each row corresponds to a VL sample; the scale r is reported in the caption.) The bestfit Gaussian function (green line) is reported. 

Open with DEXTER  
In the text 
Figure 21: The selfaveraging test for the mock catalogs. It is analysis of the PDF into two disconnected subvolumes (s_{1} and s_{2}) of the mock sample R1VL3 in redshiftspace. 

Open with DEXTER  
In the text 
Figure 22: Conditional density in the mock R1VL1, R1VL3, and R1VL5 sample in real (e.g., VL1r, VL3r, VL5r) and redshiftspace (e.g., VL1z, VL3z, VL5z). In the panel on the bottom left there is a comparison of the behaviors in the different samples in redshiftspace, where it is evident that the exponent becomes steeper for brighter objects. The normalization is taken to have the same largescale density. 

Open with DEXTER  
In the text 
Figure B.1: Average differential number counts of galaxies as a function of apparent magnitude m_{r} in bins of per unit solid angle in deg^{2}. The best fit with a behavior of type is shown for 0.01. 

Open with DEXTER  
In the text 
Figure B.2: Behavior of as a function of apparent magnitude (i.e., Eq. (B.3)) in bins of size per unit solid angle in deg^{2}. 

Open with DEXTER  
In the text 
Figure B.3: Differential number counts as a function of redshift, in bins of , for unit solid angle, in the 3 angular regions R1, R2 and R3. 

Open with DEXTER  
In the text 
Figure B.4: Average differential number counts, in redshift bins of size , for unit of solid angle, as a function of redshift. 

Open with DEXTER  
In the text 
Figure B.5: Standard deviation of the differential number counts, in redshift bins of size , for unit of solid angle, as a function of redshift. Poisson noise has been subtracted so this only contains the contribution due to galaxy correlations. 

Open with DEXTER  
In the text 
Figure C.1: Luminosity function in the SDSS Kcorrected catalog and its bestfit estimation with a Schecther function. 

Open with DEXTER  
In the text 
Copyright ESO 2009