A&A 443, 11-16 (2005)
M. Joyce1 - F. Sylos Labini2,3 - A. Gabrielli3,4 - M. Montuori3,4 - L. Pietronero3,4
1 - Laboratoire de Physique Nucléaire et des Hautes Énergies, Université de Paris VI, 4 place Jussieu, Tour 33, Rez de Chaussée, 75252 Paris Cedex 05, France
2 - "Enrico Fermi Center'', via Panisperna 89 A, Compendio del Viminale, 00184 Rome, Italy
3 - "Istituto dei Sistemi Complessi'' CNR, via dei Taurini 19, 00185 Rome, Italy
4 - Statistical Mechanics and Complexity Center - Istituto Nazionale Fisica della Materia, Unità di Roma 1, and Dipartimento di Fisica, Università di Roma "La Sapienza'', P.le A. Moro 2, 00185 Rome, Italy
Received 20 June 2005 / Accepted 18 July 2005
We discuss some of the basic implications of recent results on galaxy correlations published by the SDSS collaboration. In particular we focus on the evidence which has been recently presented for the scale and nature of the transition to homogeneity in the galaxy distribution, and results which describe the dependence of clustering on luminosity. The two questions are in fact strictly entangled, as the stability of the measure of the amplitude of the correlation function depends on the scale at which the mean density becomes well defined. We note that the recent results which indicate the convergence to well defined homogeneity in a volume equivalent to that of a sphere of radius 70 Mpc/h, place in doubt previous detections of "luminosity bias'' from measures of the amplitude of the correlation function. We emphasize that the way to resolve these issues is to first use, in volume limited samples corresponding to different ranges of luminosity, the unnormalized two point statistics to establish the scale (and value) at which the mean density becomes well defined. We note also that the recent SDSS results for these statistics are in good agreement with those obtained by us through analyses of many previous samples, confirming in particular that the galaxy distribution is well described by a fractal dimension up to a scale of at least 20 Mpc/h. We discuss critically the agreement of this new data with current theoretical models.
Key words: cosmology: observations - cosmology: large-scale structure of Universe
The most striking feature of the large scale distribution of galaxies is their organization in structures as clusters, super-clusters and filaments around large volumes of space empty of visible matter, the voids. Since the first observations of the three-dimensional distribution of galaxies (Kirshner et al. 1983) evidence has accumulated for the existence of such large agglomerations of matter. In the eighties the "Great Wall'', a giant filament with an extension of about 200 Mpc/h connecting many galaxy groups and clusters, was discovered (De Lapparent et al. 1989). Currently the largest known structure in the local universe is the recently discovered "Sloan Great Wall'' (Gott et al. 2005), roughly twice longer than the Great Wall. Its discovery has been possible with new data provided by the Sloan Digital Sky Survey (SDSS - York et al. 2000), one of the most ambitious observational programs ever undertaken in astronomy. It will measure about 1 million redshifts, giving a complete mapping of the local universe up to a depth of several hundreds of Mpc. In this paper we discuss the results, and implications, of some recently published studies of the correlation properties of the galaxy distribution based on new data from the partially completed SDSS.
An accurate statistical characterization of galaxy structures is
evidently a key element for any physical theory explaining their
origin. The earliest observational studies, from angular catalogs,
produced the primary result (Totsuji & Kihara 1969) that the reduced
two-point correlation function
(where n is the density of points)
is well approximated, in the range of scales from about 0.1 Mpc/h to
10 Mpc/h, by a simple power-law:
Such a variation of the amplitude of the measured correlation function
may, however, be explained, entirely or partially, as a finite-size
effect i.e. as an artifact of statistical analysis in finite samples.
The explanation is as follows (see Gabrielli et al. 2004 - hereafter
GSLJP - for a detailed discussion, and original references). The
reduced correlation function
can be written as
is the conditional density of points
i.e. the mean density of points in a spherical shell of radius rcentered on a galaxy (the subscript p indicates the condition that
the density is measured from an occupied point). The latter is
generally a very stable local quantity, the reliable estimation of
which at a given scale r requires only a sample large enough to
allow a reasonable number of independent estimates of the density in a
shell. The mean density
on the other hand, is a
global quantity. The size of a sample in which it is estimated
reliably is not known a priori, but depends on the properties of
the underlying distribution. Specifically the sample must be large
enough so that the mean density estimated in it has a sufficiently
small fluctuation with respect to the true asymptotic average
are estimates of
in two samples with, respectively, mean densities
It has been pointed out by Pietronero (1987) that, when analyzing a point distribution which, like the galaxy distribution, is characterized by large fluctuations, one should, in fact, first establish the existence of a well defined mean density (and ultimately the scale at which it becomes well defined and independent of the sample size, if it does) before a statistic like , which measures fluctuations with respect to such a mean density, is employed. Further the existence of power-law correlations, which are clearly present in the galaxy distribution, is typical of fractal distributions, which are asymptotically empty. In such distributions the mean density is always strongly sample dependent, with an average value decreasing as a function of sample size. Given the observation of such correlations in the system, and the instability of the amplitude of the correlation function estimated in different samples, special care should be taken in establishing first the scale (if any) at which homogeneity becomes a good approximation. The simplest way to do this is in fact to measure the conditional density , or, alternatively, , the integrated conditional density, where n*(r) represents the density in a sphere of radius r about an occupied point. These quantities are generally well defined, and give a characterization of the two-point correlation properties of the distribution, irrespective of whether the underlying distribution has a well defined mean density or not. A simple power law behavior is characteristic of scale-invariant fractal distributions, with the exponent giving the fractal dimension through . The pre-factor B is, in this case, simply related to the lower cut-off of the distribution (GSLJP). If the distribution has a well defined mean density, one has, asymptotically, (i.e., D=3 in the previous formula). Measurement of these quantities can thus both characterize (i) the regime of strong clustering and (ii) the scale and nature of a transition to homogeneity. Only once the existence of an average density within the sample size is established in this manner does it make sense to use . And, in that case, the analysis with can determine the minimal size of samples required to make the use of this quantity meaningful.
In studies of galaxy data from numerous different surveys using the conditional density (see GSLJP and references therein, in particular Joyce et al. 1999) it has been found that, at scales up to roughly 20 Mpc/h, where the statistics are very robust (i.e. where a reasonable number of independent spherical shells can be fully inscribed in the sample volume), the galaxy distribution is well described by a simple fractal scaling with . At larger scales (up to 100 Mpc/h, or even greater) it has been argued on the basis of these analyses that there is weaker statistical evidence for the continuation of a fractal scaling, and no clear evidence for homogeneity. These analyses not only place in question the physical meaning, of the amplitude of , as discussed above, but also produce a different value of the exponent characterizing galaxy clustering at small scales. Indeed, as at the scale where , the two statistics should agree at these scales. The discrepancy can be explained (see GSLJP) as a result of the way in which the dimension D is normally estimated from : it is determined by fitting a power-law in a log-log plot around the scale r0 defined by . If there is a strong break from simple power-law behavior in around this scale the estimated dimension is systematically smaller than (e.g. with one has ). For example one may note that Hawkins et al. (2003) by fitting in the regime r<r0 find that , while at scales of order r0 they measure . On the other hand Tikhonov et al. (2003) measured a value of about by studying the conditional density in different galaxy samples; in addition they found evidences for a crossover to homogeneity for scales large than 100 Mpc/h. Similar results were found by Baryshev & Bukhmastova (2004) in some early SDSS samples, by considering the two-point conditional density.
In HEB3GS the integral conditional density has been estimated in the "luminous red galaxy'' (LRG) sample of the SDSS survey. It is an approximately volume limited (VL) sample with very precise photometric calibration, and is by far the largest such sample ever considered for such an analysis. Its size allows the robust estimation of this statistic up to a scale of order 100 Mpc/h, as at this scale the sample contains a considerable number of independent (non-overlapping) spheres centered on galaxies. The results can be summarized as follows: (i) a simple power-law scaling corresponding to a fractal dimension D=2gives a very good fit to the data up to at least 20 Mpc/h, over approximately a decade in scale; (ii) at larger scales continues to decrease, but less rapidly, until about 70 Mpc/h, above which it flattens up to the largest scale probed by the sample (100 Mpc/h). The transition between the two regimes is slow, in the sense that the integrated conditional density at 20 Mpc/h is about twice the asymptotic mean density.
Let us consider the conclusions which can be drawn from this analysis in view of the discussion in the previous section. Firstly, the results are highly consistent with the claim that galaxy correlations, up to approximately 20 Mpc/h, are well characterized by a simple power-law scaling corresponding to a fractal dimension . The LRG sample probes the scaling of only from about 3 Mpc/h, and evidently only for the brighter galaxies represented by this sample. The same exponent has, however, been observed to describe well the behavior of by our analysis of many precedent galaxy samples over a large range of luminosities, and down to scales an order of magnitude smaller (see GJSLP and references therein).
Secondly, the analysis provides good statistical evidence for homogeneity at larger scales, with a flattening of apparent above Mpc/h. Future data, and in particular the even larger forthcoming samples from SDSS, will confirm or refute this very important result, extending the range of scale between the detected and the sample size. If one takes this scale to be a reliable determination of the scale of homogeneity, it implies that is a well defined statistical quantity. Further one can infer the size of samples in which one can usefully employ this statistic to characterize the statistical properties of the galaxy distribution. Here "usefully'' means that the finite size effects which can systematically offset the amplitude of the estimator from that of are under control, so that a physical significance can be attributed to this amplitude (to a degree of accuracy which can be determined). For this to be the case one requires that the sample be large enough so that (i) the sample mean density approximates sufficiently well to the true asymptotic mean density, and (ii) the sample to sample fluctuations (i.e. variance) in the density are sufficiently small. The requirement on the sample size imposed by (i) is determined by comparing the conditional density at the depth of the sample with the measured asymptotic density, and the results of HEB3GS indicate that a sample of depth greater than 70 Mpc/h is sufficient to make such systematic effects small. To determine the constraint imposed by (ii) one needs a measure of fluctuations in the density in such a sample about the (now well defined) mean density. HEB3GS provides, beyond the measure of the conditional density (which gives information about the average and not the fluctuations), one measure of such fluctuations in the LRG sample: for 0.2 < z < 0.35 in ten disjoint regions with volumes (Mpc/h)3field-to-field fluctuations are measured to be , after subtraction of a Poisson noise term. Such fluctuations, in general, depend not only on the volume of the region, but also on its geometry. However, if the sample is sufficiently large to include several spheres of radius of order , we can suppose such an equivalence. Using the estimations of HEB3GS we would thus expect that, to reduce the finite size fluctuations in the amplitude of to of order ten percent, we need (i) samples including complete spheres of radius Mpc/h, and (ii) an equivalent depth at least Mpc/h (using ).
Let us consider how these values for and compare with those of redshift surveys prior to SDSS: in the 2dF survey (e.g. Peacock et al. 2001), which has sr, the largest enclosed sphere in a VL sample is of radius 30 Mpc/h and the greatest depth of such a sample is considerably smaller than Mpc/h; in the ESP catalog (Guzzo et al. 2000) the radius of the largest enclosed sphere is 20 Mpc/h, while the largest VL sample has a volume approximately one hundred times smaller than that corresponding in the LRG sample to fluctuations of order in the density. The determinations of HEB3GS imply that these, and indeed all, previous measures of in redshift catalogs prior to SDSS are expected to be severely affected by the finite size effects we have discussed.
In another recent paper (Zehavi et al. 2005a) the SDSS collaboration reports a study of the correlation properties of a sample of 200 000 galaxies, covering a solid angle of about , with the standard method of estimation. The paper focuses on the dependence of this quantity on galaxy properties, finding in particular that the amplitude of the measured two-point correlation function rises continuously with absolute magnitude from to , with the most rapid increase occurring above the characteristic luminosity L* ( ). The scale r0(determined by ) varies in the range [2.7,10.0] Mpc/h.
As we have discussed above, the results of HEB3GS indicate that one expects finite size effects in the amplitude of to be reduced to of order ten percent under two conditions: the samples (i) enclose fully a spheres of radius 70 Mpc/h, and (ii) have at least a depth Mpc/h. These criteria are in fact satisfied only by the largest VL samples considered in Zehavi et al. (2005a), and so there is a priori evidence that such finite size effects, and not a real physical effect, may be, wholly or partially, responsible for the measured variation in r0. Zehavi et al. (2005a) do actually consider indirectly this possibility, and report a test of this hypothesis in which the correlation functions are measured in adjacent luminosity samples cut to cover the same spatial volumes. Of the four pairs of samples there is reasonable stability in only one case, in the three others the result clearly favors the hypothesis of volume dependence. The authors argue that their results indicate the opposite conclusion, and ascribe the observed variation between the different volumes a posteriori to "anomalous'' fluctuations of the density in specific samples due to the presence of the "Sloan Great Wall''. Rather than being anomalous, such fluctuations are simply indicative of the intrinsic correlations of the galaxy distribution.
In a subsequent paper by the SDSS collaboration (Zehavi et al. 2005b) a standard analysis is applied to the same LRG sample of galaxies considered in HEB3GS. In this case HEB3GS provides a justification for the use of this statistic, as it has shown that the sample does contain many independent spheres with radius larger than that at which the conditional density flattens. What is found in Zehavi et al. (2005b) is that shows a considerable degree of stability in different samples. In particular the full correlation function of the LRG galaxies in redshift space is, within the estimated error bars, in very good agreement with that of the most luminous galaxies (i.e. largest volume samples) measured in Zehavi et al. (2005a). This appears to give a quite robust determination of the scale Mpc/h, the stability of which it will be interesting to see confirmed by forthcoming larger samples of SDSS. On breaking the LRG sample into three sub-samples by luminosity, however, Zehavi et al. (2005b) finds some variation in the amplitude of , which is attributed again to "luminosity bias''. The variation is however at a level of only in the amplitude, compared to a factor of almost ten between the brightest and faintest sample in Zehavi et al. (2005a). This is at a comparable level to the fluctuations in the mean density found at the scale of the sample in HEB3GS and so this "detection'' of luminosity bias appears highly questionable. Indeed Zehavi et al. (2005b) measure in three different subsamples of the LRG sample, corresponding to three redshift bins, and find variations in amplitude at the same level. In this case they conclude that this variation is "likely to reflect large-scale structure variations'' i.e. precisely the finite size effects we are discussing.
To clearly determine whether the observed variation in amplitude is a manifestation of a real difference between galaxies of different luminosity, or a volume effect, for which we have argued there is a better prima facie case, requires, we believe, a much more systematic statistical analysis. The simplest way to perform such an analysis is via the conditional density (or alternatively via ), performed on galaxies with different properties (and, in particular, luminosities) in VL samples. By doing so, one obtains both a characterization of the two-point correlation properties (and in particular exponent or exponents) characterizing the clustering in the strongly clustered regime, and determines the scale at which homogeneity is established. A fundamental question is then whether galaxies of different luminosity show the same behavior of , and, more specifically, whether they are characterized by the same scaling dimension in the regime of power-law correlations, and the same homogeneity scale . The considerable amount of data we have analyzed in the past (for discussion and references, see GSLJP) indicated that has approximately the same simple power-law behavior in the ranges of scales probed, with only a very weak dependence on luminosity. Once such a systematic analysis has been performed, and the homogeneity scale (or scales) established, a analysis can be used to characterize correlation in the regime of small amplitude fluctuations. Note that a linear amplification of , with no distortion in the range , as a function of luminosity would correspond in the analysis to an observed dependence of the homogeneity scale, , on galaxy luminosity.
We conclude with some remarks on other related observational and theoretical questions about galaxy correlations.
We have underlined that the results of HEB3GS for the conditional density confirm the characterization, to a very good approximation, of the galaxy distribution by a single exponent up to 20 Mpc/h. The most straightforward interpretation of such an observation is that it indicates a scale-invariant fractal behavior over the corresponding range of scale. However, other explanations are possible which invoke an underlying distribution which is not fractal (see GSLJP, and references therein). One such possibility is a distribution of spherically symmetric density profiles described by a rapidly decaying power-law about their centers. Indeed this corresponds to what is observed in cosmological N body simulations of dark matter and used in phenomenological models ("halo models'', see Cooray & Sheth 2002) for the formation of structure by gravity. One can perform further statistical tests to differentiate the two qualitatively different distributions, notably by more direct determinations of the dimension by box-counting methods, or of the behavior of the conditional variance as a function of scale (Bottaccio et al. 2004; GSLJP). Results obtained by some of us (Coleman & Pietronero 1992; Sylos Labini et al. 1998) support the fractal interpretation, but with a limited statistical confidence. With the data now becoming available from SDSS it should be possible to determine the answer to this basic question about the nature of the galaxy distribution more definitively.
Beyond this specific basic question about the nature of the galaxy distribution at scales well below the homogeneity scale, i.e. in the regime of strong fluctuations, a more general question concerns the agreement between current cosmological models and the data on the galaxy distribution over all scales probed by current data. HEB3GS states that the correlation properties of LRG galaxies measured are compatible with those predicted in a standard CDM model, with a "bias factor'' of order 2. In this paper the evidence for this conclusion comes solely from the comparison, which gives the quoted bias factor, of the amplitude of the measured fluctuations in the volume equivalent to a sphere of radius 175 Mpc/h with the prediction of this theoretical model, rather than from a comparison of the model with the data over a range of scales. In Fig. 1 we show on a single plot the results of HEB3GS and the integrated conditional density calculated in a large CDM simulation of dark matter. The theoretical of the dark matter shows an approximately power-law behavior, characterized by an exponent , followed by a rapid cross-over to homogeneity at a scale Mpc/h. Also shown are the results from the analysis of the CfA2 survey reported in Joyce et al. (1999).
|Figure 1: Behavior of for "CDM particles'' in a CDM simulation run by the Virgo consortium (Jenkins et al. 1998) together with the determination by HEB3GS and those in several samples of the CfA2 catalog reported in Joyce et al. (1999). Note that (i) the power-law index in the simulated data is quite different to that in the observations ( instead of - Baertschiger et al. 2002); (ii) the homogeneity scale detected in the LRG sample, Mpc/h, is very significantly larger than that in the simulated data ( Mpc/h). See text for discussion.|
|Open with DEXTER|
What is implicitly assumed by HEB3GS is that the formation of LRGs can be described as a sampling procedure on the underlying dark matter, which leads, on the one hand, to a linear amplification of the fluctuations by a factor of 2 at very large scales, while also producing the correlation properties observed in the LRGs from the very different ones of the underlying dark matter field. So-called "halo models'' (see e.g. Cooray & Sheth 2002) currently provide the framework to describe such a sampling: the distribution of dark matter is described as a set of halos, to which formation probabilities for the different kinds of objects are ascribed a posteriori to produce the observed correlation properties. Given that the properties of the two distributions are so different, we believe it is fair to say that the statement that these observations accord with the theory is very weak: any set of observations would probably accord with theory given this criteria.
Given that the notion of "bias'', which has its origin in the measurements of variations in the amplitude in , is so central in these theoretical constructions, it is evidently crucial to correctly understand the role of finite size effects in determinations of amplitudes of correlation functions, using the method we have outlined. Only once this is done can the theoretical problem in the regime of strong fluctuations be properly addressed. The simplicity of the correlation properties of visible matter over a range of two decades in scale - well described by a simple power-law scaling from 0.1 Mpc/h to approximately 20 Mpc/h - cannot but suggest that the relegation by current theories of visible matter to an epiphenomenon of the underlying dark matter may be mistaken.
On larger scales, where fluctuations become small, the comparison between theory and data becomes in principle much cleaner, as the theoretical description of this regime becomes simpler. Measurement of the signal, which is of very low amplitude, becomes, however, much more difficult, and indeed it is only with data from SDSS now emerging that it is becoming feasible to make such comparisons. Indeed a very recent paper published by the SDSS collaboration (Eisenstein et al. 2005) give an analysis of galaxy correlations in this regime and claims a statistically significant detection of a "bump-like'' feature in the real space correlation function at the location predicted by the same CDM model considered in HEB3GS (for a discussion of these real space properties of standard models, see GSLJP, Chap. 6). Evidently such findings, if confirmed by other galaxy data, provide dramatic evidence in favor of the standard model of structure formation. We note (Gabrielli et al. 2002; GJSLP) that a very characteristic prediction in this regime of standard models is the existence of a negative power-law tail with at the very largest scales which the SDSS survey is now beginning to probe. Such a behavior is the direct signal of the "primordial'' form of the fluctuations in standard cosmological models, with a power spectrum which is linear in k at small k. Interestingly the simple behavior of the power spectrum is destroyed by the "sampling'' of the CDM field (i.e. by a prescription for how to infer galaxy correlations from those of dark matter), while that of the real space correlation function remains intact (Durrer et al. 2003; GJSLP). This behavior can be understood simply in terms of the very specific "super-homogeneous'' properties of the primordial fluctuation field, which are also encountered (Gabrielli et al. 2003) in certain systems widely studied in statistical physics.
We warmly thank Daniel Eisenstein and David Hogg for very interesting discussions on some of the issues discussed here. Moreover, it is a real pleasure to acknowledge collaborations and discussions with Yurij Baryhsev, Thierry Baertschiger, Helene Di Nella-Courtois, Ruth Durrer, Bill Saslaw, and Pekka Teerikorpi.