A&A 381, 428-439 (2002)
R. Gil-Merino - L. Wisotzki - J. Wambsganss
Universität Potsdam, Am Neuen Palais 10, 14469 Potsdam, Germany
Received 27 July 2001 / Accepted 25 October 2001
We present a new determination of the time delay of the gravitational lens system HE 1104-1805 ("Double Hamburger'') based on a previously unpublished dataset. We argue that the previously published value of years was affected by a bias of the employed method. We determine a new value of years (2 confidence level), using six different techniques based on non interpolation methods in the time domain. The result demonstrates that even in the case of poorly sampled lightcurves, useful information can be obtained with regard to the time delay. The error estimates were calculated through Monte Carlo simulations. With two already existing models for the lens and using its recently determined redshift, we infer a range of values of the Hubble parameter: (2) for a singular isothermal ellipsoid (SIE) and (2) for a constant mass-to-light ratio plus shear model ( ). The possibly much larger errors due to systematic uncertainties in modeling the lens potential are not included in this error estimate.
Key words: gravitational lensing - time delays - quasars: HE 1104-1805 - general: data analysis
The double quasar HE 1104-1805 at a redshift of was originally discovered by Wisotzki et al. (1993). The two images with (original) B magnitudes of 16.70 and 18.64 are separated by (Kochanek et al. 1998). The spectral line ratios and profiles turned out to be almost identical between the two images, but image A has a distinctly harder continuum. Wisotzki et al. (1995) report about a dimming of both components over about 20 months, accompanied by a softening of the continuum slope of both images. The lensing galaxy was discovered by Courbin et al. (1998) in the NIR and by Remy et al. (1998) with HST. The authors tentatively identified the lensing galaxy with a previously detected damped Lyman alpha system at z=1.66 (Wisotzki et al. 1993; Smette et al. 1995; Lopez et al. 1999). This identification, however, was disputed by Wisotzki et al. (1998). Using FORS2 at the VLT, Lidman et al. (2000) finally determined the redshift of the lensing galaxy to .
A first determination of the time delay in this system was published by Wisotzki et al. (1998), based on five years of spectrophotometric monitoring of HE 1104-1805, in which the quasar images varied significantly, while the emission line fluxes appear to have remained constant. The Wisotzki et al. (1998) value for the time delay was years (no formal error bars were reported), but they cautioned that a value as small as 0.3 years could not be excluded.
HE 1104-1805 shows strong and clear indications of gravitational microlensing, in particular based on the continuum variability with the line fluxes almost unaffected (Wisotzki et al. 1993; Courbin et al. 2000).
Here we present an analysis of previously unpublished photometric monitoring data of HE 1104-1805. First the data and light curves are presented (Sect. 2), then a number of numerically techniques are described and discussed and, as the scope of this paper is a comparison of different techniques in the case of poorly sampled data, we finally applied to this data set, in order to determine the time delay (Sect. 3). A discussion of the results and the implications for the value of the Hubble constant based on this new value of the time delay and on previously avalaible lens models are given in Sect. 4. In Sect. 5 we present our conclusions.
Between 1993 and 1998, we obtained a B band lightcurve of HE 1104-1805
at 19 independent epochs, mostly in the course of a monitoring
campaign conducted at the ESO 3.6 m telescope in service
mode. The main intention of the programme was to follow the spectral
variations by means of relative spectrophotometry, but at each
occasion also at least one frame in the B band was taken.
A continuum lightcurve, derived from the spectrophotometry,
and a first estimate of the time delay were presented by Wisotzki et al. (1998,
details of the monitoring will be given in a forthcoming paper
(Wisotzki & López, in preparation). Here we concentrate on the broad band
photometric data which have not been published to date.
Images were taken typically once a month during the visibility period.
The instrument used was EFOSC1 with 512512 pixels Tektronix
CCD until June 1997, and EFOSC2 with a 2K2K Loral/Lesser
chip afterwards. The B band frames (which were also used as
acquisition images for the spectroscopy) were always exposed for
30 s, which ensured that also the main comparison stars were
unsaturated at the best recorded seeing of
more than one exposure was made, enabling us to make independent
estimates of the photometric uncertainties. A journal of the
observations is presented together with the measured lightcurve
in Table 1.
|Figure 1: The new photometric dataset running from 1993 to 1998. The zero point for the relative photometry of HE 1104-1805 is the first data point of component A (see Table 1 for error estimates).|
|Open with DEXTER|
A first estimation for the time delay in this system resulted in a value of
years (W98), using the dispersion spectra method
developed by Pelt et al. (1994, 1996; hereafter P94 and P96, respectly). Note
that we will express the time delay as
), since B leads the variability
(see Fig. 1), and thus there appears a minus sign in the result.
We shall demonstrate below that the dispersion spectra method is not bias-free.
To facility a better understanding of this claim, we first briefly describe the
method in the following: the two time series Ai and Bj can
be expressed, using the P96 notation, as
The new dataset used here has the same sampling as the one used for the first estimation of the time delay in W98. As the errorbars for individual points are also very similar, one should expect to obtain a similar time delay. And in fact this is exactly what happens when applying the dispersion spectra method as described above. The original dataset is plotted in Fig. 1. There are 19 observational points for each component. We apply the dispersion spectra method (P94, P96): the result is years, i.e., the same value as the first published estimation.
Since W98 did not provide a formal error estimate, we now investigate the goodness of this value and try to estimate the uncertainty, and we also want to check the self-consistency of the method in this case. For this purpose we do a test based on an iterative procedure: after having applied the dispersion spectra method to the whole data set, we make a selection of the data trying to avoid big gaps between the epochs and considering points in both lightcurves that fall in the same time interval once one has corrected the time shift with the derived time delay. This will avoid the so-called border effects, and a time delay close to the initial one should result when the dispersion spectra are recalculated for the selected data. We do this in the next subsection.
We first consider
years as a
first rough estimate of the time delay, in agreement with W98.
It is obvious that using this time delay, the first
point of the whole dataset (epoch 1993.19) of component B has no
close partner in component A. Eliminating this point means avoiding
the big gap of almost two years at the beginning of the lightcurves. Once
this is done, the last five points of the lightcurve B and the first two ones of A
(after eliminating the epoch 1993.19) are not useful anymore for a time delay
determination since they do not cover the same intrinsic time interval.
We also eliminate these points. Now we
have a "clean'' dataset with 16 points from component A and 13 points from component B. The situation is illustrated in Fig. 2,
|Figure 2: The first point of the whole dataset has been removed and then the points that do not fall in the same time interval once we have shifted the A lightcurve with the value of the first time delay estimation, years. Thus component A has now 16 points and the component B 13 points. If the procedure were self-consistent and the first time delay estimation right, we would naturaly expect a confirmation of this value in a second measurement of the delay by using the new dataset.|
|Open with DEXTER|
Now we again apply the dispersion spectra method to the "clean'' data set, i.e. a second
iteration is made. The
result is surprising:
years. The technique should
converge to a value near to that of the first result, if the previous estimation was correct and
the technique is self-consistent. For consistency, we repeat this
analysis assuming a time delay of -0.38 years, i.e., a third iteration.
The result is again unexpected: we recover the previous
value of -0.73 years. These results can be seen in Fig. 3,
|Figure 3: Dispersion spectra: the upper panel shows the result when all the points are taken into account. In the middle panel, the result after correcting borders with the first estimation of the time delay, i.e. years. In the bottom panel we use a correction of -0.38 years obtained in the middle panel. We recover the previous value for the time delay of years, showing the inconsistency of the method. In each subfigure, two curves are plotted for two different values of the decorrelation length: solid for years and broken for years.|
|Open with DEXTER|
This clearly means that the method is not self-consistent when applying it to the current data set. The dispersion spectra method is very sensitive to individual points, and in poorly sampled sets such as this one, these points are critical. It is obvious that we need better techniques for the determination of the time delay. But these techniques must not be interpolating ones because the lightcurves have lots of variability and wide gaps, and any simple interpolation scheme might introduce spurious signals.
Many authors have applied different versions of the DCF since it was introduced by Edelson & Krolik (1988, hereafter EK88). Here we have selected five of them. These techniques take into account the global behavior of the curves, rather than "critical points''. But in order to well apply all these methods one has to eliminate border effects and gaps as described previously. If one does not do this, one will lose signal in the peak of the DCF and secondary peaks could appear, which can bias the final result. We will demonstrate this last point later (Fig. 7, described in Sect. 3.3.4, is used for this purpose).
First we apply the usual form of the DCF to the data set. We briefly recall
the expression, following EK88:
A modification of this method was suggested by Lehár et al. (1992).
A parabolic fit to the peak of the function was proposed to solve the
problem of not resolving the peak. Doing this fit, we obtain a time
years. These results are shown in Fig. 4.
|Figure 4: The standard DCF and an added fit are shown in this figure. The peak is located at -0.89 years (-0.91 without fit) using a bin semisize 0.07 years. The continuous lines are the noise levels and the zero level is also plotted. Only two points defining the peak are outside the noise band.|
|Open with DEXTER|
The locally normalized discrete correlation function was also proposed by Lehár
et al. (1992). Its main difference to the simple DCF is that it computes the means and
variances locally (i.e. in each bin):
|Figure 5: The LNDCF is evaluated with a 0.07 years bin semisize and the peak is fitted with a parabolic law. The result is a time delay years (-0.91 years without the fit). A secondary peak appears at -0.35 years, although with larger error bars. This peak was the feature that "confused" the dispersion spectra.|
|Open with DEXTER|
In any case, the poorly defined peak means the technique is again quite sensitive to our poor sampling. We look for a method less sensitive to this problem. The two following techniques are two different ways of trying to solve the problem of not having many points around the prominent peak.
The continuously evaluated bins discrete correlation function was introduced by Goicoechea et al. (1998a). The difference to the standard way of computing the DCF in this method is that the bins are non disjoint (i.e. each bin ovelaps with other adjacent bins, see Sect. 3.3.2 where the bins do not overlap each other). One has to fix the distance between the centers of the bins in addition to their width. In this way it is possible to evaluate the DCF in more points, having a more continuously distributed curve. We will have also more significant points around the peak, i.e. above the noise level, and there is no need for fitting.
Selecting the distance between the centers of the bins is again a matter of
compromise: increasing the distance means needing wider bins and, thus, losing
resolution. The adopted time resolution should depend on the sampling; it seems
reasonable to select a value slightly less then the inverse of the highest frequency
of sampling (
years). We choose 0.05 years as the best value for the
distance between bin centers and two values for bin semisizes:
years. The overlapping between bins allows us to consider slightly wider bin sizes.
We plot the results in Fig. 6,
|Figure 6: The CEDCF is a DCF more continuously evaluated. Top panel: using a bin semize of years we obtain a peak at -0.85 years with a good signal-to-noise ratio equal to 3.9. Bottom panel: with a bin semisize equal to years, the peak is at -0.80 years. Although it seems that the function is better defined, i.e. with more points, the signal-to-noise ratio at the peak is 3.8. The continuous lines are the noise levels in both panels (cf. also Fig. 7).|
|Open with DEXTER|
Now we need a good reason for preferring one over the other bin size. This reason could be the signal-to-noise ratio of the peak: in the first case years, S/N = 3.9, and in the second yrs, S/N=3.8. Clearly, the difference of these two values is not high enough to conclude that one of them is the best.
In spite of the insignificant difference in this case, we notice that the
signal-to-noise ratio is an important aspect and it
is here where we justify the need for using "clean'' data sets, i.e.
border effects and gaps corrected. In Fig. 7 we plot the CEDCF
for the original dataset (without any correction):
|Figure 7: Not eliminating borders can be crucial in DCF-based methods. Here the CEDCF has been computed with the original data set, i.e. using all points. There is a peak at -0.90 years, with a signal-to-noise value of 1.95. Other points around a secondary peak located at time zero describe another feature. The great amount of information lost in the main peak is obvious.|
|Open with DEXTER|
To our knowledge, this technique has not been applied before, but it seems a natural step as a combination of the two former techniques (i.e., the LNDCF and the CEDCF). From the one side, we use Eq. (7) for computing the DCF, i.e., it is a locally normalized discrete correlation function. From the other side, we use the idea of overlapping bins described in Sect. 3.3.4. Thus, the final result is a "continuously evaluated bins and locally normalized discrete correlation function'' (CELNDCF). Again, we fix the distances between the bins and also their width. The result will be a function similar in shape to the LNDCF in Fig. 5 but with more points evaluated.
The method was applied for three different values of the bin semisize: 0.07,
0.14 and 0.21 years. The first value is not a good choice, it gives
relatively large errorbars for the points of the CELNDCF, since the number of
points per bin is low.
Selecting the last two values, i.e.
we obtain Fig. 8.
|Figure 8: Top panel: the CELNDCF is evaluated with years bin semisize and a distance between bin centers of 0.05 years. The result is a time delay years. Bottom panel: the CELNDCF computed with years bin semisize. The distances between the bin centers is also 0.05 years. The peak is obtained at -0.75 years where it is assumed to be the time delay.|
|Open with DEXTER|
The following method, called ,
was introduced by Goicoechea et al. (1998b)
and Serra-Ricart et al. (1999). Its expression is
We have selected component B for computing the DAC, because component A has more
variability (presumably due to microlensing). We compute
different values of the bin
semisize. Adopting a bin semisize
the function shows some
features and reaches its minimum at -0.85 years (see Fig. 9,
|Figure 9: The function for three different values of the bin semisize : solid line 0.14 years, short dashed 0.21 years and long dashed 0.28 years. Since , the features in for years is unlikely to be an artifact (see text for more details).|
|Open with DEXTER|
|Figure 10: Upper panel: both DCC (filled circles) and DAC (open circles) are plotted. The bin semisize is years and the DAC has been shifted by -0.85 years, which is the value for the time delay obtained with the technique. Middle panel: the bin semisize is now years. DAC (open circles) has now been shifted by -0.80 years, which is the value obtained with the technique. The bin semisize is now years. Bottom panel: for , again, so the DCC (filled circles) is shifted by that value. In the three subfigures the solid lines indicate the noise levels. The best agreement between DCC and DAC is for years (upper pannel).|
|Open with DEXTER|
In order to better study the features in the
function, we plot it
normalized to its minimum in Fig. 11.
|Figure 11: The minimum of the function gives the time delay between the components. We have normalized it with its minimum. A secondary peak is present around -0.55, a value also considered by W98. The trend of the main feature is asymmetric, favoring values in the range [-0.9, -0.7], including several best estimates of the time delay from other techniques or binning.|
|Open with DEXTER|
To obtain an estimate for the formal error of this method, we used 1000
Monte Carlo simulations. For each simulation we did the following: for each
epoch ti we associated a value in magnitudes
xi is the observed value and
is a Gaussian random variable with
zero mean and variance equal to the estimated measurement error. The histogram
is presented in Fig. 12.
|Figure 12: Histogram of time delays obtained in 1000 Monte Carlo simulations, using the technique.|
|Open with DEXTER|
|Figure 13: The original dataset with the component A shifted by the new adopted time delay, years.|
|Open with DEXTER|
From the tour through the different techniques we can learn several useful things. First of all, when only one technique is selected for deriving a time delay between two signals, it is important to check the internal consistency of the method and its behaviour with a given data set. We have demonstrated in Sect. 3.2 that dispersion spectra does not pass this test at least in this case (see Fig. 3). We have then applied and discussed the discrete correlation function and several of its modifications. The standard DCF (Fig. 4) had problems to properly define the peak in the case of very poorly sampled lightcurves; although a fit was proposed to solve this problem, there were only two points above the noise level in the best case and the fit was not very plausible. The LNDCF (Fig. 5), based on locally normalized bins, had a similar behaviour and although the error bars of each point are smaller, the peak is not well defined either. The CEDCF (Fig. 6) and the CELNDCF (Fig. 8) worked better under these circunstances, but we found the problem of selecting the bin size; in the case of the CEDCF the difference between the two selected bin sizes was smaller than in the case of the CELNDCF. Finally applying the technique, we found a good reason for selecting one bin size: the match between the DAC and the DCC. The resulting estimate and its uncertainty include, as a "byproduct'', the results of the rest of the techniques for the same bin size (except the dispersion spectra method which was not self-consistent). This fact is not the same as computing all the techniques and doing some statistics to obtain an uncertainty. This frequently appears in the time delay determination literature, although it is not at all clear which was the weight of each technique when computing the final result. We note that for consistency we should apply a correction to the original data set with the final adopted time delay of -0.85 years. Due to the (very) sparse sampling of our data set, this correction gives a reduced data set identical to the previous "clean'' data set obtained with a correction of -0.73 years, so we do not need to repeat the whole process. The procedure is self-consistent.
It is important to notice that we have not meant to establish any general hierarchy between all these techniques. The hierarchy is valid in our particular case study. Nevertheless, the idea, not new, of correcting border effects in the signals with first estimations has been proved to be a good procedure in DCF based techniques.
In some of the techniques we have discussed and applied here for the data of HE 1104-1805, there appear secondary peaks/dips located at different values for the time lags (see Figs. 5, 8 and 11). Here we investigate two obvious effects that might cause such behaviour, namely microlensing and sampling. We do this only as a case study for the technique, but assume that our conclusions can be generalized to the other methods as well.
Microlensing affects the two quasar lightcurves differently. That means that the two lightcurves will not be identical copies of each other (modulo offsets in magnitude and time), but there can be minor or major deviations between them. On the other hand, experience from other multiple quasar systems tells us that microlensing cannot dominate the variability, because otherwise there would be no way to determine a time delay at all. In any case, microlensing is a possible source of "noise'' with respect to the determination of the time delay.
A complete analysis of microlensing on this system is beyond the scope of this article, and will be addressed in a forthcoming paper. Here we present a simple, but illustrative, approach to the way microlensing can effect the determination of the time delay, and in particular its effect on the technique. An "extreme'' view of microlensing was investigated by Falco et al. (1991), who showed for the Q0957+561 system that it is very unlikely that microlensing can mimic "parallel'' intrinsic fluctuations causing completely wrong values for the time delay correlations. But strong microlensing clearly affects the features of the cross-correlation function (Goicoechea et al. 1998a). Depending on the exact amplitude and shape of the microlensing event, the main and secondary peaks of this function can be distorted, possibly inducing wrong interpretations.
In order to study this effect here, we do the following:
we consider the lightcurve of component B
(assumed to reflect only intrinsic quasar variability)
and a copy of it, shifted by 0.85 years, which we shall call B'.
Obviously, any technique will give a time delay value
years between B and B'.
In the case of the
technique, a very sharp minimum is located
at this lag.
Now we introduce artificial "microlensing'' as a kind of
Gaussian random process with zero mean and a certain standard deviation
to the lightcurve B'.
We consider three cases:
mag, 0.075 mag and 0.100 mag.
Although microlensing is in general obviously not a random process (it depends
a bit on the sampling),
we use this simple approximation in order to study whether and
how secondary peaks can appear in time delay determinations.
The resulting -functions can be seen in
|Figure 14: We calculate the time delay between the lightcurves B and B' with the technique. B' is a copy of B, shifted 0.85 years and with a Gaussian random process added. Thin solid line: the Gaussian random process has a standard deviation of 0.05 mag. There are no secondary peaks. Dashed line: if the standard deviation of the Gaussian random process is 0.075 mag, some secondary features appear. Thick solid line: the normalized function is much more distorted, but the technique can calculate the shifted value of 0.85 years.|
|Open with DEXTER|
To make sure that this is not a chance observational effect of this particular selected lag, we repeat this exercise for an assumed shift of -0.5 years between the observed lightcurve and its shifted copy, plus added "artificial microlensing'' with mag. Again, the correct value is clearly recovered in all realisations. This is particularly convincing because a lag of 0.5 years is the "worst case scenario'' with minimal overlap between the two lightcurves. To summarize, moderate microlensing can be a cause of distortions of the time delay determination function, but it is unlikely that microlensing dominates it completely.
In order to study the effect of sampling on the shape of the
function, we proceed as follows:
again, we consider the lightcurve of component B and an identical
copy of it shifted by 0.85 years, lightcurve B'.
Now we remove some points from lightcurve B'.
functions are shown
in Fig. 15 for three cases.
|Figure 15: We analyse the sampling effect in the technique. We use lightcurves B and B', being B' a copy of B shifted 0.85 years and removing a number of points. Thin solid line: we remove 2 random points in the component B'. Dashed line: when removing 4 random points, it appears secondary structure in the function. Thick solid line: if 3 selected points are remove, the normalized function is very similar to the one computed with lightcurves A and B (see Fig. 11).|
|Open with DEXTER|
As above, we also want to check whether the particular value of the time lag plays an important role, and we again repeat the simulation exercise with an assumed lag of -0.5 years, and 4 randomly selected points removed. The result is again years, recovering the assumed lag in all cases.
Summarizing, we can state that both microlensing and sampling differences affect the shape of the time delay determination function. However, moderate microlensing will have only small effects on these curves, whereas moderate (and unavoidable!) differences in the sampling for the two lightcurves can easily introduce effects like secondary minima. The primary minimum of the method in all cases considered was still clearly representing the actual value of the time delay. Applied to HE 1104-1805, this means that most likely microlensing does not affect much the time delay determination, the features in the time delay determination function can be easily explained by the sampling differences, and the primary minimum appears to be a good representation of the real time delay.
If one wants to use the time delay to estimate the Hubble parameter H0, one needs to know the geometry and mass distribution of the system. Accurate astrometry is available from HST images presented by Lehár et al. (2000). There are also several models for the lens in the literature. In W98, two models are described: a singular isothermal sphere with external shear and a singular isothermal ellipsoid without external shear. The first model is similar to Remy et al. (1998) and Lehár et al. (2000). Courbin et al. 2000 also present two models: a singular ellipsoid without external shear and a singular isothermal ellipsoid plus an extended component representing a galaxy cluster centered on the lens galaxy.
The redshift of the lens in this system has been establish by Lidman et al. (2000) to be . Note that HE 1104-1805 is somehow atypical, in the sense that the brightest component is closer to the lens galaxy. We use the most recents models by the CASTLES group (Lehár et al. 2000), described by a singular isothermal ellipsoid (SIE) and a constant mass-to-light ratio plus shear model ( ). The derived value for the Hubble constant using the first model (SIE) is with confidence level. A ( ) model gives (), both for . The formal uncertainty in these values are very low, due to the low formal uncertainties both in the time delay estimation and in the models. Nevertheless, the mass distribution is not well constrained, since a sequence of models can fit the images positions (Zhao & Pronk 2000). We note that other models in Lehár et al. (2000) will give very different results for H0, but we did not use them because no error estimate was reported for them. Moreover, the angular separation is big enough to expect an additional contribution to the potential from a group or cluster of galaxies (Muñoz 2001, priv. comm.).
We have shown that the existing data allow us to constrain the time delay of HE 1104-1805 with high confidence between 0.8 and 0.9 years, slightly higher than the one available previous estimate. We have demonstrated that the six different techniques employed in this study were not equally suited for the available dataset. In fact, this case study has demonstrated that a very careful analysis of each technique is needed when applying it to a certain set of observations. Such an analysis becomes even more important in the case of poorly sample lightcurves. In this sense, the technique showed the best behaviour against the poor sampling: unless the lack of information due to sampling is so severe that it prevents the determination of a well defined DAC and DCC, the minimum of the function will be a robust estimator for the time delay.
Our improved time delay estimate yields a value of the Hubble parameter which now depends mostly on the uncertainties of the mass model. The degeneracies inherent to a simple 2-image lens system such as HE 1104-1805 currently preclude to derive very tight limits on H0. We note, however, that there are prospects to improve the constraints on the model e.g. by using the lensed arclet features visible from the QSO host galaxy. Even now, there seems to be a remarkable trend in favour of a relatively low value of H0, consistent with other recent lensing-based estimates (Schechter 2000).
We are especially grateful to Dr. L. J. Goicoechea (Universidad de Cantabria, Spain) for indicating us the best way of applying the technique and for many comments to a first version of this paper. We also thank Dr. L. Tenorio (MINES, USA) for his useful comments on Monte Carlo and bootstrap simulations and Dr. J. A. Muñoz (IAC, Spain) for enlightening our understanding of lens modeling.