Free Access
Issue
A&A
Volume 572, December 2014
Article Number A97
Number of page(s) 4
Section Numerical methods and codes
DOI https://doi.org/10.1051/0004-6361/201424180
Published online 03 December 2014

© ESO, 2014

1. Introduction

In astrophysics it is often crucial to determine the duty cycle (DC) of a source, or a class of sources, in order to understand both their central engines and to plan additional observing campaigns aiming at best studying them. Generally, the DC is defined as the fraction of time, usually expressed in percentages, during which the source is active, or DC=Tactive/TTot,\begin{equation} \label{sfxtsims:eq:DC} {\rm DC}= T_{\rm active}/T_{\rm Tot}, \end{equation}(1)where Tactive is the time spent above some instrumental threshold or some scientifically interesting flux value, and TTot is the total exposure. In the case of periodic sources, such as classical Xray binaries, Tactive is generally the time during which an n-σ detection is achieved (n being 3 or 5, depending on the detection method), and TTot is the orbital period Porb or the spin period Pspin (e.g. Henry & Paik 1969; Fragos et al. 2009; Knevitt et al. 2014).

For active galactic nuclei (AGNs), the DC is often defined as the fraction of time a source spends in a flaring state, that is, at n times the average flux, F\hbox{$\overline{F}$}, with n being a small number, depending on the purpose of the study (e.g. Jorstad et al. 2001; Vercellone et al. 2004; Ackermann et al. 2011). For example, in Vercellone et al. (2004) the DC is defined as χ=ττ+T,\hbox{$\chi = \frac{\tau}{\tau\, + \,T}\;, $} where T is the time spent in a low flux level (off state) and τ is the time spent in a high flux level (on state), defined by HSN=i=1nCi\hbox{$HSN = \sum_{i=1}^{n} C_{i}$}, where Ci = 1 if Fi1.5×F\hbox{$F_{i} \ge 1.5 \times \overline{F}$} and Ci = 0 otherwise.

Alternatively, when a source shows a very large dynamical range (a few orders of magnitude), more can be inferred about its nature by considering the inactivity duty cycle (IDC, Romano et al. 2009) defined as the time a source remains undetected down to a certain flux limit Flim, IDC=ΔTΣ/[ΔTtot(1Pshort)],\begin{equation} \label{sfxtsims:eq:IDC} {\rm IDC}= \Delta T_{\Sigma}/[\Delta T_{\rm tot} \, (1-P_{\rm short})] \, , \end{equation}(2)where ΔTΣ is the sum of the exposures accumulated in all observations where only a 3σ upper limit was achieved, ΔTtot is the total exposure accumulated, and Pshort is the percentage of time lost to short observations that need to be discarded in order to differentiate between non-detections due to lack of exposure from non-detections due to a true low flux state.

Since DCs (and IDCs) are integral quantities depending on the total observing time and the total time spent above (or below) a given flux threshold, they are implicitly dependent on the instrumental sensitivity, observing coverage, and the characteristic source variability timescales. The implicit assumption is that, in order to obtain a meaningful DC, the observations used to calculate them are independent, that is, each observation is not triggered by the previous ones. This is the case, for example, of monitoring programmes whose monitoring pace and exposures are defined a priori and do not depend on the source state.

In this paper we determine the theoretical expectation value of DC and its error. We then consider one specific case, the IDCs measured from ten Swift (Gehrels et al. 2004) X-ray Telescope (XRT, Burrows et al. 2005) observing campaigns on supergiant fast X-ray transients (SFXTs), a subclass of high mass X-ray binaries known for their rapid hard X-ray flaring behaviour and large dynamical range (up to 5 orders of magnitude), and compare the theoretical expectations with both the observed values and with those obtained from Monte Carlo simulations. We also evaluate how the IDC varies as a function of the number of observations available and estimate how many observations are required to obtain an IDC within a desired accuracy. Finally, we supply the reader with useful R–language (R Core Team 2014), IDL, and C–language procedures to calculate several confidence intervals (c.i.) on the DC estimate for a given source.

Table 1

Source sample properties and comparison of measured IDCs with Bayesian estimates and Monte Carlo simulations.

thumbnail Fig. 1

Distribution of IDC values derived from 104 bootstrap simulations (red), each drawn from a sample of size N. The solid vertical line marks the simulated sample mean from Eq. (12). The dashed (green) lines are the curves described by Eq. (7) in the case of a uninformative prior (a = b = 1).

2. Statistical estimate of the duty cycle

We consider one source for which N independent observations were collected and for which the DC was calculated as described in Sect. 1. In the following we estimate the DC, that we hereafter define as μ, but the formalism is unchanged for the case of the IDC which we consider in Sect. 3. In all generality, the stochastic variable state of the source can be seen as a discrete random variable that can take only one of two possible, mutually exclusive states, active (off) and flaring (on), so that μ is the probability of finding the source active in a given casual pointing. After N observations, the probability of finding the source active m times is given by the binomial Bin(m|N,μ)=(Nm)μm(1μ)Nm,\begin{equation} \label{sfxtsims:eq:binomial} {\rm Bin} (m \,| \, N, \mu) = \binom{N}{m} \, \mu^{m} \, (1-\mu)^{N-m} \, , \end{equation}(3)with an expectation value E { m } = μN and variance var{ m } = (1 − μ). Once N and m are known, where m = Nμest and μest is the DC measured from the N observations, then the problem becomes estimating the statistical offset of μ from μest. From the central limit theorem μest is normally distributed in the limit of large values of m and Nm with E { μest } = μ and σ{μest}=μ(1μ)/N\hbox{$\sigma\{\mu_{\rm est}\}=\sqrt{\mu \, (1-\mu)/N}$}.

Hereafter, we adopt a Bayesian treatment, in which μ is treated as a random variable whose probability density function (PDF) depends on the observed values for N and m. From the Bayes theorem, the posterior distribution is proportional to the product of the likelihood and the prior function, and the posterior distribution is to all intents and purposes a PDF of the random variable μ given the observed values for N and m, P(μ|N,m)P(m|N,μ)·p(μ),\begin{equation} \label{sfxtsims:eq:posterior} {P} \, (\mu \,| \, N, m) \propto {P} (m \, | \, N , \mu) \cdot p \, (\mu) , \end{equation}(4)where P (m | N,μ) is the likelihood given by Eq. (3) and is meant to be a function of μ. The prior is denoted by p (μ). Apart from a normalization term, the likelihood is the Beta distribution of μ given N and mBeta(μ|N,m)=Γ(N+2)Γ(m+1)Γ(Nm+1)μm(1μ)Nm.\begin{equation} \label{sfxtsims:eq:beta} {\rm Beta} \, (\mu \, | \, N, m) = \frac{\Gamma\,(N+2)}{\Gamma\,(m+1) \, \Gamma \, (N-m+1)} \, \mu^{m} \, (1-\mu)^{N-m} . \end{equation}(5)The convenient choice (Bishop 2006, Sect. 2.2.1) for a prior function is a conjugated distribution, which is the Beta distribution with parameters a and b, p(μ|a,b)=Beta(μ|a,b)=Γ(a+b)Γ(a)Γ(b)μa1(1μ)b1.\begin{equation} \label{sfxtsims:eq:prior} p \, (\mu | \, a, b) = {\rm Beta} \, (\mu \, | \, a, b) = \frac{\Gamma\,(a+b)}{\Gamma\,(a) \, \Gamma\, (b)} \, \mu^{a-1} \, (1-\mu)^{b-1} . \end{equation}(6)After proper normalization, the posterior in Eq. (4) becomes P(μ|m,N,a,b)=Γ(N+a+b)Γ(m+a)Γ(Nm+b)×μm+a1(1μ)Nm+b1.\begin{eqnarray} \label{sfxtsims:eq:posterior2} &&{P} (\mu | m, N, a, b) = \frac{\Gamma\,(N+a+b)}{\Gamma\,(m+a) \, \Gamma\, (N-m+b)} \notag\\&&\phantom{{P} (\mu | m, N, a, b}\quad\, \times\mu^{m\,+\,a\,-\,1} \, (1-\mu)^{N\,-\,m\,+\,b\,-\,1}. \end{eqnarray}(7)From Eq. (7) the expectation value and variance of μ are

E{μ}=m+aN+a+b=μest+a/N1+(a+b)/Nvar{μ}=(m+a)(Nm+b)(N+a+b)2(N+a+b+1)=(μest+a/N)(1μest+b/N)(1+(a+b)/N)2(N+a+b+1)·\begin{eqnarray} \label{sfxtsims:eq:expvar1} && E\{\mu\} = \frac {m+a}{N+a+b} = \frac {\mu_{\rm est} + a/N} {1+(a+b)/N} \\ &&{\rm var}\{\mu\} = \frac{(m+a)\,(N-m+b)}{(N+a+b)^2 \, (N+a+b+1)} \nonumber \\ && \phantom{{\rm var}\{\mu\} }= \frac{(\mu_{\rm est} +a/N) \, (1-\mu_{\rm est} +b/N)}{(1+(a+b)/N)^2 \, (N+a+b+1)}\cdot \end{eqnarray}The case of an uninformative prior is easily recovered for a = b = 1. We note that in the asymptotic limit of large values of N, E{μ}μestvar{μ}μest(1μest)N,\begin{eqnarray} \label{sfxtsims:eq:expvar2} && E\{\mu\} \simeq \mu_{\rm est }\\ && {\rm var}\{\mu\} \simeq \frac{\mu_{\rm est} \, (1-\mu_{\rm est})}{N }, \end{eqnarray}in agreement with the asymptotic limit of a normal distribution.

For a class of sources consisting of a small number of individuals (Nsrc ≲ 50) the prior p (μ) is unknown, so only an uninformative prior can be used in Eq. (4). Such is the case of SFXTs (Nsrc = 10), which will be detailed in Sect. 3, and for which Eq. (7) can only be used with a = b = 1.

To this end, we provide R–language, IDL, and C–language programs that, given N and DC as calculated according to Eq. (2), provide the 68.3%, 95.4%, and 99.7% c.i. for the theoretical distribution (Eq. (7)).

On the contrary, when Nsrc> 50 the prior can be obtained from the observed distribution of the DCs of all sources by fitting it with the Beta function in Eq. (6) with free parameters a and b. In this case, Eq. (6) turns out to be particularly useful for the newly discovered sources even with relatively few available observations. Through the prior, one can further constrain the DC of a new source by exploiting the information (the fitted values of a and b) acquired on the DC distribution derived from the sources of the same class previously observed.

3. Evaluating duty cycles with Monte Carlo bootstrap simulations

Once the best available measurement, DC(N), has been obtained from a set of N independent observations, one needs to assess its associated error. The DC determinations obtained by accumulating increasing observing time are not independent; therefore, the dataset cannot be used to directly determine the error on DC. Furthermore, the datasets can be so poor that the hypothesis of normal errors does not apply. The standard approach, also validating a posteriori our derivation in Sect. 2, is to perform Monte Carlo simulations.

As a test case, we consider the Swift/XRT monitoring campaigns on the ten SFXTs reported in Table 1, discussed in full by Romano et al. (2014) who calculate the IDCs according to Eq. (2). Table 1 (Cols. 1–5) reports the main properties of the sample. The data were divided in i) yearly campaigns (Y), a casual sampling of the X-ray light curve of an SFXT at a resolution of Psamp ~ 3–4 d over a ~1–2 yr baseline (for these, PsampPorb); and ii) orbital campaigns (O), that sample the light curve intensively with PsampPorb so that the phase space is uniformly observed within one (or a few) Porb. Further details can be found in Romano et al. (2014).

In order to determine the expectation value of IDC and its error, we performed Monte Carlo bootstrap simulations (Efron 1979, 1994). We created M = 104 simulated data sets, drawn from the observed sample of size N with a simple sampling (with replacement, or uniform probability). We calculated M values of IDCs (simulated sample) according to Eq. (2). The simulated sample mean and standard variance (Table 1, Col. 9) are

IDC sim = 1 M k = 1 M IDC sim ( k ) , s sim 2 = 1 M 1 k = 1 M ( IDC sim ( k ) IDC sim ) 2 . \begin{eqnarray} \label{sfxtsims:eq:IDCsim} &&\overline{{\rm IDC}_{\rm sim}} = \frac{1}{M}\sum_{k=1}^{M}{\rm IDC}_{\rm sim}(k) , \\ \label{sfxtsims:eq:IDCs} && s^2_{\rm sim}=\frac{1}{M-1}\sum_{k=1}^{M}({\rm IDC}_{\rm sim}(k)-\overline{{\rm IDC}_{\rm sim}})^2 . \end{eqnarray}

In Fig. 1 we show, superposed on the simulated sample distributions (solid red curves), the simulated sample mean IDCsim\hbox{$\overline{{\rm IDC}_{\rm sim}}$} (vertical line), and the theoretical expectations (dashed green curves) described by Eq. (7). We find that ssim = 2.9–6% for the yearly campaigns and ssim = 8.0–10.4% for the orbital ones.

The standard c.i., defined by the integral of the probability function (i.e. the simulated distributions), the cumulative probability function, F(x)=xIDCsim(x)dx,\begin{equation} \label{sfxtsims:eq:CPF} F(x)=\int_{-\infty}^{x} {\rm IDC}_{\rm sim}(x^\prime) \, {\rm d}x^\prime, \end{equation}(14)can be calculated from F(x1,1)=1c12;F(x2,1)=1+c12;andc1=0.6827,F(x1,2)=1c22;F(x2,2)=1+c22;andc2=0.9545,F(x1,3)=1c32;F(x2,3)=1+c32;andc3=0.9973.\begin{eqnarray} \label{sfxtsims:eq:IDCsim_confidence} F(x_{1, 1}) = \frac{1-c1}{2}; \,\,\, F(x_{2,1}) = \frac{1+c1}{2}; \,\,\, {\rm and} \, c1 =0.6827, \\ \label{sfxtsims:eq:IDCsim_confidence2} F(x_{1,2}) = \frac{1-c2}{2}; \,\,\, F(x_{2,2}) = \frac{1+c2}{2}; \,\,\, {\rm and} \, c2 =0.9545, \\ \label{sfxtsims:eq:IDCsim_confidence3} F(x_{1,3}) = \frac{1-c3}{2}; \,\,\, F(x_{2,3}) = \frac{1+c3}{2}; \,\,\, {\rm and} \, c3 =0.9973. \end{eqnarray}

3.1. IDC as a function of sample size

We can now determine the expected IDC value for a given observed sample size via additional Monte Carlo bootstrap simulations. For each of the sources monitored with yearly campaigns, we created M = 104 datasets drawn from the first S = 10,20,30,...,N observed points, with a simple sampling (with replacement, or uniform probability). The simulated sample mean IDCS\hbox{$\overline{{\rm IDC}_{\rm S}}$} and the standard deviation sS were calculated similarly to Eqs. (12) and (13).

thumbnail Fig. 2

Simulated sample means and their errors IDCS±sS\hbox{$\overline{{\rm IDC}_{\rm S}}\pm s_{\rm S}$} as a function of sample size S (points) for the yearly sample. The last point (filled triangle) is the simulation for N points. The shaded areas mark the 68.3% (red), 95.4% (orange), and 99.7% (yellow) confidence intervals for IDCsim (see Sect. 3) and from Eq. (7) (dark green, green, light green, respectively; see Table 1 in the case of a uninformative prior, a = b = 1).

Figure 2 shows IDCS±sS\hbox{$\overline{{\rm IDC}_{\rm S}}\pm s_{\rm S}$} as a function of the sample size S. The last point (filled triangle) is the simulation for N points for which IDCN=IDCsim\hbox{$\overline{{\rm IDC}_{N}} = \overline{{\rm IDC}_{\rm sim}}$} and sN = ssim (Eqs. (12) and (13)). The red-orange-yellow bands mark the 68.3%, 95.4%, and 99.7% c.i. for the simulated distribution as derived from Eqs. (15)–(17). We note the excellent correspondence between the 68.3% c.i. (red band) and the simulated sample standard deviation sN (the error-bar on the simulation for N points), as expected from a normal distribution. The green bands (from dark to light green) mark the 68.3%, 95.4%, and 99.7% c.i. for the theoretical distribution in Eq. (7) also reported in Table 1, Cols. 6–8.

We define Sa as the minimum S value for which IDC(S) is considered acceptable, that is the number of observations required in order to satisfy both conditions: IDCS[IDCsimssim,IDCsim+ssim]IDCS±sS[IDCsim2ssim,IDCsim+2ssim].\begin{eqnarray*} &&\overline{{\rm IDC}_{\rm S}} \in { \left[\overline{{\rm IDC}_{\rm sim}}-s_{\rm sim}, \overline{{\rm IDC}_{\rm sim}}+s_{\rm sim}\right]} \\ &&\overline{{\rm IDC}_{\rm S}}\pm s_{\rm S} \in {\left[\overline{{\rm IDC}_{\rm sim}}-2\,s_{\rm sim}, \overline{{\rm IDC}_{\rm sim}}+2\,s_{\rm sim}\right]} . \end{eqnarray*}The values of Sa thus determined are reported in Table 1, Col. 10, and they range between 40 and 80 observations, depending on the source.

Similarly, for each of the sources monitored with orbital campaigns, we created M = 104 datasets drawn from S = 5,10,15,20,...,70 observed points, thus also extrapolating the observed sample to determine how many additional observations are required to significantly lower the uncertainty sS. We find that for about 70 observations sS = 3.6–5.8%, thus comparable to those found for the yearly monitoring campaigns.

These findings can easily be used for planning future observations.

4. Conclusions

As an alternative and considerably less time-consuming approach than Monte Carlo bootstrap simulations, we derived the theoretical Bayesian expectation value for a duty cycle and its error based on a finite set of independent observational data points. We have applied our findings to the specific case of the inactivity duty cycle of SFXTs, as one of the available examples of two-state sources. For SFXTs we have compared the theoretical expectations with both the observed values and with the IDCs and their errors obtained from Monte Carlo simulations, as an a posteriori validation of the Bayesian treatment.

Our treatment, however, is more general than the simple case we considered and can be applied to all independent observations of two-state sources, such as AGNs, X-ray binaries, etc., suitable for a meaningful DC determination. In particular, the strength of this approach becomes evident when considering a well-populated class of sources (Nsrc ≥ 50) for which, the parameters a and b can be obtained by fitting the distribution of the observed DCs for all sources in the class with the Beta function in Eq. (6), thus fully characterizing the prior. Then, whenever a new source in the same class is observed for relatively few observations, the knowledge of the prior derived from the whole class can be utilized to further constrain the DC of this still poorly studied individual source by adopting the a and b of the class.

Acknowledgments

We thank A. Stamerra, P. Esposito, V. Mangano, and E. Bozzo for helpful discussions. C.G. acknowledges PRIN MIUR project on “Gamma Ray Bursts: from Progenitors to Physics of the Prompt Emission Process”, PI: F. Frontera (Prot. 2009 ERC3HT). L.D. thanks Deutsches Zentrum für Luft und Raumfahrt (Grant FKZ 50 OG 1301). We also thank the referee for comments that helped improve the paper. The Swift/XRT data were obtained through target of opportunity observations (2007-2012; contracts ASI-INAF I/088/06/0, ASI-INAF I/009/10/0) and through contract ASI-INAF I/004/11/0 (2011-2013, PI P. Romano).

References

  1. Ackermann, M., Ajello, M., Allafort, A., et al. 2011, ApJ, 743, 171 [NASA ADS] [CrossRef] [Google Scholar]
  2. Bishop, C. M. 2006, Pattern Recognition and Machine Learning, eds. M. Jordan, J. Kleinberg, & B. Scholkopf (Springer) [Google Scholar]
  3. Burrows, D. N., Hill, J. E., Nousek, J. A., et al. 2005, Space Sci. Rev., 120, 165 [NASA ADS] [CrossRef] [Google Scholar]
  4. Efron, B. 1979, Annals of Statistics, 7, 1 [Google Scholar]
  5. Efron, B. 1994, An introduction to the bootstrap (New York: Chapman & Hall) [Google Scholar]
  6. Fragos, T., Kalogera, V., Willems, B., et al. 2009, ApJ, 702, L143 [NASA ADS] [CrossRef] [Google Scholar]
  7. Gehrels, N., Chincarini, G., Giommi, P., et al. 2004, ApJ, 611, 1005 [NASA ADS] [CrossRef] [Google Scholar]
  8. Henry, G. R., & Paik, H.-J. 1969, Nature, 224, 1188 [NASA ADS] [CrossRef] [Google Scholar]
  9. Jorstad, S. G., Marscher, A. P., Mattox, J. R., et al. 2001, ApJ, 556, 738 [NASA ADS] [CrossRef] [Google Scholar]
  10. Knevitt, G., Wynn, G. A., Vaughan, S., & Watson, M. G. 2014, MNRAS, 437, 3087 [NASA ADS] [CrossRef] [Google Scholar]
  11. R Core Team 2014, R: A Language and Environment for Statistical Computing (Vienna: R Foundation for Statistical Computing) [Google Scholar]
  12. Romano, P., Sidoli, L., Cusumano, G., et al. 2009, MNRAS, 399, 2021 [NASA ADS] [CrossRef] [Google Scholar]
  13. Romano, P., Ducci, L., Mangano, V., et al. 2014, A&A, 568, A55 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  14. Vercellone, S., Soldi, S., Chen, A. W., & Tavani, M. 2004, MNRAS, 353, 890 [NASA ADS] [CrossRef] [Google Scholar]

All Tables

Table 1

Source sample properties and comparison of measured IDCs with Bayesian estimates and Monte Carlo simulations.

All Figures

thumbnail Fig. 1

Distribution of IDC values derived from 104 bootstrap simulations (red), each drawn from a sample of size N. The solid vertical line marks the simulated sample mean from Eq. (12). The dashed (green) lines are the curves described by Eq. (7) in the case of a uninformative prior (a = b = 1).

In the text
thumbnail Fig. 2

Simulated sample means and their errors IDCS±sS\hbox{$\overline{{\rm IDC}_{\rm S}}\pm s_{\rm S}$} as a function of sample size S (points) for the yearly sample. The last point (filled triangle) is the simulation for N points. The shaded areas mark the 68.3% (red), 95.4% (orange), and 99.7% (yellow) confidence intervals for IDCsim (see Sect. 3) and from Eq. (7) (dark green, green, light green, respectively; see Table 1 in the case of a uninformative prior, a = b = 1).

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.