ISPY -- NACO Imaging Survey for Planets around Young stars: Survey description and results from the first 2.5 years of observations

The occurrence rate of long-period giant planets around young stars is highly uncertain since it is not only governed by the protoplanetary disc structure and planet formation process, but also reflects dynamical re-structuring processes after planet formation as well as possible capture of planets not formed in-situ. Direct imaging is currently the only feasible method to detect such wide-orbit planets and constrain their occurrence rate. We carry out a large L'-band high-contrast direct imaging survey for giant planets around young stars with protoplanetary or debris discs using the NACO instrument at the ESO Very Large Telescope on Cerro Paranal in Chile. We use very deep angular differential imaging observations with typically>60 deg field rotation, and employ a vector vortex coronagraph where feasible to achieve the best possible point source sensitivity down to an inner working angle of about 100mas. This paper introduces our NACO Imaging Survey for Planets around Young stars ("NACO-ISPY"), its goals and strategy, the target list, and data reduction scheme, and presents preliminary results from the first 2.5 survey years. We achieve a mean 5 sigma L' contrast of 6.4mag at 150mas and a background limit of 16.5mag at>1.5". Our detection probability is>50\% for companions with 8\,M$_{\rm Jup}$\ at semi-major axes 80-200au. It thus compares well to the detection space of other state-of-the-art high-contrast imaging surveys. We have contributed to the characterisation of two new planets originally discovered by VLT/SPHERE, but we have not yet independently discovered new planets around any of our target stars. We report the discovery of close-in low-mass stellar companions around four young stars and show L'-band scattered light images of the discs around eleven stars, six of which have never been imaged at L'-band before.


Introduction
We are currently experiencing a golden era of exoplanet research that has led us from the first discovery of a planet  orbiting another Sun-like star (Mayor & Queloz 1995) 1 to the realisation that planetary systems are a natural by-product of star formation. Indeed, our immediate Galactic environment is richly populated with stars that harbour planetary systems.
We are already aware of many stars with known exoplanets within only a few parsecs around the Sun, such as for example, Proxima Centauri at 1.3 pc (Anglada-Escudé et al. 2016), Barnard's Star at 1.8 pc (Ribas et al. 2018), Teegarden's Star at 3.8 pc (Zechmeister et al. 2019), Fomalhaut at 7.7 pc (Kalas et al. 2008) 2 , Trappist-1 at 12 pc (Gillon et al. 2017), or β Pictoris at 19 pc (Lagrange et al. , 2019. Most stars in Sun-like environments 3 seem to host planets. We have also learnt that the variety of planets and the architecture of planetary systems can be very different from what we know from our own Solar System. However, significant gaps remain in our knowledge of the occurrence rate and architecture of planetary systems in general and of the origin and evolution of our own Solar System in particular. The most successful exoplanet discovery techniques thus far are the radial velocity (RV) and transit methods (see also Fig. 1). Although very successful in terms of discoveries and providing statistics, these methods nevertheless have non-negligible detection biases and limitations meaning that important questions remain unanswered. Both methods are intrinsically biased toward short-period planets and both usually avoid young stars because their intrinsic photospheric and chromospheric activity can mask the subtle signals induced by the presence of a planet (Saar & Donahue 1997, Schrijver & Zwaan 2008, Barnes et al. 2017. Astrometry on the other hand, and in particular with the anticipated final release of the Gaia data (individual measurements), will provide us with new discoveries that are expected to reveal the still incompletely known giant planet (GP) population in the 3 -5 au separation range (Casertano et al. 2008;Perryman et al. 2014) in the near future.
However, we still know very little about the frequency of rocky planets in the habitable zone or the occurrence rate of GPs at orbital separations beyond 5 au. Both are important pieces of the puzzle needed for constraining the uncertain ends in planet formation and evolution models (e.g. Baraffe et al. 2003, Fortney et al. 2008, Spiegel & Burrows 2012, Mordasini et al. 2012, Allard et al. 2013). The occurrence rate of long-period GPs is only poorly constrained since it is not only governed by the structure of the protoplanetary discs in which they form and by the physics of the planet formation process, that is, classical core accretion (Pollack et al. 1996;Ida & Lin 2004), gravitational instability (Boss 1997), or pebble accretion (Johansen & Lacerda 2010, Ormel & Klahr 2010, Bitsch et al. 2015, but also reflects dynamical re-structuring processes taking place well after planet formation. Both migration processes during the gas-rich protoplanetary disc phase and dynamical interactions between planets well after the clearing of the disc can change the architecture of planetary systems dramatically between birth and maturity (e.g. Davies et al. 2014;Morbidelli 2018). Furthermore, an unknown fraction of the giant planet population on wide orbits may actually not originate from the system they are in now, but could have been captured free-floating in space, for example in dense star clusters (Perets & Kouwenhoven 2012).
Therefore, if we want to understand how mature planetary systems like our own are formed, there is an explicit scientific demand to find and characterise GPs in wide orbits around young stars, which cannot be satisfied by the successful indirect detection techniques mentioned above. Because of the long orbital timescales involved, the only currently feasible way to explore the GP population in wide orbits is by direct imaging (DI).
Direct imaging with the currently most advanced instruments allows us to probe exoplanets at separations from the host star down to a few astronomical units (1 -10 au, depending on distance and other parameters) and planet masses down to about 1 M Jup (see Fig 1). In terms of separations, DI is thus truly complementary to the RV and transit methods. However, DI is currently only sensitive to gaseous, self-luminous (i.e. young) GPs, while smaller (and rocky) planets are too faint to be detectable with current instrumentation in the glare of their host stars.
Nevertheless, DI with high-contrast techniques has already provided us with some spectacular discoveries (e.g. Marois et al. 2008Marois et al. , 2010Lagrange et al. 2010;Rameau et al. 2013b; Kuzuhara et al. 2013;Macintosh et al. 2015;Keppler et al. 2018). Moreover, the recent advancement of very large high-contrast DI planet surveys with several hundred target stars (see Table 7) has provided us with first constraints on the occurrence rate and distribution of massive (1-13 M Jup ) GPs in wide orbits (20-300 au) around young stars. The derived values for the substellar companion frequency still show a large scatter as they range from < 1% up to 10%, depending on sample parameters and models used (e.g. Lafrenière et al. 2007;Biller et al. 2013;Brandt et al. 2014b;Chauvin et al. 2015;Nielsen et al. 2016Nielsen et al. , 2019Bowler 2016;Galicher et al. 2016;Vigan et al. 2017;Meshkat et al. 2017;Stone et al. 2018). Overall, these results show that such wide-orbit GPs are rare.
However, most of these surveys operate at wavelengths λ ≤ 2.2 µm and are thus optimised for hot (young and massive) planets. Very deep observations at longer wavelengths are needed to also reveal both very young and still embedded (i.e. reddened) planets and protoplanets as well as more evolved (i.e. cooler) planets around somewhat older (few hundred Myr) stars (Fig. 2). Furthermore, it is also important to probe physical separations down to 5 -10 au to bridge the gap between the RVconstrained GP population around mature stars and the very wide-orbit GP population around very young stars revealed by these DI surveys.
To address these needs, we initiated a large (≈ 200 stars) Lband Imaging Survey for Planets around Young stars with NACO (see Sect. 4) at the ESO Very Large Telescope (VLT) on Cerro Paranal in Chile ("NACO-ISPY"). The survey was launched in December 2015, carried out as guaranteed time observations (GTO) with a total budget of 120 observing nights, and was completed after 4 years in late 2019. The NACO-ISPY survey is complementary to the largest and most recent other L -band imaging survey, the Large Binocular Telescope (LBT) Interferometer Exozodi Exoplanet Common Hunt (LEECH; Skemer et al. 2014;Stone et al. 2018), because it covers the southern sky as opposed to the northern sky accessible from the LBT located in Arizona. It is also complementary to most of the other large imaging surveys listed in Table 7 because it is carried out in the L -band (3.8 µm) as opposed to the most widely used H-and K-bands. Last but not least, our survey is also different from most of the other surveys in that we focus exclusively on nearby stars that are surrounded by either protoplanetary transition discs (PPDs) or debris discs (DEBs).
In this paper, we give a general overview of the NACO-ISPY survey, its goals, strategy and targets, and present preliminary results from the first 2.5 survey years. In Section 2 we describe the scientific goals and strategy of the survey. Section 3 gives an overview of the target selection, the master target sample, and the list and properties of targets observed during ESO periods 96 through 100. Observations and data reduction are described in Sections 4 and 5. The first survey results are presented in Section 6 and discussed in Section 7. Section 8 summarises and concludes the paper. Fig. 1: Distribution of planet mass vs. orbital separation of known exoplanets and candidates as listed on exoplanet.eu (Schneider et al. 2011). The main detection methods are marked by different colours. Solar system planets are marked by red letters. The dashed horizontal line marks the approximate deuterium burning mass limit. The grey-shaded area marks the parameter space probed by our NACO-ISPY survey (10% detection probability, cf. Fig. 10).

Survey goals and strategy
Our NACO-ISPY survey is tailored to characterise the wideseparation GP population around nearby young stars during and shortly after the planet formation phase. We exploit the L -band capability of NACO (see Sect. 4) and focus on young stars, thus making NACO-ISPY complementary to most other current large DI surveys which mostly employ the H-band. In particular, we address the following main scientific questions: 1. Can we detect and characterise GPs during their formation phase within PPDs, where indirect methods for shorterperiod planets fail? 2. What are the properties (luminosity and temperature), location (separation), and occurrence rate of such planets? 3. What is the occurrence rate and luminosity and/or separation distribution of long-period GPs in DEBs? 4. What is the relation between DEB properties and the presence of wide-separation GPs? Can we constrain stirring models and establish disc properties as planet tracers (see below)? 5. How frequent are GPs at separations ≥ 5 au around young ( 100 Myr) solar-type stars, i.e. closer to what has already been partially constrained by previous imaging surveys? 6. Does the distribution change with age and at what age do the most significant changes occur? 7. Are the occurrence and properties of wide-orbit GPs correlated with shorter-period GPs such that in-situ formation and outward scattering can be disentangled?
Consequently, our survey is focused on only two types of nearby young stars with circumstellar discs: (a) young (<10 Myr) stars with gas-rich PPDs with direct or indirect evidence for gaps that could be carved by (proto)-planets (transition discs), and (b) somewhat older (up to a few hundred Myr) stars with well-characterised DEBs, where the primordial dust has been processed completely, and the current dust population results from the collisional grinding of larger planetesimals (Wyatt 2008).
We plan to observe approximately 200 stars in total over the planned survey duration of 4 years and within the time budget of 120 nights (which includes weather losses and second-epoch observations). This total number of survey targets is a compromise between the goal to go deeper than previous surveys, the need for a sufficiently large target list to ensure a robust statistical interpretation of the expected low-number discovery outcome, and the GTO time budget. The entire survey is carried out with the NACO instrument (Rousset et al. 2003;Lenzen et al. 2003, see Sect. 4) and the L filter at the ESO VLT on Cerro Paranal in Chile. Every target star is observed once with very deep angular differential imaging (ADI; Marois et al. 2006) observations (2 -4 hr with typically > 60 • field rotation) and a coronagraph employed (where feasible) to achieve the best possible point-source sensitivity down to the smallest possible separations (see Sect. 4). Second epoch observations are only scheduled where interesting companion candidates discovered in the first epoch image require confirmation. For the young PPD stars, we exploit the advantage of the L -band for embedded planets with possible large circumplanetary extinction over the shorter-wavelength near-infrared (NIR) bands. Furthermore, the suspected circumplanetary discs were shown to be very bright at λ > 3 µm (e.g. Zhu 2015; Eisner 2015; Szulágyi et al. 2019), further helping L -band observations even where the embedded planets may not be directly detectable. Indeed, L -band observations were key for detecting the few forming planet candidates that we know of today, namely LkCa 15 b (Kraus & Ireland 2012), HD 100546 b , A&A proofs: manuscript no. 37000corr HD 169142 b (Biller et al. 2014;Reggiani et al. 2014), MWC 758 b (Reggiani et al. 2018), and PDS 70 b , even though all but the last one are still debated (Currie et al. 2019;Rameau et al. 2017;Ligi et al. 2018;Wagner et al. 2019). Further advantages of the L -band are, for example, that scattering from circumstellar disc material, which increases the noise and the probability of false positives, and contamination by background objects are less severe than at shorter wavelengths.
For the older DEB stars, we also exploit the L -band advantage, but here for cooler, that is, lower-mass and/or older planets. More importantly, our NACO-ISPY survey is clearly distinguished from the SPHERE-SHINE survey (Chauvin et al. 2017a) for example, in that we explicitly target a large sample of well-characterised DEB stars and employ a target selection and survey strategy that will allow us to constrain the relation between DEB properties (e.g. signatures of dynamical excitation, see below) and the occurrence of GPs.

Target selection
The targets for the ISPY survey were selected from various databases and the literature. In particular, the PPD targets were compiled from catalogues and studies of Herbig Ae/Be stars (e.g. The et al. 1994;Menu et al. 2015) and then complemented with additional lower-mass objects for which high-spatial resolution observations showed substructures in their circumstellar discs possibly indicative of ongoing (gas giant) planet formation (e.g. Andrews et al. 2011Andrews et al. , 2018. Here, we could only add objects that were not blocked from other ongoing observing programs. The primary focus on Herbig Ae/Be stars was motivated by the idea that in these systems (more massive) gas giant planets might be forming at larger radial separations (cf. Quanz 2015). Indeed, thus far most candidates for young, embedded planets have been detected around such objects (e.g. Quanz et al. 2013Quanz et al. , 2015Brittain et al. 2014;Biller et al. 2014;Currie et al. 2017;Reggiani et al. 2014Reggiani et al. , 2018, although not all of these candidates were confirmed in follow-up studies and the origin of some of the detected signals has recently been questioned (e.g. Rameau et al. 2017;Cugno et al. 2019a;Ligi et al. 2018;Huélamo et al. 2018).
The DEB targets were mainly compiled from the Spitzer IRS catalogue of Chen et al. (2014), the first comprehensive DEB catalogue based on spectroscopic data. To ensure that we select only stars with significant DEB signals, we re-compiled and re-evaluated the complete spectral energy distributions (SEDs) of all target candidates (see Sect. 3.3), and applied primary cutoffs for the fractional disc luminosity at L IR /L bol > 3 × 10 −6 and for the blackbody temperature of the disc excess emission at T d1 < 220 K. The threshold values are chosen to avoid outliers with non-significant excess or suspiciously high fit temperatures. We also inspected all SED fits individually to identify suspicious cases that could be affected by confusion with background or other objects. The new SED fits were also used to re-derive other disc and stellar parameters (see Sect. 3.3 and Fig. 7). Additional DEB targets were selected from the results of the Herschel DEB surveys DUNES (Eiroa et al. 2013;Maldonado et al. 2015) and DEBRIS (Matthews et al. 2010;Booth et al. 2013). In particular, we selected those targets which show indications not only of cold outer debris belts, but also of an additional hot exozodiacal dust belt closer to the host star. The motivation for selecting the latter derives from the speculation that hot inner debris belts are re-lated to the formation of rocky planets and that in such systems more massive gas giant planets might also be forming at larger radial separations. Most of the approximately 30 DEB targets older than 1 Gyr (see Fig. 4) originate from the Herschel detections.
In addition to the specific selection criteria mentioned above, we use the following general selection criteria for compiling our master list of target candidates: 1. −70 deg ≤ DEC ≤ +15 deg, to ensure sufficient observability from the location of Paranal observatory (> 4 hr at air mass < 1.5; with a few exceptions); 2. distance ≤ 160 pc (DEB) and ≤ 1000 pc (PPD) to achieve reasonable spatial resolution; 3. avoid extreme (later than M4, earlier than B8) and uncertain spectral types; 4. K < 10 mag to ensure good adaptive optics (AO) correction for the NACO NIR wavefront sensor; 5. no known close (<1 ) binaries and multiples around bright stars (data archives, catalogues, and literature checked) that could hamper centring of the coronagraph (see Sect. 4); 6. no existing and sufficiently deep ADI L -band observations available (various archives searched).
With these selection criteria, we compiled a master list consisting of 90 PPD and 300 DEB stars. Distances were inferred from Gaia DR2 parallaxes (Gaia Collaboration et al. 2016, 2018 with the method described by Bailer-Jones et al. (2018) and retrieved through VizieR (Ochsenbein et al. 2000), unless stated otherwise in Tables 1 and 2. Age estimates were compiled from the literature. Many stars have multiple partially contradicting published age estimates derived with different methods. In Tables 1 and 2, we list only the age value that we consider to be the most reliable one. The listed age reference may not necessarily point to the original age estimate, but can also refer to a paper that compiles and discusses various age estimates. L magnitudes and their uncertainties are derived from the WISE photometry (Cutri & et al. 2013) and black body interpolation between WISE filters W1 (3.35 µm) and W2 (4.6 µm) to L (3.8 µm).
Since the total time budget for our survey of 120 observing nights permits deep observations of only ≈ 200 stars (see Sect. 4), the target candidates in this master list were separated into three priority categories: priority 1 targets that will be observed, priority 2 targets that will be observed if time permits, and priority 3 targets that will not be scheduled for observations, unless needed as fillers.
Among the 90 PPD target candidates, we gave the highest priority to nearby (< 200 pc) Herbig Ae/Be stars and to transition discs with known gaps and cavities with separations large enough so that L -band observations with VLT/NACO can probe for embedded sources. Intermediate priority was given to Herbig Ae/Be stars at distances > 200 pc and to lower-mass (T Tau) objects with less pronounced or more poorly characterised transition disc signatures. The lowest priority was given to objects where high-contrast L observations were already available or published or where the discovery space was limited for other reasons. This resulted in a list of 43 priority 1 targets, 33 priority 2 targets, and 14 targets that were discarded from the observing list and not even used as fillers.
To prioritise the DEB target candidates in such a way that we maximise the detection probability (success-oriented) while at the same time preserving an (relatively) unbiased sample, we evaluated what the achievable planet mass detection limit at a projected physical separation of 20 au would be. For this purpose, we used mean 5 σ contrast curves achieved during the first few observing runs (Tab. 3) under good weather conditions with our standard observing procedures with and without the coronagraph. These contrast curves were then scaled to the L magnitudes of each target candidate in such a way that at small angular separations, the contrast is preserved independently of the stellar brightness (∆L ≈ 6.4 mag at ∆r = 150 mas), and at large separations (> 3 ), the mean background limit of ∆L ≈ 16.7 mag is preserved (see Sects. 4 and 6.1 and Fig. 9). The resulting contrast curves are then converted to planet-mass detection limits using the L magnitudes, ages, and distances of the stars (tabs. 1, 2, 4, and 5) together with BT-Settl evolutionary models (Allard 2014;Baraffe et al. 2015).
The highest priority was given to DEB stars around which the predicted planet-mass detection limit at 20 au projected physical separation was ≤ 10 M Jup . We also gave high priority to DEBs with detectable gas (CO) content (e.g. Zuckerman et al. 1995;Moór et al. 2011Moór et al. , 2015aMarino et al. 2016Marino et al. , 2017Greaves et al. 2016;Lieman-Sifry et al. 2016), although it is not a priori clear if the CO gas is of primary or secondary origin, in particular in the discs around A stars (Moór et al. 2017). However, in some cases and in particular in the few later-type CO-rich discs, the gas is likely to originate from colliding or evaporating icy planetesimals (Kral et al. 2016;Matrà et al. 2019). Intermediate priority was given to DEB stars with predicted 20 au detection limits between 10 and 20 M Jup . Lowest priority was given to pre-selected target candidates with predicted mass detection limits > 20 M Jup , which were only observed if needed as fillers. This selection process resulted in a final list of 177 DEB targets, of which 87 are priority 1, the same number are priority 2, plus three stars for which we re-classified the IR excess as not significant only after they were observed early in the survey.

Target list
Our final survey target list thus consists of 253 stars, of which 76 stars are surrounded by young PPDs, 177 stars are surrounded by DEBs, and (including the three targets mentioned above for which the IR excess was later classified as not significant). Of these 253 targets, we assigned 130 stars as priority 1 (43 PPDs and 87 DEBs). Figure 3 shows the sky distribution of all 253 ISPY GTO target candidates. Figure 4 shows the distribution of spectral types, ages, distances, and L magnitudes. Spectral types range from early M to late B, corresponding to T eff ≈ 3000 − 10000 K. Ages of the PPD stars range from about 2 to 10 Myr. The majority of the DEB targets have ages of between 10 and 100 Myr with a distribution tail out to a few gigayears (the older ones being mostly Herschel detections). Distances to the PPD stars are mostly between 100 and 170 pc, while the DEB targets are located between 3 and 150 pc, with two distribution peaks around 10 -50 pc and 100 -140 pc. L magnitudes range from 2.5 to 8.5 mag with very few outliers towards the brighter and fainter ends. Figure 5 shows the colour-magnitude diagram (CMD) of our target stars with the stellar ages colour-coded and Siess et al. (2000) isochrones overplotted. The zero-age main sequence (ZAMS) is clearly visible as well as the many redder (cooler) and over-luminous (because of their larger radii) premain sequence (PMS) stars. Figure 6 shows the fractional disc-related excess luminosity L disc /L * = f d (see Sect.3.3) versus stellar age for all ISPY tar- Fig. 3: Sky distribution of all 253 ISPY targets. PPD targets are shown in red, DEB targets in blue. Targets observed in P 96 through P 100 are marked as large filled circles. Targets observed later and with the analysis not yet included in this paper are marked as small empty circles. The grey-shaded area and dotted contours outline the Milky Way disc and bulge as traced by the COBE-DIRBE band 2 (K) zodi-subtracted all-sky map (Hauser et al. 1989). Dotted and dashed horizontal lines mark the approximate declinations for field rotations of 60 • and 50 • , respectively, achievable with a total observing time of 3 hr scheduled around meridian passage of the star.  (Cox 2000). We note that 25 of the PPD targets have distances > 200 pc and are thus not contained in the distance distribution histogram. The vertical dashed line in the lower right panel marks the approximate brightness limit up to which the AGPM can be used. gets, illustrating the dominance of the disc emission over the stellar contribution for the young PPDs and the relatively faint disc contribution from the more evolved DEBs. Error bars in both dimensions can be substantial, in particular for the ages and for the fractional disc luminosities of the young PPD targets (see below). As ages are compiled from the literature and derived with different methods and assumptions, we have no means of deriving consistent uncertainties for all sources and therefore we refrain from showing error bars here. The PPD and DEB targets appear as two clearly separated groups in this diagram with f PPD d > 0.01 > f DEB d , albeit with some age overlap in the range 5 -20 Myr. Because stellar luminosities derived from the SED  (Cox 2000). Stellar ages are colour-coded: red: <8 Myr, light blue: 8 -30 Myr, dark blue: >30 Myr. Isochrones (Siess et al. 2000) for four stellar ages (counting from the birthline) and for metallicity Z = 0.02 are overplotted. Fig. 6: Fractional disc-related excess luminosity vs. stellar age for all ISPY targets. PPD targets are shown in red, DEB targets in light blue. The large dark blue asterisk marks HD 203 for which we show the SED in Fig. 7. We note that stellar luminosities were not corrected for extinction such that fractional disc luminosities are overestimated and appear unphysically large for some of the very young and embedded PPD stars (> 1, light red).
fitting (see Sect. 3.3) were not corrected for extinction, fractional disc luminosities for some of the very young and embedded PPD stars are significantly overestimated and appear unphysically large (>1). For the older DEB stars with typically optically thin discs, where extinction should not play a significant role, the diagram suggests a trend of fractional disc luminosity declining with age as log( f d ) = −0.5 × log(age) − 3, albeit with the note of caution that we introduced a selection bias by applying a lower limit to f d > 3 × 10 −6 . Nevertheless, this trend also holds for the upper envelope of the distribution, which is not affected by the selection bias. Figure 7 shows the flux density distribution of a typical DEB survey target (here HD 203).  (HD 203,cf. Fig. 6). Photometric data are shown as red circles, and the Spitzer IRS spectrum is shown as a black solid line with uncertainties in grey. The best-fit model (orange line) has stellar (blue) and blackbody (green, for the DEBs) components (see Sect. 3.3 for details).

Targets observed during ESO periods 96 through 100
In this section, we describe the 112 target stars (34 PPDs and 78 DEBs) observed during ESO periods 96 throughout 100, which comprise approximately the first half of the survey. The analysis of the complete survey will be presented in a forthcoming paper. Tables 1 and 2 list the names, coordinates, and other parameters for these stars. The observations are described in more detail in Sect. 4.
Distances and mean uncertainties for all but four targets (marked in Table 1) are derived by inverting Gaia parallaxes 4 (Gaia Collaboration et al. 2018). Stellar luminosities and T eff of PPD targets are also taken from Gaia DR2 (Gaia Collaboration et al. 2018), although their values have rather large uncertainties owing to circumstellar extinction, which was not taken into account. Ages and outer disc radii of PPD targets are compiled from the literature with references are given in Table 1. Stellar luminosities, effective temperatures, and disc radii of DEB targets are derived by fitting stellar (PHOENIX; Husser et al. 2013) and blackbody models to observed photometry and spectra. The photometry is obtained from multiple catalogues and publications, including 2MASS, APASS, Hipparcos/Tycho-2, Gaia, AKARI, WISE, IRAS, Spitzer, Herschel, JCMT, and ALMA. In some cases, photometry has been excluded, for example due to saturation or confusion with background or other objects. The fitting method uses synthetic photometry of grids of models to find the best-fitting models with the MultiNest code (Feroz et al. 2009), and an example is shown in Fig. 7. We first fit star + single disc component models, but in a few cases the fits were significantly improved by adding a second blackbody component, which we interpret as an indication that the star harbours debris belts at multiple radii (Kennedy & Wyatt 2014). The blackbody radius of the debris belts, R BB , is then given by (Pawellek & Krivov 2015): An estimate of the 'true' disc radius, R disc , is then obtained by applying a stellar luminosity-dependent correction factor, Γ, which accounts for the radiation pressure blowout grain size, (Pawellek & Krivov 2015), but using the new coefficients given in Pawellek (2016), namely a = 7.0 and b = −0.39, and limiting Γ to Γ max = 4.0 for stars with L * 4 L . Fig. 8: Debris disc sizes (outer belt radii) obtained via SED fitting and corrected for the luminosity-dependent blowout grain size (Pawellek & Krivov 2015) vs. directly measured disc radii derived from spatially resolved observations (values and references for the stars observed in ESO periods P 96 through 100 are in Table 2). Disc sizes derived from VIS/NIR or FIR/submm observations are marked in dark blue. Disc sizes derived from MIR observations are marked in light blue. The dashed line marks equality. Marked in light grey at R disc (obs) = 1 au are those targets that do not have spatially resolved observations.
In addition, we also compile and list in Table 2 disc radii directly obtained from spatially resolved observations where available. These directly measured disc sizes originate from various different tracers and methods for which properly derived uncertainties are not always available and may therefore not always reflect the 'true' outer disc radius. While VIS and NIR observations tracing scattered light and FIR and submm images tracing thermal emission from cold dust should be reliable tracers of the total DEB extent, mid-IR (MIR) observations may trace only hot dust and miss the cold outer dust belts. Nevertheless, these observations provide the best measure we can currently obtain to evaluate the robustness and reliability of the model disc sizes obtained via SED fitting and available for all targets. Figure 8 compares the SED fit-based model radii to the directly observed disc sizes for those 48 targets for which there are known and published disc size measurements available. As can be seen, there is general good agreement between SED-derived and observed disc sizes over a size range of more than two magnitudes. The mean relative discrepancy is ≈ 50% with the most significant outlier showing a discrepancy of a factor of seven. However, this outlier (HD 38678, see Table 2) is one of the five stars for which the disc radius was derived from marginally resolved MIR observations which do not trace the cold outer dust belt and thus underestimate the disc size. We therefore conclude that the SED-derived disc radii are, in general, a good proxy for the actual sizes of the cold outer debris belts.

Observations
During 13 observing campaigns and 64 observing nights (all in Visitor mode) between December 2015 and March 2018 (see Ta-ble 3), we observed and obtained useable data for 112 out of the 253 NACO-ISPY target candidates. In total, we obtained 140 datasets of which 105 were obtained with a coronagraph (see below).
For all observations in this survey, we use the AO assisted imager NACO, which is mounted in the Nasmyth A focus at UT1 (Antu) of ESO's VLT on Cerro Paranal in Chile. NACO is equipped with the AO front end, NAOS (Rousset et al. 2003), and the NIR imaging camera CONICA (Lenzen et al. 2003). All stars are observed with the L27 camera of NACO, which provides a pixel scale of 27.2 mas/pix, and with the L filter (λ 0 = 3.80 µm, ∆ λ = 0.62 µm). The pixel scale corresponds to a sampling of ∼3.5 pix per λ/D, the diffraction-limited full width at half maximum (FWHM).
To achieve high contrast between the star and its immediate surroundings, we carry out ADI observations in pupil tracking mode. For ADI, the total time on source time is governed by the need for a field rotation large enough to efficiently remove the speckle noise. Therefore, we observe each star for 2 -4 hr around its meridian passage, thus typically achieving field rotations of between 70 • and 100 • . In Figure A.1 we present histograms displaying the total time on target, field rotation, seeing, and coherence time for all data sets.
To further improve the contrast at small angular separations from bright stars, we use the annular groove phase mask (AGPM) vector vortex coronagraph (Mawet et al. 2013;Absil et al. 2014) for all stars that are bright enough to centre the coronagraph (L 6.5 mag). The AGPM effectively suppresses the on-axis starlight by re-directing it outside the pupil, where it is blocked by the Lyot stop. The typical detector integration time (DIT) with AGPM is 0.35 s. To model and correct the thermal background emission, we switch to an offset sky position every eight minutes (13 exposures (NEXPO), 100 detector integration times (NDIT)). We use NACO's cube mode to store each individual image frame for frame selection and sky reconstruction. To properly correct for the off-axis AGPM transmission, we measured the AGPM throughput as described in Appendix B and obtained the radial throughput curve shown in Fig. B.1.
For targets fainter than L ≈ 6.5 mag, the AGPM cannot be used because they are too faint for precise centring behind the AGPM. However, we use the same total integration time to achieve similar sensitivity at larger separations. The typical detector integration time (DIT) without AGPM is 0.2 s. The star is positioned at the centre of one quadrant of CONICA and after one exposure (NEXPO=1, NDIT=126) is moved to the centre of the next quadrant using fixed offsets. This allows for continuous observation of the star without applying additional telescope offsets for sky frames.
With this approach, we reach a star-planet 5 σ contrast of typically ∆L ∼7 mag at 0 . 2 and a background detection limit of   Table 4. (e) log(L * ) and T eff (rounded to full ten) adopted from Gaia-DR2 (Gaia Collaboration et al. 2018), unless indicated otherwise (references 31 through 32). (f) Ages compiled here are taken from the literature and derived with different methods and different treatment of uncertainties. Some ages refer to references which only summarise different other age estimation attempts. (g) Outer disc radii are compiled from the literature and originate from different methods. They are corrected for new Gaia-DR2 distances where necessary. (h) References 1 through 3 refer to distance, 11 through 26 refer to age, 31 through 32 refer to L * and T eff , and 41 through 61 refer to observed disc radius. (i) The Gaia G magnitude is listed for Elias 2-27 because no V mag is available. ∼16 mag at > 1.5 (see also Sect. 6.1 and Fig. 9). With this strategy for very deep and high-contrast observations and a smallest possible inner working angle (IWA) of ≈ 100 mas, we are able to observe two to three targets per night on average.
The lower left quadrant of CONICA is strongly affected by bad columns with large constant offset and low sensitivity (see also Sect. 5.1). Until June 2017 every eigth column was affected, but since then, the situation has deteriorated and 38% of this quadrant is affected. Therefore, the lower left quadrant is no  longer used in the observing cycle without the AGPM, while the less-severe bad-column problem of a second affected detector quadrant could be handled with post-processing (Sect. 5.1).
The ADI observation is bracketed by unsaturated flux measurements. To avoid saturation, the DIT of CONICA is adjusted depending on the brightness of the star in L -band, and the star is cycled through the three detector quadrants that are not affected by the bad columns.
An astrometric field is observed at least once per observing run with and without the AGPM to monitor and measure the pixel scale and true north orientation of the detector. The fields are 47 Tucanae and the region around the Orion Trapezium Cluster. The analysis of the astrometric fields is described in Appendix C, and the resulting mean astrometric parameters are listed in Table C.2.

Data reduction and analysis
All data taken by the ISPY survey are homogeneously reduced using the GRAPHIC pipeline (Hagelberg et al. 2016a). While a summary of the philosophy of the pipeline is explained in Hagelberg et al. (2016a), substantial changes have been made to many of the steps to optimise them for NACO-ISPY data. Since two types of data are taken during the survey (with and without the AGPM coronagraph), two reduction pathways are used. These steps converge after the data have been cleaned, and the final reduction steps are performed in the same way.

Cleaning of coronagraphic data
The NACO detector has two quadrants with defective columns. For the first of these, three in every eight columns show a constant value. To avoid problems in subsequent reduction steps, we first correct for these by replacing them with the mean of the four closest non-affected pixels from the same row. Dark frames are then made by median-combining each cube of sky frames. We use the PCA 5 -based sky-subtraction scheme outlined in Hunziker et al. (2018) to remove the background flux, subtracting the first five PCA modes from each image. We then correct for the second quadrant with defective pixels. In this quad-5 Principal Component Analysis rant, one in every eight columns has a dark current offset that varies in time. We correct for this by subtracting the median of each affected column from that column.
To calculate the position of the star behind the AGPM, we use a novel centring routine applied individually to each ∼ 0.35s frame. This procedure is summarised here, while a more extensive discussion and a description of its performance will be the subject of a subsequent publication (Godoy et al., in prep.). We perform a two-component fit to the area around the AGPM centre, consisting of a Gaussian for the star and a second profile for the subtraction by the AGPM, which we represent by a Gaussian with negative flux. To improve the stability of the fit, we fix the FWHM and position of the AGPM to the value obtained as described below.
Since the position of the AGPM coronagraph moves on the NACO detector, we use the edges of the 15" circular aperture around the AGPM to calculate its centre in each image, and use a fixed offset between the centre of this aperture and the position of the AGPM mask calculated from the sky frames. We find that the AGPM moves within a 50x10 pixel region, and its position often changes on hour timescales by several pixels, making this a crucial step in the reduction process. Due to the movement of the AGPM and the presence of significant jitter on short timescales with NACO, we find that this centring procedure outperforms other methods used in the literature.
We remove frames from the reduction that show 5σ outlying values in several criteria using the median absolute deviation as a robust estimator of the standard deviation. First, we use the distance between the measured star position and AGPM position, and then we use the total flux measured in an annulus around the star between 40 and 190 mas, and the scatter of the values in the same annulus.

Cleaning of non-coronagraphic data
In the same way as for the AGPM coronagraphic data, we first correct the defective columns in the first quadrant as described in the previous section. To subtract the sky background for noncoronagraphic data, we use the three-position dithering pattern used in our observations. Often, the stellar point spread function (PSF) is not clearly visible in the raw frames due to strong darkcurrent patterns on the detector and so an iterative approach to finding the star position must be taken. We first calculate rough sky frames by taking the mean of the frames in each three-dither sequence. We use the same PCA-based sky subtraction routine as for coronagraphic data, removing the first five modes. We then calculate the star position by performing a Gaussian fit to the PSF, recording the model parameters for each individual exposure.
Using the calculated star positions, we build a robust set of sky frames by again taking the mean of the frames in each threedither sequence, masking out an area of 40×40 pixels around the calculated star position in each exposure. We then apply the sky subtraction routine a second time using these frames, and repeat the centring procedure to ensure a robust determination of the star position.
We apply automatic criteria to remove bad frames from our data, by removing 5σ outliers in the flux and FWHM measured from the Gaussian fit, calculated across all frames of the entire observing sequence. We also reject 5σ outliers in the star position calculated within each cube (i.e. each dither is treated separately).

Angular differential imaging and point-source detection limits
The remainder of the reduction procedure is common to both coronagraphic and non-coronagraphic observations. Data are binned by centring each individual exposure and taking the median over each original data cube (roughly 100 frames, or 35 seconds). We apply a phase ramp to the Fourier transform of each image and take the inverse transform in order to recentre the data. We then use a PCA-based algorithm to subtract the flux from the star (Soummer et al. 2012;Amara & Quanz 2012). We apply PCA in annuli with a width of 2 FWHM. For each frame, we build a reference library using those frames where the change in parallactic angle is sufficiently high that a companion would have moved by more than 0.75 FWHM. We perform a number of reductions by subtracting 10-50% of the available PCA modes, using the 30% reduction as our baseline for the detection limit calculation. The frames are then derotated using Fourier transforms and median-combined to produce a final image. By using Fourier transforms to shift and derotate the images, the amplitude and structure of the noise in each image is preserved as well as the amplitude and shape of any potential signals and the stellar PSF. In addition, both steps are reversible when done in this way. This stands in contrast to interpolation-based methods, which have the effect of a low-pass filter and are nonreversible. These benefits were explored by Larkin et al. (1997) and Hagelberg et al. (2016b) among others.
Before calculating the detection limits, we first subtract large-scale structures by subtracting the median within a 20 × 20 pixel box from each pixel. We estimate a 1D noise curve by taking the standard deviation of the pixel values in annuli of 1×FWHM in width. This is then converted to a detection limit curve using the peak stellar flux corrected for small sample statistics and for the throughput of the PCA algorithm. For correcting for small sample statistics, we follow the approach proposed by Mawet et al. (2014) such that a constant false positive probability of 2.9 × 10 −7 (equivalent to 5 σ for Gaussian noise) is maintained. The throughput of the PCA algorithm is estimated by injecting the mean stellar PSF into the cleaned, binned images with a signal-to-noise ratio (S/N) of 7 and repeating the PCA reduction. The ratio of input and output flux is then recorded. This is repeated ten times in each annulus, with the azimuth of the in-jected PSF changed each time. The mean of these values is used to correct the 1D detection limit. The transmission of the AGPM coronagraph was measured on-sky (see Appendix B) and a corresponding correction applied where applicable. Fig. 9: Left: 5 σ L contrast curves for all targets observed during P96 through P100 (light grey, corrected for AGPM throughput where applicable; see Sect. 4) and median contrast (thick red). Right: 5 σ L detection limit curves.

Overview and detection thresholds
Tables 4 and 5 list the main observing parameters, L magnitudes, and achieved 5 σ contrast values at six angular separations for all 33 PPD and 78 DEB targets observed during ESO periods 96 through 100. We use the classical 5 σ approach here, corrected for both small sample statistics  and AGPM throughput, to characterise our detection thresholds and discovery space in a comparable way. At this stage of the survey, we do not employ an automated detection algorithm and therefore do not quantify our detection limits in terms of 95% completeness like, for example, Wahhaj et al. (2013b) or Stone et al. (2018). This must be kept in mind when comparing the detection spaces of different surveys.
We achieve a median 5 σ L contrast at 150 mas of ∆L = 6.4 ± 0.1 mag with best and worst values of ∼8 mag and ∼4 mag, respectively. This contrast close to the IWA is to first order independent of the brightness of the star. The scatter seen in Fig. 9 is mainly caused by variations in the observing conditions. At separations > 3 , we achieve a slightly stellar brightness-dependent 5 σ detection limit of L bg = (15.7 ± 0.2) + L * × (0.16 ± 0.03) mag with best and worst values of 17.5 mag and 15 mag, respectively (Fig. 9).
To evaluate the companion detection space of our survey, we first convert the 5 σ contrast curves (Fig. 9) to planet mass detection limit curves using the L magnitudes, ages, and distances of the stars (Tables 1, 2, 4, and 5) together with the CIFIST2011bc BT-Settl evolutionary model isochrones for the NACO passbands (Allard 2014; Baraffe et al. 2015). The BT-Settl models adopt a 'hot start' and solar abundances and employ a cloud model, which puts them in between the two models assuming atmospheric extremes with respect to dust and cloud opacity, COND (no photospheric dust opacity, Baraffe et al. 2003) and DUSTY (maximal dust opacity, Chabrier et al. 2000). We then run Monte-Carlo (MC) simulations in which we assign each star one planet with random mass, semi-major axis (SMA), orbit orientation, and orbital phase, and verify if this planet is above or below the mass detection threshold (5 σ) at the respective projected separation. To average down the MC noise, we per- form 10 6 mock survey runs. If probability distribution functions (PDF) for planet occurrence rate, masses, and SMA are invoked, the same setup can be used to predict how many planets (and BDs) we should have detected given an assumed underlying PDF. Eccentricities are currently set to zero, but assumptions on the eccentricity distribution will be taken into account for the complete statistical analysis of the entire survey. Figure 10 shows our resulting detection space in terms of achieved detection probability versus companion mass and SMA 6 . Our detection probability is ≥50% for companions with M ≥ 8 M Jup in the SMA range 80 -200 au and M > 13 M Jup at 30 -250 au, but reaches at the 10% level down to 3 M Jup at 40 -500 au and out to 5 -1000 au for companions with M > 13 M Jup .

Characterisation of known companions
Within the survey, we have observed and characterised a few stars with previously known companions, have detected and confirmed a number of previously unknown companions, have detected several discs at L -band, and have identified several companion candidates for which analysis and confirmation is still ongoing. For this reason, we also cannot yet provide a statistical analysis with constraints on the underlying population of GPs and BDs.
Early results on individual targets have been published in separate papers. The discovery of a planetary-mass companion within the gap of the PPD around the pre-main sequence star PDS 70 (V * V1032 Cen) by VLT/SPHERE observations along with L characterisation by VLT/NACO observations was pub-lished by Keppler et al. (2018) and Müller et al. (2018). A detailed characterisation of the SPHERE-discovered warm, dusty giant planet around the young A2 star HIP 65426 via L NACO and other observations was published by Cheetham et al. (2019).

New and confirmed companions
The discovery of a close (18.7 au) low-mass stellar companion to the young (1-3 Myr) PPD star R CrA was published by Cugno et al. (2019b). This companion was independently and simultaneously also discovered by SPHERE (Mesa et al. 2019). Another close (11 au) low-mass stellar companion residing within the gap between the host star and its DEB was discovered around HD 193571 by Musso Barcucci et al. (2019).
Around at least two other stars (HD 72660 and HD 92536), we find previously unmentioned close (< 1 ) low-mass stellar companions, both shown in Fig. 11. For both companions, we can already astrometrically reject the background hypothesis and confirm co-motion. HD 72660 is a ∼200 Myr old A0 star with no significant IR excess at a distance of 98.8 ± 0.1 pc (Table 2). We find a previously unmentioned secondary source at a mean angular separation of ρ ≈ 270 mas and P.A. ≈ 55 • with ∆L ≈ 5.5 mag (Fig. 11). The background star hypothesis could be rejected and co-motion confirmed via second-epoch observations (Table 6). Using BT-Settl evolutionary models (Allard 2014;Baraffe et al. 2015) and adopting the distance of the primary, the L brightness and astrometry correspond to a 0.44 ± 0.04 M (M1) companion at a projected separation of R = 24.4 ± 0.5 au. We also detect significant relative motion between primary and secondary with ∆ρ ≈ 13.5 mas/yr and ∆ P.A. ≈ 1.1 • /yr. Since we only have two astrometric measurements covering only a small orbital arc without any constraint on curvature, we use the prescription provided by Pearce et al. (2015) to ver- 12.4 (a) L magnitudes and their uncertainties are derived by interpolation between WISE bands W1 (3.35 µm) and W2 (4.6 µm) to 3.8 µm (Cutri & et al. 2013). (b) Corrected for AGPM throughput where applicable (see Sect. 4).  (Chen et al. 2014), the astrometry given in Table 6, and the Gaia distance, we derive 7 the dimensionless parameter B = V sky /V esc = 0.14 +0.14 −0.09 (Pearce et al. 2015), indicating that the companion is very likely bound provided the SMA of its (possibly eccentric) orbit is larger than a min = R/2 (1 − B) −1 ≈ 14.7 au. 8 No meaningful constraints can be put on other orbital parameters such as eccentricity or inclination at this point, except that an edge-on orbit can be ex-7 http://drgmk.com/imorbel/ 8 B > 1 would indicate the companion is unbound. cluded (i < 86 • ). A chance projection of an unbound object cannot be strictly ruled out as long as there is no constraint on the curvature of motion, for example by third epoch astrometry. However, this is extremely unlikely since no other point source is detected in this field, that is, the local stellar density must be small.
HD 92536 is a ∼46 Myr old B8 star at 157.3 ± 1.1 pc with a debris belt at r DD = 13 ± 3 au (Table 2). We find a previously unmentioned secondary source at ρ ≈ 818 mas and P.A. ≈ 276.5 • with ∆L ≈ 5.36 mag (Fig. 11). The background star hypothesis could be rejected and co-motion confirmed via second-epoch proper motion and parallax of the primaries, including their uncertainties (which are however too small to be noticeable).
observations (Table 6). Using again the BT-Settl models, the L brightness and astrometry correspond to a 0.33±0.05 M (M2-4) companion at a projected separation of ∼ 129 ± 1 au. We also detect significant relative motion between primary and secondary of ∆ρ ≈ −3.9 mas/yr and ∆ P.A.  (Pearce et al. 2015), indicating that also this companion is very likely bound provided a > a min ≈ 76 au. No meaningful constraints can be put on other orbital parameters at this point, except that an edge-on orbit can be excluded (i < 85 • ).

Disc detections
Although our ADI observations were not optimised to detect extended emission from circumstellar discs, we have detected scattered light at L -band from discs around 11 of the 112 target stars observed in P 96 through 100 (Fig. 12). All of these discs except R CrA have been imaged at millimetre wavelengths with ALMA. Five of the eleven discs have already been well-studied and also detected and imaged at L -band (and shorter NIR bands). These are the discs around the 20 Myr-old A0 DEB star HD 32297 at 133 pc (Schneider et al. 2005;Rodigas et al. 2014;Bhowmik et al. 2019), the 3.8 Myr-old B9 PPD star HD 100546 at 110 pc (Quanz et al. 2013Avenhaus et al. 2014;Currie et al. 2014;Sissa et al. 2018), the 5.4 Myr old K5 PPD star PDS 70 at 113 pc (Hashimoto et al. 2012;Keppler et al. 2018), the 4.7 Myrold B9 PPD star HD 141569 at 111 pc (Currie et al. 2016;Mawet et al. 2017;Perrot et al. 2016), and the 2 Myr-old F6 PPD star HD 142527 at 157 pc (Rameau et al. 2012;Canovas et al. 2017;Avenhaus et al. 2017).
In this paper, we only show the L -band images of the detected discs without attempting any kind of further analysis, which would require data at other wavelengths and radiative transfer simulations. Since the L -band scattered light does not necessarily trace the discs in their entirety and our ADI observing mode and data reduction filters out significant parts of the emission, we also do not attempt to derive disc sizes from the images.

Detection limits and preliminary statistical analysis
Based on the 5-σ contrast curves shown in Fig. 9, stellar properties (Tables 1 and 2), and the BT-Settl evolutionary models (Allard 2014;Baraffe et al. 2015), we calculate planet mass detection limit curves for all stars and derive a map of the raw detection probabilities in companion mass -SMA space (Fig. 10). With the same approach, but assuming an underlying companion mass function (CMF) for GPs and BDs from Reggiani et al. (2016) together with a normalisation based on the newest analysis of all existing DI survey results from Galicher et al. (2016), we evaluate how many companions we should have already detected provided the assumptions for the underlying CMF holds for our target stars and provided we would have completed all our ongoing candidate confirmation efforts. We run MC simulations of our survey in which we assign every star a companion (one or none) according to the CMF, place the companion at a random position on a randomly oriented (circular) orbit, and count the expected detections. We find that we should have detected 2.0 +3.7 −1.8 companions in the GP-BD mass regime (0.5 +1.1 −0.3 GP and 1.5 +1.6 −0.5 BD). While the poisson noise accounts for an uncertainty of the predicted number of companion detections of about ±1.3, the largest part of the uncertainties stated above derives from the uncertainty in our current knowledge of the underlying CMF (see Galicher et al. 2016).
Assuming both the findings reported by Meshkat et al. (2017) as well as their assumption on the functional shape of the underlying CMF, we should have detected 8 +3.4 −2.2 companions in the GP-BD mass regime (2.0 +1.6 −0.9 GP and 6 +3 −2 BD). Since verification efforts for several BD as well as GP candidates are still ongoing, we cannot yet draw any robust conclusions on occurrence rates or distinguish between the two different predictions. At the end of our survey (including companion verification efforts), we plan a statistical analysis that does not only include the detection thresholds and detections of our survey, but includes that of other surveys as well.

Comparison to other surveys
A list of deep imaging surveys of nearby (<100 pc) stars carried out between 2003 and 2013 is conveniently provided by Chauvin et al. (2015). Stone et al. (2018) list L -band surveys carried out between 2007 and 2017. Bowler (2016) provides a comprehensive overview and description of previous and ongoing large imaging surveys. To put our NACO-ISPY survey into context, we compile in Table 7 a condensed overview of previous and ongoing large DI surveys dedicated to exoplanet searches, which is not complete and does not include many of the early and smaller surveys. In the context of this large variety of big surveys, our NACO-ISPY survey is the largest L -band survey and the largest survey (including shorter wavelengths) that exclusively targets young stars with either PPDs or DEBs. A di- rect comparison between different surveys is not straightforward since there is no uniform metric, the detail of information provided by the various publications differs largely, detection limits (if provided) are derived in different ways, and a correction for small-sample statistics  is not always applied. Harmonising the results regarding the two latter points would require uniform re-reduction of all data for direct comparison.
We focus our comparison here on the following surveys: (1) Rameau et al. (2013a), a deep L survey with NACO and with the largest target overlap, (2) LEECH, the newest and deepest L survey with a small target overlap (northern sky, Stone et al. 2018), the two largest DEB samples from (3) the Gemini/NICI planet-finding campaign (H-band, Wahhaj et al. 2013b) and (4) SEEDS (H-band, Janson et al. 2013), and (5) the first 300 stars from GPIES, which did not explicitly targets DEB stars but still has a large overlap with our DEB sample (H-band, Nielsen et al. 2019). The purpose of the following mostly qualitative comparisons is to present our NACO-ISPY survey in the context of existing, comparable large DI surveys. For the reasons mentioned above and below, performance numbers can mostly not be compared directly and serve only to describe the approximate detection space of the different surveys.
Our NACO-ISPY survey shares the same instrument and filter and the largest overlap (30 targets) with the survey of young, nearby, and dusty stars conducted by Rameau et al. (2013a). In contrast to our observing strategy, Rameau et al. (2013a) did not employ a coronagraph and restricted the time per target to 90 min, irrespective of the field rotation achieved. Their median field rotation is of the order of 25 • , while we achieve ∼85 • (Fig. A.1b). Contrast curves in Rameau et al. (2013a) are given in the classical 5 σ scheme, but are not corrected for small-sample statistics. Nevertheless, a direct comparison between the published curves reveals that we achieve on average 0.5-0.7 mag better contrasts at small angular separations of <300 mas, that is, inside the wings of the PSF where the contrast is almost unaffected by the background limit (which the latter authors do not mention, but which is probably similar to what we achieve). Planet mass detection spaces as published appear similar between NACO-ISPY (Fig. 10) and Rameau et al. (2013a, their Fig. 6), but numbers cannot be compared directly because of different calibrations, target star properties, and evolutionary models used (COND vs. BT-Settl in ISPY). For our anticipated final statistical analysis, we will incorporate the Rameau et al. (2013a) data after re-evaluating the Imaging modes: Sat-I: saturated imaging, ADI: angular differential imaging, Cor-ADI: coronagraphic ADI, ASDI: angular and spectral differential imaging. (b) In brackets the number of targets overlapping with NACO-ISPY. (c) A single number represents the median, otherwise the range is given.  Marchis et al. 2016;Macintosh et al. 2018), (12) SpHere Infrared survEy, ongoing (Vigan et al. 2016;Chauvin et al. 2017a,b;Keppler et al. 2018;Müller et al. 2018;and others). (13) DEB stars, (14) Young moving group stars, (15) Young B and A stars. significance of the DEB excess and coherent re-reduction of all data.
The most recently completed and probably deepest Lband survey, albeit with only a small overlap (eight targets), is the LEECH) survey conducted with the LBT on the northern sky (Skemer et al. 2014;Stone et al. 2018). LEECH also did not employ a coronagraph, but was able to benefit from the excellent LBTI AO system (Bailey et al. 2014), deformable secondary mirrors, and the simultaneous but not interferometric use of two 8.4 m mirrors. LEECH also performed ADI and achieved a mean field rotation of ∼65 • . Stone et al. (2018) show two sets of contrast curves: the classical 5 σ contrasts without small sample correction and small-sample-corrected contrast curves based on 95% completeness, which they use for the analysis. The 'modern' 95% completeness curves produced by these latter authors are on average 0.28 mag less sensitive than their classical 5 σ curves. A direct comparison between the LEECH 5 σ curves and our small-sample-corrected median 5 σ contrasts (Fig. 9) indicates that LEECH achieved significantly better L contrasts at small angular separations (∆L ∼ 9.0 mag at ∆r = 300 mas vs. ∼ 7.6 mag for ISPY), but has an IWA of ∼250 mas, while the coronagraphic NACO-ISPY observations go down to 100-150 mas. Contrasts at larger angular separations are incomparable since they depend on the contrast between target star brightness and background limit (which we do not know for LEECH). The planet mass detection space of the entire LEECH survey as published by Stone et al. (2018) covers a similar SMA/projected separation range to ours, but the depth cannot be directly compared without putting all data into the same model. Although LEECH did not explicitly target stars with DEBs and observed in the other hemisphere, our two surveys have eight targets in common. For our final statistical analysis, we incorporate the LEECH data not only of these eight stars, but also of other (northern) targets which prove to have a significant DEB.
The NICI (Wahhaj et al. 2013b) and SEEDS (Janson et al. 2013) DEB surveys targeted 57 and 50 stars, respectively, of which 31 and 22 are in common with NACO-ISPY. The NICI observing campaign employed a semi-transparent flat-topped Gaussian focal plane mask to reduce scattered light from the central star, which provided an effective IWA of 0 . 32 for faint companions. Wahhaj et al. (2013b) achieve a mean 95% completeness contrast (which for NICI agrees for most stars well with the traditionally used 5 σ contrast; see Wahhaj et al. 2013a) of ∆H ≈ 10.5 mag at 0 . 36. In terms of planet mass detection limit (BT-Settl models, Allard 2014; Baraffe et al. 2015), this compares well with the median 5 σ contrast of ∆L ≈ 8.5 mag which we achieve with NACO-ISPY at this angular separation (cf. Figs. 2 and 9). However, with our coronagraphic NACO-ISPY observations, we achieve a significantly smaller IWA of 100-150 mas, which makes the two surveys truly complementary. The other two sub-surveys of the NICI campaign (Biller et al. 2013;Nielsen et al. 2013) that also have certain target overlap with NACO-ISPY have similar detection limits. The SEEDS observations were done in saturated ADI mode with the saturation extending typically out to a radius of 0 . 3 and contrasts that typically stay below those achieved by NICI. Nielsen et al. (2019) recently published a statistical analysis of the first 300 stars observed by the Gemini Planet Imager Exoplanet Survey (GPIES). Although GPIES did not explicitly target DEB stars, the survey has 60 targets (mostly DEB) in common with NACO-ISPY (full sample). GPIES achieves a coronagraphic IWA at H-band of 0 . 12, which is very similar to what NACO achieves at L -band. Nielsen et al. (2019) quantify the achieved contrasts in terms of standard deviation (8 σ at < 0 . 3 and 6 σ further out) based on the matched-filter algorithm described by Ruffio et al. (2017) and employ the BT-Settl models to convert these to planet detection thresholds. The detection probability Nielsen et al. (2019) infer for GPIES with the aferomentioned thresholds and models (see their Fig. 4) is ≥50% for companions with ≥ 8 M Jup in the SMA range 10 -100 au and reaches down to 3 M Jup at 3 -200 au at the 10% level. This compares well to our NACo-ISPY detection space albeit reaching somewhat lower SMA owing to the smaller mean distance of their targets.
Other very large ongoing exoplanet imaging surveys that have not yet published summary papers and for which we therefore cannot compare overlap and detection limits with our NACO-ISPY survey include SPHERE-SHINE (Chauvin et al. 2017a) and and Project 1640 (Hinkley et al. 2011). Our sensitivity map shown in Fig. 10 agrees very well (50% contour) with the mean sensitivity map for FGK stars derived by Bowler (2016) from the meta-analysis of 384 stars with published high-contrast imaging observations.

Summary and outlook
We present an overview of the NACO-ISPY DI survey for planets around young stars, its scientific goals, observation strategy, targets, and data-reduction scheme. Below we summarise the performance and preliminary results from the first 2.5 years of observations.
-With NACO-ISPY, we target ≈ 200 young (median age 30 Myr) and nearby (median distance 60 pc) stars that are either surrounded by a gas-rich PPD with indications for inner holes or gaps (∼50 stars), or by somewhat older wellcharacterised DEBs (∼150 stars). -During the first 2.5 years of the survey, from December 2015 through February 2018 (ESO periods 96 through 100), we observed 112 target stars (34 PPDs and 78 DEBs). -All observations are carried out with the NACO L27 camera at the VLT, with L filter and in pupil-tracking ADI mode. For brighter stars (L 6.5 mag), the AGPM vector vortex coronagraph is used; fainter stars are observed in saturated mode. The thermal sky background is derived by switching to an offset sky position every 8 min when the AGPM is used. Otherwise, a dither pattern is used. The ADI observation is enclosed by unsaturated flux measurements of the target star itself. -We typically spend 2-4 hr on one source around meridian passage and achieve field rotations of typically 90 • ± 20 • . With this approach, we reach a mean planet-star 5 σ contrast of ∆L ∼ 7 mag at 0 . 2 and a background detection limit of ∼16.5 mag at >1.5 . -All data are homogeneously reduced using the GRAPHIC pipeline (Hagelberg et al. 2016a), which was optimised for NACO-ISPY data. After initial cleaning and corrections for defective detector columns and pixels, a PCA-based skysubtraction scheme with five modes is used to remove the background from each image. For the coronagraphic data sets, we employ a novel centring procedure to determine a posteriori the (drifting) position of the AGPM on the detector and the position of the star behind the AGPM. The star position for the non-coronagraphic data is calculated by performing a Gaussian fit to the PSF in each individual exposure. Automatic criteria are used to iteratively remove bad frames and outliers. The data are then centred and medianbinned (∼100 frames or 35 sec), and a PCA-based algorithm is used to subtract the star. Frames are then derotated using Fourier transforms and median combined to produce the final image. -One-dimensional noise curves are estimated from the standard deviation of the pixel values in 1 FWHM-wide annuli and converted to 5 σ contrast curves using the peak stellar flux from the unsaturated flux images. The contrast curves are corrected for small sample statistics, for the transmission of the AGPM coronagraph measured on-sky, and for the throughput of the PCA algorithm. -We evaluate the planet mass detection space of our survey by combining the achieved 5 σ ∆L contrast curves with actual stellar parameters, BT-Settl evolutionary models, and MC survey simulations. The mean detection probability of our survey is >50% for companions with M 8 M Jup in the SMA range 80 -200 au and M > 13 M Jup at 30 -250 au and compares well to the detection space of other state-of-the-art high-contrast imaging surveys (Bowler 2016). -While we have not yet been able to independently discover and confirm a new planet in our target sample, our observations have already contributed to the characterisation of two new planets originally discovered by SPHERE: HIP 65426 B (Chauvin et al. 2017b;Cheetham et al. 2019) and PDS 70 B Müller et al. 2018). We discovered two new close-in low-mass stellar companions around the young PPD star R CrA (Cugno et al. 2019b) and within the DEB around HD 193571 (Musso Barcucci et al. 2019). Around at least two other stars (HD 72660 and HD 92536), we find and astrometrically confirm previously unmentioned close (< 1 ) low-mass co-moving stellar companions. -We detected scattered light at L -band around at least nine PPD stars and two DEB stars. Six of these discs had never before been imaged at L -band. -As data-reduction improvements, candidate identification, and follow-up characterisation are still ongoing, further discoveries may be revealed even from the data already presented in this paper. -The NACO-ISPY GTO survey observations with a total budget of 120 nights will be completed in late 2019 with a total of approximately 200 stars imaged at L -band. A statistical analysis of the entire survey and the synthesis with complementary observations (see below) will be presented in an upcoming paper.
To complement our GP discovery space towards smaller separations (<1-2 au), we are currently carrying out a complementary RV survey for planets around the DEB stars from the NACO-ISPY list (RVSPY, Zakhozhay et al., in prep.). On a timescale of 2-3 years, the synergy of these two surveys will provide us with the most complete census of GPs around young DEB stars to date, although there will still be a mass sensitivity gap for lower-mass GPs in the DI-probed region ( 10 au) and a general sensitivity gap in the 2-10 au region. This is where astrometry will come into play. The anticipated final full release of the Gaia data (individual measurements) will provide us in the near future (no specific release time has yet been given) with the necessary data to reveal the still incompletely known GP population in the 3 -5 au separation range (e.g. Casertano et al. 2008). The data of both stars are combined after normalising the stellar fluxes (see text for explanation). The error bars of the flux measurements are smaller than the size of the symbols. The exponential fit on simulated data by Mawet et al. (2005) is shown as a dashed grey line for comparison.
To measure the on-sky off-axis AGPM transmission, two bright stars were observed for comparison. HD 146624 was observed during the night of May 19, 2017, and HIP98421 was observed during the night of August 31, 2017. The instrument setup and observation are similar to standard AGPM observations. The star was placed manually in vertical and horizontal direction around the AGPM with the goal of a one-pixel sampling. Sampling at different position angles is not necessary because of the perfect circular symmetry of the AGPM. For HD 146624 and HIP 98421, 45 and 30 data cubes were recorded, respectively, each containing 400 frames. Data cubes recorded with the star centred behind the AGPM or of bad data quality were rejected in the analysis, which leaves 40 useful cubes for HD 146624 and 29 for HIP 98421. A sky observation was taken after every five cubes.
The sky frames were cosmetically reduced including dark subtraction, flat field, and bad pixel correction. This was necessary for determining the position of the AGPM centre on the detector. In the sky frames, the AGPM centre appears similar to a (faint) stellar source due to thermal emission. The AGPM centre was fitted by a Moffat function for each frame in each cube. The final position of the AGPM centre per cube was retrieved by computing the median and the standard deviation of the individual measurements.
The data cubes with the object were cosmetically reduced including sky subtraction, flat field, and bad pixel correction. The star in each frame and cube was fitted by a Moffat function to get its position and ultimately its radial distance from the AGPM centre. To measure the flux of a star, aperture photometry was performed using an aperture radius of 4×FWHM 10 for the source, and an annulus of 6-10×FWHM for the background. The final flux value for each cube is the weighted average of the individual flux measurements.
The off-axis transmission of the AGPM, T (r), has a theoretical profile of the form 1 − exp(−2 · r 2 ), where r is the radial distance of the AGPM centre in resolution elements λ/D (Mawet et al. 2005). For measuring the empirical transmission curve, T * (r), we therefore use the form T * (r) = 1 − exp(−a · r b ) . (B.1) As both the position and flux have uncertainties, an MC approach was used to determine the fit parameters a and b. For both stars, the measured flux as a function of radial distance from the AGPM centre was fitted using eq. B.1 with the additional fluxscaling parameter c, but without weighting of the data points or taking into account uncertainties. Instead, 10 5 new data sets were generated by randomly drawing new measurements from each data point taking its uncertainty into account. The final fit value for each variable was obtained by computing the bootstrap mean of the 10 5 measurements by generating 10 5 bootstrap samples. The uncertainty in the fit parameters is the standard deviation of all measurements. The explained fitting procedure on the individual stars was used only to determine the fit parameter c, which denotes the stellar flux. This is required before combining both data sets because the stars have different fluxes and c normalises the fluxes of both stars to one. The fitting procedure was then applied to the combined normalised data set (Fig. B.1). The final values and uncertainties of the fit parameters are: a = 0.586 ± 0.022, b = 1.330 ± 0.059. Measurements of the plate scale and the true north orientation on a regular basis are essential to warrant accurate and precise astrometric values of identified candidate companions. The main stellar field used for astrometric calibration is 47 Tuc as it usually provides more than 100 stars under good atmospheric conditions in the field of view of NaCo. An alternative field is the Orion Trapezium Cluster, but with significantly fewer sources visible in L'.
Astrometric fields are observed since the start of the survey in December 2015 with and without (satPSF) the AGPM. Table C.1 lists typical values used for the instrument setup. The actual values can vary because of the brightness of the sky background. In addition, we refined the observing template with time. The goal during each observing run is to observe an astrometric field with and without the AGPM in both field and pupil tracking mode, respectively.
The image reduction steps for the AGPM and satPSF mode are similar and include: bad pixel and flat field correction, and sky subtraction. Since we are recording up to six sky cubes, the frames and cubes are median-combined to a single image to remove any stellar sources present. Each object cube (the actual astrometric field) is treated individually. The final analysis is based on the average image for each individual cube. If the data were recorded in pupil tracking mode, the frames are derotated before averaging them.
After the cosmetic reduction, stellar sources in the image are identified with a version of DAOPHOT implemented in IDL. Marginal Gaussians are used to find centroids. To refine the result and reject bad detections, each initially found source is fitted with a Gaussian function. The fitted FWHM values of both directions and the fitted position are used to reject non-stellar sources. For the remaining sources, the x and y pixel coordinates are converted into RA and DEC using the WCS information written in the fits header and the separation and position angles between all stars are computed. To compare these values with a reference, we use the catalogue of McLaughlin et al. (2006) for 47 Tuc and the catalogue of Close et al. (2012) for the Trapezium. The catalogue entries (RA, DEC) are corrected for proper motion if present and converted to x,y-coordinates using the WCS information written in the fits header and the separation and position angles between all catalogue stars are computed. This allows us to identify the detected stars in the image with their corresponding catalogue entry. The derived value for the plate scale between one pair of stars is then the distance from the catalogue pair in milliarcseconds divided by the measured distance of the same pair detected in the image. The true north value is computed from the difference between measured position angle of a stellar pair and its respective catalogue entry. For 100 stars detected in the image, there are therefore over 4400 individual measurements available. The final values for plate scale and true north including their uncertainties are listed in Table C.2 and were produced by computing the two-sigma-clipped resistant mean.