High-redshift quasar selection from the CFHQSIR survey

Being observed only one billion years after the Big Bang, z ~ 7 quasars are a unique opportunity for exploring the early Universe. However, only two z ~ 7 quasars have been discovered in near-infrared surveys: the quasars ULAS J1120+0641 and ULAS J1342+0928 at z = 7.09 and z = 7.54, respectively. The Canada-France High-z Quasar Survey in the Near Infrared (CFHQSIR) has been carried out to search for z ~ 7 quasars using near-infrared and optical imaging from the Canada-France Hawaii Telescope (CFHT). Our data consist of $\rm{\sim 130\,deg^{2}}$ of Wide-field Infrared Camera (WIRCam) Y-band images up to a 5{\sigma} limit of $\rm{Y_{AB}}$ ~ 22.4 distributed over the Canada-France-Hawaii Telescope Legacy Survey (CFHTLS) Wide fields. After follow-up observations in J band, a first photometric selection based on simple colour criteria led us to identify 36 sources with measured high-redshift quasar colours. However, we expect to detect only ~ 2 quasars in the redshift range 6.8<z<7.5 down to a rest-frame absolute magnitude of $\rm{M_{1450}}$ = -24.6. With the motivation of ranking our high-redshift quasar candidates in the best possible way, we developed an advanced classification method based on Bayesian formalism in which we model the high-redshift quasars and low-mass star populations. The model includes the colour diversity of the two populations and the variation in space density of the low-mass stars with Galactic latitude, and it is combined with our observational data. For each candidate, we compute the probability of being a high-redshift quasar rather than a low-mass star. This results in a refined list of the most promising candidates. Our Bayesian selection procedure has proven to be a powerful technique for identifying the best candidates of any photometrically selected sample of objects, and it is easily extendable to other surveys.


Introduction
Quasars reside at the centres of active galactic nuclei (AGNs) and are believed to be powered by mass accretion onto a supermassive black hole (SMBH). Thanks to their strong intrinsic luminosity, quasars are bright enough (L ∼ 10 14 L for the most luminous quasar ever discovered at z = 6.30, Wu et al. 2015) to be detected at high redshifts, where they can be used to explore the early Universe. Along with other cosmological probes, high-redshift quasars (hereafter, "high-redshift" refers to redshifts z 5.6) have proved to be powerful tools for studying not only the epoch of cosmic reionisation (e.g. Fan et al. 2006;Jiang et al. 2008;Becker et al. 2015a), but also for investigating the formation and evolution of primordial SMBHs (e.g. Willott et al. 2010a).
The relations between luminosity, black hole mass, and broad emission line velocity have been demonstrated at low redshifts by reverberation mapping studies (e.g. Kaspi et al. 2000;Vestergaard 2002;Bentz et al. 2009). Spectroscopy of highredshift quasars (high-z quasars) can therefore be exploited in order to measure SMBHs masses, estimate their mass function, and place solid constraints on SMBH formation models, under the assumption that these local relations are still valid at high redshifts. Using this technique, it has been possible to confirm black holes with masses exceeding M BH 10 9 M at redshifts z 6 ( Mortlock et al. 2011;Wu et al. 2015;Bañados et al. 2018). How such massive objects could have formed so quickly in less than 1 Gyr after the Big Bang is still a fundamental question arising from these discoveries (see reviews by Volonteri 2010;Haiman 2013;Smith et al. 2017). Several scenarios for the formation of SMBHs seeds, including, for instance, remnants of Population III stars (e.g. Madau & Rees 2001;Volonteri et al. 2003) or a direct collapse of gas in atomic cooling halos (DCBH, e.g. Visbal et al. 2014;Smidt et al. 2017) have been proposed, but they are widely debated, partly because known bright high-z quasars are likely to be the tip of an iceberg that is mainly composed of fainter quasars.
The cosmic reionisation that ended the so-called dark ages during which the first sources of the Universe ionised the hydrogen in the intergalactic medium (IGM) is a major event in the Universe's history, and many questions related to the onset, duration, topology, and the sources responsible for this process remain unsolved. High-redshift quasars can be used as background sources whose UV radiation is absorbed at the resonant Lyman lines by intervening neutral hydrogen along the line of sight. Their spectra are therefore reliable tools for absorption studies since they show spectral signatures of the IGM state. The hydrogen neutral fraction x HI can indeed be determined by making a range of measurement on high-z quasar spectra, such as the Gunn-Peterson test (e.g. Gunn & Peterson 1965;Fan et al. 2006;Becker et al. 2015b), near-zone measurement (e.g. Venemans et al. 2015;Mazzucchelli et al. 2017), dark gaps and dark pixels statistics (e.g. McGreer et al. 2015), and Lyman-α damping wing reconstruction (e.g. Greig et al. 2017). Measurements of the quasar luminosity function (QLF) at z ∼ 6 have shown that the quasar ionising flux was not sufficient to keep the Universe ionised, with a photon rate density lower by between 20 and 100 times than required (Willott et al. 2010b). Even though the faint end of the QLF remains poorly constrained, it is now generally agreed that AGNs do not contribute significantly to the required ionising photon budget at z ∼ 6 (Onoue et al. 2017).
If there are compelling reasons to search for high-z quasars, the quest is no less challenging because of their rarity: Willott et al. (2010b) predicted a number of quasars of the order of ∼ 0.1 deg −2 brighter than H AB 24 in the redshift range 6.5 < z < 7.5. In the past few years, substantial progress has been made in this field, where more than 100 high-z quasars have been identified (Bañados et al. 2016). Wide area surveys greatly contributed to this success, with surveys first carried out in optical bands, such as the Sloan Digital Sky Survey (York et al. 2000, SDSS ) and the Canada-France High-z Quasar Survey (CFHQS; Willott et al. 2005Willott et al. , 2007Willott et al. , 2009Willott et al. , 2010b, providing more than 50 quasars up to redshifts z 6.4. At higher redshifts, the Lyman-α emission line is shifted into the near-infrared (NIR) and becomes undetectable at observed wavelengths λ obs 0.9 µm because of IGM absorption; this makes the use of near-IR bands necessary. Ongoing near-IR surveys such as the United Kingdom Infrared Telescope (UKIRT) Infrared Deep Sky Survey (UKIDSS; Lawrence et al. 2007), the Visible and Infrared Survey Telescope for Astronomy (VISTA) Kilo-degree Infrared Galaxy(VIKING; Emerson et al. 2004 (Miyazaki et al. 2012;Matsuoka et al. 2016) have started to increase the number of known z 6.5 quasars by employing filters centred on 1 µm. The use of such data allowed Mortlock et al. (2011) to discover the previous redshift record holder, ULAS J1120+0641 at z = 7.09, before it was superceded by ULAS J1342+0928 at z = 7.54, for which Bañados et al. (2018) mined data from the UKIDSS Large Area Survey (Lawrence et al. 2007), the Wide-field Infrared Survey Explorer (ALL-WISE, Wright et al. 2010), and the DECam Legacy Survey (DE-CaLS 1 ). However, despite all these efforts, these two quasars are the only ones found at redshift z > 7 to date. The Canada-France High-z Quasar Survey in the Near Infrared (CFHQSIR) has been designed to find more z ∼ 7 quasars that can be used to constrain the reionisation epoch as well as the initial growth of SMBHs. The inevitable contamination of any photometric sample by Galactic low-mass stars (main-sequence stars and brown dwarfs) increases the difficulty of finding such rare objects. Colourcolour cuts are commonly used for high-z quasars searches to separate the two populations of sources (e.g. Willott et al. 2005;Venemans et al. 2013). However, the photometric noise implies that the resulting candidate list is likely to be still dominated 1 http://legacysurvey.org/decamls/ by low-mass stars, which are much more numerous than high-z quasars. Here we adopt a powerful classification technique that is based on Bayesian inference. It was first developed by Mortlock et al. (2012) but was also applied by Matsuoka et al. (2016), who discovered 39 quasars in the redshift range 5.7 ≤ z ≤ 6.9 (Matsuoka et al. 2016(Matsuoka et al. , 2018. The technique allows assigning to each candidate a probability of being a high-z quasar rather than a low-mass star by combining all the information (observations and prior knowledge) available for each source in an optimal way. Although this approach has not been frequently used, it has proved to be a powerful method since Mortlock et al. (2011) successfully found the second most distant quasar at redshift z = 7.09. This paper describes an improved method for searching for z ∼ 7 quasars, and we report our initial results. In the next section we present the CFHQSIR data and our colour selection criteria. In Sect. 3 we describe our photometric candidate selection and detail our Bayesian method of classifying candidates. Our spectroscopic observations and initial results are presented in Sect. 4. Section 5 summarises these results and presents our conclusions. All magnitudes in the optical and near-IR bands in this paper are in the AB system. Cosmological parameters of H 0 = 68 km s −1 Mpc −1 , Ω M = 0.31 and Ω Λ = 0.69 are assumed throughout (Planck Collaboration et al. 2016).

Searching for high-z quasars in the CFHQSIR survey
At high redshifts, neutral hydrogen in the IGM induces a complete absorption of the quasar flux blueward of the Ly-α line at λ Lyα = 1216 Å. A z ∼ 7 quasar has a clear photometric signature: it is relatively bright in the NIR above 1 µm and is totally extinguished in all optical bands. We therefore searched for z ∼ 7 quasars through Y-band imaging of the CFHTLS Wide fields for which deep optical data exist. We briefly describe our CFHQSIR Y-band data in Sect. 2.1. We discuss our colour selection criteria in Sect. 2.2 .

CFHQSIR survey
The CFHQSIR is a CFHT Large Program carried out at the 3.6m CFH telescope with the Wide field IR camera (WIRCam) from 2011 to 2013. It aims to extend the highly successful 5.8 < z < 6.5 CFHQS survey (Willott et al. 2005(Willott et al. , 2007(Willott et al. , 2009(Willott et al. , 2010b to higher redshifts. The CFHQSIR covers ∼ 130 deg 2 over the four CFHTLS Wide fields up to a 5 σ limiting magnitude of Y lim ≈ 22.4 for point sources and is characterised by an average image quality of 0.7 . The data were obtained in two epoch observations separated by at least 20 days in order to discard transient sources and slow-moving objects. We refer to Pipien et al. (2018) for further details about the CFHQSIR data. Assuming the quasar luminosity function derived by Willott et al. (2010b) at z = 6 and considering a pure density evolution with redshift, we estimated the expected number of high-z quasars in the CFHQSIR survey. About 2.3 quasars are expected in the redshift range 6.78 < z < 7.48 down to M 1450 = −24.6. A completeness correction in magnitude and redshift, as computed in Pipien et al. (2018) (Fig. 12) and in Sect. 2.2.2 ( Fig. 2), respectively, was applied to derive the predicted number. This number decreases from 2.3 to 1.4 when considering the quasar luminosity function and evolution from Jiang et al. (2016), for which a steeper redshift evolution of the quasar density between z ∼ 5 and z ∼ 6 was demonstrated. Given this small number, it is therefore necessary to adopt a reliable selection procedure to efficiently identify and classify high-z quasar candidates. This is further described in Sects. 2.2 and 3.

Simulating the colours of quasars and low-mass stars
Our selection of z ∼ 7 quasar candidates is based on a combination of our CFHQSIR NIR (Y band) and optical data from the 7th and final release (T0007) of the CFHTLS (u-, g-, r-, i-, z-bands) produced by Terapix. At redshifts z 6.0, the strong break across the Ly-α line known as the Gunn-Peterson effect makes colour selection the most efficient way for selecting highz quasars over large areas. At redshifts z 6.5, quasars can therefore be selected as z-band dropouts as their spectrum is totally extinguished in the u, g, r, and i bands. As demonstrated by Venemans et al. (2013) for the VIKING survey and by Mortlock et al. (2011) for UKIDSS, the most efficient way of distinguishing high-z quasar and low-mass stars, their main source of contamination, is to use J-band photometry. In order to optimise our colour selection criteria, we performed simulations of the expected optical and near-IR colours of quasars and lowmass stars as measured with the CFHT MegaCam (z filter) and WIRCam cameras (Y and J filters). The simulations are detailed in the following subsections.  Table 2) of the distribution of their weights to simulate quasar mock spectra. We decided to consider only the first four components since 95 % of the variance affecting the 78 quasar spectra are represented by these components (I. Pâris, private communication). We assumed that the spectral index, equivalent width, and intensity of the quasar emission lines at redshifts z ∼ 3 are representative of the z ∼ 7 quasar population. The intrinsic quasar luminosity plays an important role in validating this hypothesis since this quantity is anti-correlated with the line equivalent widths (Baldwin effect, Baldwin 1977). As observed with the two most distant quasars ever discovered at redshift 7.1 and 7.5 (Mortlock et al. 2011;Bañados et al. 2018), quasar spectra at low-and high-z match remarkably well: the intrinsic luminosities of high-z quasars coincide with those observed at lower redshifts and thus justify the use of z ∼ 3 quasar spectra. We generated 1 000 rest-frame quasar spectra, redshifted them in the redshift range 6.2 ≤ z ≤ 7.6, and introduced IGM absorption assuming the transmission function of Meiksin (2006), where the transmitted flux at z 6.5 was set to zero blueward of the Lyα line due to the Gunn-Peterson trough. We then convolved the 1 000 simulated quasar spectra with the z (MegaCam), Y, and J (WIRCam) CFHT filters including the atmospheric transmission. Fig. 1 shows the (z − Y) and (Y − J) colours obtained for all simulated quasar spectra. As expected, the quasar track shows a strong evolution of the (z − Y) colour with redshift, reflecting the movement of the Ly-α break through the z filter. At redshifts z 7.0, the Ly-α emission line is in the z band, and it enters the Y band at z ∼ 7. At higher redshifts, quasars become rapidly fainter in the z band and therefore redder in (z − Y). At redshifts z 7.4, the Ly-α break is redward of the z filter so that these quasars are entirely extinguished in this band. in the CFHT filters for simulated quasars and observed low-mass stars from the SpeX Prism library (triangles in yellow, brown, and red show M, L, and T dwarfs, respectively). The blue curve represents the colours of the mean quasar derived by Pâris et al. (2011) for which IGM absorption is included. The grey curves show the tracks in colourcolour space of the 1 000 quasars redshifted from z = 6.2 to z = 7.7. The blue dashed lines correspond to the colour criteria chosen to select quasars of redshift z ∼ 7.

Low-mass stars
We determined the expected colours of main-sequence stars and brown dwarfs (M, L, and T spectral types) based on a sample of observed spectra from the SpeX Prism library 2 . Only spectra covering the z (MegaCam), Y, and J (WIRCam) filters were used. More than 1 300 spectra of sources with spectral types from M3 to T9 were exploited. The resulting colours computed in these filters are shown in Fig. 1. M and L-dwarfs represent a minor source of contamination compared to the much cooler T-dwarf population, which shows very similar (z − Y) colours to z ∼ 7 quasars. However, the spectral distribution of T-dwarf peaks in the J band, leading to a significantly redder (Y − J) colour than quasars and thus allows a clear separation between the two populations of objects. The box bounded by the blue dashed lines in Fig. 1 indicates our colour-colour high-z quasar selection criteria: (1)

Colour-selection completeness
To study the quasar selection efficiency as a function of redshift more precisely, we followed the approach developed by Willott et al. (2005) by calculating the fraction of artificial quasars that satisfies the criteria of Eq. 1 as a function of redshift. In order to take into account photometric errors affecting the measurement of the (z − Y) and (Y − J) colours, each simulated quasar colour was described by a Gaussian probability density distribution with a standard deviation noted σ c . We note, however, that To take into account the correlated noise, each photometric error was multiplied by a corrective excess noise factor derived for each image (see Pipien et al. 2018, for further details). Figure 2 allows us to define the redshift range probed by our survey for a given completeness. Adopting σ c = 0.5 as representative of the errors in colours for our sources, the CFHQSIR redshift range corresponding to a completeness greater than 50% is 6.78 < z < 7.48.

Initial candidate selection
Using the SExtractor software (Bertin & Arnouts 1996), we produced catalogues of objects detected in the Y-band images. We first ran SExtractor in dual-image mode for source detection in Y-band images and photometry in all bands (u, g, r, i, z, and Y). We set the detection threshold to 1.5 σ above the background and required at least four connected pixels for a measurement. This double analysis of our near-IR and optical data reveals a total of ∼ 6 million sources detected in Y band. We then applied some automatic cuts including geometric and photometric criteria in order to keep the reddest point-like sources and remove the faintest objects in Y band. We set the following criteria: no detections in the u, g, r, and i filters, (z − Y) > 1.0, Y ≤ 23.0 (M 1450 −24.0), and a signal-to-noise ratio (S/N) Y 4. All magnitudes were measured using the SExtractor MAG_AUTO parameter. In total, ∼ 54 000 objects satisfied these criteria and were selected. They were then inspected by eye in order to discard artefacts. We note that the main source of non-physical contamination comes from persistences. These effects occur in IR detectors when a saturated star observed in a previous exposure leaves a remnant image in the following exposures. These spurious sources appear in two different forms in our Y-band images. When the persistence is generated during an exposure with no connection with our observations, a single spot appears in our Yband image. Since our observations were split into two epochs separated by at least 20 days, these spurious sources were easily spotted by inspecting our individual Y-band images before stacking (for more details, see Pipien et al. 2018). The second form of persistence contaminating our data arises directly during our observations between two dithered exposures. This type of persistence is also easily identified since Y-band images show bright spots aligned according to the dither pattern. After eliminating these artefacts from our candidate list, we repeated the photometric measurements using SExtractor in single-image mode, which optimises the detection threshold and recentres the apertures. This process allowed us to remove sources with proper motion. We then again applied the same colour criteria as previously ((z − Y) > 1.0 and no detection in the u, g, r, and i filters) with these new measurements and selected 228 sources in this way. As discussed in Sect. 2, J-band photometry is necessary to distinguish high-z quasar candidates from low-mass stars. Jband follow-up observations were carried out at CFHT using the WIRCam camera, but also the SOFI (son of Isaac) and LIRIS (Long-slit Intermediate Resolution Infrared Spectrograph) cameras mounted on the 3.6m New Technology Telescope (ESO NTT) and the 4.2m William Herschel Telescope (WHT), respectively. The detectors used by SOFI and LIRIS are HgCdTe 1024 × 1024 Hawaii arrays with pixel scales of 0 .288. and 0 .251, respectively. LIRIS data were reduced using the Image Reduction and Analysis Facility (IRAF 3 ), and SOFI images were reduced with the SOFI pipeline 4 . The photometric calibration for both datasets was performed following a method similar to that presented in Pipien et al. (2018). We astrometrically calibrated the LIRIS images with the SCAMP 5 software and used the online service Astrometry.net for SOFI images. We combined our follow-up observations with existing data when available, coming from the Canada France Brown Dwarfs Survey (CFBDSIR, J band; Delorme et al. 2008), the Visible and Infrared Survey Telescope for Astronomy (VISTA-VHS, J-, H-, and K s -bands; McMahon et al. 2013), and the Vimos Public Extragalactic Redshift Survey (VIPERS-Multi-Lambda Survey, K s band; Moutard et al. 2016). This allowed us to apply the colour cuts defined in Eq. 1 and to confirm the reality of these objects. We identified a total of 36 high-z quasar candidates that are represented in the (z − Y) versus (Y − J) diagram of Fig. 3. Magnitudes were measured in 2.5 apertures, and aperture corrections were applied to give the total magnitudes. Non-detected objects in the z and/or J band (arrow points in Fig. 3) show colours outside the dashed selection region. The colour cuts defined in Eq. 1 were not applied to these sources, as they only have a 5 σ magnitude lower limit in these bands. Two sources in the W4 field showed blueer (Y − J) colours ((Y − J) < −0.5) than the other candidates. Although they were not detected in the z or J filters, they are not spurious objects and remain candidates because they were both detected in the VIPERS K s band. As shown in Fig. 1, in the absence of photometric noise, high-z quasars and low-mass stars are expected to occupy a well-defined region in colour-colour space so that it is, in principle, possible to distinguish the two populations of objects by applying simple colour cuts. However, because high-z quasars are outnumbered by brown dwarfs and large photometric errors affect the faintest sources, this implies that most of our candidates are likely brown dwarfs scattered in the selection box defined in Eq. 1. In the next section, we present an alternative approach to the application of hard cuts based on Bayesian inference, which allowed us to classify our candidates in the best possible way according to their probability of being a high-z quasar rather than a brown dwarf.

Bayesian classification method
The Bayesian classification method allows introducing priors (e.g. the relative number of high-z quasars compared to the number of brown dwarfs), as opposed to the posterior distribution, which also includes observational data. The strength of this method is that a ranking of all candidates can be optimally established in a coherent statistical framework by combining all the information available for each object. In this section, we adapt and extend the Bayesian approach developed by Mortlock et al. (2012) to the CFHQSIR survey. We assumed that the most important contamination comes from the low-mass stars and that artefacts and any other sources of contamination have been removed from our candidates list. We therefore considered only two population types: the high-z quasars, and the low-mass stars, for which we developed parametric models (see Sects. 3.2.1 and 3.2.1). Given a photometric dataset, we need to define the probability that a source detected in the CFHQSIR survey is a high-z quasar rather than a low-mass star. The answer is given by the conditional posterior probability, which can be estimated with the Bayes theorem by comparing parametric models for high-z quasars and low-mass stars. For a detected source with photometric measurements − → d , this probability is defined by where the weighted evidence terms are given by Here the subscripts q and s denote the high-z quasars and lowmass stars, respectively. The vector − → p corresponds to the parameters used to model the quasar and stars populations. The quantity ρ q/s ( − → p ) represents the surface density of quasars or lowmass stars as a function of their intrinsic properties − → p . The functions Pr(det | − → p , q/s) and Pr( − → d | − → p , q/s) are the probability that the source (quasar or star) is detected (det) and the probability that the source is observed with the photometric measurements − → d , each as a function of − → p .
As pointed out in Mortlock et al. (2012), because the probability of a source to be a high-z quasar cannot be determined without at least some observational information (e.g. colour measurements), the assumption is justified that the source has to be detected in the survey (in at least the Y band in our case). This constraint does not, however, exclude the possibility that some candidates may be not detected in one or more photometric bands. This is especially important for high-z quasars, which can have negligible flux blueward of their Ly-α emission lines. Furthermore, a non-detection can lead to negative flux; this effect di-rectly results from the photometric noise and is described in detail in Mortlock et al. (2012). Given that such measurements cannot be converted into traditional logarithmic magnitudes (Pogson 1856), we decided to follow the approach developed by Mortlock et al. (2012) by working in terms of flux units. In the following, for each population, stars or quasars, we therefore considered the observational dataset given by − → d = F z obs , F Y obs , F J obs , where F z obs , F Y obs , and F J obs are the observed flux of the candidates in the z, Y, and J bands, respectively. In the case of nondetections (in z or J), a flux is measured at the exact position where the object has been detected in Y band, using the SExtractor software in double-image mode.

Quasar population
Our prior knowledge is given by the surface density of quasars ρ q ( − → p ), which can be calculated with the quasar luminosity function of Willott et al. (2010b), noted Φ(M 1450 , z). This function suggests the use of two parameters for the quasar population modelling: the rest-frame absolute magnitude M 1450 , and the redshift z. However, we chose to work with the Y-band apparent magnitude instead of the absolute magnitude for practical reasons. The parameter set that describes the high-z quasar population is thus given by − → p = {Y mod , z}, where Y mod represents the modelled Y-band magnitude of the quasar. In the following, the subscript mod refers to fluxes and magnitudes estimated for the high-z quasar and low-mass star models. The surface density in mag −1 sr −1 redshift −1 can be written as a function of these two parameters: where µ is the distance modulus defined as a function of the luminosity distance D L by µ = 5log 10 D L /10 pc , K corr (z) is the K-correction that converts the absolute magnitude at rest-frame 1450 Å into the observed Y-band magnitude, and 1/4π × dV c /dz is the comoving volume element per steradian and per redshift interval dz.
For a given high-z quasar spectrum q i , the weighted evidence in mag −3 deg −2 is Since the Y band is the detection band of our survey, we modelled the probability Pr(det|Y mod , z, q i ) with the completeness rate derived in Pipien et al. (2018), which corresponds to a function varying between 0 and 1 and is characterised by a magnitude limit at 80 % completeness of the order of Y lim ∼ 22.5. In the case of faint sources observed photometrically in the NIR, the Poissonian photon noise can reasonably be neglected. As most of our candidates are relatively faint (Y ∼ 21.0 − 23.0), their photometric errors are dominated by background noise uncertainties, which are typically described by a normal distribution in flux. Ignoring the inter-band correlations, the probability Pr(z obs , Y obs , J obs |Y mod , z, q i ) is then given by a product of three The arrow points correspond to a 5 σ magnitude lower limit in the case of a non-detection in the z and/or J band.
Gaussians, each associated with one filter (z, Y or J), which, after conversion into magnitude units, is defined as where the index b refers to the three bands considered here, b = (z, Y, J), and N b is the number of bands used. σ b represents the photometric errors following a Gaussian distribution in flux units. The Jacobian used to convert the probability from flux units into magnitude units is where the model flux F b mod in the b band is given as a function of the zero-point flux noted F 0 , We refer to Appendix A2 of Mortlock et al. (2012) for a complete description of the magnitude -flux conversion for probability densities.
In Eq. 5, the surface density ρ q i (Y mod , z) and the probability Pr(z obs , Y obs , J obs |Y mod , z, q i ) are estimated for one single quasar template noted q i . We modelled the diversity of intrinsic properties of high-z quasars by considering 80 quasar templates, generated according to the method described in Sect. 2.2.1. After assigning an equivalent weight to each quasar template q i , we Article number, page 6 of 14 S. Pipien et al.: High-redshift quasar selection from the CFHQSIR survey write the final weighted evidence as the average of the evidences W q i over the N temp = 80 templates: W qi (z obs , Y obs , J obs , det). (9)

Brown dwarf population
Given the colour criteria applied previously (z − Y ≥ 1.5, Y − J ≤ 0.5 and no detections in the u, g, r, and i filters), we determined the dominant source of contamination. To do this, we used a spectroscopic sample of M, L, and T stars from the SpeX Prism library covering the i, z, Y, and J bands. Most of the main-sequence stars are far bluer than high-z quasars, so that they do not represent a significant source of contamination. Conversely, T dwarfs are undetected in the i band, have z − Y colours mimicking z ∼ 7 quasars, and noise can scatter their Y − J colours inside the selection box of Fig. 1. In M and L dwarfs, sources with i − z i lim − z lim = 0.9 (i lim = 24.8 and z lim = 23.9 are the 80% completeness limits for point sources in the i and z bands respectively, from the CFHTLS-T0007 release 6 .) can be undetected in the i band, and therefore they are likely to be scattered into our candidate list. This limit is represented by a vertical dashed line in Fig. 4, where the observed colours of low-mass stars and of our z-band detected candidates are shown. Here again, noise can scatter z − Y colours into our z − Y ≥ 1.5 criterion, and we therefore chose to adopt a more conservative limit for the contamination by low-mass stars of z − Y > 1.0. Figure 4 shows that this approximately corresponds to an L0 spectral type. For this reason, our brown dwarf model is based on a spectroscopic sample of 633 L dwarfs and 180 T dwarfs from the SpeX Prism library, with spectral types from L0 to T9. Hereafter, we assume that these observed spectra are representative of the colour distribution of the entire L-and T-dwarf populations.
We chose two parameters to model the brown dwarf population, the apparent magnitude Y mod , and the spectral type spt: − → p = Y mod , spt .
We computed the brown dwarf surface density ρ s ( − → p ) using the Galactic spatial density model of late-L and T dwarfs developed by Caballero et al. (2008). With this model, we calculated the spatial density distribution of late-type dwarfs at a given Galactic coordinate as a function of luminosity for each spectral type. A complete description of the spatial variation of the density of late dwarfs is given in Caballero et al. (2008), so we here report only the main calculation steps. We considered the standard Galactic model and followed the same approach as Caballero et al. (2008) by only taking into account the Galactic thin disc, which can be modelled by a double exponential law (e.g. Chen et al. 2001). As discussed in detail in Caballero et al. (2008), brown dwarfs from the Galactic thick disc and halo can be neglected for two reasons. First, the thick disc and the halo are more rarified and extended than the thin disc. Secondly, thick-disc and halo stars are thought to be very old (ages 10 Ga, see e.g. Feltzing et al. 2003): they are therefore extremely faint and hardly detectable, according to the general cooling theory for brown dwarfs (see e.g. Baraffe et al. 2003). The space density for an object of spectral type spt 6 http://www.cfht.hawaii.edu/Science/CFHTLS/T0007/ at Galactic coordinates (l, b) and heliocentric distance d is given by where Z is the height of the Sun relative to the Galactic plane, h R and h Z are the length and height scales of the thin disc, respectively, and n 0,spt is the space density at the Galactic plane (Z = 0) and at the distance to the Galactic centre (R = R ). The values of the main parameters of the Galaxy thin disc are listed in Table 1.
Since CFHQSIR observations correspond to extragalactic fields, the Galactocentric distance of the object R(d, l, b) can be approximated as follows, under the assumption R d: In this approximation, Eq. 10 can be rewritten as where the sign convention indicates if d sin(b) is greater or lower than Z , that is, whether the object is above or below the Galactic plane. At null heliocentric distance, the local spatial density of an object of spectral type spt, noted n , is then Fig. 5 shows n as a function of spectral type. For L0-T7 dwarfs, we used the predicted densities reported in Caballero et al. (2008) that we took from Burgasser (2007). The densities for the T8 and T9 spectral types are extrapolated values.  Fig. 4: Observed colours of low-mass stars (spectra from the SpeX Prism library) of spectral types M3 (blue) to T8 (red). Colours of our high-z quasar candidates detected in the z band are also shown as magenta points. The box bounded by the dashed lines defines the spectral types that are likely to contaminate our candidate list. with Based on Eqs. 12 and 14 of Caballero et al. (2008), we computed the brown dwarf surface density per spectral type ρ s (d, spt) (in sr −1 spt −1 ) as a function of the maximum heliocentric distance d with where d * = −Z / sin(b), and NGP and SGP refer to the north and south Galactic poles, respectively. We finally computed the maximum heliocentric distance d beyond which a source with an absolute magnitude M Y mod can be detected, by solving the following equation: where A Y mod corresponds to the attenuation in Y band. Table  2 presents an example of the obtained values for the distance d, considering a brown dwarf with an apparent magnitude Y mod = 22.5. Using Eq. 17 and 18, we estimated the brown dwarf surface density ρ s (Y mod , spt) as a function of the Y-band apparent magnitude for each spectral type in mag −1 deg −2 spt −1 units.
The weighted evidence for the brown dwarf population in mag −3 deg −2 is given by where ρ s (Y mod , spt) is the surface density previously estimated, and the probabilities Pr(det|Y mod , spt) and Pr(z obs , Y obs , J obs |Y mod , spt) are defined in the same way as for the quasar population. We note with N s,spt the number of brown dwarf spectra available for a spectral type spt. We decided not to reduce one spectral type to a unique pair of colour (z mod − Y mod ; Y mod − J mod ) but to consider instead each brown dwarf spectrum individually. This allowed us to estimate and account for the dispersion within a single spectral type. We first computed the weighted evidence associated with the i-th brown dwarf of spectral type spt, in mag −3 deg −2 spt −1 : The probability Pr(z obs , Y obs , J obs |Y mod,i , spt) was calculated in the same way as for the high-z quasar population: using Eq. 6 and associating the modeled flux with the observed brown dwarf colours. Each brown dwarf i contributes the same to the weighted evidence associated with the spectral type spt, so that we can write w s,spt (z obs , Y obs , J obs , det) = 1 N s,spt N s,spt i=1 w s,spt,i (z obs , Y obs , J obs , det). Notes. d is the maximum heliocentric distance for which a brown dwarf of a given spectral type and with an apparent magnitude Y mod = 22.5 can be detected. The absolute magnitudes M Y mod were computed by combining the absolute magnitude in the 2MASS J filter M J 2MASS derived by Dupuy & Liu (2012) for each spectral type, with the observed colours Y mod − J 2MASS measured on the 813 brown dwarf spectra used in this study. We also reference the local spatial density n .
We based our Bayesian classification method on the formalism proposed by Mortlock et al. (2012) and improved the quasar and low-mass star modelling in several aspects that we describe below. Mortlock et al. (2012) used 12 different quasar templates spanning four line-strengths and three continuum slopes to account for the intrinsic diversity of the quasar population, while we based our high-z quasar model on 80 simulated quasar spectra, each with different spectral indices, emission line widths, and intensities. Furthermore, we developed an enhanced brown dwarf model that better reflects their properties. We included the variation in space density with Galactic coordinates, while Mortlock et al. (2012) adopted a model corresponding to an average of the brown dwarf population over the range of Galactic latitudes covered by the UKIDSS LAS. Finally, we included the intrinsic colour spread in Y − J for L and T dwarfs, whereas Mortlock et al. (2012) attributed only one Y − J colour to each spectral type. Our Bayesian probabilistic algorithm is similar to the model developed by Matsuoka et al. (2016) for the SHELLQs project, but has two notable differences. First, Matsuoka et al. (2016) needed to include M dwarfs in their model in addition to L and T dwarfs since they probed a lower redshift range (5.7 < z < 6.9). Secondly, they used a unique quasar spectral energy distribution (SED) from the stacking of 340 bright SDSS quasar spectra at z ∼ 3, while our model uses 80 different spectra created from a principal component analysis of 78 SDSS spectra at z ∼ 3 (see Sects. 2.2.1 and 3.2.1).

Probability of being a high-z quasar: simulations
In this section, we explore the dependence of the probability P q on photometric measurements. For this purpose, we simulated a set of sources uniformly distributed in the z obs − Y obs versus Y obs − J obs diagram with CFHQSIR-like z-, Y-, and J-band measurements. Figure 6 shows the probability P q obtained for sources observed with an apparent magnitude Y obs = 21.5 ± 0.14 (left panel) and Y obs = 22.0 ± 0.22 (right panel). As expected, P q increases when the object approaches the characteristic high-z quasar colours. The comparison between the two panels clearly shows that P q strongly depends on the observed magnitude Y obs , and its high-probability region (in black) is much more prononced for the brightest objects (left panel). This effect is mostly related to the photometric errors: bright objects are measured with a sufficiently high S/N for there to be no ambiguity about their nature. However, faint objects are more likely to be stars measured with high-z quasar colours because of a lower S/N and greater number of stars compared to high-z quasars. This dependency is also demonstrated in Fig. 7, where the probability P q for sources with quasar colours simulated from the mean quasar spectrum discussed in Sect. 2.2.1 at four different Fig. 6: Probability P q of being a high-redshift quasar for simulated sources with an observed magnitude Y obs = 21.5 ± 0.14 (left panel) and Y obs = 22.0 ± 0.22 (right panel). This probability reaches from white (P q = 0) to black (P q = 1). The coloured points represent brown dwarf colours ranging from blue (L0) to red (T9). The black curve is the mean quasar spectrum derived by Pâris et al. (2011), for which we included IGM absorption. redshifts is represented as a function of their observed Y-band magnitude. Well-detected sources have a probability P q = 1, and their nature is unambiguously established. However, for fainter objects, P q drops sharply. As mentioned above for the previous figure, the fact that brown dwarfs outnumber the quasars combined with lower S/N implies that these sources are more easily construed as brown dwarfs scattered into the quasar loci. The redshift dependency lies in the slight variation of the Y mod − J mod quasar colours. As shown in Fig. 1, the Y mod − J mod colour of a z = 7.2 quasar is much closer to the brown dwarf locus (with Y mod − J mod ≈ 0.3) than that of a z = 7.0 quasar, for which Y mod − J mod ≈ 0.1 because the Ly-α line enters the Y band. In the next section, we present our high-z quasar candidate probabilities and discuss our results.

Results
We applied the Bayesian formalism developed in the previous section to our list of 51 high-z quasar candidates. We calculate the P q value for each candidate and report the resulting distribution in Fig. 8. As expected, the main result is that most of our Fig. 8: Histogram of the high-z quasar candidate probabilities. Fewer than 20% of our colour-selected candidates have a probability P q > 0.1.
candidates have a low probability (P q ≈ 10 −2 ). Only 6 of 36 candidates have a chance greater than 10% (P q > 0.1) of being high-z quasars. Three of them have a probability P q > 0.6. This result arises because high-z quasars are outnumbered by low-mass stars and the vast majority of our candidates has a relatively faint Y-band magnitude (Y obs ∼ 22.4 on average). Figure 9 corresponds to the same colour-colour diagrams as shown in Fig. 3, but with the difference that the probabilities P q are indicated according to a colour-code from red (low probability) to blue (high probability). Stars refer to candidates observed spectroscopically. In total, 17 candidates have been followedup in spectroscopy in the W1, W2, and W4 fields using the infrared spectrograph SOFI at NTT (GBF grism blue) and the optical spectrograph FORS2 at the Very Large Telescope (VLT; GRIS_600z+23 grism). Positions, photometry, probabilities, and spectroscopic informations of these sources are given in Table 3. Of these, 14 with P q 0.2 were consistent with being late brown dwarfs. The remaining 3 candidates observed with FORS2 at the VLT need to have further spectroscopic observations in the infrared because their spectral S/N is too low to confidently confirm their exact nature at this moment. However, as shown in Fig. 9, some candidates with a high probability have not yet been observed spectroscopically (one candidate in the W1 field with P q ∼ 0.6 and another in the W4 field with P q ∼ 1.0) because they were judged to be too faint in the z band to be detected in optical spectroscopy with FORS2.

Discussion and conclusions
We have presented a complete method for selecting and classifying z ∼ 7 quasar candidates using optical and NIR photometric data as part of the CFHQSIR survey. After visual inspection and removal of artefacts, we selected 228 candidates with red z − Y colours. The sample was eventually culled to 36 sources by considering candidates with Y − J colours consistent with being high-z quasars after NIR follow-up in J band. We extended and refined the Bayesian formalism developed by Mortlock et al. (2012) in order to improve our selection of high-z quasar candidates. This robust statistical approach allowed us to classify our candidates in the best possible way according to their probability of being a high-redshift quasar. Applying a Bayesian method is indeed very efficient to identify the most promising candidates in any photometrically selected sample since it allows combining the prior knowledge of the brown dwarf and quasar populations and the filter and noise properties of the observational data. Moreover, we demonstrated the importance of acquiring sufficiently precise photometric measurements to clearly determine the nature of the objects. The application of the Bayesian formalism revealed a promising list of six candidates with a probability P q > 0.1. Only three of these have a chance higher than 60% of being a high-z quasar. Even though one of them has been observed spectroscopically with the FORS2 instrument, we were unable to draw any conclusions about its nature because its spectral S/N is low. Our immediate goal is to complete the spectroscopic follow-up of our most promising candidates, according to their probability of being a high-z quasar. The full analysis of our search for z ∼ 7 quasars will be reported in a future paper, including the analysis of our spectroscopic data. A non-detection indicates that the quasar density at z ∼ 7 is at a 90% and 75% confidence level lower than the density inferred from the QLF of Willott et al. (2010b) and Jiang et al. (2016), respectively. Conversely, the discovery of one high-redshift quasar in CFHQSIR would be 23% and 35% consistent with the QLF of Willott et al. (2010b) and Jiang et al. (2016), respectively.
Our Bayesian method can be consolidated in several aspects, however. Early-type galaxies at z ∼ 2 may appear compact at faint magnitudes and therefore represent a second source of contamination that would also need to be modelled precisely, in addition to the low-mass stars. Furthermore, the brown dwarf population could be modelled more accurately by considering the stars from the Galactic thick disc and the halo, in addition to the thin disc. There is also further room for improvement concerning the local spatial density of brown dwarfs. Using observed spatial density measurements instead of model-predicted ones would be interesting (see e.g. Reylé et al. 2010;Kirkpatrick et al. 2011). As the number of known ultracool dwarfs continues to rise, more precise estimates of their spatial densities will be essential ingredients of an improved Bayesian model. Our results may also be biased as they are based on a heterogeneous collection of L-and T-dwarf spectra coming from various surveys, provided by the SpeX Prism library. In particular, our fundamental assumption that the measured colours of these sources are representative of the colours of the L and T populations may no longer be valid when con-sidering unresolved binaries and peculiar sources. The unusual colours of these rare sources, as well as the uncertainty in their spectral classification, can introduce an intrinsic scatter in the colours that is not taken into account in our model. Unresolved binaries can be an important source of uncertainty, as they could represent ∼ 40% of the dwarf population (Liu et al. 2006;Burgasser 2007). To finish, the newly discovered spectral type, the ultracool Y-dwarf class (e.g. Cushing et al. 2011), should also be included in our model, although these sources are likely to be too faint at our limiting magnitude to represent a real source of contamination. Nevertheless, the Bayesian technique presented in this paper remains a powerful approach to efficiently prioritize high-z quasar candidates, and it is moreover easily adaptable to any other highz object surveys, whether they are dedicated to the search for quasars or not (e.g. high-z galaxies in deep fields). This technique will be critical to the discovery of many more distant quasars by future NIR surveys such as the Euclid-wide imaging survey (Laureijs et al. 2011) or the WFIRST High Latitude Survey (HLS, Spergel et al. 2013). Thanks to the high sensitivity of these upcoming surveys, high-z quasars will be identified in an unprecedented number. Scheduled to be launched in 2021 and in the mid-2020s, respectively the Euclid space mission is expected to discover ∼ 30 quasars at redshifts z > 8.1 with J < 22.0 over 18 000 deg 2 , and WFIRST will find ∼ 500 z 8 quasars over 2 000 deg 2 above 10 σ (mag ≈ 26 limit, Spergel et al. 2013). A great challenge for these high-depth surveys will be the increasing number of contamination sources. In addition to the L and T dwarfs, a significant number of Y dwarfs will be detected, and possibly free-floating planets as well (see e.g. Han et al. 2004, for free-floating planets detected by microlensing). The increased number of contamination populations will require careful modeling in Bayesian frameworks such as the one presented in this work. This will be an essential requirement for culling the number of high-z candidates from these surveys before spectroscopic observations with the Extremely Large Telescope (ELTs 7 ) or the James Webb Space Telescope (JWST, Gardner et al. 2006) can be attempted.
Acknowledgements. We thank the anonymous referee for valuable comments and suggestions. Based on observations obtained with MegaPrime/MegaCam, a joint project of CFHT and CEA/IRFU, at the Canada-France-Hawaii Telescope (CFHT), which is operated by the National Research Council (NRC) of Canada, the Institut National des Science de l'Univers of the Centre National de la Recherche Scientifique (CNRS) of France, and the University of Hawaii. This work is based in part on data products produced at Terapix available at the Canadian Astronomy Data Centre as part of the Canada-France-Hawaii Telescope Legacy Survey, a collaborative project of NRC and CNRS. Based on observations made with ESO Telescopes at La Silla Paranal Observatory under programmes ID 095.A-0349, 097.A-0650, 098.A-0524, 099.A-0358 and 0100.A-0061. The WHT is operated on the island of La Palma by the Isaac Newton Group of Telescopes in the Spanish Observatorio del Roque de los Muchachos of the Instituto de Astrofísica de Canarias. The LIRIS photometry was obtained as part of OPT/2015B/26 and OPT/20184/007. This research has benefitted from the SpeX Prism Library and SpeX Prism Library Analysis Toolkit, maintained by Adam Burgasser at http://www.browndwarfs.org/spexprism. This research makes use of the VIPERS-MLS database, operated at CeSAM/LAM, Marseille, France. This work is based in part on observations obtained with WIRCam, a joint project of CFHT, Taiwan, Korea, Canada and France. The CFHT is operated by the National Research Council (NRC) of Canada, the Institut National des Science de l'Univers of the Centre National de la Recherche Scientifique (CNRS) of France, and the University of Hawaii. This work is based in part on observations made with the Galaxy Evolution Explorer (GALEX). GALEX is a NASA Small Explorer, whose mission was developed in cooperation with the Centre National d'Etudes Spatiales (CNES) of France and the Korean Ministry of Science and Technology. GALEX is operated for NASA by the California Institute of Technology under NASA contract NAS5-98034. This work is based in NTT/SOFI GB Grism Blue

Notes.
No quasar was confirmed after these spectroscopic observations. The detailed analysis of the spectroscopic results will be reported in a future paper.
Article number, page 12 of 14 S. Pipien et al.: High-redshift quasar selection from the CFHQSIR survey Fig. 9: Colour-colour diagram that represents our high-z quasar candidates according to a colour-code from red to blue, corresponding to low and high probability P q . Stars indicate spectroscopically observed candidates (NTT or VLT).
part on data products produced at TERAPIX available at the Canadian Astronomy Data Centre as part of the Canada-France-Hawaii Telescope Legacy Survey, a collaborative project of NRC and CNRS. The TERAPIX team has performed the reduction of all the WIRCAM images and the preparation of the catalogues matched with the T0007 CFHTLS data release. Based on observations obtained as part of the VISTA Hemisphere Survey, ESO Progam, 179.A-2010 (PI: McMahon).