Kernel-phase Detection Limits : Hypothesis Testing and the Example of JWST NIRISS Full Pupil Images

The James Webb Space Telescope will offer high-angular resolution observing capability in the near-infrared with masking interferometry on NIRISS, and coronagraphic imaging on NIRCam&MIRI. Full aperture kernel-phase based interferometry complements these observing modes, probing for companions at small separations while preserving the telescope throughput. Our goal is to derive both theoretical and operational contrast detection limits for the kernel-phase analysis of JWST NIRISS full-pupil observations by using tools from hypothesis testing theory, applied to observations of faint brown dwarfs with this instrument, but the tools and methods introduced here are applicable in a wide variety of contexts. We construct a statistically independent set of observables from aberration-robust kernel phases. Three detection tests based on these observable quantities are designed and analysed, all guaranteeing a constant false alarm rate for small phase aberrations. One of these tests, the Likelihood Ratio or Neyman-Pearson test, provides a theoretical performance bound for any detection test. The operational detection method considered here is shown to exhibit only marginal power loss with respect to the theoretical bound. In principle, for the test set to a false alarm probability of 1%, companion at contrasts reaching 10^3 at separations of 200 mas around objects of magnitude 14.1 are detectable. With JWST NIRISS, contrasts of up to 10^4 at separations of 200 mas could be ultimately achieved, barring significant wavefront drift. The proposed detection method is close to the ultimate bound and offers guarantees over the probability of making a false detection for binaries, as well as over the error bars for the estimated parameters of the binaries detectable by JWST NIRISS. This method is not only applicable to JWST NIRISS but to any imaging system with adequate sampling.


Introduction
In the past few years, many nearby brown dwarfs have been discovered thanks to the Wide-field Infrared Survey Explorer (WISE) sky survey (Wright et al. 2010;Cushing et al. 2011;Schneider et al. 2015). These newly discovered objects present an observational challenge due to their intrinsically low luminosities. Some of these brown dwarfs have been observed by the Hubble Space Telescope (HST), mostly for proper motion and parallax measurements (e.g. Marsh et al. 2013). While previous studies have searched for companions, they lack the sensitivity in the optical and the near infrared to achieve high enough contrasts to detect very low mass companions (e.g. Fontanive et al. 2018). High-angular resolution observations are also possible from the ground using either adaptive optics or optical interferometry. Cool dwarfs are however intrinsically faint objects and therefore fall short of the requirements of either technique, unless it is assisted by laser guide stars (Bernat et al. 2010).
Issues limiting the quality of ground based observations, such as sky background or atmospheric perturbations can be alleviated by observing from space. When launched, the James Webb Space Telescope (JWST, Gardner et al. 2006) will be the largest ever space telescope, and provide unparalleled sensitivity for studying faint, cool dwarfs. With a 6.5-meter primary mirror, and an instrument suite covering the 0.6-25.5 µm wavelength range, the theoretical angular resolution of this telescope respectively ranges from 20 to 800 mas. For a nearby object located less than 20 pc away, this translates in the ability to resolve structures present within a few astronomical units (AU) of the central source.
However, even for instruments capable of very high angular resolution, the glare from an object can drown out the light of faint surrounding structures. This issue is usually addressed by using coronagraphy and JWST's instrumentation offers several coronagraphs inside NIRCam and MIRI, with inner working angles ranging from 300 to 800 mas. To probe the innermost parts of nearby systems, inside the inner working angles of the coronagraphs, interferometry offers a viable alternative. In that scope, on board JWST, NIRISS offers the aperture masking interferometer (AMI) observing mode (Sivaramakrishnan et al. 2012) with a non-redundant mask (NRM) located in the instrument Article number, page 1 of 11 arXiv:1908.03130v1 [astro-ph.IM] 8 Aug 2019 A&A proofs: manuscript no. Main_text pupil wheel. AMI enables the detection of objects with lower contrasts, but at narrower separations compared to what can be achieved by JWST's coronagraphs. AMI is expected to have sufficient performance to address yet unanswered questions in the fields of active galaxy nuclei (Ford et al. 2014), planetary formation, exoplanet (Artigau et al. 2014), as well as follow-ups on astrometry measurements from the GAIA mission, or on ground based extreme adaptive optics (AO) surveys. In the case of binary point sources in non-coronagraphic modes, contrast ratios as high as 10 magnitudes (10 4 ) for the brightest companions at 130 mas can be attained using AMI (Sivaramakrishnan et al. 2012;Greenbaum et al. 2015Greenbaum et al. , 2018.
AMI achieves its best performance by taking advantage of self-calibrating observable quantities called closure-phases (Jennison 1958). This technique, first developped for radio interferometry and later ported to the optical regime (Baldwin et al. 1986) was adapted to single dish telescopes using a nonredundant aperture mask. Initially used in seeing-limited observing conditions (Nakajima 1989), the technique eventually took advantage of the development of AO (Tuthill et al. 2006) allowing for stabilized longer exposure modes and the ability to observe fainter objects. NRM interferometry is now routinely used and has led to a variety of studies, e.g. Sallum et al. (2015); Kraus et al. (2008Kraus et al. ( , 2011. Kernel-phase generalises the idea of closure-phase to apertures of arbitrary shapes, and can reliably be used when aberrations are smaller than about one radian (Martinache 2010). This method can therefore be used on images acquired using any instrument onboard JWST, provided that the instrument pupil geometry is accurately modelled. It is therefore useable on fullpupil images as well as on AMI/NRM closure phases. The Kernel method has already been used successfully to uncover new brown dwarf binaries with HST observations, as reported by Pope et al. (2013). Full-aperture kernel-phase and AMI closurephase cover the same parameter space but with its lower throughput (∼15 %), AMI is suited for the observation of bright targets that would otherwise saturate the instrument,as well as for observations where aberrations are too important to fall into the linear regime covered by the kernel method.
Kernel-and closure-phase rely on exploiting the phase of the Fourier transform (also referred to as the complex visibility) of the image. The image must satisfy the Nyquist-sampling requirement (platescale smaller than 0.5 λ/D), although small grid dithering allows observers to reconstruct a Nyquist Sampled image for other filters. Saturation should be avoided, although recovery is still possible. (Laugier et al. 2019). For a filter to be fully exploitable, its shortest wavelength must respect the sampling criterion. For the 6.5 m diameter of the primary mirror of JWST, this means that the filters compatible with a Kernel-phase analysis are: aims at setting a general statistical framework for the theoretical and operational detection limits of the Kernel approach, with focus on guarantees for the actual false alarm rate of the implemented detection method. As for the practical results, we investigate various aspects of the detection limits achievable for full-aperture NIRISS observations in the F480M filter. Section 2 will remind how kernel-phases are constructed, present the corresponding statistical model and introduce three statistical tests that will later be used to determine contrast detection limits. Section 3 will show how the method is applied to simulated images by JWST NIRISS. For several objects representative of the Y dwarfs discovered by WISE, this part highlights the need for estimating the noise covariance matrix, compares the performance of proposed detection tests and analyses the statistical uncertainty resulting in the estimation of the parameters of the detected binaries. For the remainder of this paper, an italicised lowercase letter such as a will denote a real or complex number, a bold lowercase italicised letter such as a will denote a vector, a bold italicised uppercase letter such as A will denote a matrix, and a hat such as b will denote the Maximum Likelihood Estimate, or MLE, of an unknown parameter b.

The Kernel Approach
The kernel framework introduced by Martinache (2010) describes diffraction-dominated images produced by the mostlycontinuous aperture of a telescope as if they were the interference pattern formed by a discrete array of virtual sub-apertures laid out on a regular grid of finite step. Although any pupil model can in principle yield kernel-phases, using a regularly spaced grid allows to encode simply and effectively the redundancy of the filled aperture. The fidelity of the discrete representation of the continuous aperture increases with the density the grid. In practice however, the size of the grid step (s) translates into a cut-off frequency λ/s that is matched to the field of view over which the diffractive signal is recorded. The example of the discrete representation of the JWST entrance aperture along with an image of the theoretical point spread function (PSF) of the original aperture are shown in Fig. 1. The entrance pupil is a combination of the entrance pupil and of an additional pupil-plane mask, CLEARP.
Kernel-phases are formed from a linear combination of the phase measured in the Fourier transform of the image. For a given wavelength, the discrete grid describing the original aperture also defines the sampling of the Fourier space via the coordinates and redundancies of the different baselines. The values of the Fourier transform of an image for the selected spatial frequencies are collected in the complex visibility vector v. The phase vector φ is defined as the argument of the complex visibility φ := ∠v. (1) In the optical path of a diffraction-limited instrument, unknown and potentially evolving aberrations result in a variable PSF that degrade the image quality. According to Martinache (2010), in the small aberration regime and for simple (i.e. noncoronagraphic) images, a linear model relates the phase φ measured in the Fourier space to the true phase of the observed object φ 0 and to the aberration phase ϕ present across the aperture: where A is a phase transfer matrix, encoding how the aberrations in each subaperture will propagate to the Fourier phase of the image. Its properties depend on the discrete representation of the aperture. The discrete model of JWST's aperture featured in Fig. 1 is made of m = 452 virtual sub-apertures, placed on a grid with a step size of 20 cm that form n = 1363 distinct baselines, resulting in a full rank phase transfer matrix A of dimensions 1363 × 452. The kernel matrix K is defined as a p × n matrix that verifies The kernel matrix cancels phase perturbation to the first order (Ireland 2013). With the chosen model, this matrix makes it possible to form a vector of kernel-phases k of size (p × 1), with p = 887, defined as: The kernel matrix K represents the left-nullspace of the transfer matrix A, and is computed from its singular value decomposition. The discrete representation of the aperture, the associated phase transfer matrix A and the kernel matrix K can be generated using a specially designed Python package called XARA 1 , that also offers the basic tools to extract kernel-phases from images.

Statistical modelling and hypothesis tests
Given a data image, how likely is it that a companion is present?
The present study proposes to tackle this question through statistical hypothesis testing. A hypothesis test compares a test statistic (noted T ) to a threshold (ξ), and has the general form where y is the data under test (obtained from the image) and the test statistic T (y) is a real random variable. In (5), the null hypothesis H 0 (noise only) is claimed if T (y) < ξ and the alternative hypothesis H 1 (noise + companion) is claimed otherwise. If the distribution of T can be known, the probability of false alarm can be controlled by the value of the test threshold ξ. The performance of a detection test are given by its probability of false alarm (P FA , the probability that a detection occurs under 1 XARA is available at http://github.com/fmartinache/xara/ H 0 ) and its probability of detection (P DET , the probability that a detection occurs under H 1 ): The power of a test is its P DET at a given P FA : the higher the P DET for a given P FA , the more powerful the test. It can be conveniently represented as a receiver operating characteristic (ROC) curve, P DET as a function of P FA .
Turning back to our detection problem, in the absence of noise the ker phases can take the values if the target is centrosymmetric or k = Kφ 0 , if the target presents asymmetries.
The noises affecting the images, propagate into the Fourier phases and consequently into the ker-phases. As we shall see in the next Section, the noise on the kernels can be modelled by a correlated Gaussian distribution with a covariance noted Σ. If this matrix is known, we can construct a vector y of 'whitened' kernel-phases which are decorrelated (hence independent), and similarly a vector x of whitened theoretical kernel-phases corresponding to the signature of the target: x This leads to the following statistical hypotheses: where is a p × 1 noise vector with independent and identically distributed Gaussian entries ( thanks to the whitening), and N(0, I) denotes the standard normal distribution (the covariance of is the Identity matrix, I).

Known signature in white Gaussian noise
For the problem defined in (10), the most powerful test is the likelihood ratio (LR), or Neyman-Pearson (NP) test (Neyman & Pearson 1933). For this test, the companion signature x must be known. The NP test is defined as (x; y) (0; y) where (x; y) is the likelihood of the signature x given the data y and η an adjustable threshold. For the Gaussian, white noise considered here, the likelihood is (Scharf & Friedlander 1994): p being the length of the kernel phase vector. Similarly, for H 0 with x = 0. The LR test in Eq. (11) becomes Taking the logarithm of Eq. (13) and noting ξ := η + 1 2 x T x leads to the test Hence, the NP test amounts to comparing the dot product of the data with the signature to a threshold. The distribution of T NP can be analytically determined under H 0 and H 1 : Denoting by N(0, 1) a standard normal variable and by F N its cumulative distribution function (CDF), Def. (6) and Eq. (15), the P FA and P DET can be derived as which, for the purpose of plotting ROC curves, combine to This test is the most powerful for the considered model, and will serve as the benchmark against which any other detection tests can be evaluated.
Implementing the NP test (Eq. (14)) requires to know the target signature x (namely, contrast and position if x correspond to a companion). In practical situations however, x is often partially or even fully unknown. This leads to consider the statistical model where X is a space describing some prior information about x. We will consider below two cases: completely unknown signature (X = R p ) and signature of a binary with unknown contrast and separation (X is then the space spanned by all possible binary signatures). A classical approach when some parameters describing the target x are unknown is to inject its Maximum Likelihood estimate (MLE, noted x) in place of x in the likelihood ratio (11). The MLE is defined by and injecting the MLE in the LR leads to the so-called generalised likelihood ratio (GLR) defined as max z∈X (z; y)

Completely unknown x signature
If we assume as a worst case situation that nothing is known about the signature x, we have X = R p . The likelihood in (12) is maximised for x = y, and injecting this value in Eq. (20) yields with ξ a threshold. Taking the logarithm of Eq. (21), we obtain the test: This test uses the measured squared norm of the signal as a test statistic and is called an energy detector (hence T E ). Its statistic is distributed as: Denoting by F χ 2 p (λ 2 ) the CDF of a χ 2 p (λ 2 ) random variable with p degrees of freedom and non-centrality parameter λ, we obtain: We note that test T E in Eq. (22) was previously used in the literature, e.g. in Zwieback et al. (2016) and Le Bouquin & Absil (2012) (although not identified as a GLR), with the P FA reported in Eq. (24). The expressions above combine into: Indeed, this test does not exploit any prior knowledge on the structure of the object to be detected and can thus be seen as providing a lower bound for the detection performance.

Signature of a binary
Repeated observations of gravitationally interacting multiple systems is the only means by which unambiguous dynamical masses can be determined. Because they make it possible to resolve asymmetries near or even slightly below the diffraction limit, which translates into small orbital distances, NRM closure or kernel-phase (Kraus et al. 2008;Huélamo et al. 2011;Lacour et al. 2011) and full-aperture kernel-phase (Pope et al. 2013;Laugier et al. 2019) are particularly suited to the observation of unequal brightness low mass binary systems. At any instant, a binary system is characterised by three parameters: the angular separation ρ of the companion relative to the primary, its position angle θ and a contrast c, defined here as the luminosity ratio of the primary over the secondary. Our simulations assume that the position angle is measured in the image relative to the axis pointing up (represented by a blue arrow in the right hand side panel of Fig. 1), and increases counterclockwise. Actual observations will also have to take into account the orientation of the telescope to project the apparent position angle onto the celestial sphere to combine observations at multiple epochs.
As an intermediate step, it is also convenient to use a Cartesian coordinate system in which the location of the secondary is given by (α, β). If the binary system is made of two individually unresolved point sources, its intensity distribution O can be modelled as a pair of Dirac distributions: The complex visibility v associated to this object is the 2D Fourier transform of Eq. (26) (van Cittert 1934;Zernike 1938), Recall that in the alternative hypothesis defined in Eq. (18), This leads to the parametric hypothesis: Under H 1 , there are three free parameters: α, β and c, so the MLE is now Finding the MLE is equivalent to minimising the argument of the exponential. This minimisation cannot be done analytically but numerical methods can be used to compute x, as explained below. Injecting (29) in (20) gives the test equivalent to Note that this detection problem is similar to the case VII of Scharf & Friedlander (1994), where the detection procedure also relies on the ML estimation of the signal of interest. In that reference, however, the signature x is assumed to live in a linear subspace (independent from the nuisance subspace), which is not the case here. As mentioned above, the MLE x must be found numerically. Fig. 2 illustrates, for one realisation of , an example of the value of the likelihood for a fixed contrast as a function of position angles α and β. It is apparent that the likelihood function is multimodal, so the minimisation strategy must be able to avoid local minima. A brute force search on a finely discretised grid of the parameter space is possible but comes at a large computation cost. Efficient numerical methods for solving multimodal problems exist, such as for instance Monte Carlo Markov Chains method with simulated annealing (Andrieu et al. 2003) or nested sampling (Skilling 2004).
Because the distribution of T B involves the unknown distribution of the MLE estimate x, it cannot be characterised analytically. However, as we shall see in the next Section, this distribution can be estimated by Monte Carlo simulations, allowing to establish accurately the relationship between the false alarm probability P T B FA of this test and the threshold ξ in (31). As an important final remark, we underline that the false alarm probabilities of the considered tests are independent of the power of the phase perturbations ϕ (at least as long as the linear model in Eq. (2) holds, that is, for phase perturbation below ≈ 1 radian). This is clear from expressions (16) and (24) for tests T NP and T E ; this is also the case for test T B because the phase perturbation is cancelled by the operator K and does affect the test statistic. This means that the false alarm rate of these tests remains constant in case of fluctuating aberrations, which is a desirable feature in practice.

Likelihoods, likelihood ratios and χ 2 intervals
The test statistic T B can be interpreted in terms of χ 2 -derived intervals as follows. Letx be some model obtained by some fit on data y. The χ 2 score corresponding to this fit is Considering the likelihood in Eq. (12), this shows that if y is Gaussian with meanx, the score in Eq. (32) is indeed a χ 2 p random variable. Now, the test statistics T B can be rewritten as which shows that T B can be interpreted as the reduction in the sum of squared residuals when comparing the null hypothesis to the considered model. For the sake of accurately controlling the false alarm rate, note, however, that T χ 2 in Eq. (32) may not be distributed as a χ 2 p variable becausex is a random variable. Actually, the true distribution of T χ 2 may not be known analytically, and a Monte Carlo procedure (such as that mentioned in Sec. 2.2.3 for the estimation of the correspondence between the P FA vs threshold for T B ) is required.

Results
The tests with the performance analyses presented in Section 2 are very general: considering a different aperture and instrumental noise simply amounts to replacing A, K and Σ in the equations. We focus now on their specific application to JWST NIRISS full pupil images.

Dataset and considered targets
We will apply the three detection tests previously introduced to a series of simulated JWST/NIRISS datasets, replicating the observing scenario of archetypal ultracool Y-type brown dwarfs. While their multiplicity rate is currently unknown, over 25 such Article number, page 5 of 11 objects have been discovered less than 20 pc away, mostly by the WISE mission . At 20 pc, the theoretical angular resolution of JWST for λ = 4.8 µm translates into an orbital distance of 3 AU: interferometric observations will make it possible to probe within the first few AUs of most known Y dwarfs.
JWST NIRISS images of Y dwarfs are simulated to evaluate the performance of the detection tests, using the ami_sim 2 package (Greenbaum et al. 2016), corresponding to a 40-minute integration on target and a 40-minute integration on a perfect calibrator. Frames are simulated in full pupil mode, using the F480M filter, for two different 'W2' magnitudes: 15.4 and 14.1. The W2 magnitude is the apparent magnitude in the band selected by the W2 (λ = 4.6 µm) WISE filter (Wright et al. 2010). For these objects, companions are placed at a single position angle θ = 315 • (materialised by the orange arrow in the PSF shown in Fig. 1). The simulated companions lie at separations of ρ = 73 mas (≈ 0.5λ/D @ λ = 4.8µm) or ρ = 147 mas (≈ λ/D @ λ = 4.8µm), and have contrasts c = 10, c = 20, c = 50 or c = 100, leading to a total of 8 possible signatures.
For any given target, a calibration frame is simulated and we assume no calibration error (stable wavefront, calibrator with the same spectrum and brightness as the Y dwarf). To comply with a real situation, ker-phases are not extracted directly from the simulated image: the frames are recentered, cropped to a size of 64 × 64 pixels and apodized by a Super-Gaussian mask (see Eq. 2 of Laugier et al. (2019)) of radius 30 pixels to weigh down the edges of the image.

Modelling the errors
Two types of errors affect kernel-phases and the outcome of the statistical tests described in Sec. 2. First are statistical errors induced by random noises whose overall impact can be captured in the acquisition or the synthesis of a global covariance matrix. Second are systematic errors resulting from the imperfect modelling by the kernel framework of the broadband, longexposure, and diffractive nature of images. The subtraction of kernel-phases acquired on a point source theoretically accounts for this systematic error however, in practice, wavefront drifts between observations will result in unaccounted residual residual errors referred to as systematic errors (Ireland 2013).
To estimate the potential impact of wavefront drift induced systematic errors, we rely on Perrin et al. (2018) who predict that over a timescale of two hours, JWST drifts will result at most in 2 ami_sim is available at https://github.com/agreenbaum/ ami_sim a 16 nm RMS wavefront across the entire pupil 3 . We used the 10 OPD maps distributed with the webbpsf package, scaled down to correspond to the predicted RMS to produce images resulting in 10 distinct ker-phase realisations. The dispersion of kerphases across these realisations was used to estimate the magnitude of the calibration residual. In the bright target scenario (W2 mag = 14.1) introduced in Sec. 3.1, this calibration residual accounts for about 14 % of the total noise variance. As will be shown later, this systematic error has a small impact when observing faint targets.

Covariance estimation
Whereas simulated images used in the analysis include all the previously listed noises, experience has shown us that, barring calibration residuals, the covariance matrix can accurately be estimated using the three dominant noises: photon, readout and dark current. Fig. 3 indeed shows that after whitening by this simpler covariance, the distribution of kernel-phases is indistinguishable from a normal distribution of standard deviation 1. The effect of the whitening is further illustrated in Fig. 4, which shows how previously noise-correlated kernel-phases (left panel) are indeed made statistically independent (right panel). The thus whitened observables can indeed be reliably used as input for the different statistical tests introduced in Sec. 2.2.
In practice, the covariance Σ is estimated using Monte Carlo simulations. An accurate estimation requires a number of simulated frames much greater than the total number of kernels; we used 10 5 frames for 887 kernels in our case.
Calibrated ker-phases are obtained by subtracting the kerphases of a calibrator from the ker-phases of the target in order to remove kernel model imperfections. Since the same flux is assumed for both observations, they share the same covariance. The covariance of the calibrated ker-phase vector is therefore twice the covariance Σ est estimated from the MC simulations.
To account for unknown calibration errors reported in NRM-inteferometry as well as in full aperture kernel-phase that result in a kernel-phase bias, one commonly used solution has been to artificially inflate the experimental variance by adding an additional term whose overall magnitude is adjusted during the model fit (eg. Martinache et al. (2009)). The OPD maps introduced in Sec. 3.2 make it possible to estimate the magnitude of this bias a priori. Proper treatment of the calibration would require the subtraction of an estimate of the calibration term, using either the POISE algorithm of Ireland (2013) or the KL decomposition approach described by Kammerer et al. (2019) that relies on the observation of multiple calibration sources. Here we will estimate the impact of an unaccounted calibration error on the contrast detection limits by adding the residual determined after analysis of the simulation that included the OPD maps to the diagonal of the covariance. To pursue the possibly covariated effects would require the computation of a distinct covariance matrix from a large number of distinct realisations of telescope drifts. For the faint brown dwarf case that motivates this study, the impact of the calibration error are small, so we chose not to pursue the non-diagonal terms.

Parameter estimation
Detecting a companion using the operational binary test T B requires the determination of the MLE x. This requires estimating the parameters ρ, θ and c from the whitened ker-phases y in Eq. (28). The distribution of the parameters can be estimated by generating, for each considered signature, a large number of noisy ker-phases and estimating the parameters. In practice, a global optimisation algorithm can be used. For the purpose of making a large number of simulations, we assume that the algorithm has localised the region in which the global minimum is situated (the darkest region in Fig. 2). In this setting, the minimum can be found by a gradient descent algorithm.
In the following, we use the algorithm described by Branch et al. (1999), as implemented in scipy.optimize.leastsquares, which uses the local gradient and optimises for the direction descent and step size. The initialisation of the algorithm corresponds to the parameters of the injected companion. This method is suited for the determination of contrast limits thanks to its speed. We checked that we obtained very similar results with a (computationally more expensive) systematic grid search that would typically be used in practice. 4 Fig. 5 shows the recovered ker-phases as a function of the ker-phases of simulated images for different separation, contrast and flux regimes. The fit remains pretty consistent for each case, with a scatter getting predictably more important as the signalto-noise ratio decreases (the signal-to-noise ratio is affected by the contrast, the separation and the total flux in the image).
All of the signatures presented in Fig. 6 are detectable by the T B with P FA < 10 −3 . The shape of the two dimensional distribution of the estimated separation ρ and contrast c reproduces what was for instance reported by Pravdo et al. (2006) in the context of NRM observations: at angular separations smaller than λ/D, estimates for the contrast and the angular separation are strongly correlated. Fig. 6 also shows that two regimes can be distinguished. For a companion at ρ ≈ λ/D (for JWST λ/D = 152mas @ λ4.80µm), all parameters are well constrained, while for a companion at ρ < λ/D, the contrast and the angular separation cannot be well constrained simultaneously. In practice, this means that the estimation of the position of a companion using kernelphases when the expected angular separation is smaller than λ/D can be further constrained by an independent measurement of the luminosity of the companion at a different epoch, when ρ > λ/D. This property can be particularly useful in the case of objects with high eccentricities or inclinations. A study of the consequences of the parameters' uncertainties and correlations on the orbit that can be fitted using the Kernel method on NIRISS images is out of the scope of this paper; this should be the object of future work, along with recommendations of optimum observing strategies in regards of the uncertainties on measured orbital parameters.

Detection and contrast performance
First, we validate the theoretical relations predicting the performance of the NP test T NP (Eq. (17)) and of the energy detector T E (Eq. (24)), and determine the actual performance of T B (Eq. (31)). For that purpose, we perform Monte Carlo simulations consisting in 2000 realisations 5 of y under H 0 and under H 1 for a given signature x (cf Eq. (27)).
All of the detection limits are shown for P FA = 1% and P DET = 68%. In terms more frequently encountered in astronomy publications, this is equivalent to having a 68% chance of making a ≈ 2.3σ detection.
On each realisation, we perform each of the three tests by using the kernels operator K and the covariance matrix Σ estimated as in Sec. 3.3.
We present in Fig. 7 the results in the form of ROC curves, which provide a graphical representation of the power of each test. It can be seen that the dashed curves representing the theoretical ROCs match accurately the solid lines corresponding to the performance achieved in practice. As expected, T NP appears as the most powerful of the three tests (this test corresponds to the upper performance bound) and T E as the least powerful of the three (this test uses no prior information on the target signature and can be seen as a lower bound). The performance of T B logically lies in between, but much closer to the upper than to the lower bound.
The detection limits for the three tests T NP , T E and T B are represented in Fig. 8 across a range of contrasts and separations, for a fixed position angle θ = 315 • . The dashed lines correspond to no wavefront error while the solid lines correspond to 16 nm RMS of wavefront error. We can see that the theoretical performance, validated for a single companion signature in Fig. 7 do hold true over a large range of contrasts and separations, and that the detection limit of T B remains close to the bound provided by T NP . The dashed and dotted lines correspond to a perfectly stable JWST leading to a perfect calibration of the systematic errors.
The detection limits further depend on θ, because the PSF of JWST NIRISS is not centrosymmetric (as visible in Fig. 1). Fluctuations of these limits are shown in Fig. 9 for three position angles. The Figure   The closer a curve is from the black line on the diagonal, the less powerful the corresponding test. The higher flux regime is represented in the top panel, and the lower flux regime in the bottom panel. The performance of T NP and T E are accurately described by the theoretical expressions in Eq. (17) and Eq. (24). The test T NP presents the highest performance. T B is the next best-performing test and T E has the lowest performance of the three. We see a clear improvement of the power of all tests as the flux (and thus the SNR) increases.
the corresponding positions in the image (computed here as the maximal pixel value of a noiseless image with only the companion, divided by the standard deviation of the considered noise), showing that the detection limits follow the overall noise level in the image. Performance wise, the detectable contrast ratios are of the order of 10 3 at 200 mas, with some variations between the two flux levels considered.

Mass limits for WISE 1405+5534
WISE 1405+5534 is a Y-type brown dwarf with a W2 magnitude of 14.1 that was used as a reference target to produce the contrast detection limits featured in Fig. 8. The raw observational detec-  Fig. 8. Detection limits at a fixed position angle θ = 315 • : contours below which the P DET falls below 68% for a fixed P FA of 1%, represented as a function of the separation and contrast of the companions, for T E (green), T B (blue) and T NP (orange). The dashed lines represent theoretical detection limits for T E and T NP (Eq. (17) and Eq. (24)) and the dotted lines present the limits actually achieved in the MC simulations. T NP (orange) provides ideal detection limits for a Kernel treatment of a JWST-NIRISS image and the practical test T B (dotted blue) has contrast detection limits within a factor of 2.5 of the theoretical maximum.
The solid lines represent the detection limits for T B (blue) and T NP (red) with a calibration residual corresponding to a 16 nm RMS wavefront drift.
tion limit curve of contrast as a function of angular separation can be converted into an astrophysical detection limit curve of companion mass as a function of orbital separation.
Whereas the 129 ± 19 mas parallax measured by Dupuy & Kraus (2013) directly allows for the conversion of the angular separation into a projected orbital distance, the contrast to mass conversion requires a model. We use the mass-luminosity relations given by the AMES-Cond model of Baraffe et al. (2003) for an age of 1 Gy and a mass estimate of 30 M J for the primary given by Cushing et al. (2011).
The detection limits obtained for WISE 1405+5534 are shown in 10. At P FA = 1%, and P DET = 68%, a 1M J can be detected at separations greater that 1.5 AU. An orbit with this semi major axis would have a period of 40 years, thus a quarter of an orbit could be captured with repeated observations over the expected service life of JWST.

Bright limits
For the faint Y-dwarf targets considered thus far, it may have occurred to the reader that the contrast detection limits are dominated by the effect of the dark current and the readout noise and not by the photon noise of the central object. We wish here to complete the description of the properties of our approach with a bright target scenario that will feature a different behaviour, thus exhibiting the contribution of the photon noise.
The saturation limit for full pupil JWST NIRISS using the F480M filter, and a 64×64 pixels subarray size is 7.6 mag. We consider a shorter observation sequence, with a total of 20 minutes spent on the target of interest and 20 minutes on a calibrator of similar brightness. The detection limits for this observation, using the operational test T B are shown in Fig. 11, at P FA = 1%, and P DET = 68%.
Unlike the contrast detection limits obtained on the faint targets, the curves now clearly reveal two different regimes. Up to an angular separation of ≈500 mas, where the photon noise is expected to dominate, the contrast detection decreases as a function of the angular separation. Beyond this point, it reaches a plateau, as the detection is once again dominated by the homogeneous properties of the dark current and the readout noise.
In this bright scenario, calibration errors induced by a drift comparable to what was described in Sec. 3.5 will have a stronger impact on the weak signal of a high-contrast companion. Sallum Fig. 11. Detection limits for the brightest target observable without saturation with JWST NIRISS. Solid lines: detection limits for T B at P FA = 1% and P DET = 68% applied to the image with the greatest possible dynamic range, with 20 minutes total integration time. For the brightest images, the kernel method with the test T B ideally allows to detect contrasts up to 10 5 beyond 500 mas. The dashed orange line represents detection limits in the presence of a 16 nm wavefront drift. into account. Under the hypothesis introduced in Sec. 3.2, the calibration error accounts here for 85% of the total noise variance of the kernels and therefore result in a degraded performance by a factor ≈10, as shown by the dashed curve in Fig.  11.

Conclusion
This paper provides a theoretical and numerical analysis of the performance of various detection tests based on the Kernel method. The approach provides an upper bound for the achievable detection limits, and an operational detection test whose performance are close to the upper bound. Furthermore, the false alarm rate of these tests is not affected by fluctuating aberrations and can be tuned a priori.
The kernel-based detection approach presented in this paper is not specific to either NIRISS, the 480M band, the full pupil imaging mode, nor to JWST itself. The method only requires weak wavefront perturbations and appropriate sampling (i.e., a small enough platescale as compared to λ/D). In particular, the statistical treatment proposed in this study can also be used for NRM data.
For JWST-NIRISS in the F480M band, we have shown that medium (≈ 10 2 ) to high (≈ 10 3 ) contrast detections can realistically be achieved for separations down to half of λ/D on ultracool brown dwarf primary targets. In practice, this means that a 80-minute observation sequence can allow for the detection of a 1M J situated 1.5 AU away from a 30M J Y type brown dwarf at a distance of 8 pc. On brighter targets, kernel-phase analysis, combined with the methods presented in this paper can reveal companions at contrasts ≈ 10 3 down to 0.3λ/D.
Detection results presented in this paper rely on up-to-date simulations of JWST-NIRISS frames, that take into account all the noises expected to contribute to kernel-phase uncertainties. These results can be affected by several effects that are not yet accounted for, the most critical being probably calibration errors. Instrumental drifts in the range of a few tens of nanometres, as predicted by Perrin et al. (2018), are not expected to degrade performances significantly for Y dwarfs. Another limitation may come from the algorithmic efficiency in determining the MLE x in Eq. (29) for the test T B . Too coarse grid searches, or algorithms too sensitive to local minima will lead to a loss in detection power and to an increased uncertainty for the estimated parameters.
The performance reported in this work can therefore be seen as ideal contrast performance achievable using kernel phases for JWST NIRISS images. The method can in principle be improved upon by exploiting the full information available in the image (present not only in the phase but also, to a lesser extent, in the amplitude of the complex visibility). Even working solely with the phase, the calibration problem can be mitigated by using a more accurate and less idealised representation of the instrument. A significant fraction of the calibration error comes from the use of a necessarily approximate discrete model to represent the continuous phenomenon of diffraction. The results reported in this work used a dense aperture model to mitigate this discretisation error however the representation is not optimal yet. One avenue to improve the overall fidelity for example seems to be to take into account a variable local transmission function to more accurately describe the aperture with the same grid density. The study of the general aperture modelling prescription will be the object of future work.
The XARA package is regularly updated in the context of the KERNEL project.