Faint end of the $z \sim 3-7$ luminosity function of Lyman-alpha emitters behind lensing clusters observed with MUSE

We present the results obtained with VLT/MUSE on the faint-end of the Lyman-alpha luminosity function (LF) based on deep observations of four lensing clusters. The precise aim of the present study is to further constrain the abundance of Lyman-alpha emitters (LAEs) by taking advantage of the magnification provided by lensing clusters. We blindly selected a sample of 156 LAEs, with redshifts between $2.9 \le z \le 6.7$ and magnification-corrected luminosities in the range $ 39 \lesssim \log L_{Ly_{\alpha}}$ [erg s$^{-1}$] $\lesssim 43$. The price to pay to benefit from magnification is a reduction of the effective volume of the survey, together with a more complex analysis procedure. To properly take into account the individual differences in detection conditions (including lensing configurations, spatial and spectral morphologies) when computing the LF, a new method based on the 1/Vmax approach was implemented. The LAE LF has been obtained in four different redshift bins with constraints down to $\log L_{Ly_{\alpha}} = 40.5$. From our data only, no significant evolution of LF mean slope can be found. When performing a Schechter analysis including data from the literature to complete the present sample a steep faint-end slope was measured varying from $\alpha = -1.69^{+0.08}_{-0.08}$ to $\alpha = -1.87^{+0.12}_{-0.12}$ between the lowest and the highest redshift bins. The contribution of the LAE population to the star formation rate density at $z \sim 6$ is $\lesssim 50$% depending on the luminosity limit considered, which is of the same order as the Lyman-break galaxy (LBG) contribution. The evolution of the LAE contribution with redshift depends on the assumed escape fraction of Lyman-alpha photons, and appears to slightly increase with increasing redshift when this fraction is conservatively set to one. (abridged)


Introduction
Reionization is an important change of state of the universe after recombination, and many resources have been devoted in recent years to understand this process. The formation of the first structures, stars, and galaxies marked the end of the dark ages. Following the formation of the first structures, the density of ionizing photons was high enough to allow the ionization of the entire neutral hydrogen content of the intergalactic medium (IGM). It has been established that this state transition was mostly completed by z ∼ 6 (Fan et al. 2006;Becker et al. 2015). However the identification of the sources responsible for this major transition and their relative contribution to the process is still a matter of substantial debate.
Although quasars were initially considered as important candidates owing to their ionising continuum, star-forming galaxies presently appear as the main contributors to the reionization (see e.g. Robertson et al. 2013Robertson et al. , 2015Bouwens et al. 2015a;Ricci et al. 2017). However a large uncertainty still remains on the actual contribution of quasars, as the faint population of quasars at high redshift remains poorly constrained (see e.g. Willott et al. 2010;Fontanot et al. 2012;McGreer et al. 2013). There are two main signatures currently used for the identification of star-forming galaxies around and beyond the reionization epoch. The first signature is the Lyman "drop-out" in the continuum bluewards with respect to Lyman-alpha from the combined effect of interstellar and intergalactic scattering by neutral hydrogen. Different redshift intervals can be defined to select Lyman break galaxies (LBGs) using the appropriate colour-colour diagrams or photometric redshifts. Extensive literature is available on this topic since the pioneering work by Steidel et al. (1996) (see e.g. Ouchi et al. 2004;Stark et al. 2009;McLure et al. 2009;Bouwens et al. 2015b, and the references therein). The second method is the detection of the Lymanalpha line to target Lyman-alpha emitters (hereafter LAEs). The "classical" approach is based on wide-field narrow-band (NB) surveys, targeting a precise redshift bin (e.g. Rhoads et al. 2000;Kashikawa et al. 2006;Konno et al. 2014). More recent methods made efficient use of 3D/IFU spectroscopy in pencil beam mode with the Multi-Unit Spectroscopic Explorer (MUSE) at the Very Large Telecope (VLT; Bacon et al. 2015), which is a technique presently limited to z ∼ 7 in the optical domain.
Based on LBG studies, the UV luminosity function (LF) evolves strongly at z ≥ 4, with a depletion of bright galaxies with increasing redshift on one hand, and the slope of the faint end becoming steeper on the other hand (Bouwens et al. 2015b). This evolution is consistent with the expected evolution of the halo mass function during the galaxy assembly process. Studies of LAEs have found a deficit of strongly emitting ("bright") Lyman-alpha galaxies at z ≥ 6.5, whereas no significant evolution is observed below z ∼ 6 ( Kashikawa et al. 2006;Pentericci et al. 2014;Tilvi et al. 2014); this trend is attributed to either an increase in the fraction of neutral hydrogen in the IGM or an evolution of the parent population, or both. The LBGs and LAEs constitute two different observational approaches to selecting star-forming galaxies, which are partly overlapping. The prevalence of Lyman-alpha emission in well-controlled samples of star-forming galaxies is also a test for the reionization history. However, a complete and "as unbiased as possible" census of ionizing sources can only be enabled through 3D/IFU spectroscopy without any photometric preselection.
As pointed out by different authors (see e.g. Maizy et al. 2010), lensing clusters are more efficient than blank fields for detailed (spectroscopic) studies at high redshift and also to explore the faint end of the LF. In this respect, they are complementary to observations in wide blank fields, which are needed to set reliable constraints on the bright end of both the UV and LAE LF. Several recent results in the Hubble Frontier Fields (HFF; Lotz et al. 2017) fully confirm the benefit expected from gravitational magnification (see e.g. Laporte et al. 2014Laporte et al. , 2016Atek et al. 2014;Infante et al. 2015;Ishigaki et al. 2015;Livermore et al. 2017).
This paper presents the results obtained with MUSE (Bacon et al. 2010) at the ESO VLT on the faint end of the LAE LF based on deep observations of four lensing clusters. The data were obtained as part of the MUSE consortium Guaranteed Time Observations (GTO) programme and first commissioning run. The final goal of our project in lensing clusters is to set strong constraints on the relative contribution of the LAE population to cosmic reionization. As shown in Richard et al. (2015) for SMACSJ2031.8-4036, Bina et al. (2016) for A1689, Lagattuta et al. (2017) for A370, Caminha et al. (2016) for AS1063, Karman et al. (2016) for MACS1149 andMahler et al. (2018) for A2744, MUSE is ideally designed for the study of lensed background sources, in particular for LAEs at 2.9 ≤ z ≤ 6.7. The MUSE instrument provides a blind survey of the background population, irrespective of the detection or not of the associated continuum. This instrument is also a unique facility capable of deriving the 2D properties of "normal" strongly lensed galaxies, as recently shown by Patricio et al. (2018). In this project, an important point is that MUSE allows us to reliably recover a greater fraction of the Lyman-alpha flux for LAE emitters, as compared to usual long-slit surveys or even NB imaging.
The precise aim of the present study is to further constrain the abundance of LAEs by taking advantage of the magnification provided by lensing clusters to build a blindly selected sample of galaxies which is less biased than current blank field samples in redshift and luminosity. By construction, this sample of LAEs is complementary to those built in deep blank fields, whether observed by MUSE or by other facilities, and makes it possible to determine in a more reliable way the shape of the LF towards the faintest levels and its evolution with redshift. We focus on four well-known lensing clusters from the GTO sample, namely Abell 1689, Abell 2390, Abell 2667, and Abell 2744. In this study we present the method and we establish the feasibility of the project before extending this approach to all available lensing clusters observed by MUSE in a future work.
In this paper we present the deepest study of the LAE LF to date, combining deep MUSE observations with the magnification provided by four lensing clusters. In Sect. 2, we present the MUSE data together with the ancillary Hubble Space Telescope (HST) data used for this project as well as the observational strategy adopted. The method used to extract LAE sources in the MUSE cubes is presented in Sect. 3. The main characteristics and the references for the four lensing models used in this article are presented in Sect. 4, knowing that the present MUSE data were also used to identify new multiply-imaged systems in these clusters, and therefore to further improve the mass models. The selection of the LAE sample used in this study is presented in Sect. 5. Section 6 is devoted to the computation of the LF. In this Section we present the complete procedure developed for the determination of the LF based on IFU detections in lensing clusters; some additional technical points and examples are given in Appendices A-D. This procedure includes novel methods for masking, effective volume integration and (individual) completeness determination, using as far as possible the true spatial and spectral morphology of LAEs instead of a parametric A3, page 2 of 30 Notes. The A2744 field was splitted in two (part a and part b) because of the additional pointing covering the centre of the 2 × 2 MUSE mosaic. For A1689 and A2390, the seeing was measured on the white light image obtained from the final datacube. For A2667 and A2744, the seeing was obtained by fitting a MUSE reconstructed F814W image with a seeing convolved HST F814W image (see Patricio et al. 2018for A2667 and Mahler et al. 2018 for A2744).
approach. The parametric fit of the LF by a Schechter function, including data from the literature to complete the present sample, is presented in Sect. 7. The impact of mass model on the faint end and the contribution of the LAE population to the star formation rate density (SFRD) are discussed in Sect. 8. Conclusions and perspectives are given in Sect. 9. Throughout this paper we adopt the following cosmology: Ω Λ = 0.7, Ω m = 0.3 and H 0 = 70 km s −1 Mpc −1 . Magnitudes are given in the AB system (Oke & Gunn 1983). All redshifts quoted are based on vacuum rest-frame wavelengths.

MUSE observations
The sample used in this study consists of four different MUSE cubes of different sizes and exposure times, covering the central regions of well-characterized lensing clusters: Abell 1689, Abell 2390, Abell 2667, and Abell 2744 (resp. A1689, A2390, A2667 and A2744 hereafter). These four clusters already had well constrained mass models before the MUSE observations, as they benefited from previous spectroscopic observations. The reference mass models can be found in Richard et al. (2010;LoCuSS) for A2390 and A2667, in Limousin et al. (2007) for A1689, and in Richard et al. (2014) for the Frontier Fields cluster A2744.
The MUSE instrument has a 1 × 1 field of view (FoV) and a spatial pixel size of 0.2 , the covered wavelength range from 4750 Å to 9350 Å with a 1.25 Å sampling, effectively making the detection of LAEs possible between redshifts of z = 2.9 and 6.7. The data were obtained as part of the MUSE GTO programme and first commissioning run (for A1689 only). All the observations were conducted in the nominal WFM-NOAO-N mode of MUSE. The main characteristics of the four fields are listed in Table 1. The geometry and limits of the four FoVs are shown on the available HST images, in Fig. 1.
A1689. Observations were already presented in Bina et al. (2016) from the first MUSE commissioning run in 2014. The total exposure was divided into six individual exposures of 1100 s. A small linear dither pattern of 0.2 was applied between each exposure to minimize the impact of the structure of the instrument on the final data. No rotation was applied between individual exposures. A2390, A2667, and A2744. The same observational strategy was used for all three cubes: the individual pointings were divided into exposures of 1800 s. In addition to a small dither pattern of 1 , the position angle was incremented by 90 • between each individual exposure to minimize the striping patterns caused by the slicers of the instrument. A2744 is the only  mosaic included in the present sample. The strategy was to completely cover the multiple-image area. For this cluster, the exposures of the four different FoVs are as follows: 3.5, 4, 4, 5 hours of exposure plus an additional 2 hours at the centre of the cluster (see Fig. 1 in Mahler et al. 2018 for the details of the exposure map). For A2390 and A2667, the centre of the FoV was positioned on the central region of the cluster as shown in Table 1 and Fig. 1.

MUSE data reduction
All the MUSE data were reduced using the MUSE ESO reduction pipeline (Weilbacher et al. 2012(Weilbacher et al. , 2014. This pipeline includes bias subtraction, flat fielding, wavelength and flux calibrations, basic sky subtraction, and astrometry. The individual exposures were then assembled to form a final data cube or a mosaic. An additional sky line subtraction was performed with the Zurich Atmosphere Purge software (ZAP; Soto et al. 2016). This software uses principal component analysis to characterize the residuals of the first sky line subtraction to further remove them from the cubes. Even though the line subtraction is improved by this process, the variance in the wavelength layers affected by the presence of sky lines remains higher, making the source detection more difficult on these layers. For simplicity, A3, page 3 of 30 A&A 628, A3 (2019) hereafter we simply use the term layer to refer to the monochromatic images in MUSE cubes.

Complementary data (HST)
For all MUSE fields analysed in this paper, complementary deep data from HST are available. They were used to help the source detection process in the cubes but also for modelling the mass distribution of the clusters (see Sect. 4). A brief list of the ancillary HST data used for this project is presented in Table 2. For A1689 the data are presented in Broadhurst et al. (2005). For A2390 and A2667, a very thorough summary of all the HST observations available are presented in Richard et al. (2008) and more recently in Olmstead et al. (2014) for A2390. A2744 is part of the HFF programme, which comprises the deepest observations performed by HST on lensing clusters. All the raw data and individual exposures are available from the Mikulski Archive for Space Telescopes (MAST), and the details of the reduction are addressed in the articles cited above.

Source detection
The MUSE instrument is very efficient at detecting emission lines (see for example Bacon et al. 2017;Herenz et al. 2017). On the contrary, deep photometry is well suited to detect faint objects with weak continua, with or without emission lines. To build a complete catalogue of the sources in a MUSE cube, we combined a continuum-guided detection strategy based on deep HST images (see Table 2 for the available photometric data) with a blind detection in the MUSE cubes. Many of the sources end up being detected by both approaches and the catalogues are merged at the end of the process to make a single master catalogue. The detailed method used for the extraction of sources in A1689 and A2744 can be found in Bina et al. (2016) and Mahler et al. (2018) 1 , respectively. The general method used for A2744, which contains the vast majority of sources in the present sample, is summarized below. The presence of diffuse intra-cluster light (ICL) makes the detection of faint sources difficult in the cluster core, in particular for multiple images located in this area. A running median filter computed in a window of 1.3 was applied to the HST images to remove most of the ICL. The ICL-subtracted images were then weighted by their inverse variance map and combined to make a single deep image. The final photometric detection was performed by SExtractor (Bertin & Arnouts 1996) on the weighted and combined deep images.
For the blind detection on the MUSE cubes, the Muselet software was used (MUSE Line Emission Tracker, written by J. Richard 2 ). This tool is based on SExtractor to detect emissionline objects from MUSE cubes. It produces spectrally weighted, continuum-subtracted NB images (NB) for each layer of the cube. The NB images are the weighted average of five wavelength layers, corresponding to a spectral width of 6.25 Å. These images form a NB cube, in which only the emission-line objects remain. This Sextractor tool is then applied to each of the NB images. At the end of the process, the individual detection catalogues are merged together and sources with several detected emission lines are assembled as one single source.
After building the master catalogue, all spectra were extracted and the redshifts of galaxies were measured. For A1689, A2390, and A2667, 1D spectra were extracted using a fixed 1.5 aperture. For A2744, the extraction area is based on the SExtractor segmentation maps obtained from the deblended photometric detections described above. At this stage, the extracted spectra are only used for the redshift determination. The precise measurement of the total line fluxes requires a specific procedure, which is described in Sect. 3.2. Extracted spectra were manually inspected to identify the different emission lines and accurately measure the redshift.
A system of confidence levels was adopted to reflect the uncertainty in the measured redshifts, following Mahler et al. (2018), which has some examples that illustrate the different cases. All the LAEs used in the present paper belong to the confidence categories 2 and 3, meaning that they all have fairly robust redshift measurements. For LAEs with a single line and no continuum detected, the wide wavelength coverage of MUSE, the absence of any other line, and the asymmetry of the line were used to validate the identification of the Lyman-alpha emission. For A1689, A2390, and A2667 most of the background galaxies are part of multiple-image systems, and are therefore confirmed high redshift galaxies based on lensing considerations.
In total 247 LAEs were identified in the four fields: 17 in A1689, 18 in A2390, 15 in A2667, and 197 in A2744. The important difference between the number of sources found in the different fields results from a well-understood combination of field size, magnification regime, and exposure time, as explained in Sect. 5.

Flux measurements
The flux measurement is part of the main procedure developed and presented in Sect. 6 to compute the LF of LAEs in lensing 1 The complete catalogue of MUSE sources detected by G. Mahler in A2744 is publicly available at http://muse-vlt.eu/science/ a2744/. 2 Publicly available as part of the python MPDAF package (Piqueras et al. 2017): http://mpdaf.readthedocs.io/en/ latest/muselet.html.
A3, page 4 of 30 clusters observed with MUSE. We discuss this in this section to understand the selection of the final sample of galaxies used to build the LF.
For each LAE, the flux measurement in the Lyman-alpha line was done on a continuum subtracted NB image that contains the whole Lyman-alpha emission. For each source, we built a sub-cube centred on the Lyman-alpha emission, plus adjacent blue and red sub-cubes used to estimate the spectral continuum. The central cube is a square of size 10 and the spectral range depends on the spectral width of the line. To determine this width and the precise position of the Lyman-alpha emission, all sources were manually inspected. The blue and red sub-cubes are centred on the same spatial position, with the same spatial extent, and are 20 Å wide in the wavelength direction. A continuum image was estimated from the average of the blue and red sub-cubes and this image was subtracted pixel-to-pixel from the central NB image. For sources with large full width at half maximum (FWHM), the NB used for flux measurement can regroup more than 20 wavelength layers (or equivalently 25 Å).
Because SExtractor with FLUX_AUTO is known to provide a good estimate of the total flux of the sources to the 5% level (see e.g. the SExtractor Manual, Sect. 10.4, Fig. 8.), it was used to measure the flux and the corresponding uncertainties on the continuum-subtracted images. The FLUX_AUTO routine is based on Kron first moment algorithm, and is well suited to account for the extended Lyman-alpha haloes that can be found around many LAEs (see Wisotzki et al. 2016 for the extended nature of the Lyman-alpha emission). In addition, the automated aperture is useful to account properly for the distorted images that are often found in lensing fields. As our sample contains faint, low surface brightness sources, and given that the NB images are not designed to maximize the signal-tonoise ratio (S/N), it is sometimes challenging to extract sources with faint or low-surface brightness Lyman-alpha emission. In order to measure their flux we force the extraction at the position of the source. To do so, the SExtractor detection parameters were progressively loosened until a successful extraction was achieved. An extraction was considered successful when the source was recovered at less than a certain matching radius (r m ∼ 1 ) from the original position given by Muselet. Such an offset is sometimes observed between the peak of the UV continuum and the Lyman-alpha emission in case of high magnification. A careful inspection was needed to make sure that no errors or mismatches were introduced in the process.
Other automated alternatives to SExtractor exist to measure the line flux (see e.g. LSDCat in Herenz et al. 2017 or NoiseChisel in Akhlaghi & Ichikawa 2015 or a curve of growth approach as developed in Drake et al. 2017). A comparison between these different methods is encouraged in the future but beyond the scope of the present analysis.

Lensing clusters and mass models
In this work, we used detailed mass models to compute the magnification of each LAE, and the source plane projections of the MUSE FoVs at various redshifts. These projections were needed when performing the volume computation (see Sect. 6.1). The mass models were constructed with Lenstool, using the parametric approach described in Kneib et al. (1996), Jullo et al. (2007), and Jullo & Kneib (2009). This parametric approach relies on the use of analytical dark-matter (DM) halo profiles to describe the projected 2D mass distribution of the cluster. Two main contributions are considered by Lenstool: one for each large-scale structure of the cluster and one for each massive cluster galaxy. The parameters of the individual profiles are optimized through a Monte Carlo Markov chain (MCMC) minimization. The Lenstool software aims at reducing the cumulative distance in the parameter space between the predicted position of multiple images obtained from the model, and the observed images. The presence of several robust multiple systems greatly improves the accuracy of the resulting mass model. The use of MUSE is therefore a great advantage as it allowed us to confirm multiple systems through spectroscopic redshifts and also to discover new systems (e.g. Richard et al. 2015;Bina et al. 2016;Lagattuta et al. 2017;Mahler et al. 2018). Some of the models used in this study are based on the new constraints provided by MUSE. An example of source plane projection of the MUSE FoVs is provided in Fig. 2.
Because of the large number of cluster members, the optimization of each individual galaxy-scale clump cannot be achieved in practice. Instead, a relation combining the constant mass-luminosity scaling relation described in Faber & Jackson (1976) and the fundamental plane of elliptical galaxies is used by Lenstool. This assumption allows us to reduce the parameter space explored during the minimization process, leading to more constrained mass models, whereas individual parameterization of clumps would lead to an extremely degenerate final result and therefore, a poorly constrained mass model. The analytical profiles used were double pseudo-isothermal elliptical potentials (dPIEs) as described in Elíasdóttir et al. (2007). The ellipticity and position angle of these elliptical profiles were measured for the galaxy-scale clumps with SExtractor taking advantage of the high spatial resolution of the HST images.
Because the brightest cluster galaxies (BCGs) lie at the centre of clusters, they are subjected to numerous merging processes and are not expected to follow the same light-mass scaling relation. They are modelled separately in order to not bias the final result. In a similar way, galaxies that are close to the multiple images or critical lines are sometimes manually optimized because of the significant impact they can have on the local magnification and geometry of the critical lines.
The present MUSE survey has allowed us to improve the reference models available in previous works. Table 3 summarizes their main characteristics. For A1689, the model used is an improvement made on the model of Limousin et al. (2007), previously presented in Bina et al. (2016). For A2390, the reference model is presented in Pello et al. (1991), Richard et al. (2010), and the recent improvements in Pello et al. (in prep.) For A2667, the original model was obtained by Covone et al. (2006) and was updated in Richard et al. (2010). For A2744, the gold model presented in Mahler et al. (2018) was used, including as novelty the presence of NorthGal and SouthGal, which are two background galaxies included in the mass model because they could have a local influence on the position and magnification of multiple images.

Selection of the final LAE sample
To obtain the final LAE sample used to build the LF, only one source per multiple-image system was retained. The ideal strategy would be to keep the image with the highest S/N, which often coincides with the image with highest magnification. However, it is more secure for the needs of the LF determination to keep the sources with the most reliable flux measurement and magnification determination. In practice, it means that we often chose the less distorted and most isolated image. The flux and extraction of all sources among multiple systems were manually reviewed to select the best one to be included in the final sample.  Notes. The values of RMS indicated are computed from the position of multiply imaged galaxies in the image plane, n const and n free correspond to the number of constraints passed to Lenstool and the number of free parameters to be optimized, respectively. The coordinates ∆α and ∆δ are in arcsec with respect to the following reference points: A1689: α = 197 • 52 23 , δ = −1 • 20 28 , A2390: α = 328 • 24 12 , δ = 17 • 41 45 , A2667: α = 357 • 54 51 , δ = −26 • 05 03 A2744: α = 3 • 35 11 , δ = −30 • 24 01 . The ellipticity e, is defined as (a 2 − b 2 )/(a 2 + b 2 ), where a and b are the semi-major and the semi-minor axes of the ellipse. The position angle, θ, provides the orientation of the semi-major axis of the ellipse measured counterclockwise with respect to the horizontal axis. Finally, r core , r cut , and σ 0 are the core radii, cut radii, and central velocity dispersion, respectively. All the sources for which the flux measurement failed or that were too close to the edge of the FoV were removed from the final sample. One extremely diffuse and low surface brightness source (Id : A2744, 5681) was also removed as it was impossible to properly determine its profile for the completeness estimation in Sect. 6.2.1. The final sample consists of 156 lensed LAEs: 16 in A1689, 5 in A2390, 7 in A2667, and 128 in A2744. Out of these 156 sources, four are removed at a later stage of the analysis for completeness reasons (see Sect. 6.2.2) leaving 152 to compute the LFs. The large difference between the clusters on the number of sources detected is expected for two reasons: -The A2744 cube is a 2 × 2 MUSE FoV mosaic and is deeper than the three other fields: on average four hours exposure time for each quadrant, whereas all the others have two hours or less of integration time (see Table 1). -The larger FoV allows us to reach further away from the critical lines of the cluster, therefore increasing the probed volume as we get close to the edges of the mosaic. This makes the effective volume of universe explored in the A2744 cube much larger (see end of Sect. 6.1.2) than in the three other fields combined. It is therefore not surprising to find most of the sources in this field. This volume dilution effect is most visible when looking at the projection of the MUSE FoVs in the source plane (see Fig. 2). Even though this difference is expected, it seems that we are also affected by an over-density of background sources at z = 4 as shown in Fig. 3. This overdensity is currently being investigated as a potential primordial group of galaxies (Mahler et al., in prep.). The complete source catalogue is provided in Table E.1 and the Lyman-alpha luminosity distribution corrected for magnification can be found on the lower panel of Fig. 3. The corrected luminosity L Ly α was computed from the detection flux F Ly α with where µ and D L are the magnification and luminosity distance of the source, respectively. In this section and in the rest of this work, a flux weighted magnification is used to better account for extended sources and for sources detected close to the critical lines of the clusters where the magnification gradient is very strong. This magnification is computed by sending a segmentation of each LAE in the source plane with Lenstool, measuring a magnification for each of its pixels and making a flux weighted average of it. A full probability density of magnification P(µ) is also computed for each LAE and used in combination with its uncertainties on F Ly α to obtain a realistic luminosity distribution when computing the LFs (see Sect. 6.3). Objects with the highest magnification are affected by the strongest uncertainties and tend to have very asymmetric P(µ) with a long tail towards high magnifications. Because of this effect, LAEs with log L < 40 should be considered with great caution. Figure 4 compares our final sample with the sample used in the MUSE HUDF LAE LF study (Drake et al. 2017, hereafter D17). The MUSE HUDF (Bacon et al. 2017), with a total of 137 hours of integration, is the deepest MUSE observation to date. It consists of a 3 × 3 MUSE FoV mosaic, each of the quadrants being a 10 hours exposure, with an additional pointing (udf-10) of 30 h, overlaid on the mosaic. The population selected in D17 is composed of 481 LAEs found in the mosaic and 123 in the udf-10, for a total of 604 LAEs. On the upper panel of the figure, the plot presents the luminosity of the different samples versus the redshift. Using lensing clusters, the redshift selection tends to be less affected by luminosity bias, especially for higher redshift. On the lower panel, the normalized distribution of the two populations is presented. The strength of the study presented in D17 resides in the large number of sources selected. However, a sharp drop is observed in the distribution at log L ∼ 41.5. Using the lensing clusters, with ∼25 h of exposure time and a much smaller lens-corrected volume of survey, a broader luminosity selection was achieved. As discussed in the following sections, despite a smaller number of LAEs compared to D17, the sample presented in this paper is more sensitive to the faint end of the LF by construction.

Computation of the luminosity function
Because of the combined use of lensing clusters and spectroscopic data cubes, it is extremely challenging to adopt a parametric approach to determine a selection function. By construction, the sample of LAEs used in this paper includes sources coming from very different detection conditions, from intrinsically bright emitters with moderate magnification to highly magnified galaxies that could not have been detected far from the critical lines. To properly take into account these differences when computing the LF, we adopted a non-parametric approach allowing us to treat the sources individually: i.e. the 1/V max method (Schmidt 1968;Felten 1976). We present in this section the four steps developed to compute the LFs: (i) The flux computation, performed for all the detected sources. This step was already described in Sect. 3.2 as the selection of the final sample relies partly on the results of the flux measurements. (ii) The volume computation for each of the sources included in the final sample, presented in Sect. 6.1. (iii) The completeness estimation using the real source profiles (both spatial and spectral), presented in Sect. 6.2. (iv) The computation of the points of the differential LF, using the results of the volume computation and the completeness estimations, presented in Sect. 6.3.

Volume computation in spectroscopic cubes in lensing clusters
The V max value is defined as the volume of the survey where an individual source could have been detected. The inverse value, 1/V max , is used to determine the contribution of one source to a numerical density of galaxies. Because this survey consists of several FoV, the V max value for a given source must be determined from all the fields that are part of the survey, including the fields in which the source is not actually present. The volumes were computed in the source plane to avoid multiple counting of parts of the survey that are multiply imaged. For that, we used Lenstool to get the projection of the MUSE fields in the source plane and then used these projections to compute the volume (see Fig. 2 for an example of source plane projection). In this analysis, the volume computation was performed independently from the completeness estimation, focussing on the spectral noise variations of the cubes only. The detectability of each LAEs needs to be evaluated on the entire survey to compute V max . This task is not straightforward, as the detectability depends on many different factors: -The flux of the source: The brighter the source, the higher the chances to be detected. For a given spatial profile, brighter sources have higher V max values. -The surface brightness and line profile of the source: For a given flux, a compact source would have a higher surface brightness value than an extended one, and therefore would be easier to detect. This aspect is especially important as most LAEs have an extended halo (see Wisotzki et al. 2016). -The local noise level: At first approximation, it depends on the exposure time. This point is especially important for mosaics in which noise levels are not the same on different parts of the mosaic as the noisier parts contribute less to the V max values. -The redshift of the source: The Lyman-alpha line profile of a source may be affected by the presence of strong sky lines in the close neighbourhood. The cubes themselves have strong variations of noise level caused by the presence of those sky emission lines (see e.g. Fig. 5). -The magnification induced by the cluster.: Where the magnification is too small, the faintest sources could not have been detected. -The seeing variation from one cube to another. This shows that to properly compute V max , each source has to be individually considered. The easiest method to evaluate the detectability of sources is to simply mask the brightest objects of the survey, assuming that no objects could be detected behind them. This can be achieved from a white light image, using a mask generated from a SExtractor segmentation map. The volume computation can then be done on the unmasked pixels and only where the magnification is high enough to allow the detection of the source. However, as shown in Appendix C, A3, page 8 of 30 this technique has some limitations to account for the 3D morphologies of real LAEs. For this reason, a method to determine precisely the detectability map (referred to as detection mask or simply masks hereafter) of individual sources has been developed. As the detection process in this work is based on 2D collapsed images, we adopted the same scheme to build the 2D detection masks, and from these, built the 3D masks in the source plane adapted to each LAE of the sample. Using these individual source plane 3D masks, and as previously mentioned, the volume integration was performed on the unmasked pixels only where the magnification is high enough. In the paragraphs below, we quickly summarize the method adopted to produce masks for 2D images and explain the reasons that lead to the complex method detailed in Sects. 6.1.1 and 6.1.2.
The basic idea of our method for producing masks for 2D images is to mimic the SExtractor source detection process. For each pixel in the detection image, we determine whether the source could have been detected, had it been centred on this pixel. For this pseudo-detection, we fetch the values of the brightest pixels of the source (hereafter Bp) and compare them pixel-to-pixel to the background root mean square maps (RMS maps) produced by SExtractor from the detection image. The pixels where this pseudo-detection is successful are left unmasked, and where it failed, the pixels are masked. Technical details of the method for 2D images can be found in Appendix A. The detection masks produced in this way are binary masks and show where the source could have been detected. We use the term "covering fraction" to refer to the fraction of a single FoV covered by a mask. A covering fraction of 1 means that the source could not be detected anywhere on the image, whereas a covering fraction of 0 means that the source could be detected on the entire image.
This method of producing the detection masks from 2D images is precise and simple to implement when the survey consists of 2D photometric images. However, when dealing with 3D spectroscopic cubes, its application becomes much more complicated owing to the strong variations of noise level with wavelength in the cubes. Because of these variations, the detectability of a single source through the cubes cannot be represented by a single mask, duplicated on the spectral axis to form a 3D mask. An example of the spectral variations of noise level in a MUSE cube is provided in Fig. 5. These spectral variations are very similar for the four cubes. "Noise level" is used to refer to the average level of noise on a single layer. It is determined from the RMS cubes, which are created by SExtractor from the detection cube (i.e. the Muselet cube of NB images). For a layer i of the RMS cube, the noise level corresponds to the spatial median of the RMS layer over a normalization factor as follows: In this equation .. x,y is the spatial median operator. The 2D median RMS map, RMS median , is obtained from a median along the wavelength axis for each spatial pixel of the RMS cube. The normalization is the spatial median value of the median RMS map. The main factor responsible for the high frequency spectral variations of noise level is the presence of sky lines affecting the variance of the cubes. To properly account for the noise variations, the detectability of each source has to be evaluated throughout the spectral direction of the cubes by creating a series of detection masks from individual layers. These masks are then projected into the source plane for the volume computation. This step is the severely limiting factor, as it would take an excessive amount of computation time. For a sample of 160 galaxies in four cubes, sampling different noise levels in cubes at only ten different wavelengths, we would need to do 6400 Lenstool projections. This represents more than 20 days of computation on a 60 CPU computer, and it is still not representative of the actual variations of noise level versus wavelength. To circumvent this difficulty, we developed a new approach that allows for a fine sampling of the noise level variations while drastically limiting the number of source plane reconstructions. A flow chart of the method described in the next sections is provided in Fig. 6.

Masking 3D cubes
The general idea of the method is to use a S/N proxy of individual sources instead of comparing their flux to the actual noise. In other words, the explicit computation of the detection mask for every source, wavelength layer, and cube is replaced by a set of pre-computed masks for every cube, covering a wide range of S/N values, in such a way that a given source can be assigned the mask corresponding to its S/N in a given layer. Two independent steps were performed before assembling the final 3D masks: First, the evolution of S/N values is computed through the spectral dimension of the cubes for each LAE. Second, for each cube, a series of 2D detection masks were created for an independent set of S/N values. This is referred to as the S/N curves hereafter. These two steps are detailed below. The final 3D detection masks were then assembled by successively picking the 2D mask that corresponds to the S/N value of the source at a given wavelength in a given cube. This process was done for all sources individually.
For the first step, the S/N value of a given source was defined as follows, from the bright pixels profile of the source and a RMS map, by comparing the maximum flux of the brightest pixels profile (max(Bp)) to the noise level of that RMS map.
For each layer of the RMS cube, we computed the S/N value the source would have had at that spectral position in the cube. We point out that this is not a proper S/N value (hence the use of the term "proxy") as the normalization used to define the noise levels in Eq. (2) depends on the cube. For a layer i of the RMS cube, the corresponding S /N i value is given by An example of a S/N curve defined this way is provided in Fig. 7. For a given source, this computation was done on every layer of every cube part of the survey. When computing the S/N of a given source in a cube different from the parent cube, the seeing difference (see Table 1) is accounted for by introducing convolution or deconvolution procedure to set the detection image of the LAE to the resolution of the cube considered. As a result for each LAE, three additional images are produced. The four images (original detection image plus the three simulated ones) are then used to measure the value of the brightest pixels in all four seeing conditions. For the deconvolution a python implementation of a Wiener filter part of the Scikit-image package (van der Walt et al. 2014) was used. The deconvolution algorithm itself is presented in Orieux et al. (2010) and for all these computations, the PSF of the seeing is assumed to be Gaussian.
For the second step, 2D masks are created from a set of S/N values that encompass all the possible values for our sample. To produce a single 2D mask, the two following inputs are needed: the list of bright pixels of the source Bp and the RMS maps produced from the detection image (in our case, the NB images produced by Muselet). To limit the number of masks produced, two  Fig. 6. Flow chart of the method used to produce the 3D masks and to compute V max . The key points are shown in red and the main path followed by the method is indicated in blue. All the steps related to the determination of the bright pixels are shown in grey. The steps related to the computation of the S/N of each source are indicated in green. The numbered labels in light blue refer to the bullet points in Appendix D that briefly sum up all the differnt steps of this figure.
simplifications were introduced, the main one being that all RMS maps of a same cube present roughly the same pattern down to a certain normalization factor. This is equivalent to saying that all individual layers of the RMS cube can be approximately modelled and reproduced by a properly rescaled version of the same median RMS map. The second simplification is the use of four generalized bright-pixel profiles (hereafter Bp g ). To be consistent with the seeing variations, one profile is computed for each cluster, taking the median of all the individual LAE profiles computed from the detection images simulated in each seeing condition (see Fig. A.1 for an example of generalized bright pixel profile, also including the effect of seeing). These profiles are normalized in such a way that max(Bp g ) = 1. For each value of the S/N set defined, a mask is created for each cluster from its median RMS map and the corresponding Bp g , meaning that the 2D detection masks are no longer associated with a specific source, but with a specific S/N value. Using the definition of S/N adopted in Eq.
(3), the four Bp g are rescaled to fit any S /N j value of the S/N set and to obtain profiles that are directly comparable to the median RMS maps: where c j is the scaling factor. According to Eq. (2), the noise level of the median RMS maps is just 1, and as mentioned above max(Bp g ) = 1. We can see that the scaling factor is simply c j = S /N j . Therefore the four sets of bright-pixels profiles S /N j × Bp g and the corresponding median RMS maps are used as input to produce the set of 2D detection masks.
After the completion of these two steps, the final 3D detection masks were assembled for every source individually. For this purpose, a subset of wavelength values (or equivalently, a subset of layer index) drawn from the wavelength axis of a MUSE cube was used to resample the S/N curves of individual sources. For each source and each entry of this wavelength subset, the procedure fetches the value in the S/N set that is the closest to the measured value, and returns the associated 2D detection mask, effectively assembling a 3D mask. An example of this 2D sampling is provided in Fig. 7. To each of the red points resampling the S/N curve, a pre-computed 2D detection mask is associated, and the higher the density of the wavelength sampling, the higher the precision on the final reconstructed 3D mask. The important point is that to increase the sampling density, we do not need to create more masks and therefore it is not necessary to increase the number of source plane reconstructions.

Volume integration
In the previous section we presented the construction of 3D masks in the image plane for all sources with a limited number of 2D masks. For the actual volume computation, the same was achieved in the source plane by computing the source plane projection of all the 2D masks, and combining these masks with the magnification maps. Thanks to the method developed in the previous subsection, the number of source plane reconstructions only depends on the length of the S/N set initially defined and the number of MUSE cubes used in the survey. It depends neither on the number of sources in the sample nor the accuracy of To each of these red points, a mask with the closest S/N value is associated. The short and long dashed black lines represent the S/N level for which a covering fraction of 1 (detected nowhere) and 0 (detected everywhere) are achieved, respectively. For all the points between these two lines, the associated masks have a covering fraction ranging from 1 to 0, meaning that the source is always detectable on some regions of the field.
the sampling of the S/N variations. For the projections, we used PyLenstool 3 , which allows for an automated use of Lenstool.
Reconstruction of the source plane was performed for different redshift values to sample the variation of both the shape of the projected area and the magnification. In practice, the variations are very small with redshift and we reduce the redshift sampling to z = 3.5, 4.5, 5.5, and 6.5. In a very similar way to what is described at the end of the previous section, 3D masks were assembled and combined with magnification maps, in the source plane. In addition to the closest S/N value, the procedure also looks for the closest redshift bin in such a way that, for a given point (λ k , S /N k ) of the resampled S/N curve, the redshift of the projection is the closest to z k = λ k λ Ly α − 1. The last important aspect to take into account when computing V max is to limit the survey to the regions where the magnification is such that the source could have been detected. The condition is given by where µ is the flux weighted magnification of the source, F d the detection flux, and δF d the uncertainty on the detection which reflects the local noise properties. This condition simply states that µ lim is the magnification that would allow for a S/N of 1 under which the detection of the source would be impossible. It is complex to find a S/N criterion to use that would be coherent with the way Muselet works on the detection images, since the images used for the flux computation are different and of variable spectral width compared to the Muslet NBs. Therefore, this criterion for the computation of µ lim is intentionally conservative to avoid overestimating the steepness of the faint end slope.
To be consistent with the difference in seeing values and in exposure time from cube to cube, µ lim is computed for each LAE and for each MUSE cube (i.e. four values for a given LAE). A source only detected because of very high magnification in a shallow and bad seeing cube (e.g. A1689) would need a much smaller magnification to be detected in a deeper and better seeing cube (e.g. A2744). For the exposure time difference, the ratio of the median RMS value of the entire cube is used, and for the seeing the ratio of the squared seeing value is used. In other words, the limiting magnification in A2744 for a source detected in A1689 is given by where .. x,y,λ is the median operator over the three axis of the RMS cubes and s is the seeing. The exact same formula can be applied to compute the limit magnification of any source in any cube. This simple approximation is sufficient for now as only the volume of the rare LAEs with very high magnification are dominated by the effects of the limiting magnification. The volume integration is performed from one layer of the source plane projected (and masked) cubes to the next, counting only pixels with µ > µ lim . For this integration, the following cosmological volume formula was used: where ω is the angular size of a pixel, D L is the luminosity distance, and E(z) is given by In practice, and for a given source, when using more than 300 points to resample the S/N curve along the spectral dimension, a stable value is reached for the volume (i.e. less than 5% of variation with respect to a sampling of 1000 points). A comparison is provided in Appendix C between the results obtained with this method and the equivalent findings when a simple mask based on SExtractor segmentation maps is adopted instead. The maximum co-volume explored between 2.9 < z < 6.7, accounting for magnification, is about 16 000 Mpc 3 , distributed as follows among the four clusters: ∼900 Mpc 3 for A1689, ∼800 Mpc 3 for A2390, ∼600 Mpc 3 for A2667, and ∼13 000 Mpc 3 for A2744.

Completeness determination using real source profiles
Completeness corrections account for the sources missed during the selection process. Applying the correction is crucial for the study of the LF. The procedure used in this article separates, on one hand, the contribution to incompleteness due to S/N effects across the detection area, and the contribution due to masking across the spectral dimension on the other hand (see V max in Sect. 6.1).
The 3D masking method presented in the previous section aims to map precisely the volume where a source could be detected. However, an additional completeness correction was needed to account for the fact that a source does not have a 100% chance of being detected on its own wavelength layer. In the continuity of the non-parametric approach developed for the volume computation, the completeness was determined for individual sources. To better account for the properties of sources, A3, page 11 of 30 namely their spatial and spectral profiles, simulations were performed using their real profiles instead of parameterized realizations. Because the detection of sources was done in the image plane, the simulations were also performed in the image plane on the actual masked detection layer of a given source (i.e. the layer of the NB image cube containing the peak of the Lyman-alpha emission of the source). The mask used on the detection layer was picked using the same method as described in Sect. 6.1.1, leaving only the cleanest part of the layer available for the simulations.

Estimating the source profile
To get an estimate of the real source profile, we used the Muselet NB image that captures the peak of the Lyman-alpha emission (called the max-NB image hereafter). Using a similar method to that presented in Sect. 3.2, the extraction of sources on the max-NB images were forced by progressively loosening the detection criterion. The vast majority of our sources were successfully detected on the first try using the original parameters used by Muselet for the initial detection of the sample: DETECT_THRES = 1.3 and MIN_AREA = 6.
To recover the estimated profile of a source, the pixels belonging to the source were extracted on the filtered image according to the segmentation map. The filtered image is the convolved and background-subtracted image that SExtractor uses for the detection. The use of filtered images allowed us to retrieve a background-subtracted and smooth profile for each LAE. Figure 8 presents examples of source profile recovery for three representative LAEs.
A flag was assigned to each extracted profile to reflect the quality of the extraction, based on a predefined set of parameters (detection threshold, minimum number of pixels, and matching radius) used for the successful extraction of the source. A source with flag 1 is extremely trustworthy, and was recovered with the original set of parameters used for initial automated detection of the sample. A source with flag 2 is still a robust extraction and a source with flag 3 is doubtful and is not used for the LF computation. Of the LAEs, 95% were properly recovered with a flag value of 1. The summary of flag values is shown in Table 5.
The three examples presented in Fig. 8 have a flag value of 1 and were recovered using DETECT_THRESH = 1.3, MIN_AREA = 6 and a matching radius of 0.8 . Objects with flag >1 are less than 5% of the total sample. For the few sources with an extraction flag above 1, several possible explanations are found, listed by order of importance as follows: -The image used to recover the profiles (30 ) is smaller than the entire max-NB image. As the SExtractor background estimation depends on the size of the input image, this may slightly affect the detection of some objects. This is most likely the predominant reason for a flag value of two. -There is a small difference in the coordinates between the recovered position and listed position. This may be due to a change in morphology with wavelength or bandwidth. By increasing the matching radius to recover the profile, we obtained a successful extraction but we also increased the value of the extraction flag. -The NB used does not actually correspond to the NB that leads the source to be detected. By picking the NB image that catches the maximum of the Lyman-alpha emission we do not necessarily pick the layer with the cleanest detection. For example the peak could fall in a very noisy layer of the cube, whereas the neighbouring layers would provide a much cleaner detection.  -The source is extremely faint and was actually detected with relaxed detection parameters or manually detected. We checked that we did not include LAEs that were expected to be at a certain position as part of multiple-image system. This is to say, we did not select the noisiest images in multiple-image systems.

Recovering mock sources
Once a realistic profile for all LAEs was obtained, source recovery simulations were conducted. For this step, the detection process was exactly the same as that initially used for the sample detection. However, since we limited the simulations to the max-NB (see Sect. 6.2.1) images and not the entire cubes, we did not need to use the full Muselet software. To gain computation time, we only used SExtractor on the max-NB images, using the same configuration files that Muselet uses, to reproduce the initial detection parameters. In this section, the set of parameters were also DETECT_THRESH = 1.3 and MIN_AREA = 6.
To create the mock images, we used the masked max-NB images. Each source profile was randomly injected many times on the corresponding detection max-NB image, avoiding overlapping. After running the detection process on the mocks, the recovered sources were matched to the injected sources based on their position. The completeness values were derived by comparing the number of successful matches to the number of injected sources. The process was repeated 40 times to derive the associated uncertainties.
The results of the completeness obtained for each source of the sample are shown in Fig. 9. The average completeness value over the entire sample is 0.74 and the median value is 0.90. The values are this high because we used masked NB images, effectively making source recovery simulations on the cleanest part of the detection layer only. As seen on this figure, there is no well-defined trend between completeness and detection flux. At a given flux, a compact source detected on a clean layer of the cube has a higher completeness than a diffuse source with the same flux detected on a layer affected by a sky line. Four LAEs with a flag value of 3 or with a completeness value less than 10% are not used for the computation of the LFs in Sect. 6.3.
A more popular approach to estimate the completeness would be to perform heavy Monte Carlo (MC) simulations for each of the cubes in the survey to get a parameterized completeness (see Drake et al. 2017 for an example). The classical approach consists in injecting sources with parameterized spatial and spectral morphologies and retrieving the completeness as a function of redshift and flux. This method is extremely time consuming, in particular for IFUs where the extraction process is lengthy and tedious. The main advantage of computing the completeness based on the real source profile is that it allows us to accurately account for the different shapes and surface brightnesses of individual sources. And because the simulations are done on the detection image of the source in the cubes, we are also more sensitive to the noise increase caused by sky lines. As seen in Fig. 10, except from the obvious flux-completeness correlation, it is difficult to identify correlations between completeness and redshift or sky lines. This tends to show that the profile of the sources is a dominant factor when it comes to estimating the completeness properly. The same conclusion was reached in D17 and in Herenz et al. (2019). A non-parametric approach of completeness is therefore better suited in the case of lensing clusters, where a proper parametric approach is almost impossible to implement because of the large number of parameters to take into account (e.g. spatial and spectral morphologies including distortion effects, lensing configuration, and cluster galaxies).

Determination of the luminosity function
To study the possible evolution of the LF with redshift, the 152 LAE population has been subdivided into several redshift bins: z1 : 2.9 < z < 4.0, z2 : 4.0 < z < 5.0, and z3 : 5.0 < z < 6.7. In addition to these three LFs, the global LF for the entire sample z all : 2.9 < z < 6.7 was also determined. For a given redshift and luminosity bin, the following expression to build the points of the differential LFs was used: where ∆ log L i corresponds to the width of the luminosity bin in logarithmic space, j is the index corresponding to the sources falling in the bin indexed by i, and C j stands for the completeness correction of the source j. To account for the uncertainties affecting each LAE properly, MC iterations are performed to build 10 000 catalogues from the original catalogue. For each LAE in the parent catalogue, a random magnification is drawn from its P(µ), and a random flux and completeness values are also drawn assuming a Gaussian distribution of width fixed by their respective uncertainties. A single value of the LF was obtained at each iteration following Eq. (9). The distribution of LF values obtained at the end of the process was used to derive the average in linear space and to compute asymmetric error bars. The MC iterations are well suited to account for LAEs with poorly constrained luminosities. This happens for sources close, or even on, the critical lines of the clusters. Drawing random values from their probability density and uncertainties for magnification and flux results in a luminosity distribution (see Eq. (1)), which allows these sources to have a diluted contribution across several luminosity bins.
For the estimation of the cosmic variance, we used the cosmic variance calculator presented in Trenti & Stiavelli (2007). Lacking other options, a single compact geometry made of the A3, page 13 of 30 A&A 628, A3 (2019) union of the effective areas of the four FoVs is assumed and used as input for the calculator. The blank field equivalent of our survey is an angular area of about 1.2 × 1.2 . Given that a MUSE FoV is a square of size 1 , the observed area of the present survey is roughly 7 × 7 square. Our survey is therefore roughly equivalent to a bit more than only one MUSE FoV in a blank field. The computation is done for all the bins as the value depends on the average volume explored in each bin as well as on the intrinsic number of sources. The uncertainty due to cosmic variance on the intrinsic counts of galaxies in a luminosity bin typically range from 15% to 20% for the global LF and from 15% to 30% for the LFs computed in redshift bins. For log(L) 41, the total error budget is dominated by the MC dispersion, which is mainly caused by objects with poorly constrained luminosity jumping from one bin to another during the MC process. The larger the bins the lesser this effect because a given source is less likely to jump outside of a larger bin. For 41 log(L) 42 the Poissonian uncertainty is slightly larger than the cosmic variance but does not completely dominate the error budget. Finally for 42 log(L), the Poissonian uncertainty is the dominant source of error due to the small volume and therefore the small number of bright sources in the survey.
The data points of the derived LFs and the corresponding error bars are listed in Table 6. These LF points provide solid constraints on the shape of the faint end of the LAE distribution. In the following sections, we elaborate on these results and discuss the evolution of the faint end slope as well as the implications for cosmic reionization.

Parametric fit of the luminosity function
The differential LFs are presented in Fig. 11 for the four redshift bins. Some points in the LF, shown as empty squares, are considered as unreliable and presented for comparison purpose only. Therefore, they are not used in the subsequent parametric fits. An LF value is considered unreliable when it is dominated by the contribution of a single source, with either a small V max or a low completeness value, due to luminosity and/or redshift sampling. These unreliable points are referred to as "incomplete" hereafter. The rest of the points are fitted with a straight line as a visual guide, the corresponding 68% confidence regions are represented as shaded areas. For z 3 , the exercise is limited owing to the large uncertainties and the lack of constraints on the bright end. The measured mean slope for the four LFs are as follows: α = −1.79 +0.1 −0.09 for z all , α = −1.63 +0.13 −0.12 for z 1 , α = −1.61 +0.08 −0.08 for z 2 and α = −1.76 +0.4 −0.4 for z 3 . These values are consistent with no evolution of the mean slope with redshift.
In addition, and because the integrated value of each LF is of great interest regarding the constraints they can provide on the sources of reionization, the different LFs were fitted with the standard Schechter function (Schechter 1976) using the formalism described in Dawson et al. (2007). The Schechter function is defined as where Φ * is a normalization parameter, L * a characteristic luminosity that defines the position of the transition from the power law to the exponential law at high luminosity, and α is the slope of the power law at low luminosity. In logarithmic scale the Schechter function is written as This function represents the numerical density per logarithmic luminosity interval. The fits were done using the Python package Lmfit (Newville et al. 2014), which is specifically dedicated to nonlinear optimization and provides robust estimations for confidence intervals. We define an objective function, accounting for the strong asymmetry in the error bars, whose results are then minimized in the least-squares sense, using the default Levenberg-Marquardt method provided by the package. The results of this first minimization are then passed to a MCMC process 4 that uses the same objective function. The uncertainty on the three parameters of the Schechter function (α, L * , Φ * ) are recovered from the resulting individual posterior distributions. The minimization in the least-square sense is an easy way to fit our data but is not guaranteed to give the most probable parameterization for the LFs. A more robust method would be the maximum-likelihood method. However, because of the nonparametric approach used in this work to build the points of the LF, taking into account the specific complexity of the lensing regime, the implementation of a maximum-likelihood approach such as those developed in D17 or in Herenz et al. (2019) could not be envisaged.
Because of the use of lensing clusters, the volume of Universe explored is smaller than in blank field surveys. The direct consequence is that we are not very efficient in probing the transition area around L * and the high luminosity regime of the LF. Instead, the lensing regime is more efficient in selecting faint and low luminosity galaxies and is therefore much more sensitive to the slope parameter. To properly determine the three best parameters, additional data are needed to constrain the bright end of the LFs. To this aim, previous LFs from the literature are used and combined together into a single average LF with the same luminosity bin size as the LFs derived in this work. This last point is important to ensure that the fits are not dominated by the literature data points that are more numerous with smaller bin sizes and uncertainties. In this way we determine the three Schechter parameters while properly sampling the covariance between them.
The choice of the precise data sets used for the Schechter fits is expected to have a significant impact on the results, including possible systematic effects. To estimate the extent of this effect and its contribution to uncertainties, different series of data sets were used to fit the LF, among those available in a given redshift interval (see Fig. 13). The best-fit parameters recovered are found to be always consistent within their own error bars.
In addition, the error bars do not account for the error introduced by the binning of the data. To further test the robustness of the slope measurement and to recover more realistic error bars, different bins were tested for the construction of the LF. The exact same fit process was applied to the resulting LFs. The confidence regions derived from these tests are shown in Fig. 12 for z 1 and z 3 . The bins used hereafter to build the LFs are identified in this figure as black lines. We estimate that these bins are amongst the most reliable possibilities, and in the following they are referred to as the "optimal" bins. They were determined in such a way that each bin is properly sampled in both redshift and luminosity, and has a reasonable level of completeness. Figure 12 shows that α is very stable for z 1 and that all the posterior distributions are very similar. Because we are able to probe very low luminosity regimes far below L * , the effect of binning on the measured slope is negligible for z all because of the increased statistics. As redshift increases as a consequence of lower statistics and higher uncertainties, the effects of binning on the measured slope increases. For z 2 the LF is affected by a small overdensity of LAEs at z ∼ 4 resulting in a higher dispersion on the faint end slope value when testing different binnings. It was ensured that the optimal binning allowed this fit to be consistent with the fit made for z all : in both cases the points at 41.5 log L 42, affected by the same sources at z ∼ 4, are treated as a small overdensity with respect to the Schechter distribution. Finally, for z 3 , the lack of statistics seriously limits the possibilities of binnings to test. The only viable options are the two presented on the right panel of Fig. 12: in both cases the quality of the fit is poor compared to the other redshift bins, but the measured slopes are consistent within their own error bars. Not used for fit 2.9 < z < 6.7 ( line fit: α = −1.79± 0.1 0.09 ) 2.9 < z < 4.0 ( line fit: α = −1.63± 0.13 0.12 ) 4.0 < z < 5.0 ( line fit: α = −1.61± 0.08 0.08 ) 5.0 < z < 6.7 ( line fit: α = −1.76± 0.4 0.4 ) Fig. 11. Luminosity function points computed for the four redshift bins. Each LF was fitted with a straight dotted line and the shaded areas are the 68% confidence regions derived from these fits. For the clarity of the plot, the confidence area derived for z all is not shown and a slight luminosity offset is applied to the LF points for z 1 and z 3 .
The LF points from the literature used to constrain the bright end are taken from Blanc et al. (2011) andSobral et al. (2018) for z all and z 1 , Dawson et al. (2007), Zheng et al. (2013), andSobral et al. (2018) for z 2 , and finally Ouchi et al. (2010), Santos et al. (2016), Konno et al. (2018), and Sobral et al. (2018) for z 3 . The goal is to extend our own data towards the highest luminosities using available high-quality data with enough overlap to check the consistency with the present data set. The best fits and the literature data sets used for the fits are also shown in Fig. 13 as full lines and lightly coloured diamonds, respectively. The dark red coloured regions indicate the 68% and 95% confidence areas for the Schechter fit. The best Schechter parameters are listed in Table 7. In addition, this table contains the results obtained when the exact same method of LF computation is applied to the sources of A2744 as an independent data set. This is done to assess the robustness of the method and to see whether or not the addition of low volume and high magnification cubes add significant constraints on the faint end slopes. All four fits made using the complete sample are summed up in Fig. 14, which shows the evolution of the confidence regions for α, Φ * , and L * with redshift. Table 7 shows that the results are very similar for z1 and z3 when considering A2744 only or the full sample. For z all and z 2 the recovered slopes exhibit a small difference at the 2σ level. This difference is caused by one single source with 40.5 log L 41, which has a high contribution to the density count. When adding more cubes and sources, the contribution of this LAE is averaged down because of the larger volume and the contribution of other LAEs. This argues in favour of a systematic underestimation of the cosmic variance in this work. Using the results of cosmological simulations to estimate a proper cosmic variance is out of the scope of this paper. For the higher redshift bin, even though the same slope is measured when using only the LAEs of A2744, the analysis can only be pushed down to log L = 41 (instead of log L = 40.5 for the other redshift bins or when using the full sample). This shows the benefit of increasing the number of lensing fields to avoid a sudden drop in completeness at high redshift. The effect of increasing the number of lensing fields will be addressed in a future article in preparation. In the following, only the results obtained with the full sample are discussed The values measured for L * are in good agreement with the literature (e.g. log(L * ) = 43.04±0.14 in Dawson et al. (2007) for z 4.5, log(L * ) = 43.25 +0.09 −0.06 in Santos et al. (2016) for z 5.7 and a fixed value of α = −2.0, and log(L * ) = 43.3 +0.5 −0.9 in Hu et al. (2010) for z 5.7 and a fixed value of α = −2.0) and these values tend to increase with redshift. This is not a surprise as this parameter is most sensitive to the data points from the literature used to fit the Schechter functions. Given the large degeneracy and therefore large uncertainty affecting the normalization parameter φ * , a direct comparison and discussion with previous studies is difficult and not so relevant. Regarding the α parameter, the Schechter analysis reveals a steepening of the faint end slope with increasing redshift, which in itself means an increase in the observed number of low luminosity LAEs with respect to the bright population with redshift. However, this is a ∼1σ trend that can only be seen in the light of the Schechter analysis, with a solid anchorage of the bright end, and cannot be seen using only the points derived in this work (see e.g. Fig. 11).
Taking advantage of the unprecedented level of constraints on the low luminosity regime, the present analysis has confirmed a steep faint end slope varying from α = −1.58 +0.11 −0.11 at 2.9 < z < 4 to α = −1.87 +0.12 −0.12 at 5 < z < 6.7. The result for the lower redshift bin is not consistent with α = −2.03 +1.42 −0.07 measured using the maximum-likelihood technique in D17. At higher redshift, the slopes measured in D17 are upper limits, which are consistent with all the values in Table 7. The points in purple in Fig. 13 are the points derived with the V max from this same study. It can be seen that there is a systematic difference, increasing at lower luminosity for z all , z 1 and z 2 . This difference, taken at face value, could be evidence for a systematic underestimation of the cosmic variance both in this work and in D17. This aspect clearly requires further investigation in the future. Faint end slope values of α = −2.03 +0.4 −0.3 for z = 5.7 and α = −2.6 +0.6 −0.4 for z ∼ 5.7 (α = −2.5 +0.5 −0.5 for z ∼ 6.6) were found in Santos et al. (2016) and Konno et al. (2018), respectively. These values are reasonably consistent with our measurement made for z 3 . In this case again, the comparison with the literature is quite limited as the faint end slope is often fixed (see e.g. Dawson et al. 2007;Ouchi et al. 2010) or the luminosity range probed is not adequate leading to poor constraints on α.
From Fig. 13, we see that the Schechter function provides a relatively good fit for z all , z 1 , and z 2 . The over-density in number count at z ∼ 4 for 41.5 log L 42 is indeed seen as an over-density with respect to the Schechter distribution. For z 3 however, the fit is not as good with one point well above the 1σ confidence area. The final goal of this work is not the measurement of the Schechter slope in itself, but to provide a solid constraint on the shape of the faint end of the LF. Furthermore it is not certain that such a low luminosity population is expected to follow a Schechter distribution. Some studies have already explored the possibility of a turnover in the LF of UV selected galaxies (e.g. Bouwens et al. 2017;Atek et al. 2018), and the same possibility is not to be excluded for the LAE population. For the specific needs of this work, it remains convenient to adopt a parametric form as it makes the computation of proper integrations with correct error transfer easier (see Sect. 8) and facilitates the comparison with previous and future works. When talking about integrated LFs, any reasonable deviations from the Schechter form is of little consequence as long as the fit is representative of the data. In other words, as long as no large extrapolation towards low luminosity is made, our Schechter fits provide a good estimation of the integrated values. Areas of 68% confidence derived on the Schechter parameters when testing different binnings. Left panel: results for 2.9 < z < 4.0 and the right panel those for 5.0 < z < 6.7. The legends on the plots indicate, from left to right, log(L) min , log(L) max and the number of bins considered for the fit between these two limits. The black lines show the results obtained from the optimal bins adopted in this work.

Discussion and contribution of LAEs to reionization
In this section, before going to the integration of the LFs and the constraints and implications for reionization, we discuss the uncertainties introduced by the use of lensing. As part of the HFF programme, several good quality mass models were produced and made publicly available by different teams, using different methodologies. The uncertainties introduced by the use of lensing fields when measuring the faint end of the UV LF are discussed in detail in Bouwens et al. (2017) and Atek et al. (2018) through simulations. A more general discussion on the reason why mass models of the same lensing cluster may differ from one another can be found in Priewe et al. (2017). And finally, a thorough comparison of the mass reconstruction produced by different teams with different methods from simulated lensing clusters and HST images is done in Meneghetti et al. (2017). The uncertainties are of two types: -The large uncertainties for high magnification values. This aspect is well treated in this work through the use of P(µ), which allows any source to have a diluted and very asymmetric contribution to the LF over a large luminosity range. This aspect was already addressed in Sect. 5. -The possible systematic variation from one mass model to another. This aspect is more complex as it has an impact on both the individual magnification of sources and on the total volume of the survey. Figure 15 illustrates the problem of variation of individual magnification from one mass model to another, using the V4 models produced by the GLAFIC team (Kawamata et al. 2016;Kawamata 2018), Sharon & Johnson (Johnson et al. 2014), and Keeton that are publicly available on the HFF website 5 . Since we are restricted to the HFF, this comparison can only be done for the LAEs of A2744. The figure shows the Lyman-alpha luminosity histograms when using alternatively the individual magnifi-cation provided by these three additional models. The bin size is ∆ log L = 0.5, which is the bin size used in this work for the LFs at z 1 ,z 2 and z 3 . For log L > 40.5 the highest dispersion is of the order of 15%. This shows that even though there is a dispersion when looking at the magnification predicted by the four models, the underlying luminosity population remains roughly the same. Regarding the needs of the LF, this is the most important point. Figure 10 of Atek et al. (2018) shows an example of the variations of volume probed with rest-frame UV magnitude using different mass models for the lensing cluster MACS1149. This evolution is very similar for the models derived by the Sharon and Keeton teams and, in the worst case scenario, implies a factor of ∼2 of difference among the models compared in this figure. These important variations are largely caused by the lack of constraints on the mass distribution outside of the multiple image area: a small difference in the outer slope of the mass density affects the overall mass of the cluster and therefore the total volume probed. However, unlike other lensing fields from the HFF programme, A2744 has an unprecedented number of good lensing constraints at various redshifts thanks to the deep MUSE observations. These constraints were shared between the teams and are included in all the V4 models used for comparison in this work. These four resulting mass models are robust and coherent, at the state of the art of what can be achieved with the current facilities. It has also been shown by Meneghetti et al. (2017) based on simulated cluster mass distributions, that the methodology employed by the CATS (the CATS model for A2744 is the model presented in Mahler et al. 2018) and GLAFIC teams are among the best to recover the intrinsic mass distribution of galaxy clusters. To test the possibility of a systematic error on the survey volume, the surface of the source plane reconstruction of the MUSE FoV is compared at z = 4.5 using the same four models as in Fig. 15. The surfaces are (1.23 ) 2 , (1.08 ) 2 ,(1.03 ) 2 , and (0.94 ) 2 using the mass models of Mahler, GLAFIC, Keeton, and Sharon, respectively. The strongest difference is observed between the models of Mahler and Sharon and corresponds to a relatively small difference of only 25%. Luminosity function for 2.9 < z < 6.7 Blanc 2011(1.9 < z < 3.8) Drake 2017(2.9 < z < 6.7) Sobral 2018(2.5 < z < 6) Schechter fit ( Given the complex nature of the MUSE data combined with the lensing cluster analysis, precisely assessing the effect of a possible total volume bias is nontrivial and out of the scope of this paper. From this discussion it seems clear that the use of lensing fields introduces an additional uncertainty on the faint end slope. However the luminosity limit under which this effect becomes dominant remains unknown as all the simulations (Bouwens et al. 2017;Atek et al. 2018) were only done for the UV LF for which the data structure is much simpler.
In order to estimate the contribution of the LAE population to the cosmic reionization, its SFRD was computed. From the best parameters derived in the previous section, the integrated luminosity density ρ Ly α was estimated. The SFRD produced by the LAE population can be estimated using the following prescription for the (Kennicutt 1998) assuming the case B for the recombination (Osterbrock & Ferland 2006): (12) This equation assumes an escape fraction of the Lymanalpha photons ( f Ly α ) of 1 and is therefore a lower limit for the SFRD. Uncertainties on this integration were estimated with MC iterations, by perturbing the best-fit parameters within their rescaled error bars, neglecting the correlations between the parameters. The values obtained for the SFRD Ly α and ρ Ly α are presented in Table 7 for a lower limit of integration of log(L) = 40.5, which corresponds to the lowest luminosity points used to fit the LFs (i.e. no extrapolation towards lower luminosities). The equation log(L) = 44 is used as upper limit for all integrations. The upper limit has virtually no impact on the final result because the LF drops so steeply at higher luminosity.
We show in Fig. 16 the results obtained using different lower limits of integration and how they compare to previous studies of both LBG and LAE LFs. The yellow area corresponds to the 1σ and 2σ SFRD needed to reionize the universe fully, which is estimated from the cosmic ionizing emissivity derived in Bouwens et al. (2015a). The cosmic emissivity was derived using a clumping factor of 3, the conversion to UV luminosity density was done assuming log(ξ ion f escp ) = 24.50, where f escp is the escape fraction of UV photons and ξ ion is the Lyman-continuum photon production efficiency. Finally the conversion to SFRD was done with the following relation: SFRD [M yr −1 ] = ρ UV /(8.0 × 10 27 ) (see Kennicutt 1998;Madau et al. 1998). Because all the slopes are over α = −2 (for α < −2 the integral of the Schechter parameterization diverges), the integrated values increase relatively slowly when decreasing the lower luminosity limit. On the same plot, the SFRD computed from the integration of the LFs derived in Bouwens et al. (2015b) α 2.9 < z < 6.9 2.9 < z < 4 4 < z < 5 5 < z < 6.9 . Comparative Lyman-alpha luminosity histograms obtained using the magnification resulting from different mass models. The grey area represents the completeness limit of this work. limit) and M UV = −13, which is thought to be the limit of galaxy formation (e.g. Rees & Ostriker 1977;Mac Low & Ferrara 1999;Dijkstra et al. 2004). From this plot, and with f Ly α = 1, we see that the observed LAE population only is not enough to reionize the universe fully at z ∼ 6, even with a large extrapolation of 2 dex down to log L = 38.5. However, a straightforward comparison is dangerous: an escape fraction f Ly α 0.5 would be roughly enough to match the cosmic ionizing emissivity needed for reionization at z ∼ 6. Moreover, in this comparison, we implicitly assumed that the LAE population has the same properties (log( f escp ξ ion ) = 24.5) as the LBG population in Bouwens et al. (2015b). A recent study on the typical values of ξ ion and its scatter for typical starforming galaxies at z ∼ 2 by Shivaei et al. (2018) has shown that ξ ion is highly uncertain as a consequence of galaxy-to galaxy variations on the stellar population and UV dust attenuation, while most current estimates at high-z rely on (too) simple pre-scriptions from stellar population models. The SFRD obtained from LAEs when no evolution in f Ly α is introduced remains roughly constant as a function of redshift when no extrapolation is introduced and slightly increases with redshift when using L inf = 38.5. Figure 16 shows in green/blue, the SFRD Ly α values derived in previous studies of the LAE LF, namely Ouchi et al. (2008;hereafter O08), Cassata et al. (2011;hereafter C11), and D17. In C11, a basic correction for IGM absorption was performed assuming f Ly α varying from 15% at z = 3 to 50% at z = 6 and using a simple radiative transfer prescription from Fan et al. (2006). This correction can easily explain the clear trend of increase of SFRD with redshift and the discrepancy with our points at higher redshift. At lower redshifts, the IGM correction is lower and the points are in a relatively good agreement. The points in O08 are the result of a full integration of the LFs with a slope fixed at α = −1.5 and are in reasonable agreement for all redshift domains. The two higher redshift points derived in D17 are inconsistent with our measurements. This is not a surprise as the slopes derived in D17 are systematically steeper and inconsistent with this work.
The use of an IFU (MUSE) in D17, in Herenz et al. (2019;hereafter H19), and this survey ensures that we better recover the total flux, even though we may still miss the faintest part of the extended Lyman-alpha haloes (see Wisotzki et al. 2016). This is not the case for NB (e.g. O08) or slit-spectroscopy (e.g. Cassata et al. 2011) surveys in which a systematic loss of flux is possible for spatially extended sources or broad emission lines because of the limited aperture of the slits or the limited spectral width of NB filters. It is noted in H19 that the 3.2 < z < 4.55 LF estimates in C11 tend to be lower than most literature estimates (including those in H19). One possible explanation would be a systematic loss of flux, which results in a systematic shift of the derived LF towards lower luminosities. Interestingly, when assuming point-like sources to compute the selection function, H19 manages to recover very well the results of C11 for this redshift domain. It is also interesting to see that as luminosity decreases, the LF estimates from C11 become more and more consistent with the points and Schechter parameterization derived in this work. For z 3 , the C11 LF is even fully consistent with the Schechter parameterization across the entire luminosity domain (see Fig. 13). The following line of thought could explain the concordance of this work with the C11 estimates at lower luminosity and higher redshift: At lower luminosity and higher redshift, a higher fraction of LAEs detected are point-like sources, making the C11 LFs more consistent with our values; and at higher luminosity and lower redshift, more extended LAEs are detected and a more complex correction is needed to get a realistic LF estimate.
The second advantage of using an IFU is linked to the selection of the LAE population. The O08 authors used a NB photometric selection of sources with spectroscopic follow-up to confirm the LAE candidates. This results in an extremely narrow redshift window which is likely to lead to lower completeness of the sample due to the two-step selection process. The studies by D17 and H19, adopt the same approach as this work: a blind spectroscopic selection of sources. In addition, as shown in Fig. 4 and stated in Sect. 7 when discussing the differences in slope between A2744 alone and the full sample, the use of highly magnified observations allows for a more complete source selection at increasing redshift. The sample used in the present work could arguably have a higher completeness level than other previous studies.
To summarize the above discussion, the observational strategy adopted in this study by combining the use of MUSE and lensing clusters has allowed us to A3, page 19 of 30 A&A 628, A3 (2019) Notes. The last two columns list the Lyman-alpha flux density and the SFRD as a function of redshift, obtained from the integration of the LFs derived in Sect. 7. The errors on the parameters of the Schechter function correspond to 68% confidence interval. The values ρ Ly α are computed using a lower integration limit log(L) = 40.5, which is considered to be the completeness limit of this work. For each redshift bin, the Schechter parameters are measured from the LFs computed from the entire sample and from the LAEs of A2744 only. -Reach fainter luminosities, providing better constraints on the faint end slope of the LF, while still taking advantage of the previous studies to constrain the bright end; -Recover a greater fraction of flux for all LAEs; -Cover a large window in redshift and flux; -Reach a higher level of completeness, especially at high redshift. A steepening of the faint end slope is observed with redshift, which follows what is usually expected. This trend can be explained by a higher proportion of low luminosity LAEs observed at higher redshift owing to higher dust content at lower redshift. On the other hand, the density of neutral hydrogen is expected to increase across the 5 < z < 6.7 interval, reducing the escape fraction of Lyman-alpha photons, a trend affecting LAEs in a different way depending on large-scale structure. While an increase of SFRD with redshift is observed, the evolution of the observed SFRD Ly α is also affected by f Ly α . From the point of view of the literature, the expected evolution of f Ly α is an increase with redshift up to z ∼ 6−7 and then a sudden drop at higher redshift (see e.g. Clément et al. 2012;Pentericci et al. 2014). For z < 6, the increase of f Ly α is generally explained by the reduced amount of dust at higher redshift. And for z ∼ 6−7 and above, we start to probe the reionization era and owing to the increasing amount of neutral hydrogen and the resonant nature of the Ly α , the escape fraction is expected to drop at some point. It has been suggested in Trainor et al. (2015) and Matthee et al. (2016) that the escape fraction would decrease with an increasing SFRD. This would only increase the significance of the trend observed, as it means the points with the higher SFRD would have a larger correction.
Furthermore the derived LFs and the corresponding SFRD values could be affected by bubbles of ionized hydrogen, especially in the last redshift bin. In our current understanding of the phenomenon, reionization is not a homogeneous process (Becker et al. 2015;Bosman et al. 2018). It could be that the expanding areas of ionized hydrogen develop faster in the vicinity of large structures with a high ionising flux, leaving other areas of the universe practically untouched. There is increasing observational evidence of this effect (see e.g. Stark et al. 2017).
It was shown in Matthee et al. (2015), using a simple toy model, that an increased amount of neutral hydrgen in the IGM could produce a flattening of the faint end shape of the LF. This same study also concluded that the clustering of LAEs had a large impact on the individual escape fraction, which makes it difficult to estimate a realistic correction, as the escape fraction should be estimated on a source to source basis.
As previously discussed, it is neither certain nor expected that the LAE population alone is enough to reionize the universe at z ∼ 6. However, the LBG and the LAE population have roughly the same level of contribution to the total SFRD at face value. Depending on the intersection between the two populations, the observed LAEs and LBGs together could produce enough ionizing flux to maintain the ionized state of the universe at z ∼ 6.
This question of the intersection is crucial in the study of the sources of reionization. Several authors have addressed the prevalence of LAE among LBG galaxies, and have shown that the fraction of LAE increases for low luminosity UV galaxies till z ∼ 6, whereas the LAE fraction strongly decreases towards z ∼ 7 (see e.g. Stark et al. 2010, Pentericci et al. 2011. The important point however is to precisely determine the contribution of the different populations of star-forming galaxies within the same volume, which is a problem that requires the use of 3D/IFU spectroscopy. As a preliminary result, we estimate that ∼20% of the sample presented in this study have no detected counterpart on the deep images of the HFFs. A similar analysis is being conducted on the deepest observations of MUSE on the Hubble Ultra Deep Field (Maseda et al. 2018).

Conclusions
The goal of this study was to set constraints on the sources of cosmic reionization by studying the LAE LF. Taking advantage of the great capabilities of the MUSE instrument and using lensing clusters as a tool to reach lower luminosities, we blindly selected behind four lensing clusters a population of 156 spectroscopically identified LAEs that have 2.9 < z < 6.7 and magnification corrected luminosities 39 log L 43.
Given the complexity in combining the spectroscopic data cubes of MUSE with gravitational lensing, and taking into account that each source needs an appropriate treatment to properly account for its magnification and representativity, the computation of the LF needed a careful implementation, including some original developments. For these needs, a specific procedure was developed, including the following new methods: First, we created a precise V max computation for the sources found behind lensing clusters is based on the creation of 3D masks. This method allows us to precisely map the detectability of a given source in MUSE spectroscopic cubes. These masks are then used to compute the cosmological volume in the source plane. This method could be easily adapted to be used in blank field surveys. Second, we developed a completeness determination based on simulations using the real profile of the sources. Instead of performing a heavy parametric approach based on MC source injection and recovery simulations, which is not ideally suited for lensed galaxies, this method uses the real profile of sources to estimate their individual completeness. The method is faster, more flexible, and accounts in a better way for the specificities of individual sources, both in the spatial and spectral dimensions.
After applying this procedure to the LAE population, the Lyman-alpha LF has been built for different redshift bins using 152 of the 156 detected LAEs. Four LAEs were removed because their contribution was not trustworthy. Because of the observational strategy, this study provides the most reliable con-straints on the shape of the faint end of the LFs to date and therefore, a more precise measurement of the integrated SFRD associated with the LAE population. The results and conclusions can be summarized as follows: -The LAE population found behind the four lensing clusters was split in four redshift bins: 2.9 < z < 6.7, 2.9 < z < 4.0, 4.0 < z < 5.9, and 5.0 < z < 6.7. Because of the lensing effect, the volume of universe probed is greatly reduced in comparison to blank field studies. The estimated average volume of universe probed in the four redshift bins are ∼15 000 Mpc 3 , ∼5000 Mpc 3 , ∼4000 Mpc 3 , and ∼5000 Mpc 3 , respectively. -The LAE LF was computed for the four redshift bins. By construction of the sample, the derived LFs efficiently probe the low luminosity regime and the data from this survey alone provide solid constraints on the shape of the faint end of the observed LAE LFs. No significant evolution in the shape of the LF with redshift is found using these points only.
These results have to be taken with caution given the complex nature of the lensing analysis, on the one hand, and the small effective volume probed by the current sample on the other hand. Our results argue towards a possible systematic underestimation of cosmic variance in the present and other similar works. -A Schechter fit of the LAE LF was performed by combining the LAE LF computed in this analysis with data from previous studies to constrain the bright end. As a result of this study, a steep slope was measured for the faint end, varying with redshift between α = −1.58 +0.11 −0.11 at 2.9 < z < 4 and α = −1.87 +0.12 −0.12 at 5 < z < 6.7 -The SFRD Ly α values were obtained as a function of redshift by the integration of the corresponding Lyman-alpha LF and compared to the levels needed to ionize the universe as determined in Bouwens et al. (2015a). No assumptions were made regarding the escape fraction of the Lyman-alpha photons and the SFRD Ly α derived in this work correspond to the observed values. Because of the well-constrained LFs and a better recovery of the total flux, we estimate that the present results are more reliable than previous studies. Even though the LAE population undoubtedly contributes to a significant fraction of the total SFRD, it remains unclear whether this population alone is enough to ionize the universe at z ∼ 6. The results depend on the actual escape fraction of Lymanalpha photons. -The LAEs and the LBGs have a similar level of contribution at z ∼ 6 to the total SFRD level of the universe. Depending on the intersection between the two populations, the union of both the LAE and LBG populations may be enough to reionize the universe at z ∼ 6. Through this work, we have shown that the capabilities of the MUSE instrument make it an ideal tool to determine the LAE LF. Being an IFU, MUSE allows for a blind survey of LAEs, homogeneous in redshift, with a better recovery of the total flux as compared to classical slit facilities. The selection function is also better understood as compared to NB imaging.
About 20% of the present LAE sample have no identified photometric counterpart, even on the deepest surveys to date, i.e. HFF. This is an important point to keep in mind as this is a first element of response regarding the intersection between the LAE and LBG populations. Further investigation is needed to better quantify this intersection. Also the extension of the method presented in this paper to other lensing fields should make it possible to improve the determination of the Lyman-alpha LF and to make the constraints on the sources of the reionization more robust. In this section we compare the results obtained when computing the V max using the method adopted in this study to the classical integration based on a unique mask. We present in Fig. C.2 the comparison between the V max values obtained from these two different methods. The first (on the y-axis) is used in this project, based on 3D masks, following the noise variation through the MUSE cubes. The second (on the x-axis) uses a mask generated from a unique SExtractor segmentation map, which is replicated across the spectral dimension. An example of such a mask is provided in Fig. C.1. It is mostly efficient to mask the brightest sources and haloes on the image. Comparing this mask to the masks presented in Fig. B.1, we see that they are completely different. Whereas the 3D masks adopted in this paper are able to follow the differences in exposure time while encoding the instrumental noise patterns, the simple masks provide a unique pattern for all sources, irrespective of their S/N values. This results in the following effects as seen in Fig. C.2: First, a unique mask translates into a unique V max value for a large number of sources, as only the lensing effects play a role in the determination of V max . This corresponds to the vertical pattern on the right-hand side of Fig. C.2. Second, using the adaptive mask method, systematically lower V max values are obtained. And more interestingly, for sources in A1689, A2390, and A2667, we see that the differences are less pronounced (or even not significant for some sources) than for the sources in the A2744 mosaic.

Appendix C: Comparison of the different volume computation methods
To explain the first point, it is important to understand that when using a single mask, the only factor that could influence the V max value is the limit magnification µ lim (see Sect. 6.1.2). A source with a higher µ lim value would end up with a smaller V max as the area of the survey with large magnification is smaller. For the bright sources of the sample, it could be that the computed µ lim would be under the lower magnification reached on the survey. For those sources, the volume was integrated on the entire survey area. Using the 3D mask method, µ lim still 3 3 Fig. C.2. Comparison of the results of V max computation using the average mask obtained from a unique SExtractor segmentation map (xaxis) and the 3D masks adopted in this paper, following the evolution of noise through the MUSE cubes (y-axis). See text for details. plays a role but it is no longer the only factor affecting the final volume value and the local noise level is properly taken into account.
To explain the second point and to illustrate the systematical difference between the two methods, we can consider a faint source detected in one of the deepest parts of the A2744 mosaic. When comparing the source to the noise level in the rest of the mosaic, the quadrants with the lower integration time end up being completely masked. As for the three other cubes, their contribution is zero as they have even less integration time. In that case, only a small portion of the mosaic has a significant contribution to the V max value and it results in a low V max . However, all sources detected in A1689, A2390, or A2667 could have been detected anywhere in the A2744 mosaic. Because the A2744 FoV accounts for 80% of the total volume, only µ lim affects the final contribution of A2744, and the contribution of the smaller fields is not that significant. This explains the correlation between the two methods for the sources detected in the three shallower fields.

Appendix D: Detailed procedure for volume computation in lensed MUSE cubes
In this appendix, we provide an overview and a quick description of all the steps needed to compute V max . The details are explained in the main text. The goal of this section is to provide a synthetic view to explain the method. The numbers on the notes below refer to the steps listed in Fig. 6 as follows: -(0) The NB cubes consist of all the NB images produced by Muselet. All LAEs were detected on those NB images. Details on those NB images are provided in Sect. 3.1 -(1.1) Background RMS maps produced separately by SExtractor and assembled into a RMS cube. The RMS cube are cubes of noise that are used to track the spectral evolution of noise levels in cubes. -(1.2) Median of the RMS cubes along the spectral axis. One median RMS image is obtained per cube. They are used to mock the 2D SExtractor detection process. -(1.4) Using a generalized bright-pixels profile (see Fig. A.1) and the median RMS maps, a 2D detection mask is built for each value of the S/N set and for each cube; the method is described in Appendix A. -(1.5) Redshift values used to sample the evolution of the source plan projections and magnification maps. -(1.6) Source plan projection of the set of 2D masks combined with magnification maps for different redshift. -(1.7) For each LAE, the final 3D survey masks are assembled from the set of source plane projections. The procedure browse the S/N curves (see Fig. 7, and picks the precomputed 2D source plane projection computed from the correct S/N value and the appropriate redshift value. Details on this can be found in Sects. 6.1.1 and 6.1.2). -(1.8) Minimal magnification to allow the detection of a given LAE in its parent cube. This first value is computed from the error on the flux detection, which is indicative of the local noise level. See definition in Eq. (5). -(1.9) A rescaled limit magnification (see definition in Eq. (6)) is computed for each LAE and for the three additional cubes. This is done to account for the differences in both seeing and exposure time. All the details about limiting magnification are explained in Sect. 6.1.2. For each LAE, the four µ lim values are used to restrict the volume computation to the areas of the source plan projection with a magnification high enough to allow the detection of this LAE. -(1.10) Volume of the survey where a given source could have been detected. For one LAE, this volume is computed from the source plane projected 3D masks, on the pixels with a high enough magnification. -(2.1) For each LAE, the NB containing the max of its Lyman-alpha emission is selected. The cleanest detection was obtained on this slice of the NB cube. -(2.2) Filtered map produced with SExtractor. See Appendix A for details. -(2.3) From the original filtered map produced for each LAE in the parent cube, three additional images are produced to the resolution of the additional cubes the LAE does not belong to using convolution or deconvolution. -(2.4) Individual bright-pixel profiles are retrieved for the four different seeing conditions from the filtered images and the three additional images produced in the previous step. The bright-pixel profiles contain the information related to the spatial profile of the LAEs. -(2.5) The four generalized bright-pixel profiles are the median of the individual bright-pixel profiles computed for each seeing condition (see Fig. A.1). These generalized profiles are used to limit the number of mask computed and simplify the production of 3D masks. -(3.1) The noise level in cubes is an average measure of noise in a given slice of a cube. It is defined in Eq.
(3) and an example is provided in Fig. 5. -(3.2) Combining the definition of noise levels and the individual bright-pixels profiles, the evolution of S/N for individual sources is computed through the cubes with Eq. (4) (see Sect. 6.1.1 and Fig. 7 Notes. The value F Ly α is the detection flux of the LAE, expressed in 10 −18 units, µ is the flux weighted magnification of the source, and the error bars correspond to the 68% asymmetric errors computed from Pµ,log Ly α is the Lyman-alpha luminosity corrected for magnification. No error bars are associated with the luminosity value, as this uncertainty is accounted for during the MC iterations needed to build the LFs. "Comp" is the completeness expressed in percentage. The V max value given in this table are computed for 2.9 < z < 6.9. A3, page 27 of 30 A&A 628, A3 (2019)