An Atlas of MUSE Observations towards Twelve Massive Lensing Clusters

Spectroscopic surveys of massive galaxy clusters reveal the properties of faint background galaxies, thanks to the magnification provided by strong gravitational lensing. We present a systematic analysis of integral-field-spectroscopy observations of 12 massive clusters, conducted with the Multi Unit Spectroscopic Explorer (MUSE). All data were taken under very good seeing conditions (0.6") in effective exposure times between two and 15 hrs per pointing, for a total of 125 hrs. Our observations cover a total solid angle of ~23 arcmin$^2$ in the direction of clusters, many of which were previously studied by the MACS, Frontier Fields, GLASS and CLASH programs. The achieved emission line detection limit at 5$\sigma$ for a point source varies between (0.77--1.5)$\times$10$^{-18}$ erg\,s$^{-1}$\,cm$^{-2}$ at 7000\AA. We present our developed strategy to reduce these observational data, detect sources and determine their redshifts. We construct robust mass models for each cluster to further confirm our redshift measurements using strong-lensing constraints, and identify a total of 311 strongly lensed sources producing 930 multiple images. The final redshift catalogs contain more than 3300 robust redshifts, of which 40\% are for cluster members and $\sim$30\% for lensed Lyman-$\alpha$ emitters. 14\% of all sources are line emitters not seen in the available HST images, even at the depth of the FFs ($\sim29$ AB). We find that the magnification distribution of the lensed sources in the high-magnification regime ($\mu{=}$ 2--25) follows the theoretical expectation of $N(z)\propto\mu^{-2}$. The quality of this dataset, number of lensed sources, and number of strong-lensing constraints enables detailed studies of the physical properties of both the lensing cluster and the background galaxies. The full data products from this work are made available to the community. [abridged]


Introduction
Strong gravitational lensing by massive galaxy clusters leads to the magnification of sources lying behind them, and this amplifi-Article number, page 1 of 47 arXiv:2009.09784v1 [astro-ph.GA] 21 Sep 2020 A&A proofs: manuscript no. ms cation can reach very large factors in the cluster cores (µ∼5-10, Kneib & Natarajan 2011), and even higher for images in the vicinity of the critical lines (Seitz et al. 1998). For this reason, massive clusters are sometimes referred to as Nature's telescopes, as the combined power of large diameter telescopes and gravitational magnifications provide us with the best views of background galaxies in the distant Universe. Since the first spectroscopic confirmation of a giant gravitational arc was reported (Soucail et al. 1988), high resolution images from the Hubble Space Telescope (HST) have significantly contributed to the success of lensing clusters, with the discovery of a high density of multiple images in deep observations (e.g., Kneib et al. 1996;Broadhurst et al. 2005;Jauzac et al. 2015). Indeed, multiply-imaged systems give us the most precise constraints on the mass distribution in the cluster cores, and consequently the magnification factors.
Multi-object spectrographs on 8-10m class telescopes have helped start large spectroscopic campaigns to confirm the lensed nature of very distant galaxies and the identification of multiple images (e.g., Campusano et al. 2001;Bayliss et al. 2011). However these large spectroscopic campaigns were largely limited by the crowding of galaxy clusters in their central regions, which leads to strong contamination between cluster galaxies and background sources as well as an inefficient use of multi-object spectrographs. In addition, the redshift distribution of lensed sources peaks at z > 1.5 in the redshift desert, where only the brightest UV-selected galaxies can be confirmed in the optical ). Because of these limitations, typically only a small number of multiply-imaged systems (typically < 10) have been spectroscopically confirmed in a given cluster with such instruments (e.g., Richard et al. 2010). Several observing campaigns have focused on near-infrared spectroscopy to avoid the redshift desert and complement optical observations (e.g. HST grism Treu et al. 2015, or Keck/MOSFIRE Hoag et al. 2015. The advent of the Multi Unit Spectroscopic Explorer (MUSE, Bacon et al. 2010) on the Very Large Telescope (VLT) has revolutionised the study of strong lensing galaxy clusters. MUSE is a panoramic integral field spectrograph, fully covering a 1 arcmin 2 field of view with spectroscopy in the optical range (475-930nm). Together with its very high throughput (40% end-to-end including telescope and atmosphere), medium resolution (R∼3000) and fine spatial sampling (0 . 2), MUSE is very well-suited for crowded field spectroscopy (Roth et al. 2018), and more specifically in galaxy cluster cores. Its pairing with the Ground Layer Adaptive Optics (AO) system in 2017 (Leibundgut et al. 2017) has improved its observing efficiency even further.
The versatile capabilities of MUSE on galaxy cluster fields were demonstrated almost immediately: in commissionning , science verification (Karman et al. 2015) and regular observations (e.g., Jauzac et al. 2016;Grillo et al. 2016;Caminha et al. 2017a;Lagattuta et al. 2017;Mahler et al. 2018;Lagattuta et al. 2019). Most notably, it has been very successful to follow-up the Frontier Field clusters (hereafter FFs; Lotz et al. 2017), a program initiated by STScI to get deep observations of six massive clusters with the Hubble (HST) and Spitzer space telescopes. But MUSE was also very successful to follow-up clusters with shallower HST images (e.g., Jauzac et al. 2019;Mahler et al. 2019;Rescigno et al. 2020).
This success has pushed several teams to analyse MUSE observations of known massive clusters such as the Cluster Lensing and Supernova Survey with Hubble (CLASH, Postman et al. 2012) program (Rexroth et al. 2017;Caminha et al. 2019a;Jauzac et al. 2020). Indeed, the richness of the MUSE spectroscopic datasets have a strong legacy aspect, for example to cross-match with multiwavelength observations of the same fields (e.g., with ALMA; Laporte et al. 2017;Fujimoto et al. 2020 submitted).
In this paper, we present a full analysis of 12 lensing clusters, totalling more than 125 hours of exposure time with the MUSE instrument. These observations are in majority taken as part of the MUSE Guaranteed Time Observations (GTO) program, but are complemented by additional MUSE datasets publicly available on the same clusters or following a similar target selection. We have benefited from many years of developments in MUSE analysis tools as part of the MUSE GTO programs Inami et al. 2017;Piqueras et al. 2019) to improve the data reduction, source detection and analysis. The results of this analysis are made available in the form of a public data release.
This paper is organized as follows. Section 2 presents the target selection, observations and data reduction for the MUSE and ancillary datasets used in our analysis. Section 3 describes the construction of the spectroscopic catalogs contained in the data release and Sect. 4 describes the mass models we use to estimate the magnification and source properties. We provide an overview of the full spectroscopic catalog and a few science highlights in Sect. 5, and give our conclusions in Sect. 6.
Throughout the paper, we assume a standard Λ-CDM cosmological model with Ω M = 0.3, Ω Λ = 0.7, and H 0 = 70 km s −1 Mpc −1 whenever necessary. At the typical redshift of the lensing clusters (z = 0.4), 1 covers a physical distance of 5.373 kpc. All magnitudes are given in the AB system.

Cluster sample
The observations used in our analysis focus on the cores of massive galaxy clusters from both the MUSE Lensing Cluster GTO program and available archival programs that target similar clusters. Cluster fields in the MUSE GTO program were chosen based on their strong-lensing efficiency at magnifying background sources, in particular Lyman-α Emitters (LAEs) at z > 3 expected to be detected within the MUSE spectral range.
We compile our sample from a master list of massive Xray luminous clusters, mainly from the ROSAT Brightest Cluster Sample (BCS, Ebeling et al. 1998Ebeling et al. , 2000 and the MAssive Cluster Survey (MACS, Ebeling et al. 2001), along with its southern counterpart, SMACS. Valuable follow-up imaging with the HST was performed in particular for MACS and SMACS clusters for HST SNAPshot programs GO-10491, -10875, -11103, -12166, and -12884 (PI Ebeling), allowing the identification of stronglensing features in the form of arcs and arclets in almost every single target (e.g., Ebeling et al. 2007;Repp et al. 2016).
To build the sample, we chose targets based on the following selection criteria: cluster redshift 0.2 < z cl < 0.6, to ensure that the main spectral signatures (K,H absorption lines and 4000 break) of cluster members are located in a low-background region of the MUSE spectrum; a wide range in Right Ascension (R.A.), to allow easier scheduling with respect to the rest of MUSE GTO observations; transit at low airmass (< 1.25) as seen from Cerro Paranal Observatory, corresponding to declinations −60 < Dec < +10 degrees; at least one existing high-resolution broad-band HST image (in either the F606W or F814W filter to ensure overlap with the MUSE spectral range), that shows bright arcs and multiple Article number, page 2 of 47 images. These images are critical to pinpoint the location of sources sufficiently bright in the continuum; a preliminary mass model based on HST images and spectroscopic confirmation of at least one multiply imaged system to roughly estimate the total mass in the cluster core. From this crude map the MUSE observations could be designed to efficiently cover the critical lines and multiple-image region.
For clusters observed in the framework of the MUSE GTO, the choice of the precise center for the pointing, or the mosaic configuration, were determined in such a way that the observed area included as far as possible the tangential critical lines, in order to maximize the number of strongly magnified and multiple images (see also Sect. 3.6).
We present here the results for a set of 12 clusters selected through this process, all of which were analysed in a uniform manner. The combination of the aforementioned criteria makes our selection similar to the one used in other cluster surveys such as CLASH and the FFs. It is therefore not surprising that half of the clusters we selected overlap with these two programs.
The main properties of the 12 selected clusters are summarised in Table 1. Additional clusters observed as part of the GTO program will be analysed following the same procedure as described in this paper, and included in a future data release.

MUSE GTO Survey
The MUSE Lensing Cluster project (PI: Richard) is a multisemester program, which has been running since ESO semester P94 starting in Fall 2014. It targets the central regions of the massive lensing clusters introduced in the previous section, with an effective exposure time of 2 − 10 hrs per pointing. Data are acquired in Wide Field Mode (WFM), using both standard (WFM-NOAO-N) and adaptive-optics (WFM-AO-N) observations -the latter configuration having been available since Fall 2017, following the commissioning of the ground layer Adaptive Optics (AO) correction. Although the AO mode improves the seeing, we note that WFM-AO-N datasets have a gap in the 5800-5980 Å wavelength range due to the AO notch filter. Additionally, while nearly all of the data are acquired in Nominal mode, covering the 4750 -9350 Å wavelength range, one exception is the Bullet cluster, for which observations were taken in Extended mode (WFM-NOAO-E) covering bluer wavelengths (See Sect. 2.5).
The location and orientation of each MUSE pointing has been chosen to maximise the coverage of the multiply-imaged regions ( Fig. 1), and also to guarantee the availability of suitable Tip-Tilt stars for observations taken in WFM-AO-N mode. In the case of FF and CLASH clusters, we adopted a coverage of larger mosaics of multiple contiguous MUSE pointings to increase the legacy value of these datasets, in coordination with non-GTO programs (see below).
The observations were split into blocks of ∼1-1.15 hr execution time. Each observing block consists in either 2×1800 sec., 3×900 sec or 3×1000 sec. exposures. We included a small spatial dithering box (<0 . 3) as well as 90 degrees rotations of the instrument between each exposure. Indeed this strategy has been shown to help reduce the systematics due to the IFU image slicer (Bacon et al. 2015). A summary of all exposures taken is provided in Table 2.
Standard calibrations have been used for this program, including day-time instrument calibrations as well as standard star observations. All exposures taken after the 2nd MUSE commissioning in June 2014 (i.e. all cluster fields presented here except for SMACS2031) include single internal flat-field exposures taken with the instrument as night calibrations. These short (0.35 sec.) exposures are used for an illumination correction and taken every hour, or whenever there is a sudden temperature change in the instrument. These calibrations are important to correct for time and temperature dependence on the flat-field calibration between each slitlet throughout the night. In addition, twilight exposures are taken every few days and are used to produce an on-sky illumination correction between the 24 channels.

MUSE Archival Data
The importance of getting deep MUSE exposures on the FFs has led us to coordinate a joint effort between the GTO program and additional programs towards MACS0416 and Abell 370 (ESO programs 094.A-0525 and 096.A-0710 respectively, PI: Bauer). Overall, the same strategy as for the GTO campaign has been followed for these observations, and were combined with GTO data when overlapping. Similarly, additional exposure time in the northern part of the FF cluster MACS0416 has been obtained as part of ESO program 0100.A-0764 (PI: Vanzella, Vanzella et al. 2020b), and combined with the GTO observations. Finally, we include in our analysis two CLASH clusters observed as part of ESO programs 095.A-0525, 096.A-0105, 097.A-0909 (PI: Kneib) on MACS0329 and RXJ1347, again with very similar science goals and observing strategy to the GTO program. A first analysis on these datasets with only partial exposure time was presented in Caminha et al. (2019a) and we present here a more detailed analysis of the full datasets after homogeneous reduction as for the other fields.

Ancillary HST data
We make use of the available high-resolution WFPC2, ACS/WFC and WFC3-IR images in the optical / near-infrared covering the MUSE observations. Six clusters in our sample are included in the CLASH (Postman et al. 2012) and FF (Lotz et al. 2017) surveys, for which High Level Science Products (HLSP) incorporating all observations taken in 12 and 6 filters respectively have been aligned and combined. We use the HLSP images provided by the Space Telescope at the respective repositories for CLASH 1 and FFs 2 .
For the remaining six clusters, Snapshot and GO programs on HST were obtained as part of the MACS survey (PI: Ebeling) as well as follow-up HST programs (PIs: Bradac, Egami). The details of the HST programs and available bands are given in Table 3, and were used in previous strong-lensing work. We make use of the reduced HST images available in the Hubble Legacy Archive (HLA).
Following the MUSE observations on MACS0940, we have obtained as part of HST program 15696 (PI: Carton) new images with ACS and WFC3-IR at a higher resolution and going much deeper than the previous WFPC2 snapshot in F606W. Three orbits were obtained in F814W (totalling 7526 sec), one orbit in F125W (totalling 2605 sec) and 1.5 orbits in F160W (totalling 3900 sec). We have aligned and drizzled the individual calibrated exposures for each band using the astroDrizzle and TweakReg utilities (Koekemoer et al. 2011).
All HLSP datasets were already calibrated for absolute astrometry. The HST images for the remaining 6 clusters were aligned against one another and with respect to star positions in J2000 selected from the Gaia Data Release 2 (Gaia Collaboration

MUSE Data reduction
For consistency, data from each cluster are reduced using a common pipeline, which processes the raw exposures retrieved from the ESO archive into a fully-calibrated combined datacube. This sequence largely follows the main standard steps described in Weilbacher et al. (2020) as well as the MUSE Data Reduction Pipeline User Manual 3 . However, we make some modifications due to the crowded nature of lensing cluster fields, which contain extended bright objects. We summarise each step below. While the specific version of the Data Reduction Pipeline used on each cluster ranged between v2.4 to v2.7, there are only minor changes between these versions, and these do not affect the quality of the resulting datacubes. (2) 1 exposure stopped after 740 sec.
(3) 1 exposure stopped after 1405 sec. (4) MUSE commissioning run .  Table 3. Details of the HST programs and filters used for the 6 clusters which are not part of the FF or CLASH samples.
The first step in the pipeline provides basic calibrations. Raw calibration exposures are combined and analysed to produce a master bias, master flat and trace table (which locates the edges of the slitlets on the detectors), as well as the wavelength solution and Line Spread Function (LSF) estimate for each observing night. These calibrations are then applied on all the raw science exposures to produce a pixel table propagating the information on each detector pixel without any interpolation. A bad pixel map is used to reject known detector defects, and we make use of the geometry table created once for each observing run to precisely locate the slitlets from the 24 detectors over the MUSE FoV. Twilight exposures and night-time internal flat calibrations (when available) are used for additional illumination correction.
The second step of the data reduction makes use of the muse_scipost module of the data reduction pipeline to process science pixel tables. The calibrations performed in this step include flux calibration using standard star exposures taken at the beginning of the night, telluric correction, auto-calibration (detailed below), sky subtraction, differential atmospheric refraction correction, relative astrometry and radial velocity correction to barycentric velocity. In the case of WFM-AO-N observations with adaptive optics, laser-induced Raman lines are also fit and subtracted.
Each calibrated pixel table is then resampled onto a datacube with associated variance, regularly sampled at a spatial pixel (hereafter spaxel) scale of 0 . 2 and a wavelength step of 1.25 Å, between 4750 (4600 for WFM-NOAO-E) and 9350 Å. The drizzling method from muse_scipost is used in the resampling process, which also performs a rejection of cosmic-rays. A white light image is constructed for each datacube by inverse-variance weighted averaging all pixel values along the spectral axis. Individual white light images are used to measure spatial offsets between individual exposures. This is done either through the use of the muse_exp _align task of the data reduction pipeline, or Article number, page 5 of 47 A&A proofs: manuscript no. ms by locating point sources against an HST image in the F606W or F814W filter, in the case of large MUSE mosaics. The measured offsets are then used to produce fully WCS-aligned datacubes for each exposure.
Once resampled to the same WCS pixel grid, all datacubes are finally combined together outside of the pipeline using the MPDAF 4 (Piqueras et al. 2019) software task CubeList.combine, or its equivalent CubeMosaic.pycombine for large mosaics. This allows to perform an inverse-variance weighted average over a large number of datacubes. A 3-5 σ rejection (depending on the number of exposures) of the input pixels was applied in the average, to remove remaining defects and cosmic rays, totalling ∼5-10% of all pixels in each exposure. We visually inspected each individual white light image and manually rejected very few obvious cases of light contamination, issues in telescope tracking, etc. from the combination. We also masked 4 cases of satellite and asteroid trails by selecting and masking the relevant spaxels across the cubes. We estimated variations in atmospheric transmission by comparing the total flux of bright isolated sources between individual exposures and provided the correction factors as input to CubeList.combine or CubeMosaic.pycombine.
Low level systematics due to flat-field residuals (at the ∼1% level) remain between each slitlet after applying the internal flatfield calibrations and nightime illuminations. They appear prominently as a weaving pattern over the FoV in the white light image (Fig. 2, left panel). In v2.4 (and subsequent versions) of the data reduction pipeline, an optional self-calibration can be performed as part of muse_scipost, which makes use of the overall sky signal within each slitlet to correct for these flat-field systematics. This procedure has been shown to work accurately in MUSE deep fields where very bright continuum sources are scarce . However the very bright extended cluster light haloes in the MUSE lensing fields, in particular in the vicinity of the Brightest Cluster Galaxies (BCGs) strongly bias this measurement when the procedure is applied automatically (Fig. 2), due to the extent of the Intra-Cluster light (ICL). We have the possibility to give as input to the pipeline a sky mask for the selfcalibration procedure, to help the algorithm identify spaxels with a clean background. We constructed this mask in two steps, by thresholding a deep HST continuum image (smoothed by a ∼0 . 6 FWHM gaussian filter) or a MUSE white-light image created from a first combined datacube, and then applying it to produce the self-calibrated exposures. The threshold was determined by inspecting the mask to avoid using a too small fraction of the spaxels in a given slitlet for the background level estimation. In a few very crowded regions we make use of the multiple rotations in the observations to compute the self-calibration corrections only in the most favourable cases, and provide them as a user table (see more details in Weilbacher et al. 2020). When deemed necessary, another iteration on the sky mask was performed on the new combined cube to improve the self-calibration further. An illustration of the resulting combined white light image and sky mask is presented in the middle panel of Fig. 2. In the case of the SMACS J2031 cluster field, which is the only cluster lacking for night-time illumination corrections during the commissioning run, the improvements due to self-calibration are even more significant (see Claeyssens et al. 2019).
The white-light image of the self-calibrated combined datacube shows some imperfections in the sky background, in the form of negative 'holes' in the interstacks between the MUSE slices, in particular within the top channels of the instrument. These can typically be corrected in empty fields by producing a super-flat calibration (Bacon et al. in prep.) combining all exposures in the instrument frame of reference while masking objects. However this method is not suited for crowded cluster fields. Instead, we aligned all the individual white light exposures in the instrument referential frame by inverting all WCS offsets and rotations between exposures, then detected deviant spaxels with strongly negative flux over the average combined image. This produces a small mask containing ∼ 0.6% of the FoV, which we use to mask pixels in the datacubes at all wavelengths prior to combination. This results in the removal of the majority of the imperfections as seen in the white-light image (Fig. 2, right panel).
The combined datacube is finally processed using the Zurich Atmospheric Purge (ZAP, Soto et al. 2016) v2, which performs a subtraction of remaining sky residuals based on a Principal Component Analysis (PCA) of the spectra in the background regions of the datacube. This has been shown to remove most of the systematics in particular towards the longer wavelengths (see e.g., Bacon et al. 2017 in the UDF). We provide as input to ZAP an object mask by inverting the sky mask described above. Although the number of eigenvectors removed is automatically selected by ZAP, we have performed multiple tests adjusting this number to check that the chosen value did not affect the extracted spectra of faint continuum and line emitting sources.

MUSE Data Quality
The reduced datacube is given as a FITS file with two extensions, containing the f λ and associated variance over a regular 3D grid. The spectral resolution of the observations varies from R=2000 to R=4000, with a spectral range between 4750 and 9350 Å. To ensure that cubes are properly Nyquist sampled, we set the wavelength grid to 1.25 Å pixel −1 . The final spatial sampling is 0.2 arcsec pixel −1 in order to properly sample the Point Spread Function (PSF).
We have cross-checked the relative astrometry between the MUSE cubes and the HST images presented in Sect. 2.4, by crossmatching bright sources in the two datasets. We measure a global rms of ∼0 . 12, i.e., about half a MUSE spaxel.
In order to estimate the spatial Point Spread Function, which varies as a function of wavelength, we have followed the MUSE Hubble Ultra Deep Field approach, described by Bacon et al. (2017). From the MUSE datacube we produce pseudo-HST images in the bandpasses matching the HST filters (Table 3) and overlapping with the MUSE spectral range. These images are compared with pseudo-MUSE images created from HST and convolved with a Moffat (1969) PSF at fixed β = 2.5. The FWHM of the Moffat function is optimised by minimising the residuals in Fourier space. This method allows to use all objects in the field to measure the PSF.
The wavelength dependence on the PSF is then adjusted by assuming a linear relation: The slope C of this relation depends on the use of Adaptive Optics, we find values ranging between −1.9 × 10 −5 and −3.6 × 10 −5 arcsec Å −1 , which are similar to the ones from the MUSE Ultra Deep Fields ) and MUSE-Wide (Urrutia et al. 2019). We report for each dataset the FWHM of the PSF at 7000 Å in Table 2. In general the datacubes have been taken in very good seeing conditions, translating into a typical FWHM 7000 value between 0 . 55 and 0 . 65. The only exception is SMACS2031 (FWHM 7000 =0 . 79) which was observed during MUSE commissioning. The same MUSE vs HST comparison allows us to cross-check the photometric accuracy of the MUSE cubes, and we find an agreement within 5-10% between the two datasets.
The variance extension of the cube is estimated during the data reduction and gives an estimate of the noise level which is generally underestimated due to correlation in the noise introduced during the final 3D resampling process (see e.g., Herenz et al. 2017;Urrutia et al. 2019;Weilbacher et al. 2020). To properly assess the noise properties of each cube as a function of wavelength, we make use of objects-free regions (as defined with the sky masks mentioned in Sect.2.5) and measure the variance of the flux level integrated over 1 square apertures. When compared with the direct estimate from the variance extension, the two measurements show a simple constant correction factor between 1.3 and 1.5 on the standard deviation as a function of wavelength (e.g., Fig. 3). We apply this constant correction factor to the noise estimation of the datacube.
The relative noise levels presented in Fig. 3 are very similar in all datacubes (with the exception of the AO filter region which depends on the relative fraction of AO vs NOAO exposure time), and roughly scale with the square root of the effective exposure time. The 5σ surface brightness limits (averaged in a 1 arcsec 2 region) between skylines at 7000 Å is 4.8×10 −19 and 2.5×10 −19 erg s −1 cm −2 Å arcsec −2 for cubes with 2 hrs and 8 hrs exposure time respectively. This translates into a 5σ line flux limit of 1.5×10 −18 and 7.7×10 −19 erg s −1 cm −2 at 7000 Å when averaged over a 5×5 spaxels aperture and 3 wavelength planes (6.25 Å).

Overview
The sheer number of volume pixels (voxels) in a 1 arcmin 2 MUSE datacube (∼ 4 × 10 8 ) makes it challenging to locate and identify all the extragalactic sources; this is especially true when observing crowded fields like galaxy clusters. Prior knowledge of source locations from HST images provides crucial external information to help distinguish and deblend overlapping sources. Our goal is to extract a spectrum and estimate the redshift for all sources down to a limiting signal-to-noise ratio (described in subsequent subsections), either in the stellar continuum or via emission lines. Additional prior information on the redshift of background sources can be obtained for strongly-lensed multiple images, where well-constrained models allow. We have therefore  developed a common process to analyse the MUSE datacubes incorporating all these elements. The general concept of this procedure has been described previously in Mahler et al. (2018) and Lagattuta et al. (2019), but we have further refined and partially automated the methods for the subsequent cluster observations presented here, in order to create more uniform and robust catalogs. We summarise each step of the process in the flowchart shown in Fig. 4, and provide details on each step in the following subsections.

HST prior sources
We make use of all the available high-resolution HST imaging in each field (see Sect. 2.4) to produce input photometric catalogs of continuum sources that overlap with the MUSE FoV to identify regions for subsequent spectral extraction. Since our goal is to measure the spectra of typically faint compact background sources, usually embedded within the bright extended haloes of cluster members, we first produce a median-subtracted version of each image by subtracting a 1 . 5×1 . 5 running median filter box.
Article number, page 7 of 47 A&A proofs: manuscript no. ms The WCS-aligned, median-subtracted HST images are cropped to the common MUSE area and further combined into an inversevariance-weighted detection image, using SWarp (Bertin et al. 2002).
This procedure was used in previous works searching for faint high-redshift dropouts (Atek et al. 2015(Atek et al. , 2018 in deep Hubble FF images, and significantly improves the detection of background sources while still keeping the central cores of all bright cluster members. The negative impact of the median subtraction is a reduction of the isophotal area for bright extended sources in the segmentation map, which reduces the total flux in their extracted MUSE spectra. However, as these sources all have high signalto-noise ratio in the MUSE datacube, this has no impact on the spectroscopic redshift measurements, and we adjusted the size of the median-box filter in order to mitigate this effect. The detection image is given as input to the SExtractor software (Bertin & Arnouts 1996) used in dual mode to measure the photometry over each HST band (aligning each band image with Montage) 5 . We make use of the HST weight maps to account for variation of depth over the FoV, and selected typical SExtractor parameter values detect_minarea=10-15 pixels and detect_thresh=1.2-1.5. These values were optimized for each cluster to account for variations in depth and pixel scale of the HST images.
The photometry performed by SExtractor in each band is merged together into a photometric catalog of HST sources, dubbed prior sources through the rest of the paper. SExtractor also produces a segmentation map where each spaxel in the MUSE FoV is flagged with a source ID number.
During the detection process, SExtractor tends to split large clumpy sources such as foreground disk galaxies or very elongated arcs into multiple entries in the catalog. Such cases are visually identified as part of the source inspection and redshift estimation process (see Sect. 3.5), and the corresponding photometry and segmentation map are merged together to produce a single prior source located at the HST light barycenter.

Line-emitting sources
Independent from the HST continuum sources, we produced a catalog of line-emitter candidates identified directly from the MUSE datacubes. This was performed by running the muselet software, which is part of the MPDAF python package (Piqueras et al. 2019). muselet first creates a 'pseudo-narrow-band' MUSE datacube by replacing each wavelength plane of the cube with a continuum-subtracted narrow-band image. To generate the narrow-band emission for a given wavelength, we take a weighted average of five planes of the cube (the original wavelength plane and its four nearest neighbours), using a fixed FWHM Gaussian weight function corresponding to 150 km s −1 for a line centered at 7000Å. This weighting scheme improves the signal-to-noise ratio of the narrow-band at the center of a typical emission line of similar FWHM. The subtracted continuum is estimated by an inverse-variance weighted average of two 25 Å wide wavelength windows located bluewards and redwards of the narrow-band. We have found that these parameters are generally appropriate for an automatic subtraction of the local continuum.
The full set of narrow-band images is run through SExtractor to detect candidate spectral lines at each wavelength. Because we expect some galaxies to show diffuse extended emission, especially Lyman-α emitters, we tune our SExtractor detection parameters (detect_minarea=12 pixels; detect_thresh=1.2), and spatial smoothing filter (a 5×5 pixel tophat with FWHM=0 . 8) to better identify these objects. By construction, all bright lines will be detected in adjacent wavelength planes, and therefore they are automatically merged into a master list of individual line candidates. We specifically masked three 4Å-wide wavelength regions in the cube corresponding to bright sky lines at 5577 and 6300 Å as well as the strongest Raman line at 6825 Å, to prevent artefacts in the detections. Sets of lines spatially coincident (within 1 seeing disk) are then grouped together into muselet sources and a segmentation map is produced. A first redshift estimate is also provided by muselet based on an automatic match with a list of bright emission lines.
The MPDAF v2.0 of the muselet software was used for a first set of clusters (Abell 2744, Abell 370, MACS1206, MACS0416S). MPDAF v3.1 introduces improvements in the merging and weighting schemes of the muselet sources, and was used for the rest of the clusters.

Spectral extraction
In order to characterise the detected prior and muselet sources, we adopt the following procedure to extract 1D spectra, which optimises the signal-to-noise of the detections while being adapted to crowded fields. A weight image is created for each source by taking the flux distribution over the segmentation maps produced as part of the detection process. In the case of prior sources this flux distribution is taken from the combination of all HST filters, and the 2D weight map is convolved by the MUSE PSF model described in Sect. 2.6. To account for the wavelength variation of the PSF (Eq. 1), weight images are created for each source at 10 wavelength planes (i.e. at ∼ 500 Å intervals) over the MUSE spectral range and interpolated into a 3D weight cube. In the case of muselet sources, the segmentation map used is the one of the brightest emission line of the object. Since we do not account for wavelength variations in the intrinsic flux profiles of galaxies (e.g., colour gradients), the weighted extractions, while optimal for the distant compact sources, are not spectrophotometrically accurate for the extended cluster members. For this reason we also extract unweighted spectra.
The weight extraction is computed following the optimal extraction algorithm from Horne (1986): where C x,y,λ and V x,y,λ respectively correspond to the pixel flux and associated variance at a specific location in the cube, and W x,y,λ is the optional weight cube.
In addition, we estimate a local background spectrum around each source, produced by averaging the spectra from MUSE spaxels outside of all detected sources in the field. This local background estimate removes large-scale contamination from bright sources, such as stars and cluster members, as well as potential systematics in the background level remaining from the data reduction. We define spaxels containing sky/intra-cluster light by convolving the combined HST 2D weight map with the central wavelength MUSE PSF. From this, the darkest 50% of spaxels are considered as background-spaxels. However, for each object extracted we compute the local background, by aggregating the nearest ∼500 background-spaxels surrounding it (with a minimum Manhattan distance of 0 . 4 from the object). To retain locality, we do not aggregate beyond 4 . 0, even if we have not Article number, page 8 of 47 reached the desired ∼500 spaxel target. This ensures a good tradeoff between aggregating sufficient sky spaxels to achieve a high signal-to-noise estimate of the background, without aggregating so many spaxels that they are no longer locally relevant.
Therefore, for every object we compute four sets of spectra: weighted (with and without) local background subtraction and unweighted (with and without) local background subtraction. In general weighted spectra with local background subtraction are those most optimal for source identification and redshift measurement. But for the bright very extended cluster members, the unweighted spectra without local background subtraction would provide superior spectrophotometric accuracy.

Source inspection
To determine the redshift of extracted sources, we adopt a semiautomatic approach similar to the one used to produce the redshift catalog in the MUSE observations over the Hubble Ultra Deep Field (Inami et al. 2017). We make use of the Marz software (Hinton 2016) which looks for the best redshift solutions by performing a cross-correlation with a set of pre-defined spectral templates. The templates used in our analysis are described in detail in Appendix B of Inami et al. (2017). They include a variety of stellar spectra from Baldry et al. (2014) and deep MUSE spectra from blank fields (Bacon et al. 2015. All muselet sources and every prior source down to a limiting (continuum-level) SNR have been inspected individually by at least 3 coauthors. Based on our first assessments, we adjust the prior magnitude limit for each cluster as a function of MUSE exposure time, given by F814W AB =25+1.25 log(T/2.0), with T the maximum exposure time over the field in hours. This allows us to avoid reviewing sources with too low signal, and corresponds to a typical median SNR of ∼1.5 per pixel over the MUSE spectral range. Any prior source in our final catalog below this limit is therefore measured from a corresponding muselet detection.
To inspect each source in detail, we use a graphical interface tool designed within the MUSE collaboration (Bacon et al. in prep.) that lets a user simultaneously view all HST and MUSE white light images of the object, the extracted spectrum (and its variations) compared to features of the five best Marz redshift solutions, and zoom-ins on specific spectral lines and narrowband images. The tool allows us to navigate quickly between all neighbouring sources, to check for contaminations or object blending. This tool has been adapted to the needs of the lensing cluster fields.
More specifically, the following information is inspected for each source: • Spectroscopic redshift and confidence. All Marz solutions are assessed, and the user can also choose a different value manually. When a redshift solution is found, a confidence level (zcon f ) is selected according to its level of reliability: -Confidence 1: redshift based on a single low-SNR or ambiguous emission line, or several very low SNR absorption features. These sources are given in the public catalog for comparison with other measurements, but are not used in the subsequent analysis. -Confidence 2: redshift based on a single emission line without additional information, several low SNR absorption features, or whose redshift confidence is increased by the identification of multiply-imaged systems (see Sect. 3.6). -Confidence 3: redshift based on multiple spectral features, or with additional information on a high SNR emission line (e.g., very clear asymmetry of the line, HST nondetection blueward of the line). • Association between prior and muselet sources. The two detection methods will most of the time identify the same sources, and we keep the prior sources by default when a clear match is found with muselet emission lines. Note that significant spatial offsets can be found between the two sources, for example in the case of star-forming/photoionised gas trailing behind infalling galaxies in the cluster, or Lyman-α emission which is physically offset from the underlying UV continuum emission. • Merging of sources. Clumpy star-forming galaxies or very elongated lensed arcs tend to be separated into multiple prior sources. We identify them and perform a new extraction of the merged source as in Sect. 3.4 by aggregating their segmentation maps and combining their photometry. • Defects and artefacts. We identify stellar spikes or cosmic ray residuals in the prior catalog, as well as artefacts found by muselet in the cube due to strongly varying spectral continuum (typically low-mass stars or cluster members).
The resulting inspection information is stored in a database, and since each spectrum was reviewed independently by at least 3 users, the results are cross-matched during a consolidation phase, where a consensus is reached in case of disagreement between the users, and the entry is written in the catalog.

Assessment of multiple images
All sources with a reliable redshift in the catalog are tested with the corresponding Lenstool mass model of the cluster (see Sect. 4) to predict multiple images. As the precision of the lens model is typically 0 . 5 − −0 . 9 over the area of multiple images, and the density of objects in a given redshift plane is low, we easily find good matches for systems of images at the same redshift which are predicted to arise from the same source. We manually inspect the corresponding spectra of each matching candidate, as we expect precisely the same redshift and spectral features from multiple images. This assessment serves two purposes: (a) identifying multiple images in order to pinpoint truly independent sources, and (b) cross-checking the redshift measurements for background lensed sources.
Indeed, the main spectral features identified for background galaxies through the redshift inspection are typically bright emission lines ([O ii], C iii], C iv, Lyman-α) or absorption features (rest-frame UV ISM absorptions or Lyman-α break). Ambiguous redshifts are typically limited to an uncertainty between [O ii] and Lyman-α as a single emission line, which produce very different lensing configurations between z ∼ 0.5-1.5 and z ∼ 3-7. The multiple-image assessment allows us to correct a very small number of erroneous redshifts (typically less than 3 images per cluster), and more importantly to boost the redshift confidence level to zcon f = 2 for faint line emitters with zcon f = 1 showing a clear matching counterimage predicted by the model.
As new multiple images help us refine and improve the lensing model, this is an iterative process: we start by using only spectroscopically confirmed images from the most secure systems (having confidence levels 2 or 3) as constraints in the model, and make further predictions from the refined model. We systematically search for spectral features in all predicted counter-images, even when they are not included in the original catalog, solving all possible inconsistencies. This process allows us to identify a small number (up to 5-10 per cluster) of images as faint emission lines in the narrow-band datacube, which were not identified originally by muselet. We include these images as additional entries Article number, page 9 of 47 A&A proofs: manuscript no. ms in the catalog by manually forcing their extraction, adopting a point source extraction aperture in the procedure described in Sect. 3.4. A typical example of faint counter image discovered during the assessment is presented in Fig. 5.
Remaining strongly-lensed sources not matched to a given system are zcon f = 1 objects for which we are able to explain the lack of counterimage by either (a) its much lower magnification, (b) its strong contamination by a cluster member or foreground source, or (c) it being outside the MUSE FoV. Conversely, when the counterimage is predicted to be bright and isolated, we remove the zcon f = 1 entry from the catalog.
Multiple images are identified in a dedicated column (MUL) of the final spectroscopic catalog: we adopt the usual notation X.Y where X identifies a system of multiple images and Y is a running number identifying all images for a given system.

Cross-checks with public spectroscopic catalogs
We performed a comparison of our spectroscopic catalogues with available spectroscopy from VLT and HST in the literature, including some of the same MUSE observations included in our sample.

Abell 2744 and Abell 370
First versions of the MUSE spectroscopic catalogs, using the same datasets but based on an earlier version of our data reduction and analysis process, were presented in Mahler et al. (2018) for Abell 2744 and in Lagattuta et al. (2019) for Abell 370. Both catalogs were cross-checked with available observations from the GLASS (Schmidt et al. 2014;Treu et al. 2015). The datacubes included in these publications were reduced and analysed using an earlier version of the reduction pipeline and MPDAF, which did not include autocalibration or the interstack masks, and which used older versions of ZAP and MUSELET. In light of the newly processed datacube and source inspection catalogs, we reviewed all of the low-confidence sources (zcon f = 1 or 2) from the public catalogs and rejected 18 sources. We corrected the redshifts for 9 sources, typically faint [O ii] and Lyman-α emitters or sources in the redshift desert.

SMACS2031
An early catalog from the MUSE commissioning data taken on this cluster was presented in Richard et al. (2015) together with a cluster mass model. In view of the significant improvements in the data reduction since these observations were taken (see also Claeyssens et al. 2019) we fully re-reduced and analysed the MUSE observations following our common process. We identify three additional multiply-imaged systems as faint Lyman-α emitters (forming a total of nine images), and also find two counterimages for existing systems. One published redshift has been corrected.

MACS0416
A first analysis of the MUSE datasets in MACS0416 was presented by Caminha et al. (2017a) together with the CLASH/VLT redshifts from Balestra et al. (2016). The data they analysed includes the full MUSE coverage in the southern part (MACS0416S, program ID 094.A-0525) but only the GTO coverage in the northern part (MACS0416N, program ID 094.A-0115). In the southern part, we find small redshift differences for five sources in common and at low confidence (zcon f = 1). One confirmed Lyman-α emitter (CLASHVLTJ041608.03-240528.1 at z = 4.848) was not detected with muselet in our catalog but we confirmed it from the narrow-band cube at low confidence. In the same region, we identify nine new multiply-imaged systems producing 22 images in the MUSE data, which were not used in previous published models.
In the northern part, the MUSE data used in our analysis is much deeper and we find no discrepancy for the sources in common with (Caminha et al. 2017a). Vanzella et al. (2020b) present one source at z = 6.63 from the deep observations, which was also identified in our muselet catalog. The deeper data provide us with additional redshifts (not used in previous published models) for 15 systems and 33 multiple images in that region, including 11 systems newly discovered with MUSE as faint Lyman-α emitters 6 .
Finally, as in Sect. 3.7.1, we cross-check the agreement between our redshift measurements and the GLASS catalog from (Hoag et al. 2016) for the sources in common, and find no redshift discrepancies.

MACS1206
Caminha et al. (2017b) presented an analysis of the same MUSE data taken in the field of the cluster MACS1206 as in our sample. Their full spectroscopic catalog is not publicly available but we have cross-checked the redshifts and multiply-imaged systems with our own analysis. We recover all of the systems at the same spectroscopic redshifts as used in their strong-lensing model. In addition, we identify eight new systems (named 29 to 36) of two to four images each (total 21 images) in our analysis of the datacubes. These 21 images are in their large majority Lymanα emitters, generally diffuse and extended, that were found by muselet. The redshifts were further confirmed by predictions from the well-constrained lens model (rms model = 0 . 52).

RXJ1347
Caminha et al. (2019a) presented an analysis of early observations taken in the RXJ1347, restricted to the upper-left quadrant of the mosaic taken in WFM-NOAO-N. We compare our spectroscopic redshifts with the public catalog available on VizieR, as well as the constraints presented in their lens model, based on 8 multiply-imaged systems including 4 with spectroscopic confirmation (totalling 9 images). All the published sources are in common with our spectroscopic catalog and there are no redshift discrepancies. The full dataset on this cluster gives us a significant increase in the number of spectroscopic redshifts and multiple systems: we identify 25 additional systems with a spectropic redshift, producing a total of 119 images. We compared our sample with the public redshift catalog from GLASS on this cluster and did not identify any discrepancies between common redshifts. We boosted the confidence level for 2 MUSE sources with matching lines with GLASS observations. 3.7.6. MACS0329 Caminha et al. (2019a) presented an analysis of the same MUSE data taken in the field of the cluster MACS0329. We compare our spectroscopic redshifts with the public catalog available on VizieR, and identify one source at z = 2.919 with diffuse Lymanα emission which we did not detect in our muselet catalog. We confirm from our lensing predictions that it is indeed a multiply imaged candidate associated with the halo of our muselet source 129 showing the same spectral line. We treat these images as a multiply-imaged candidate, but their precise location is too uncertain to be included as multiple images in our lensing model.
In addition, we identify two clear pairs of multiply imaged Lyman-α emitters which we include as strong lensing constraints (our systems 4 and 7) but which were not used in the strong lensing model from Caminha et al. (2019a). In both cases, one image from the system was in common between the two spectroscopic catalogs. We also identify 11 low confidence (zcon f = 1) Lyman-α emitters not present in their catalog, which either do not predict or for which we cannot rule out the presence of a counterimages, as well as a high confidence (zcon f = 3) [O ii] emitter at z = 0.963.

Spectral analysis
Due to their different physical origins, the spectral features identified in each source as part of the cross-correlation (e.g., K,H Ca lines in cluster members, nebular emission lines, ISM absorption lines, Lyman-α emission) or through manual redshift measurement do not provide uniform constraints on the accuracy and precision of the redshift estimates. In addition, the crosscorrelation carried out by Marz can be systematically off, and the results do not provide us with a reliable error on the redshift measurement.
Therefore, to properly distinguish between the various redshift estimates and get a better estimate on the redshift error, we make use of the pyplatefit tool developed for the MUSE deep fields (Bacon et al. in prep.) on the weighted, sky-subtracted 1D spectra. pyplatefit is a simplified python version of the original Platefit IDL routines developed by Tremonti et al. (2004) and Brinchmann et al. (2004) for the SDSS project. It performs a global continuum fitting of the spectrum, and then fits individual emission and absorption lines using a Gaussian profile (or asymmetric Gaussian for the Lyman-α line). Multiple redshift estimates are measured for each family of spectral features (nebular emission lines, Balmer absorption lines, Lyman-α, ..), allowing for small velocity offsets (within 150 km/s, but up to 500 km/s for Lyman-α emission).
The line profile fitting is performed using the lmfit python module. The best values and errors on the redshift and spectral line parameters (flux, signal-to-noise ratio, equivalent width) are computed using a bootstrap technique. Two examples of this spectral line fitting are presented in Fig. 6.

Full redshift catalog
We provide a summary of all spectroscopic measurements for each cluster, in the form of two tables available online (examples for the first entries are presented in Tables A.1 and A.2). A main table gives the relevant information for each source: location, redshift and confidence, as well as photometric measurements, magnification and multiple image identification. A companion table lists the measurements obtained by the pyplatefit spectral fits for each source. Figure 7 presents the spectroscopic catalogs overlaid on the color HST images of each cluster, showing the spatial distribution of each redshift category with respect to the MUSE Fields as well as the region where we expect the multiple images. Table 4 summarises the redshift measurements both for all sources and for the high-confidence (zcon f > 1) sources. There is a clear trend for clusters having shallower MUSE data (e.g., MACS0329, Bullet) to have a lower fraction of secure redshifts (zcon f > 1). We present a redshift histogram for all sources in Fig. 8, which is discussed further in Sect. 5. 14% of all redshift measurements from the MUSE datacubes (and 12% of high confidence detections, with zcon f > 1) are sources purely de-Article number, page 11 of 47 A&A proofs: manuscript no. ms tected from their line emission with muselet, i.e., they cannot be securely associated with an HST source from the photometric catalogs. While the number of such line-only sources in the final catalog strongly depends on the depth of the HST images (which varies from 26-29 AB for a point source depending on the field / filter), we have identified a few such sources even at the depth of the HST FF images (∼ 29 AB). They correspond to high equivalent width Lyman-α emitters, comparable to sources discovered in the MUSE deep fields Hashimoto et al. 2017;Maseda et al. 2018). Low-redshift line-only sources are typically associated stripped gas and jellyfish galaxies in the clusters, or high equivalent width emission lines from compact galaxies.  Cluster Nz z min -z max z < z min z min < z < z max z max < z <  Table 4. Summary of all spectroscopic measurements. For each cluster we provide the total number of redshifts measured as well as in separate redshift bins: in the foreground, in the cluster, and in the background. The cluster redshift limits are defined from the velocity distributions and given

Mass models
In order to interpret the measured spectrophotometric properties of lensed background sources in a more physical way, we must first account for (and correct) both the magnification factors and the general shape distortions that massive cluster cores impart to the observations. To do this, we generate parametric models of each cluster's total mass distribution, using the numerous multiple images identified in the MUSE catalogs as constraints. In this section, we describe the procedure we use for this analysis, which is similar to the process employed by the CATS (Clusters As Telescopes) team to generate mass models of the FF clusters (e.g., Richard et al. 2014;Jauzac et al. 2014Jauzac et al. , 2015Jauzac et al. , 2016Lagattuta et al. 2017;Mahler et al. 2018;Lagattuta et al. 2019) and makes use of the latest version (v7.1) of the public Lenstool software 7 (Kneib et al. 1996;Jullo et al. 2007).
To reduce uncertainty, we use the spatial positions of only high-confidence multiple-image systems as constraints for a given model. Specifically, we include systems that are either confirmed spectroscopically (having at least one image with a spectroscopic redshift; called gold systems), or bright HST sources without a spectroscopic redshift, but with an obvious, unambiguous lensing configuration, similar to the ones reported as silver systems in previous works. Here, gold and silver are based on the notation used in previous FF studies (e.g., Lotz et al. 2017). We note that the vast majority of known systems will have new spectroscopic redshifts thanks to MUSE, so this limitation does not strongly bias the final model. In fact, this conservative approach allows us to reach a sufficient precision such that we can assess additional multiple images in the spectroscopic catalogs, as discussed in Sect. 3.6. We also make use of the preliminary mass models and known HST arcs from references listed in Table 1 as a starting point for our strong-lensing analysis.

Model parametrisation
Lenstool uses a Monte Carlo Markov Chain (MCMC) to sample the posterior probability distribution of the model, expressed as a function of the likelihood defined in Jullo et al. (2007). In practice, we minimize the distances in the image plane: ( 3) with θ (i, j) obs and θ (i, j) pred respectively representing the observed and predicted vector positions of multiple image j in system i. σ pos is a global error on the position of all multiple images, which we fix at 0 . 5. This value corresponds to the typical uncertainty in reproducing the strongly-lensed images, which is affected by the presence of mass substructures within the cluster or along the line of sight (Jullo et al. 2010). The best model found (which minimises the χ 2 value) also minimises the global rms between θ (i, j) obs and θ (i, j) pred (herafter rms model ).
In order to estimate all the values of θ (i, j) pred , Lenstool inverts the lens equation with a parametric potential which we assume to be a combination of double Pseudo Isothermal Elliptical mass profiles (dPIE, Elíasdóttir et al. 2007;Suyu & Halkola 2010). dPIE potentials are isothermal profiles which are characterised by a central velocity dispersion σ 0 and include a core radius r core (producing a flattening of the mass distribution at the centre), and a cut radius r cut (producing a drop-off of the mass distribution on large scales). Together with the central position (x c ,y c ) and elliptical shape (ellipticity e, angle θ), dPIE potentials are fullydefined by 7 parameters.
For each cluster, we include a small number of cluster-scale dPIE profiles (up to three) to account for the smooth large-scale components of the mass distribution. Additionally, we assign individual dPIE potentials to each cluster member, which represent the galaxy-scale substructure. We fix the cut radius of clusterscale mass components to 1 Mpc (e.g. Limousin et al. 2007), as it is generally unconstrained by strong lensing, but allow the other parameters to vary freely in the fit.
Cluster members used as galaxy-scale potentials are selected through the red sequence of elliptical galaxies, which is wellidentified in a color-magnitude diagram based on the HST photometry. More details on this selection are provided in, e.g., Richard et al. (2014) for the FF clusters. For the newest lens models and most recent HST observations, we make use of the F606W-F814W vs F160W color-magnitude diagram to select cluster members down to 0.01 L * , where L * is the characteristic luminosity at the cluster redshift based on the luminosity function from Lin et al. (2006). As a further refinement to this process, we cross-check the selected members with existing redshifts in our MUSE spectroscopic catalogue.
To reduce the number of free parameters in the model, we assume that the shape parameters (x c , y c , e, θ) of the galaxy-scale potentials follow the shape of their light distribution, and use the following scaling relations on the dPIE parameters with respect to their luminosity L: which follow from the Faber & Jackson (1976) relation of elliptical galaxies and the assumption of a constant mass-to-light ratio. The fixed value for r core is negligible and follows from the discussion in Limousin et al. (2007). The mass models are constructed through an iterative process, using a single cluster-scale dPIE profile at first to fit the most confident sets of multiply imaged systems. Other systems are tested and included as additional constraints, and a second or third cluster-scale dPIE profile is added when it has a significant effect in reducing rms model . While most galaxy-scale mass components follow the general scaling relations presented in Equation 4, we do fit the mass parameters of some individual galaxies separately, in special cases where we can constrain these values using additional information outside of the scaling relation. This is particularly the case for (a) BCGs, which are known not to follow the usual scaling relations of elliptical galaxies, (b) cluster members lying very close (<∼ 5 ) to multiple images, which locally influence their location (and sometimes produce galaxygalaxy lens systems), and (c) additional galaxies located slightly in the foreground or background of the cluster, which produce a local lensing effect on their surroundings but will not follow the same scaling relations.
Finally, for the most complex clusters (A370, MACS1206, RXJ1347) we have tested the addition of an external shear potential. Typically used when modelling galaxy-galaxy lensed systems (e.g., Desprez et al. 2018), this potential accounts for an unknown effect of the large scale mass environment surrounding the cluster, Article number, page 18 of 47 in the form of a constant shear γ ext at an angle Φ ext . However it does not bring any additional mass to the model (see also the discussions in Mahler et al. 2018 andLagattuta et al. 2019).

Results of the strong lensing analysis
In Table 5, we summarise the number of strong-lensing constraints (systems and images) which were confirmed by our strong lensing analysis for each cluster, along with the rms from the best fit models. The complete list of multiple images and the best-fit parameters of the mass models are detailed in Appendix B.
Overall, these models are among the most constrained (in terms of number of independent multiply imaged systems) stronglensing cluster cores to date. We stress that this occurs not only for clusters with deep HST imaging, such as those available in the FFs, but even clusters in our sample observed with snapshot HST images are densely constrained; for example, MACS0257 has 25 systems with spectroscopic redshifts, producing 81 total multiple images over a single 1 arcmin 2 MUSE field. This is entirely dominated by the population of multiply-imaged Lymanα emitters, which are very faint in HST but are easily revealed by the MUSE IFU.
Such a high density of constraints allows for a precise measurement of the mass profile, with a typical statistical error of < 1% (e.g., Jauzac et al. 2014;Caminha et al. 2019b). The improvement is typically a factor of ∼5 with respect to the models prior to MUSE observations (e.g. Richard et al. 2015). This has many applications, allowing us for example to test different parametrisations of the mass models and line of sight effects (e.g., Chirivì et al. 2018) to improve the rms model even further and reduce systematics. Some discussion on these effects were presented for the clusters Abell 2744 and Abell 370 in Mahler et al. (2018) and Lagattuta et al. (2019), respectively. One route to improve the models further is to combine the parametric mass models with a non-parametric (grid-like) mass distribution using a perturbative approach (Beauchesne et al. submitted).
Another application of these very-constrained models is to make use of the high density of constraints at the center of the cluster, and probe the inner slope of the mass distribution to test predictions from Λ-CDM simulations (Caminha et al. 2019a). Several clusters in our sample show so-called 'hyperbolic-umbilic' lensing configurations (Orban de Xivry & Marshall 2009), which are rarely seen and provide us with very important constraints in the cluster core (Richard et al. 2009). Finally, multiply-imaged systems appearing in the same regions of the cluster but originating from very different source plane redshifts are important for strong-lensing cosmography studies (Jullo et al. 2010;Caminha et al. 2016a;Acebron et al. 2017).
A full discussion of the 2D mass distribution of individual clusters is beyond the scope of this paper. However, in Table 5 we provide the main cluster parameters derived from our lens model which characterise their lensing power: namely the equivalent Einstein radius θ E at z = 4 (defined from the area within the critical curve Σ, as θ E = √ Σ/π), and the enclosed projected mass within this radius M < θ E . We selected z = 4 as it is the average DLS/DS (the lensing efficiency factor, ratio of the distance between the lens and the source over the distance to the source) for Lyman-α emitters which contribute to the majority of constraints.

Redshift distribution
The complete spectroscopic dataset of the 12 clusters amount to more than 3200 high-confidence redshifts, consisting of galaxies either in the foreground, in the cluster itself, or in the background. Overall, cluster members dominate the redshift distribution; this is clearly apparent in Table 4, where cluster members account for ∼40% of the total spectroscopic sample. We display a histogram of the global redshift distribution combining all clusters in Fig. 8. The left panel shows the distribution of all and high confidence redshifts when limited to unique sources, i.e., removing the additional images of strongly lensed systems. The right panel shows the distribution of line-only sources (i.e. sources detected purely from line emission without any HST counterpart).
These histograms reveal a number of important features in the data, both intrinsic to our sample and unique to the MUSE instrument. Notably, since all of the clusters are located within a relatively narrow redshift range, there is a prominent peak around z = 0.4 representing the cluster overdensities. At the same time, we see that multiply imaged lensed sources start to represent a significant fraction (37%) of all galaxies at z > 1.8, as evidenced by the increasing discrepancy between the "all" (grey) and "unique" (purple) distributions in the left panel. Furthermore, the MUSE redshift desert (1.5< z <2.9), a region where no strong emission emission lines ([O ii], Lyman-α) are present in the MUSE wavelength range, is also clearly visible, with a significant deficit of redshifts (especially for line-only sources) in those bins. Finally, the redshift histogram at z > 2.9 is dominated by the population of Lyman−α emitters, which are more easily detected between sky lines. This produces small gaps in the redshift distribution at wavelengths where the sky is brighter, in particular at z ∼ 4.6 and z > 5.8.
Compared to the total redshift histograms, individual clusters tend to show more prominent peaks at specific redshifts. This is a known effect of galaxy clustering in the small area probed behind lensing clusters (e.g., Kneib et al. 2004). Specific grouplike structures are seen at z ∼ 4 behind Abell 2744 (Mahler et al. 2018) and z ∼ 1 behind Abell 370 (Lagattuta et al. 2019).

Kinematics of the cluster cores
The high velocities of galaxies in cluster cores appear clearly in the redshift distribution when zooming-in around the systemic cluster redshift (Fig. 9). The shape of the distribution generally follows a single normal distribution, with the exception of Abell 2744 showing a clear bimodal distribution. For that particular cluster, Mahler et al. (2018) demonstrated that the two velocity components seen in the inner 400 kpc radius persist out to much larger distances, as they are associated (and in good agreement) with sparser spectroscopic data covering > 1 Mpc scales (Owers et al. 2011).
Although a full analysis of the cluster kinematics is clearly limited by the spatial coverage of MUSE-only spectroscopy in the cluster core, the large number of cluster redshifts measured in the very central core (R < 300 kpc) still provides insight into the overall velocity dispersion in the different cluster components seen in the distribution. We fit either 1 or 2 Gaussian components per cluster to the velocity distribution and estimate the global velocity dispersion σ dynamics as the quadratic sum of their σ. We independently estimate a global velocity dispersion σ model from our strong lensing model, by taking the quadratic sum of the σ 0 parameters in the cluster-scale dPIE components. We include a   1.3× correction factor between the line-of-sight velocity dispersion and the dPIE σ 0 parameters, as discussed in Elíasdóttir et al. (2007) for the typical values of core and cut radii in our cluster potentials. Figure 9 (right panel) directly compares the two velocity dispersion estimators for the 12 clusters. Taken at face value, we find that the clusters span a large range in σ, but there is a generally good agreement between the two estimators for most of the clusters. This shows that despite the strong differences in the dynamical state of the clusters, the mass components included in our mass models account for a similar amount of mass as seen in the cluster velocity distribution, although we cannot directly associate between the large-scale clumps of the lens model and the velocity components. We observe a large dispersion on σ dynamics , which is certainly due to the limited coverage of the velocity measurements and line-of-sight projection effects due to the overall geometry of the cluster. Moreover, clusters showing very high σ dynamics (close to ∼2000 km/s, as for Abell 370 and MACS1206) are most likely formed of multiple lower σ components which we cannot isolate individually. RXJ1347 is the most discrepant point in this diagram, with σ dynamics ∼ 1100 km/s despite it having the largest mass of the sample within its Einstein radius. This could be explained if multiple cluster structures are present with a low velocity difference, enhancing its lensing power while maintaining a low σ dynamics . Regardless, we find an average ratio < σ dynamics /σ model >=0.92±0.14. Even though the cluster dynamics only probe the inner regions (< 200 kpc), we do not see a significant under-or overestimation of the global velocity dispersion in the velocity measurements.

Magnification distribution and survey volume at high redshift
One of the direct outputs of the mass models is an estimate of the magnification factor at a given source redshift and image position. These values are crucial in order to correct absolute physical measurements of lensed sources, such as the stellar mass, luminosity, and star-formation rate, etc. We include this estimated magnification µ at the central location of each source in the spectroscopic catalog (Table A.1). These values are generally well-constrained within the multiply imaged regions, but tend to be overestimated in the vicinity of the critical lines (typically µ > 25), where the emergence of lensed pairs and the resolved sources limit very strong magnification factors. We present the magnification distribution of the full set of lensed images in Fig.10 Magnification distribution for all background sources. We overlay a theoretical µ −2 relation to guide the eye. We observe a significant drop in the number of sources at low magnifications (µ < 1.5), due to the limits set by the FoV of the MUSE observations, and at very high magnifications (µ > 25) due to the very small statistics and possibly the resolved nature of Lyman-α emitters. each observed image individually (i.e., multiply-imaged systems are not combined into a single entry). The minimum magnification is set by the limits of the MUSE coverage in each cluster, but is typically µ ∼ 1.5 − 2. The distribution N(µ) at µ > 2 follows a clear power-law, with an exponent µ −2.02±0.09 . This value is very close to the theoretical prediction of optical depth for strong lensing magnification, τ ∝ µ −2 (Blandford & Narayan 1986) in the case of smooth mass distributions like the ones used in our modelling. This shows that our detection process of MUSE lensed sources is not strongly affected and missing additional sources at very high magnifications.
Another important measurement we derive from the lens model is the equivalent source-plane area covered by the MUSE observations. This differs from the image-plane area (i.e., the MUSE footprint) due to the strong lensing effect, with the effective source-plane area reduced by the same amount as the magnification factor; unsurprisingly this also diminishes the associated survey volume behind the cluster. Figure 11 presents the source-plane area for each cluster at a typical source redshift z = 4, with the colorscale representing the greatest magnification factor at a given source position (recall that sources lying within the multiply-imaged region will have more than one magnification solution). In the figure, the strongly lensed region itself appears in the form of the caustic lines in the source plane.
Compared to the total observed area of ∼ 23.5 arcmin 2 on sky covered with MUSE, the total effective area covered in the source plane at high redshift is 4 arcmin 2 , a factor 6× smaller. From these source plane areas we compute the effective volume covered at source redshifts 2.9 < z < 6.7 when detecting lensed Lyman-α emitters, as a function of lensing magnification. This is important knowledge when probing the luminosity function of Lyman-α emitters behind clusters (Bina et al. 2016;de La Vieuville et al. 2019) compared to blank fields (Drake et al. 2017;Herenz et al. 2019). We stress that there are strong cluster-to-cluster variations in the volume surveyed depending on the geometry of the caustics and the MUSE coverage, with a maximum volume ranging between 600 and 10000 comoving Mpc 3 in individual clusters at any magnification. While a full volume computation would require more precise completeness measurements accounting for spatial variations of exposure time and the evolution of the noise as a function of wavelength, these variations can easily explain the cluster-to-cluster differences in the observed number counts of LAEs, as found by de La Vieuville et al. (2019).

Resolved properties of high redshift galaxies
Despite the effective surface reduction in the source plane, one of the strong benefits from the magnification is to increase the apparent size of lensed galaxies, allowing us to reach much smaller intrinsic scales. Combined with MUSE integral field spectroscopy, this magnification gives access to resolved spectral properties in particular from the nebular line emission. Some of the brightest and most extended arcs at 0.5 < z < 1.5 in the present sample have been studied in Patrício et al. (2018) to measure the detailed gas kinematics at sub-kpc scales, in relation with the presence of Article number, page 21 of 47 A&A proofs: manuscript no. ms star-forming clumps and signficant resolved metallicity gradients as a function of radius (Patrício et al. 2019). These bright extended arcs are rare cases of very strong magnification in typical L * galaxies. But the measurement of resolved kinematics can be pushed to fainter masses / luminosities and larger samples to study the Tully-Fisher relation and morpho-kinematics of lensed sources reconstructed in the source plane. This allows spatially resolved analyses which are complementary to blank field studies, the latter being more strongly limited by the spatial resolution (Contini et al. 2016;Girard et al. 2020, Abril-Melgarejo et al. submitted).
At higher redshifts (z > 2.9) IFU observations have now established that the Lyman-α emission of distant galaxies is typically more spatially extended than the stellar UV continuum, illuminating the surrounding circumgalactic medium (Wisotzki et al. 2016;Leclercq et al. 2017). The origin of this Lyman-α emission is complex, and only the brightest Lyman-α haloes are sufficiently extended to allow spatially resolved analyses in the absence of any lensing effect (Erb et al. 2018;Leclercq et al. 2020) and test predictions from simulations. MUSE observations on lensing clusters offer the power to identify extended gas around high redshift galaxies (once magnified), allowing detailed studies of the spectral line properties within multiple regions, in particular through bright Lyman-α extended haloes (Patrício et al. 2016;Smit et al. 2017;Caminha et al. 2016b;Vanzella et al. 2017;Claeyssens et al. 2019). Figure 12 shows a very clear example taken from our sample of extended emission detected over the z = 4.086 source originally identified by Cohen & Kneib (2002) in RXJ1347. This galaxy appears as a very compact source in the HST image (FWHM<0 . 3, limited by the PSF), while the Lyman-α emission reaches an extension of 13 in the form of an arc. The critical line at z = 4.086 straddles the arc at the location of the UV source, which appears as a bright H ii knot coincidentally located over the caustic line in the source plane. This example illustrates the capability of MUSE observations to clearly disentangle between stellar and nebular emission at these high redshifts. We identified 2-3 such magnified (spanning >5 ) Lyman-α emitters per cluster, for which we will be able to retrieve detailed physical properties at scales better than typically ∼1 kpc (Claeyssens et al. in prep.).

Summary and conclusion
We have presented a spectroscopic survey of MUSE/VLT observations towards 12 massive clusters selected from the MACS, CLASH, and FF samples. The data represent ∼23 arcmin 2 on sky as single pointings or mosaics, with an effective exposure time ranging between 2 and 15 hrs and a grand total of > 125 hrs.
Thanks to improvements in the data reduction and the tools developed for source detection and inspection in the datacubes, our final spectroscopic catalog presents > 3200 spectroscopic redshifts with high confidence (zcon f = 2 or 3). Improvements include careful self-calibration and specific masking to remove instrumental systematics and other artefacts in the datacube, a better estimate of the noise properties, and the establishment of a clear process for spectral analysis.
Deep (> 8 hrs per pointing) MUSE observations reveal a very high density of faint lensed LAEs, which dominate the population of background galaxies with spectroscopic redshifts. A very significant fraction of high-redshift galaxies are multiply-imaged systems, which we use to precisely constrain mass models in the cluster cores. The number of strong lensing constraints in a single cluster, reaching 70 systems producing 189 multiple images in the case of MACS0416, and the fraction of spectroscopically confirmed images (> 90% in the models used in our analysis) are among the highest in any known cluster including the FFs. These parametric mass models are fully included in our redshift assessment process to help confirm or reject some multiple images.
Based on our cluster mass models, we assess the source plane survey area at high redshift (z > 3) and find that the effective surface area is ∼ 6 arcmin 2 , a factor of 6 smaller than in the image plane due to the lensing magnification. This shows that the number density of LAEs is very high at the low luminosities probed through lensing magnification. The magnification distribution N(µ) is compatible with the theoretical predictions in the very strong magnification regime (2< µ < 25), showing that our detection process for lensed sources is not missing additional sources in the high magnification regime. We note that, in the area covered by MUSE observations, multiple images dominate the detected sources at z > 1.8, while the fraction of multiple images is much lower at z < 1.5 (Fig. 8). This difference is related to the magnification bias (Broadhurst et al. 1995), which affects the source populations differently depending on the redshift and therefore intrinsic luminosity. The redshift range 1.8 < z < 6.7 is therefore ideal to identify a large number of multiply imaged sources and makes MUSE very efficient for this purpose at 3 < z < 6.7. The future BlueMUSE instrument ) will complement this redshift range by targetting LAEs at 1.9 < z < 4, where the density of multiply-imaged LAEs is expected to be very high.
We make the results of this analysis publicly available in the form of a first data release 8 , including the reduced datacubes, spectrophotometric catalogs, extracted spectra and mass models. Additional MUSE observations of lensing clusters taken under the GTO program and archival observations (including lower redshift clusters such as Abell 1689, 2390 and 2667) will be inspected in a similar manner and their data released in subsequent catalogs. Overall, this very rich dataset has a strong legacy value, which a large variety of statistical studies from cluster physics to very high redshift galaxies. The sample of magnified Lymanα emitters will be particularly suitable for follow-ups at longer wavelength with the James Webb Space Telescope (JWST) and the Atacama Large Millimeter/submillimeter Array (ALMA) to better constrain the resolved star formation and clump properties through ISM diagnostics.  A. and ∆Dec.), dPIE shape (ellipticity and orientation), velocity dispersion, core and cut radius. The final row is the generic galaxy mass at the characteristic luminosity L * , which is scaled to match each of cluster member galaxies. Parameters in square brackets are fixed a priori in the model.