Calibration and Performance of the NIKA2 camera at the IRAM 30-meter Telescope

NIKA2 is a dual-band millimetric continuum camera of 2900 Kinetic Inductance Detectors (KID), operating at $150$ and $260\,\rm{GHz}$, installed at the IRAM 30-meter telescope. We present the performance assessment of NIKA2 after one year of observation using a dedicated point-source calibration method, referred to as the \em{baseline} method. Using a large data set acquired between January 2017 and February 2018 that span the whole range of observing elevations and atmospheric conditions encountered at the IRAM 30-m telescope, we test the stability of the performance parameters against time evolution and observing conditions. We assess the robustness of the performance results using the \em{baseline} method against systematic effects by comparing to results using alternative methods. We report an instantaneous field of view (FOV) of 6.5' in diameter, filled with an average fraction of $84\%$ and $90\%$ of valid detectors at $150$ and $260\,\rm{GHz}$, respectively. The beam pattern is characterized by a FWHM of $17.6'' \pm 0.2''$ and $11.1''\pm 0.1''$, and a beam efficiency of $77\% \pm 2\%$ and $55\% \pm 3\%$ at $150$ and $260\,\rm{GHz}$, respectively. The rms calibration uncertainties are about $3\%$ at $150\,\rm{GHz}$ and $6\%$ at $260\,\rm{GHz}$. The absolute calibration uncertainties are of $5\%$ and the systematic calibration uncertainties evaluated at the IRAM 30-m reference Winter observing conditions are below $1\%$ in both channels. The noise equivalent flux density (NEFD) at $150$ and $260\,\rm{GHz}$ are of $9 \pm 1\, \rm{mJy}\cdot s^{1/2}$ and $30 \pm 3\, \rm{mJy}\cdot s^{1/2}$. This state-of-the-art performance confers NIKA2 with mapping speeds of $1388 \pm 174$ and $111 \pm 11 \,\rm{arcmin}^2\cdot \rm{mJy}^{-2}\cdot \rm{h}^{-1}$ at $150$ and $260\,\rm{GHz}$.


Introduction
Sub-millimetric and millimetric domains offer a unique view of the Universe from nearby astrophysical objects, including planets, planetary systems, galactic sources, nearby galaxies, to high-redshift cosmological probes, e.g.distant dusty starforming galaxies, clusters of galaxies, Cosmic Infrared Background (CIB), Cosmic Microwave Background (CMB).
Ground-based continuum millimeter experiments have made spectacular progress in the past-two decades thanks to the advent of large arrays of high-sensitivity detectors (Wilson et al. 2008;Siringo et al. 2009;Swetz et al. 2011;Bleem et al. 2012;Monfardini et al. 2011;Holland et al. 2013;Dicker et al. 2014;Adam et al. 2017a).This fast growth will continue as experimental efforts are driven by two challenges: improving the sensitivity to the polarisation to detect the signature of the end-of-inflation gravitational waves in the CMB, and reaching arcminute angular resolution to exploit the CMB secondary anisotropies as cosmological probes, and sub-arcminute resolution to unveil the inner structure of faint or complex astrophysical objects and to map the early universe down to the confusion limit.Furthermore, new generation of sub-arcminute millimetric experiments, like NIKA2, which combines high angular resolution with a high mapping speed and a large coverage in the frequency domain, will achieve a breakthrough in our detailed understanding of the formation and evolution of structures throughout the Universe.
The New IRAM KID Array Two, NIKA2, is a subarcminute-resolution high-mapping speed camera observing simultaneously a 6.5' diameter field of view (FOV) in intensity in two frequency channels centered at 150 and 260 GHz and in polarisation at 260 GHz (Adam et al. 2018).NIKA2 was installed at the IRAM 30-m telescope in October 2015 with a partial readout electronics, and operated in the final instrumental configuration since January 2017.After a successful commissioning phase that ended in October 2017, NIKA2 is open to the community for science-purpose observations for the next decade.NIKA2 will provide key observations both at the galactic scale and at high redshifts to address a plethora of open questions, including the environment impact on dust properties, the star formation processes at low and high redshifts, the evolution of the largescale structures and the exploitation of galaxy clusters for accurate cosmology.
At the galactic scale, progress in understanding the star formation process relates to an accurate characterization of dust properties in the interstellar medium (ISM).NIKA2 will provide the high-resolution high-mapping speed dual-wavelength millimeter observations that are required for the determination of the mass and emissivity of statistically significant samples of dense, cold, star-forming molecular clouds (Rigby et al. 2018).Deep multi-wavelengths surveys of nearby galaxies and of large areas of the galactic plane also allows for setting constraints on environmental-related variations of the dust properties.Furthermore, NIKA2 observations are needed for a detailed study of the inner molecular cloud filamentary structure that hosts Solarmass star progenitors (Bracco et al. 2017), to understand the evolution process that culminates in star formation (see e.g.André et al. (2014) for a review).Ultimately, these observations are also helpful to understand planet formation within proto-planetary disks.
For cosmology, NIKA2 observations will have two major implications.On the one hand, they represent a unique opportunity to study the evolution of the galaxy cluster mass calibration with redshift and dynamical state for their accurate exploitation as cosmological probe.Galaxy clusters are efficiently de-tected via the thermal Sunyaev-Zel'dovich (tSZ) effect (Sunyaev & Zeldovich 1970) up to high redshifts (z < 1.5), as was recently proven by CMB experiments (Hasselfield et al. 2013;Reichardt et al. 2013;Ade et al. 2016a).The exploitation of the vast SZselected galaxy cluster catalogues is currently the most powerful approach for cosmology with galaxy clusters (Ade et al. 2016b).However, the accuracy of the tSZ cluster cosmology relies on the calibration of the relation between the tSZ observable and the cluster mass and on the assessment of both its redshift evolution and the impact of the complex cluster physics on its calibration.Previous arcminute resolution experiments only allowed detailed studies of the intra cluster medium spatial distribution for low redshift clusters (z < 0.2).Sub-arcminute resolution high mapping speed experiments are required to extend our understanding of galaxy cluster towards high redshift (Mroczkowski et al. 2019).The first high-resolution tSZ mapping of a galaxy cluster with NIKA2 has been reported in Ruppin et al. (2018).Furthermore, NIKA2 capabilities for the characterization of highredshift (0.5 < z < 0.9) galaxy clusters have been verified using numerical simulation (Ruppin et al. 2019b), and their implication for cosmology has been discussed in Ruppin et al. (2019a).
On the other hand, in-depth mapping of large extragalactic fields with sub-arcminute resolution with NIKA2 will provide unprecedented insight on the distant universe.Indeed, hundreds of dust-obscured optically-faint galaxies will be detected up to high redshift (z ≈ 5) during their major episodes of star formation.For this purpose, the high-angular resolution is key to reduce the confusion noise, which is the ultimate limit of cosmological surveys (Bethermin et al. 2017), while the high mapping speed allows this irreducible noise level to be reached in the extragalactic field map in a short integration time.This will help quantifying the star formation at z ≈ 5, probing the whole star formation evolution across cosmic ages and clarifying the role of the dusty galaxies in the reionization of the universe (Mancuso et al. 2016).Moreover, NIKA2 observations will be critical for our understanding of galaxy formation and evolution.
The current generation of sub-arcminute resolution experiments also include the Large APEX Bolometer Camera (LABOCA (Siringo et al. 2009)) at the Atacama Pathfinder Experiment (APEX) 12-meter telescope, which covers a 12' diameter FOV at 345 GHz at an angular resolution of about 19"; AzTEC at the 50-meter Large Millimeter Telescope, which operates with a single bandpass centered at either 143, 217 or 270 GHz (Wilson et al. 2008), and which has a beam FWHM of 5, 10 or 18 , respectively; the Submillimeter Common User Bolometer Array Two (SCUBA-2 (Holland et al. 2013;Dempsey et al. 2013)) on the 15-meter James Clerk Maxwell Telescope, which simultaneously images a FOV of about 7' at 353 GHz and 666 GHz with a main beam FWHM of 13" and 8" in the two frequency channels, respectively ; MUSTANG-2 at the 100-meter Green Bank telescope, which maps a 4.35' FOV at 90 GHz with 9" resolution (Dicker et al. 2014;Stanchfield et al. 2016).Therefore, NIKA2 is unique in combining an angular resolution better than 20", an instantaneous FOV of a diameter of 6.5' and multiband observation capabilities at 150 and 260 GHz.
Most of the other millimeter instruments consist of bolometric cameras.By contrast, NIKA2 is based on the Kinetic Inductance Detectors (KID) technology (Day et al. 2003;Doyle et al. 2008;Shu et al. 2018).This concept has been first demonstrated with a pathfinder instrument, NIKA (Monfardini et al. 2010(Monfardini et al. , 2011)).Installed at the IRAM 30-m telescope until 2015, NIKA demonstrated state-of-the-art performance (Catalano et al. 2014), and obtained breakthrough results (see e.g., Adam et al. (2014); Adam et al. (2017b).NIKA has been crucial in optimizing the NIKA2 instrument and data analysis.
A thorough description of the NIKA2 instrument is presented in Adam et al. (2018), along with the results of the commissioning in intensity based on the data acquired during the two technical campaigns of 2017.In the present paper, we propose a standard calibration method, which is referred to as the baseline calibration, to go from raw data to stable and accurate flux density measurements.Furthermore we assess the performance in intensity using a large amount of data acquired between January 2017 and February 2018.Regarding the performance of the polarization capabilities, their assessment will be addressed in a forthcoming paper.To achieve a reliable and high-accuracy estimation of the performance, we perform extensive testing of the stability with respect to both the analysis methodological choices and to the observing conditions.First, the methodological choices and hypothesis may have an impact on the performance results and the systematic errors.At each step of the calibration procedure and for each performance metrics, we compare the results obtained using the baseline method to alternative approaches to ensure the robustness against systematic effects.Second, we check the stability of the results using a large number of independent data sets corresponding to various observing conditions.Specifically, most of the performance assessment relies on data acquired during the February 2017 technical campaign (N2R9) and the October 2017 (N2R12) and January 2018 (N2R14) first and second scientific-purpose observation campaigns.These observation campaigns are referred to as the reference observation campaigns.Each campaign consists of about 1300 observation scans lasting between two and twenty minutes for a total observation time of about 150 hours.
This paper constitutes a review of NIKA2 calibration and performance assessment in intensity.It is intended to be a reference for observations with NIKA2, which will last at least for ten years.The outline of the paper is as follows: Sect. 2 to 4 give short summaries of the instrumental set up, the observational modes and the data analysis methods that have been used for the calibration and the performance assessment.Sect. 5 to 10 detail the dedicated calibration methods, extract the key characteristic results and discuss their accuracy and robustness.The field-ofview reconstruction and the KID selection for science purpose are discussed in Sect. 5.The beam pattern is characterized in Sect.6, along with the main beam full-width at half maximum and the beam efficiency.Sect.7 is dedicated to the derivation of the atmospheric opacity.The methods that we have proposed to calibrate are gathered in Sect.8, while Sect.9 presents the validation of these methods and the calibration accuracy and stability assessment.The noise characteristics and the sensitivity are discussed in Sect.10.Finally, Sect.11 summarizes the main measured performance characteristics and briefly describes next steps for future improvements on NIKA2.

General view of the instrument
NIKA2 simultaneously images a FOV of 6.5' in diameter at 150 and 260 GHz.It also has polarimetric capabilities at 260 GHz, which are not discussed here.To cover the 6.5'-diameter FOV without degrading the telescope angular resolution, NIKA2 employs a total of around 2 900 KIDs split over three distinct arrays, one for the 150 GHz band and two for the 260 GHz band.
A detailed description of the instrument can be found in Adam et al. (2018) and Calvo et al. (2016).We briefly present here the main sub-systems focusing on the elements that are spe-cific to NIKA2 or that drive the development of dedicated procedures for the data reduction or calibration.

Cryogenics
KIDs are superconducting detectors, which in the case of NIKA2 are made of thin-aluminium films deposited on a silicon substrate (Roesch et al. 2012).For an optimal sensitivity, they must operate at a temperature of around 150 mK, that is roughly one order of magnitude lower than the aluminium superconducting transition temperature.For this reason, NIKA2 employs a custom-built dilution fridge to cool down the focal plane, and the refractive elements of the optics.Overall, a total mass of around 100 kg is kept deeply in the sub-Kelvin regime.Despite the complexity and huge size of the system, the operation of NIKA2 does not require external cryogenic liquids and can be fully operated remotely.

Optics
The NIKA2 camera optics include two cold mirrors and six lenses.The filtering of unwanted (off-band) radiation is provided by a suitable stack of multi-mesh filters as thermal blockers placed at all temperature stages between 150 mK and room temperature.Two aperture stops, at a temperature of 150 mK, are conservatively designed to limit the entrance pupil of the optical system to the inner 27.5 m diameter of the primary mirror.An air-gap dichroic plate splits the 150 GHz (reflection) from the 260 GHz (transmission) beams.This element, which is made of a series of thin micron-like membranes separated by calibrated rings and mounted on a native ring in inox, has been designed to resist to low temperature-induced deformation.As discussed in Sect.8.2, the air-gap technology has been proven to be efficient in preserving the planarity of the dichroic, but shows sub-optimal performance in transmission.Moreover, a grid polariser ensures the separation of the vertical and horizontal components of the linear polarizations on the 260 GHz channel.Band-defining filters, custom-designed to optimally match the atmospheric windows while ensuring robustness against average atmospheric condition at 260 GHz, are installed in front of each array.A half-wave polarization modulator is added at room temperature when operating the instrument in polarimetric mode.
Hereafter, the detector array illuminated by the 150 GHz (2 mm) beam is named Array 2 (A2), while in the 260 GHz (1.15 mm) channel, the array mapping the horizontal component of the polarization is referred to as Array 1 (A1) and the one mapping the vertical component is called Array 3 (A3).The 150 GHz observing channel is referred to as the 2 mm band and the 260 GHz channel as the 1 mm band.

KIDs and electronics
Array 2 consists of 616 KIDs, arranged to cover a 78 mm diameter circle.Each pixel has a size of 2.8 × 2.8 mm 2 , which is the maximum pixel size allowed not to degrade the theoretical 30m telescope angular resolution.In the case of the 260 GHz band detectors, the pixel size is 2 × 2 mm 2 , to ensure a comparable sampling of the focal plane.In order to ensure a full coverage of the 6.5' FOV, a total of 1,140 pixels is needed in each of the two 260 GHz arrays A1 and A3.
The key advantage of the KID technology is the simplicity of the cold electronics and the multiplexing scheme.In NIKA2, each block of around 150 detectors is connected to single coaxial line linked to the readout electronics (Bourrion et al. 2016).Hence Array 2 is connected to four different readout feed-lines, while Array 1&3 are both equipped with eight feed-lines.The warm electronics required to digitize and process the pixels signals is composed of twenty custom-built readout cards (one per feed-line).

KID photometry and tuning
KIDs are superconducting resonators whose resonance frequency shift linearly depends on the incoming optical power.This was theoretically demonstrated in Swenson et al. (2010) and confirmed for NIKA KIDs, which have a similar design to the current NIKA2 KIDs, using laboratory measurements, as discussed in Monfardini et al. (2014).The measurement of the KID frequency shift ∆ f is critical for the use of KIDs as mm-wave detectors.
For the KID readout, an excitation signal is sent into the cryostat on the feedline coupled to the KID.The excitation tones produced by the electronics are amplified by a cryogenic (4 K) low noise amplifier after passing through the KIDs and being analysed by the readout electronics.Each KID is thus associated with an excitation tone at a frequency f tone , which corresponds to an estimate of its resonance frequency for a reference background optical load.The transmitted signal can be described by its amplitude and phase, or, as is common practice for KID, by its I (in-phase) and Q (quadrature) components with respect to the excitation signal.The goal is now to relate the measured variations of the KID response to the excitation signal (∆I, ∆Q), which are induced by incident light, to ∆ f .For this, the electronics modulates the excitation tone frequency f tone at about 1 kHz with a known frequency variation δ f and the read out gives the induced transmitted signal variations (dI, dQ).Projecting linearly (∆I, ∆Q) on (dI, dQ) therefore provides ∆ f .This quantity, in Hz, constitutes the raw KID time-ordered data, which are sampled at a frequency of 23.84 Hz.For historical reasons, this way of deriving KID signals has been nicknamed RfdIdQ.More details on this process are given in Calvo et al. (2013).Once ingested into the calibration pipeline, the raw data will be further converted into astronomical units (Sect.8).
In addition to light of astronomical origin, any change in the background optical load (due, for example, to changes in the atmospheric emission with elevation) contributes as well to the shift of the KID resonance frequencies.In order to maximize the sensitivity of a KID, the excitation signal f tone must always be near the KID resonance frequency.We therefore have developed a tuning algorithm that performs this optimization.A tuning is performed at the beginning of each observation scan to adapt the KIDs f tone to the working background conditions.This process takes only a few seconds.These optimal conditions are further maintained by performing continuous tunings between two scans while NIKA2 is not observing, to match regularly with the observing conditions.

Bandpasses
The NIKA2 spectral bands have been measured in laboratory using a Martin-Puplett interferometer built in-house (Durand 2007).The measurement relies on using the difference of two black body radiations used as input signal for the interferometer.Both arrays and filter bands were considered in the measure-Fig.1. Relative system response of the three KID arrays as a function of frequency.For illustration we also show in black the atmospheric transmission obtained with the ATM model (Pety et al. 2009;Pardo et al. 2001) for two values of precipitable water vapor (1 and 5 mm).The spectra of the model of Uranus (pink) and the model of Neptune (green), as discussed in Appendix A, are also plotted for illustration with arbitrary normalization with respect to the NIKA2 transmission.ments, while the dichroic element was not included.Figure 1 shows the relative spectral response for the three arrays.Notice that Array 2 was upgraded in September 2016 (during the socalled N2R5 technical campaign) and that the spectral transmissions are slightly different (red and orange lines in Fig. 1).
The two arrays operating at 260 GHz, mapping different linear polarisations, exhibit a slightly different spectral behaviour as can be seen on Fig. 1.Besides the effect of the optical elements in front of the two arrays, this may be explained by a tiny difference in the silicon wafer, the difference of the Aluminium film thickness of the KID arrays and/or the different responses for two polarisations of the detector (Adam et al. 2018;Shu et al. 2018).
It is clear from Fig. 1 that the atmosphere will modify the overall transmission of the system, especially at the transmission tails for A2.To highlight this effect we compute an effective frequency ν eff computed as the weighted integral of the frequency considering the NIKA2 bandpass and the SED of Uranus, for various atmospheric conditions.In Table 1, we give ν eff and the bandwidths ∆ν computed both at zero atmospheric opacity and for the reference IRAM 30-m winter-semester observing conditions (defined by an atmospheric content of 2 mm of precipitable water vapor and an observing elevation of 60 o ).
In laboratory spectral characterization allows the bandpass of the three arrays to be measured with uncertainties better than one percent.The bandpass characterization will be further improved using in-situ measurements with a new Martin-Puplett interferometer designed to be placed in front of the cryostat window, which will allow to account for the whole optical system, including the dichroic plate.As it will be discussed in Sect.8, the baseline calibration method only resorts to the bandpass measurements for color correction estimations in order to mitigate the bandpass uncertainty propagation in the flux density measurement.In fact, for each array, we define reference frequencies Table 1.Effective frequencies ν eff and bandwidths ∆ν of the three arrays computed in two different observing conditions defined with the precipitable water vapor (pwv) contents in mm and the observing elevation in degrees).

Observations
This section presents the different observation modes that are used at the IRAM 30-m telescope for both commissioning and scientific-purpose observations with NIKA2.Each observation campaign is organized as observing pool allowing to optimize observations of several science targets in a flexible way.Most of these observing scans are on-the-fly (OTF) raster scans, which consist of a series of scans at constant azimuth or elevation (right ascension or declination) and varying elevation or azimuth (declination or right ascension).Their characteristics have been tailored for NIKA2 performance.

Focus
Observation pools start with setting the telescope focus since NIKA2 large FOV alleviates the need to adjust the pointing beforehand.We have designed a specific focus procedure that takes advantage of the dense sampling of the FOV allowing to map a source in a short integration time.We perform a series of five successive one-minute raster scans of a bright (above a few Jy) point source at five axial offsets of the secondary mirror (M2) along the optical axis.As the scan size is 1 × 5 , the main contribution to each map mainly comes from the KIDs located in the central part of the FOV.Elliptical Gaussian fits on the five maps provide estimates of the flux and FWHM along minor and major axes for each focus.The best axial focus in the central part of the array is then estimated as the maximum of the flux or the minimum of the FWHM using parabolic fits of the five measurements.
As presented in more details in Appendix B, the focus surface, that is defined as the locations of the best focus across the whole FOV is not flat but rather slightly bowl-shaped.To account for the curvature of the focus surfaces and optimize the average focus across the FOV, we add -0.2 mm to the best axial focus in the central part of the array.This focus offset is measured on data using a dedicated sequence of de-focused scans, as discussed in Appendix B. It is in agreement with expectations derived with optical simulation using ZEMAX 1 .
Axial focus offsets are measured every other hour during daytime and are systematically checked after sunrises and sunsets, while one or two checks suffice during night.Lateral focus offsets can also be checked in a similar way, but are found to stay constant over periods of time that cover several observing campaigns.

Pointing
Once the instrument is correctly focused, we can estimate pointing corrections before scientific observations.Based on general operating experience at the 30-m telescope, we use the so-called pointing or cross-type scans to monitor the pointing during observations.The telescope executes a back and forth scan in azimuth and a back and forth scan in elevation, centered on the observed source.We fit Gaussian profiles from the timelines of the reference detector, which is chosen as a valid detector located close to the center of Array 2. The choice of a reference detector operated at 150 GHz is suitable since most of our pointing sources are radio sources.We use the estimated position of the reference detector to derive the current pointing offsets of the system in azimuth and elevation.This correction is propagated to the following scans.The pointing is monitored in an hourly basis.
In addition, we perform pointing sessions in order to refine NIKA2 pointing model.A pointing session consists in observing about 30 sources on a wide range of elevations and azimuth angles while monitoring the pointing offsets that are measured for each observation.During N2R9 technical campaign, the rms of the residual scatter after pointing offset correction was 1.62 in azimuth and 1.37 in elevation.We conservatively report rms pointing errors < 3 .

Skydip
A skydip scan consists in a step-by-step sky scan along a large range of elevations.NIKA2 skydips are not used for the scanto-scan atmospheric calibration.For this purpose, the KIDs are used as total power detectors to estimate the emission of the atmosphere and hence, the atmospheric opacity, as discussed in Sect.7. NIKA2 skydips therefore serve to calibrate the KID responses with respect to the atmospheric background for atmospheric opacity derivation.
Unlike heterodyne receivers for which skydips can be conducted continuously slewing the telescope in elevation, the NIKA2 camera cannot resort to such method, as the KIDs need to be retuned for a given air mass.A NIKA2 skydip, which is quoted skydip, comprises eleven steps in the elevation range from 19 to 65 degrees, regularly spaced in air mass.For each step, we acquire about twenty seconds of data to ensure a precise measurements.KIDs are tuned at the beginning of each constant elevation sub-scan (hence once per air mass).
Skydips are typically performed every eight hours for a wide spanning of the atmospheric conditions through an observation campaign.

Beam map
A beammap scan is a raster scan in (az,el) coordinates tailored to map a bright compact source, often a planet, with steps of 4.8", that are small enough to meet Nyquist sampling of the 1 mm beam for each KID.A scan of 13 × 7.8 arcmin 2 is acquired either with the telescope performing a series of continuous slews at fixed elevation or at fixed azimuth.A continuous scanning slew defines a subscan.The fixed-elevation scanning has the advantage of nulling the air-mass variation across a subscan, while the fixed-azimuth scanning offers an orthogonal scan direction to the former: the combination of both gives a more accurate determination of the far side lobes.The scan size is optimized to enable maps to be made for all KIDs, even those located at the edges of the array.Larger size in the scanning direction allows for correlated noise subtraction.During subscans, the telescope moves at 65 arcsec/s.The need to have Nyquist sampling of 11" beams along the scan direction translates into a maximum speed of 97 arcsec/s for our nominal acquisition rate of 23.8 Hz and is thus met with margins.For the sake of scanning efficiency with the 30-m telescope, the minimal duration for subscans is 10 s.For beammap scans, subscans last 12 s and the entire scan lasts about 25 min.
Beammaps are key observations for the calibration.Whereas a single beammap acquired in stable observing conditions could suffice, beammap scans are performed on a daily basis.More details on these observations are given in Sect. 5 where we describe how to actually exploit them to derive individual KID properties.

Data Reduction: from raw data to flux density maps
The raw KID data (I, Q, ∆I, ∆Q) and the telescope sourcetracking data are synchronized by the NIKA2 acquisition system using a clock that gives the absolute astronomical time, that is the telescope pulses per second, to define the NIKA2 raw data.From the KID raw data, we compute a quantity that is proportional to the KID frequency shift using the RfdIdQ method, as described in Sect.2.4.This quantity, which is hence proportional to the input signal, constitutes the KID time-ordered information (TOI).
We have developed a dedicated data reduction pipeline to produce calibrated sky maps from NIKA2 raw data.This pipeline was first developed for the data analysis during the commissioning campaigns and is currently used for science-purpose data reduction.The calibration and performance assessment relies on this pipeline.A detailed description of this software will be presented in a companion paper (Ponthieu et al. 2019), as well as an application to blind source detection.Here we summarize the main steps of the data reduction.Moreover, we focus the discussion on the treatment of point-like or compact sources, which are used for the performance assessment.

Low level processing
We isolate the relevant fraction of the data for scientific utilisation and we mask KIDs that do not meet the selection criteria, as discussed in Sect.5.3, or timeline accidents (glitches).Specifically, we flag out cosmic rays, which impact only one data sample per hit due to the KID fast time constant (Catalano et al. 2014), and the KIDs for which the noise level exceed 3σ of the average noise level of all other KIDs of the same array.

Pointing reconstruction
We produce a timeline of the pointing positions of each KID with respect to the targeted source position (usually located at the center of the scan) using two sets of information.First, the control system of the telescope provides us with the absolute pointing of a reference point of the focal point, which is coincident with the reference KID after the pointing correction are applied, as described in Sect.3.2.Second, we estimate the offset positions of each KID with respect to the reference KID using a dedicated procedure that is referred to as the focal plane reconstruction, as presented in Sect. 5.After this step, we are able to distinguish KIDs that are on-source from those that are off-source, which is a key information for dealing with the correlated noise.

TOI calibration
The KID TOI in units of Hz (frequency shifts) are converted to Jy/beam in two steps.First, the KID data are inter-calibrated using the calibration coefficients, a. k. a. relative gains, as discussed in Sect.8.2 and the absolute scale of the flux density is set using the absolute calibration method discussed in Sect.8.1.Second, the instantaneous line-of-sight atmospheric attenuation exp[−τ ν x(t)] is corrected using the zenith opacity at the observing frequency ν τ ν , which is estimated for a given scan as discussed in Sect.7, and the instantaneous air mass x(t).The latter is estimated as (sin el t ) −1 using the observing elevation el t , as obtained using the pointing reconstruction.

Correlated noise subtraction
The TOI of each KID include a prominent low-frequency component of correlated noise of two different origins: the atmospheric component, which is dominant with some rare exceptions and common to all KIDs, and the electronic noise, which is common to the KIDs connected to a same electronic readout feed-line (see Sect. 2.3).The subtraction of this correlated noise is a key step of the data processing, as correlated noise residuals are an important limiting factor of the sensitivity.We have devised several dedicated methods for this purpose.This will be thoroughly discussed in Ponthieu et al. (2019), whereas here we illustrate the general principle and only discuss the method routinely used for the calibration.
As illustrated on the upper panel of Fig. 2, the low-frequency noise component is seen by all KIDs at the same time, while the astrophysical signal (Uranus in this case) is shifted from one KID to another.In this figure the KID TOIs have been rescaled to be null at t = 0.A simple average of the KID TOIs provides an estimate of the low-frequency noise component, that we referred to as a common mode, while the signal is averaged out.The common mode shown as a red line on the lower panel of Fig. 2, is then subtracted to each KID TOI.
For the calibration and performance assessment, we use an atmospheric and electronic noise decorrelation method named Most Correlated Pixels which comprises two additional technicalities with respect to the common mode method.First, the signal contamination of the common mode estimate is mitigated by discarding on-source KID data samples before averaging the rescaled TOI.Second, instead of a single common mode subtraction to all KIDs, we estimate an accurate common mode for each KID.Calculating the KID-to-KID cross-correlation matrix, we identify the most correlated KIDs.Then, we build an inverse noise weighted co-addition of the timelines of the KIDs that are the most correlated with the KID under concern.Furthermore, we have tested on simulations that this method does preserves the flux of point-like or moderately extended sources.
In Fig. 3 we present the noise power spectra of a typical KID TOI both before any data reduction and using three noise decorrelation methods, which are the simple common mode (CM) method used in Fig. 2, a method based on a principal component analysis (PCA) and the Most Correlated Pixels method (MCP).We observe that after decorrelation the 1/ f -like noise in the power spectra (principally due to atmospheric emission drifts) is significantly reduced leading to nearly flat spectra down to 0.5 Hz, with lower 1/ f -like residual noise for the PCA and the Most Correlated Pixels methods than the common mode decorrelation at low frequencies.Moreover, we have checked using simulations that the Most Correlated Pixels method was more The low frequency correlated component (atmospheric and electronic noises) is clearly seen.Bottom: One of these TOIs (in blue) and the common mode that is subtracted from it (in red).The zero level is arbitrary.
efficient than the PCA in preserving the astrophysical signal.The former is thus preferred over the latter.

Map projection
We use the pointing information to project the cleaned (lowfrequency noise subtracted) calibrated TOI of all the valid KIDs of an array onto a flux density map (tangential projection).This map M p is produced using an inverse variance noise weighting of all of the data samples that fall into a map pixel as defined using a nearest grid point scheme.We also compute the associated count map H p defined as the number of data samples per map pixels.The map resolution is chosen small enough (typically 2 per map pixel) to alleviate the need for more refined interpolation scheme.The noise variance σ k for each KID k is evaluated by the standard deviation of the KID TOI far from the source position.The variance map σ 2 p is inhomogeneous and varies as the Fig. 3.The data noise power spectra are shown for the three NIKA2 arrays (A1, A3, and A2 from top to bottom).The power spectra are given for the raw data (blue), and for noise decorrelated data using the common mode (labeled CM, green), the PCA (red) and the Most Correlated Pixels (labeled MCP, cyan) methods.
inverse of H p .Its normalisation is evaluated using the homogeneous background map variance, that is the variance of M p H p calculated far from the source.
To account for the residual correlated noise while evaluating the variance map, we resort to an effective approach.First, we compute the map of the signal-to-noise ratio (SNR) as the ratio of M p and the noise map σ p , that is the square root of the variance map.We observe that the distribution of the SNR map over the pixels far from the source is well-approximated with a Gaussian but has a width larger than the expected unity.This is due to the remaining correlations between KID TOIs before projection.Then, we multiply the noise map σ p by the required factor so that the width of SNR distribution becomes normalized.This normalizing factor ranges from 1.2 to 1.5 depending on the observing conditions.This constitutes an effective approach to account for the pixel-to-pixel correlation matrix off-diagonal terms alleviating the need of accurately measure them.
When several scans of the same source are averaged, we apply an inverse variance weighting as well.The variance map of the sum of scans is also corrected to ensure unity-width SNR distribution.

Observation scan selection
For calibration and performance assessment, we select scans in average observing conditions by performing mild selection cuts.These scan cuts rely on zenith opacity estimates τ ν in NIKA2 bands, as described in Sect.7, on the elevation and on the observation time of the day.We select the scans satisfying the following criteria: i) τ A 3 < 0.5, where τ A 3 is the τ ν estimate for Array 3; ii) x τ A 3 < 0.7 and el > 20 o , where el is the observing elevation and x, the air mass, which depends on the elevation as x = (sin el) −1 .This threshold corresponds to a decrease of the astrophysical signal by a factor of two; iii) observation time from 22:00 to 9:00 UT and from 10:00 to 15:00 UT, that excludes the sunrise period and the late afternoon.
In the following sections, these selection cuts are referred to as the 'baseline scan selection'.As discussed in Sect.8.3, the late afternoon observations are often affected by time-variable broadening of the telescope beams caused by (partial) solar irradiation of the primary mirror and/or anomalous atmospheric refraction.Around sunrise, the focus shifts continuously due to the ambient temperature change until the temperature stabilizes, so that the scans taken from 9:00 to 10:00 UT are likely not to be optimally focused.After the focus stabilisation, the middle of the day period ranging from 10:00 to 15:00 UT offers stable observing conditions provided that the telescope is not pointed too close to the Sun.Otherwise, further scan selection based on the exact sequence of observations and on beam monitoring might be needed before using these observations for performance assessment.In summary, the baseline scan selection retains 16 hours of observations a day and discards observations affected by an atmospheric absorption exceeding 100%.

Focal Plane Reconstruction
From the operational point-of-view the KIDs are defined by the resonance frequency and not by their physical position in the focal plane.Therefore, to find the position on the sky we need to deploy a dedicated method that we refer to as the FOV reconstruction.The FOV reconstruction thus consists in matching the KID frequency tones to positions in the sky and in performing a KID selection.Although all the 2,900 KID are responsive, some of them are affected by cross-talk or are noisy due to an inaccurate tuning of their frequency, and must be discarded for further analysis.We use beammaps, which enable an individual map per KID to be constructed to measure the KID positions and relative gains, as discussed in Sect.5.1.The measured KID positions are further checked by matching with the design positions, as presented in Sect.5.2.In Sect.5.3, we present the final KID selection and FOV geometry, as obtained by repeating the procedure on a series of beammaps.

Reconstruction of the FOV positions and KID properties
In order to be able to produce a map, one needs to associate a pointing direction to any data sample of the system.The telescope provides pointing information for a reference position in the focal plane.These information consist of the absolute azimuth and elevation (az ref t , el ref t ) of the source, together with offsets (∆az ref t , ∆el ref t ) w. r. t. these, which depends on the scanning strategy.We then need to know the relative pointing offsets of each detector with respect to this reference position.We use beammaps for this purpose (see Sect. 3.4).
We apply a median filter per KID timeline whose width is set to 31 samples, that is equivalent to about 5 FWHM at 65 arcsec/s and for the sampling frequency of 23.84 Hz.Then, we project one map per KID in Nasmyth coordinates.The median filter removes efficiently most of the low frequency atmospheric and electronic noise, albeit with a slight ringing and flux loss on the source.However, at this stage, we are only interested in the location of the observed source.To derive the Nasmyth coordinates from the provided (az t , el t ) and (∆az t , ∆el t ) coordinates, we build the following quantities at time t: Note that ∆az ref t is already corrected by the cos el ref t factor to have orthonormal coordinates in the tangent plane of the sky and be immune to the geodesic convergence at the poles.The data timelines are then projected onto (x, y) maps.We fit a 2D elliptical Gaussian on each KID Nasmyth map.The centroid position of this Gaussian provides us with an estimate of the KID pointing offsets w. r. t. the telescope reference position (az t + ∆az t , el t + ∆el t ) in Nasmyth (x, y) coordinates (independent of time).
To convert from Nasmyth offsets to (az, el) offsets, we apply the following rotation: where k is a KID index.Adding these offsets to (∆az ref t , ∆el ref t ) gives the absolute pointing of each KID in these coordinates.
Furthermore, the fitted Gaussian per KID further provides us with a first estimate of the KID FWHM, ellipticity and sensitivity.We apply a first KID selection by removing outliers to the statistics on these parameters.We also discard manually KIDs that show a cross-talk counterpart on their map.We repeat this procedure using the baseline TOI decorrelation method instead of the median filter.Specifically, we apply the Most Correlated Pixels noise subtraction presented in Sect. 4 to the KID timelines, which are then used to produce maps per KID.Therefore this alleviates the flux loss induced by the median filter and ensures that each timeline is treated in the same way as the scientific observation will.A second iteration of the KID selection is performed.
This analysis is repeated on all beammaps to obtain statistics and precision on each KID parameter, together with estimates on KID performance stability, as discussed in the next sections.

FOV grid distortion
We compare the reconstructed KID positions in the FOV to their design positions in the array.We fit the 2D field translation and rotation that allow matching the measured KID positions with the design positions using a 2D polynomial mapping function.We find that a matching can be obtained using a 2D polynomial function of degree one, which corresponds to a linear transformation and a rotation only.We call distortion cross-terms between the two spatial coordinates in the polynomial fit.and A2.The three plots show the detectors that met the selection quality criteria for at least two beammaps during two technical campaigns.These consist of 952, 961, and 553 detectors for A1, A3 and A2, respectively.The color indicates how many times a KID was identified as valid on a beammap, ranging from blue for the KIDs valid in at least two beammaps, to red for the KIDs valid in all (ten) beammaps.The inner and outer dash-line circles correspond to circular regions of 5.5 arcmin and 6.5 arcmin, respectively.Units are arcseconds.
The aim is twofold.First we obtain a detailed characterization of the real geometry of NIKA2 focal plane.Secondly, this analysis is also used for KID selection.The most deviant KIDs, whose measured position deviates by more than 4 from the design position are discarded.
We present the global results of the grid distortion analysis using the KID positions given by the focal plane geometry procedure, as described in Sect.5.1, applied to a beammap scan acquired during the first scientific campaign (a.k. a. N2R12).The initial number of KID considered in this analysis results from a first KID selection, which consists in discarding the KIDs that are the most impacted by the cross-talk effect or the tuning failures, applied in the FOV geometry obtained from the beammap scan.More details on the KID selection are given in the next section.The results are gathered in Table 2.
Most of the selected KID are also well-placed, that is at less than 4" from their expected position.Moreover, on average the position of each detector is known to about one arcsecond.We find that Array 1 has some of the most deviant detectors (above 4" from their expected position).These detectors are excluded from further analysis.The two 1 mm arrays have almost the same center but this center differs by 7 and 2 in the two Nasmyth coordinates, respectively, from the 2 mm array center.This has no significant impact on the pointing and the focus settings at the precision of which they are measured.The center-to-center distance between contiguous detectors, referred to as grid step, has been estimated in mm and arcseconds.The ratio of the grid step in mm to the grid step in arcseconds gives compatible effective focal lengths of about 42.4 ± 0.3 m at both observing wavelengths.The sampling is above λ/D at 1 mm, assuming a 27 m effective diameter aperture.Note that the rotation angle between the array and the Nasmyth coordinates was designed as 76.2 o , ess than two degrees away from what is observed.
These results have been compared to expectations obtained using ZEMAX simulation.We generated a grid diagram for the NIKA2 optical system and found a maximum grid distortion of 2.7% in the 6.5 FOV.We notice that the strongest distortion appear in the upper right corner of the Nasmyth plane, which is also the area of the largest defocus w.r.t. the center (see Appendix B).An expected distortion of 2.7% is at most a 5" shift from the center to the outside of the array.The quoted distortions Table 2. Field-of-view deformation.Example of mapping of the observed KID positions in the sky to their mechanically designed positions.The initial table of selected KIDs is given by the focal plane geometry procedure, as described in Sect.5.1, applied to a beammap scan acquired during the N2R12 campaign.More than 90% of the detectors are within less than 5 arcseconds of their expected position.

Characteristic
Array  (a) Initial number of KIDs selected in a FOV geometry using a beammap scan of the N2R12 campaign; (b) Number of KIDs for which the best-fit sky position is less than 4" away from the expected position; (c) Median angular offset [arcsec] between the expected and measured sky position of the KIDs; (d) Average best-fit cross term of the polynomial model across the FOV [arcsec]; (e) Array center in Nasmyth coordinates; (f) Averaged scaling between measured KID position grid and the designed one; (g) Rotation from the design to Nasmyth coordinates; (h) Center-to-center distance between neighbour detectors; (i) Center-tocenter distance between neighbour detectors using the reference frequencies (260 GHz and 150 GHz) and a 27 m entrance pupil diameter (see Sect. 2.2); (j) Center-to-center distance between neighbour detectors modeled using ZEMAX simulation.
between the measured and design positions are well within the expected maximum distortions from the NIKA2 optics.

KID selection and average geometry
In order to identify the most stable KIDs, we compare the KID parameters obtained using the FOV reconstruction procedure, as described in Sect.5.1, with several beammaps.In the following, we show results as obtained using ten beammaps acquired during two technical observation campaigns in 2017.For each KID we compute the average position on the focal plane and the average FWHM.As discussed in Sect.5.1, we perform a KID selection while analysis a beammaps.A few KIDs have very close resonance frequencies and can be misidentified on some scans.A few others must also be discarded because they appear identical numerically due to the fact that a same (noisy) KID can sometimes be associated to two different frequency tones in the acquisition system.These KIDs are flagged out (less than 1% of the design KIDs).We count how many times a KID has been kept in the KID selection per beammap and has been found at a position agreeing with its median position within 4".Using this, we define the valid KIDs as the KIDs that met the selection criteria in about 20% of the FOV geometries (in two beammap analysis out of ten).
In Fig. 4 we show the average focal plane reconstruction.The colors, from blue to red, represent the number of times that the KID has been retained after KID selection.The eight feed lines for each of the two 1 mm arrays can also be traced out in several ways in this figure.First, slightly larger spaces are seen between KID rows connected to different feed lines than between KID rows of the same feed line.Second, KIDs at the end of a feed line are less often valid than the others (see e.g. the FOV of Array 3).As the tone frequencies increase with the position of the KID in the feedline, some KIDs are sometimes missing because their frequency lays above the maximum tone frequency authorized by the readout electronics.This explains the linear holes in the middle of the 1 mm arrays.For A1, this end-of-feedline effect is mixed with the effect of the KID gain variation across the FOV, which mainly affects the lower left third of the array, as discussed in Sect.8.2.
For A1, A3 and A2, respectively, we found 952, 961, and 553 valid KIDs (selected at least twice).From this, we deduce the fraction of valid detectors over the design ones, as given in Table 3.The valid KIDs represent the KIDs usable to produce a scientifically exploitable map of the flux density using observations for which no sizable tuning issues are experienced.To give an idea of the dependence of the KID selection on the observing conditions, we also evaluate a number of very stable KIDs as the KIDs that met the selection criteria at least in 50% of the FOV geometries (at least five times out of ten).We found 840, 868 and 508 KIDs using this definition for A1, A3 and A2, respectively.The fraction of very stable KIDs over the designed ones is reported in Table .4.
Practically, for the production of flux density maps, we perform further selection of the valid KIDs using a conservative noise level thresholding of the KIDs at the low level processing, as discussed in Sect. 4. The number of used KIDs for producing science-purpose maps using the data reduction pipeline, as described in Sect.4, is thus significantly lower than the number of valid KIDs.We evaluate the median number of used KIDs using all scans for each of the observation campaigns.The median fractions of used KIDs with respect to the design ones for each campaigns and for the combinations of all scans are given in Table .4. We find median fractions of used KIDs of about 70% for the 1 mm arrays and of about 80% for the 2 mm array, with a notable exception at the N2R9 technical campaign, for which these fraction are lower due to severe atmospheric temperatureinduced instable observing conditions (see Sect. 8.3).Moreover the median fractions of used KIDs are close to the fractions of very stable KIDs from the FOV geometries.We stress that these numbers depend on the choices made in the data reduction pipeline for data sample cuts, and are thus subject to improvement.By contrast, the fractions of valid KIDs η constitute a conservative estimates of the stable KIDs usable for science exploitation over all the functioning KIDs.These are thus the relevant estimates for the instrument performance assessment.

Beam Pattern
The NIKA2 full beam pattern originates from the KIDs illuminating the internal and external optics, out to the IRAM 30m telescope primary mirror.To characterize the full beam pattern, we use beammap observations.First, deep integration maps of bright sources are produced to provide a qualitative description of the complex beam structure in Sect.6.1.Then we model the beam using three complementary methods to estimate the main beam angular resolution (Sect.6.2) and the beam efficiency (Sect.6.3).

Full beam pattern
To study the two-dimensional pattern of the beam, we primarily use a map obtained from a combination of beammap observations of strong point sources acquired during the N2R9 commissioning campaign.Namely, we use beammap scans of Uranus, Neptune and the bright quasar 3C84.Furthermore, we checked the stability of our results on single scan maps, combinations of scans for a single source, and combinations of shallower scans but spanning a large range of scanning direction.The beammap scans are reduced using the method discussed in Sect. 4 to produce maps.Figure 5 shows the two-dimensional beam pattern as measured with NIKA2 using the former beammap combination, for each of the arrays and for the 1 mm array combination.The beam pattern is shown over a large dynamic range down to about −40 dB and out to radii of about 5'.The telescope beam pattern further extends well beyond this radius, as for example shown by lunar edge observations at the IRAM 30-m telescope (Greve et al. 1998;Kramer et al. 2013).However, this extended pattern is at present difficult to detect using the data reduction pipeline discussed in Sect.4, as the extended error beams are both filtered and mixed with atmospheric and electronics large-scale correlated noise residuals.The large angular scale contributions to the full beam are further discussed in Sect.6.3.The NIKA2 beam maps reveal some noticeable features, which are shown in Fig. 5. Ranging from strong and/or extended to weak and/or spiky, they include: 1. the main beam and the underlying first error beam, which is due to large-scale deformations of the primary mirror, and the first side lobes, which correspond to various diffraction patterns.In particular, the 60" and 85" diameter (square-like shaped) side lobes at 1 and 2 mm, respectively, at a level lower than −20 dB, are due to the convolution of the primary mirror and the quadrupod diffraction pattern with the pixel (KID) transfer function; 2. at a much lower level of about −30 dB, a diffraction ring shows up, which is presumably caused by panel buckling of the primary mirror (Greve et al. 2010), as shown with a red circle in the A1 panel; 3. also at a level of about −30 dB, the side lobes shown with green diagonal lines in the A2 panel are due to diffraction on the quadrupod holding the secondary mirror of the telescope, as expected from ZEMAX simulations; 4. spikes of not fully understood origin marked by yellow arrows.The ones that are along the vertical and horizontal axes are reproduced by ZEMAX simulations but at a shallower level, whereas the ones shown in the A3 panel in the diagonal directions may be due to the small cylindrical instrumentation box on the side of the M2 cabin.The origin of the asymmetry on the 1 mm arrays is unknown but most probably due to internal optics aberrations; 5. shallow spikes of unknown origin at a level less than −30 dB, which are circled by pink ellipses.The multiple images on the combined deep beam map indicate a rotation of these spikes with the observing elevation, which in turn point to diffraction related issue or a ghost image that are formed inside the cryostat.These shallow features are expected to have no significant impact on NIKA2 science results.
We further quantify the respective level of the axisymmetrical features of the beam pattern by evaluating the beam radial profile B ν (r), which is the normalized radial brightness profile for the array ν, where r is the radius from the beam center.Although the profile cannot represent the sub-dominant nonaxisymmetrical features, which are seen in Fig. 5 (quadrupod diffraction pattern, spikes), it provides a useful representation of the internal and central parts of the beam up to about 180 .We determine a beam profile from a beam map in centering to the fitted value of the main beam center and in computing the weighted average of the map pixels in annular rings.
We checked the stability of the beam against various observing conditions (source intensity, weather condition, focus optimisation) by comparing the beam profiles of a series of 18 beammap observations.This set of beammaps, which is referred to as BM18, has been selected from all the available beammap scans at optimal focus using the baseline scan selection criteria (Sect.4.6).The measured beam profiles using the BM18 data set are shown in Fig. 6.The profiles consist of the main beam and the first error beams and side lobes, which significantly contribute at levels of less than −10 dB at both observing wavelengths.Moreover, at 1 mm the contribution of the diffraction ring, which was marked with a red circle in Fig. 5, is seen as a peak at a level up to −33 dB located at a radius of about 115 .Calculating the rms of the relative difference of the beam profiles to the median beam profile, we find a dispersion below 5% at 1 mm and below 2% at 2 mm.
To measure the relative level of the axi-symmetrical beam pattern features, we further model the beam profiles using an empirical function, which accounts for the main beam and for a significant fraction of the error beams and the side lobes.We define this function B 3G (r) as: where A i is the amplitude of the Gaussian G i for i ∈ 1, 2, 3 and B 0 a pedestal level accounting for the residual background level in the map.The measured beam profiles are fitted using Eq. 1 and the median best-fit parameters are given in Table 5.The errors are evaluated as the standard deviation of the best-fitting parameter values of the 18 beammap scans of BM18.These values are given to gain insight of beam profile, but are not used for the calibration procedure.We find the level of the first error beam with respect to the total beam at about −11 dB and −13 dB at 1 and 2 mm, respectively.For illustration, we show in Fig. 6 the median three-Gaussian profiles at 1 and 2 mm (pink lines), and the main beam profiles (black lines).
Fig. 6.Beam radial profiles given in decibel.The data points are the beam profiles for a series of 18 beammap scans acquired during the reference observational campaigns, labeled from the scan ID.The black line shows the main beam profile using the 'combined' FWHM, as given in Table 6, while the pink line shows the median best-fit three-Gaussian profile, as defined in Eq. 1.

Main beam
NIKA2 angular resolution is characterized using the FWHM of a Gaussian fitted to the main beam.This principal Gaussian encloses most of the measured flux density of a point-like source.

Main beam characterization methods
To characterize the main beam and to derive an estimate of its FWHM, we have developed three methods.The two first methods, quoted Prof-3G and Prof-1G, rely on a fit of the beam profile to benefit from the signal-to-noise ratio increase after azimuthally averaging the signal.The last one by contrast, consists in an elliptical Gaussian fit of the beam map for a better 2D modeling, and is labeled Map-1G.They are presented in more detail below.
Prof-3G consists in fitting the beam profile using the three-Gaussian function defined in Eq. 1.The main beam FWHM estimate is given by the best-fitting value of the FWHM for the first Gaussian function.This main beam FWHM estimate is expected to be immune to the first error beams and side lobes, which are well accounted for.For consistency checks, we also relies on two other methods that rely on simpler beam models.
Prof-1G relies on fitting a single Gaussian to the beam profile after masking the portion of the profile where the contributions of the side lobes and error beams are the largest.Specifically, the side lobe mask is designed to cut out the radius ranging from an inner radius r in = 0.65 FWHM 0 ,, where FWHM 0 is the reference Gaussian beam FWHM (see Sect. 8.1.1)to an outer radius r out = 80 , centered on the beam maximum.The profile is estimated up to a radius of 180 , that is in the inner part of the beam map where the noise variance is uniform.
Map-1G consists in modeling the two-dimensional distribution of the main beam using an 2D elliptical Gaussian of size σ x and σ y .We characterize the NIKA2 main beam using FWHM = 2 2 ln 2 σ x σ y . (2) As in Prof-1G, we use masked versions of the beam map to avoid side lobe and error beam contaminations.Whereas r out is conservatively set to be 100 , r in is let free to vary around a central value of about 8 for A1 and A3 and of about 12 for A2 to provide the best 2D Gaussian fit.

Data sets for the main beam determination
We select a sub-set of the selected beammap scans described in Sect.6.1 by discarding scans of Mars.Indeed, beammaps toward Mars unveil the complex full beam pattern, which extends beyond radii of 100 , so that the annulus sidelobe mask used in Prof-1G and Map-1G is not sufficient to mitigate the error beams and sidelobes effects.The 12 remaining beammap scans are analysed using the data reduction pipeline of Sect. 4 and projected onto maps with a resolution of 1 and an angular size of 10 .This data set is referred to as BM12.
We also consider series of shorter integration scans.We select 5 × 8 raster scans of moderately bright to very bright point sources by thresholding the flux density estimates at 1 Jy at both wavelengths.After the baseline scan selection, as described in Sect.4.6, the data set comprises 154 scans towards the giant planets Uranus and Neptune, the secondary calibrator MWC349 and the quasars 3C84, 3C273, 3C345 and 3C454 (aka 2251+158).For these short scans, which are referred to as R154, the data are reduced and projected onto 2 resolution maps.
Finally, we use a series of 75 observation scans of Uranus and Neptune, which includes both beammap and 5 × 8 raster scans.This data set, which is referred to as UN75, consists of all the scans of Uranus and Neptune acquired during the reference observation campaigns (N2R9, N2R12 and N2R14).

Results
We have derived the main beam FWHM for the three arrays and the 1 mm arrays combination using the three methods presented in Sect.6.2.1 and the data sets of Sect.6.2.2.Namely, our main beam FWHM estimates consist of i) the median FWHM estimate using Prof-3G on the BM12 dataset, ii) the average FWHM estimate using Prof-1G on the UN75 data set and the Map-1G average FWHM using either BM12 or R154.By comparing these results, we test the stability of the FWHM estimates against the choices of the data set and of the estimation method.
In the case of Uranus, the FWHM estimates are further corrected for the average beam broadening induced by the extension of the apparent disc of the planet, which is 0.19 and 0.12 at 1 and 2 mm, respectively.During the observation period, Uranus disc diameter has varied from 3.3 to 3.7 .This diameter variation translates into beam broadening variations of an amplitude of a few tenth of arcseconds, which are neglected.
The results of this analysis are gathered in Table 6, including uncertainties evaluated as the rms dispersion of single-scan based FWHM estimates.Prof-1G and Map-1G results are in agreement within uncertainties, whereas Prof-3G yields slightly smaller FWHM.The latter is more robust against the error beams and large radii beam features than the formers, as expected.Combined results are obtained from an error-weighted average of the four FWHM estimates for each array.Because the rms errors estimated using the 12 beammap scans may be optimistic considering the small statistic, they are conservatively increased to match the uncertainty estimates based on the R154 data set before performing the weighted average.The combined results, as given in Table 6, provide a robust evaluation of the FWHM.Hence, we report main beam FWHMs of 11.1 ± 0.2 at 1 mm and 17.6 ± 0.1 at 2 mm.

Stability checks
Figure 7 shows the main beam FWHM estimates using Map-1G as a function of atmospheric transmission, which is modeled as exp (−τ ν x).The main beam FWHM estimates using data of the three campaigns are in agreement within rms errors.Moreover, the main beam FWHM is stable against atmospheric conditions at both wavelengths.Slightly lower values than average (about 11 ) are observed in the best atmospheric conditions at 1 mm providing us with a lower limit in the absence of correlated atmospheric noise residuals.We note three scans acquired during the N2R12 campaign with larger FWHM than average at 2 mm although the atmospheric transmission was excellent: this is likely an effect of atmospheric instabilities, which affected a large number of observation scans during N2R12.

Main beam efficiency
Building upon the description of the full-beam pattern in Sect.6.1 and the main beam in Sect.6.2, we derive the main beam efficiency for each array, which is defined as the ratio of the solid angle sustained by the main beam to the total beam solid angle.
We derive an estimate of the total beam solid angle from the normalized beam profile B ν (r) of the array ν integrated up to r max = 180 .Hereafter, we refer to this total beam solid angle estimate as Ω 180 .The main beam solid angle is evaluated from the main beam (mb) FWHM, as Ω mb = 2π σ 2 mb , where FWHM= 2 √ 2 ln 2 σ mb .The choice of the maximum radius is set both by the integration depth of the beammap scans, which in turn fixes the noise variance, and the filtering due to the data processing.As discussed in Sect.4, the latter is actually targeted to the analysis of point-like or compact sources that are largely encompassed within a 180 -radius.
We evaluate Ω 180 from the measured beam profiles obtained using the UN75 data set (see Sect. 6.2.2).The result is given in Table 7.
The main beam efficiency up to a radius of 180 is computed as using both the UN75 and the BM12 data sets, as presented in Sect.6.2.2.We compare the results based on three estimates of the total beam and main beam solid angles: -BE1 relies on the best-fitting parameters of the three-Gaussian model of the full beam to derive the two solid angles.The main beam solid angle thus corresponds to the volume enclosed by the first Gaussian, as obtained using Prof-3G; -BE2 consists in using Ω 180 measurements as total beam solid angle estimates, while Ω mb is derived with the FWHM obtained using Prof-1G (see Sect. 6.2.1); -BE3 is similar to BE2 but the main beam FWHM is derived using Map-1G.
The beam efficiency estimates using the three methods are gathered in Table 8: central values and error bars are evaluated as the median and the rms error of the estimates on individual beammaps respectively.A robust evaluation of the beam efficiency uncertainties is obtained using the rms error estimates for BE2, which are based on the UN75 date set.We combined the results of the three methods using an error-weighted average.Rms errors of BE1&BE3, which rely on 12 scans, are conservatively increased to be at least equal to BE2 rms errors beforehand.
Using the combined results, as given in Table 8, we report main beam efficiencies of 55 ± 3% at 1 mm and 77 ± 2% at 2 mm.Using IRAM 30-m telescope EMIR frontend observations (Carter et al. 2012), Kramer et al. (2013) found main beam efficiency of 59% at 230 GHz and of 74% at 145 GHz accounting for the beam solid angle stemming from radii greater than 180 .Having slightly lower main beam efficiencies with respect to EMIR is expected since EMIR detectors were coupled to the 30-m telescope entrance pupil via corrugated horns, unlike the NIKA2 ones.
As a stability check of the beam efficiency, we detail the BE2 estimates for each observation campaigns, as given in Table 9.We find stable beam efficiencies using observations acquired one year apart.
Heterodyne observations at the IRAM 30-m telescope toward the lunar edge and the measure of forward beam efficiency using skydips show that a non-neglectible fraction of the full 55 ± 3 56 ± 3 56 ± 3 76 ± 2 beam stems from radius greater than 180 (Greve et al. 1998;Kramer et al. 2013).An accurate evaluation of the total beam solid angle using NIKA2 would require the development of a dedicated data reduction method tailored to handled very extended diffuse emission, as well as a dedicated observation program (Greve et al. 1998;Sugimoto et al. 2004;Güsten et al. 2006).Here we use the IRAM 30-m telescope beam model described in Kramer et al. (2013) to evaluate the beam solid angle stemming from a radius greater than 180 .This quantity receives two contributions, the one of the error beams, which are modeled as three Gaussians, and the one of the forward and rearward spillover and scattering efficiencies.We compute the solid angle of the main beam and error beams integrated over a radius ranging from 180 to 390 , which is the maximum radius covered by beammap scan, Ω mb+eb 180<r<390 , and over radii greater than 180 , Ω mb+eb >180 .We use NIKA2 main beam as measured in Sect.6.2 and the three-Gaussian error beams as reported in Kramer et al. (2013).The results are given in Table 10, as well as the contribution to the beam solid angle of the forward and rearward spillover and scattering efficiencies, Ω frss .The total beam solid angle Ω tot can be estimated as These results are useful to study very extended diffuse sources or to perform aperture photometry up to radii larger than 180", as will be discussed below in Sect.8.1.3.

Atmospheric opacity
The atmospheric opacity constitutes the ultimate limitation of ground-based experiments.Only a fraction of the source signal Table 10.Estimates of the contribution to the beam solid angle integrated in varying range of radii.Ω 180 include all beam contributions up to a radius of 180 as measured in NIKA2 maps.Ω mb+eb 180<r<390 and Ω mb+eb >180 account for the main beam and error beam contributions integrated 180 to 390 and from 180 to 4π, respectively.Ω frss account for the forward and rearward spillover and scattering efficiencies integrated over 4π.All solid angles are given in arcsec 2 .

A1&A3 A2
Ω An accurate derivation of the atmospheric opacity for each scan is of outmost importance to retrieve the source signal and thus, to ensure small calibration uncertainties.We developed three atmospheric opacity derivation methods, which are described in Sect.7.1.In Sect.7.2, we present robustness tests.

Atmospheric opacity estimation
We have developed three procedures for the atmospheric opacity derivation: i) taumeter relies on measurements provided by the resident IRAM taumeter operated at 225 GHz; ii) skydip consists in using NIKA2 as a total-power taumeter by resorting to a series of skydip scans; iii) corrected skydip is a modified version of the skydip method that minimizes the dependence of the estimated flux density on the opacity.
All methods i) do not rely on an ATM model nor on any hypothesis on the atmospheric contents for the sake of robustness, and ii) do not use the laboratory measurements of the bandpass (see Sect. 2.5) for more accuracy.
Sect 7.1.1presents the taumeter method.The skydip method is described in Sect 7.1.2and the selection of the used skydip scans is discussed in Sect.7.1.3.corrected skydip is presented in Sect.7.1.4.

The taumeter method
The IRAM 30-m telescope facility is equipped with a resident taumeter operated at 225 GHz.Every four minutes, it performs elevation scans at fixed azimuth to monitor the atmospheric opacity.The IRAM science support team provides us with timestamped zenith opacities at 225 GHz τ 225 , as derived from the taumeter measurements.The τ 225 estimates come in two different flavours: one relying on a linear model and the other on an exponential fitting model.They are then filtered by removing outliers and by thresholding on goodness-of-fit criteria.Based on IRAM experience, we use the linear fit and filtered τ 225 data for the NIKA2 analysis.The time-stamped τ 225 estimates, which are sampled about every 4 minutes, are interpolated to the time of the NIKA2 scans (we consider the time of the middle of the scan).For cross-check we also produce a smooth version of time-stamped τ 225 data by filtering with a running median of seven samples, which is then interpolated to the NIKA2 scan times.
We fit the relations between the IRAM 225 GHz taumeter opacities and NIKA2 band pass opacities using observation of calibration sources which spans a large range of air masses.This method has the advantages of not relying on atmospheric model nor on the bandpass measurements in laboratory.We use a series of 64 scans of MWC349, which consists of the baselineselected subset of scans from the 68 available scans for this source during N2R9.It constitutes an homogeneous data set in flux density but heterogeneous in atmospheric conditions: zenith opacities at 225 GHz range from 0.08 to 0.32 and elevations from 23 to 73 degrees, spanning a large range of air mass as required.NIKA2 opacities τ ν , for ν corresponding to the observing frequency of Array 1, 2, 3 and the combination of Arrays 1 and 3, are estimated from the 225 GHz taumeter median-filtered linear-based opacity estimates τ 225 as The parameters a 225 ν and b 225 ν are fitted to the data set so that the source flux density is constant within scans.We tested two estimators of the flux stability.The first one relies on minimizing the standard deviation of the measured-tomedian flux densities ratio after atmospheric opacity correction using Eq. 8.The second one is obtained by minimizing where σ is the rms uncertainty of the flux density estimates.Note that these estimators do not depend on the absolute scale of the flux density of the source.Both estimators yield consistent results that are combined and gathered in Table .11.The quoted errors ∆a 225 ν and ∆b 225 ν are 1-σ errors of the fit.Because the IRAM taumeter observes at a fixed azimuth, the taumeter opacities are not the line-of-sight opacities for the observation scans.As this will be verified in Sect.9, this induces larger rms errors of the top-of-the-atmosphere flux density estimates compared to opacity correction methods that relies on NIKA2 skydip-based measurements.The taumeter method will thus be used as an alternative method in case of failure of the NIKA2 skydip-based methods and to perform consistency checks.

NIKA2 skydip-based method
The NIKA2 skydip method for the opacity derivation consists in using the NIKA2 instrument as an in-band total-power taumeter.The opacity integrated in the NIKA2 bandpasses and in the line-of-sight of the observing scan is thereby directly obtained.This idea, which was successfully tested with NIKA (Catalano et al. 2014), relies on the fact that the resonance frequency of each KID varies linearly with the total power, as discussed in Sect.2.4.First, we have to calibrate the relationship between total power and opacity.Then, we can use this calibration to measure the opacity during a given scan.
First, we detail the opacity calibration.For each KID k, the absolute value of its resonance frequency f k reso varies with the atmospheric load according to where c k 0 is a constant equal to the KID resonance frequency at zero opacity, c k 1 is a calibration conversion factor in Hz/K, T atm is the temperature of the atmosphere.By assuming a homogeneous plane-parallel atmosphere, the air mass x is defined from the elevation as x = (sin el) −1 .The Earth sphericity can be safely neglected at the elevations under discussion here.We take T atm as a constant equal to 270 K.However, the opacity is expected to slightly depend on the atmospheric temperature.For example, in poor weather conditions (6 mm of water vapor contents), the zenith opacity in both observing bands can vary by about 10% for temperature variations of 10 K.The effect of the temperature variations on the final calibration of NIKA2 response to the sky load is mitigated by using several dedicated observation scans regularly distributed all along an observation campaign.
The c k 0 and c k 1 are determined using skydip scans, which consist in moving the telescope in elevation step by step, as defined in Sect.3.3.For each KID k, the evolution of f k reso is monitored as a function of the air mass in each elevation step to perform a joint fit of the zenith opacity τ ν and the c k 0 and c k 1 coefficients.All skydips, obtained under various opacity conditions, are analysed together to break the degeneracy between the opacity and the Herz-to-Kelvin conversion factor c k 1 .The degeneracy occurs mostly for low opacity conditions for which we can only determine the combination c k 1 τ ν x.The procedure has two steps.First, all the skydip scans are analysed individually to extract f k reso for each stable elevation and for each KID.Secondly, a simultaneous fit is done for all parameters (one τ ν per skydip, and a set of c k 0 and c k 1 for all KIDs).Figure 8 illustrates the fitting procedure.This fit is performed on block of 40 KIDs.We check that the resulting τ ν from the different blocks are consistent within rms errors, which are equal to about 4 × 10 −3 at 1 mm and 1 × 10 −3 at 2 mm.Once the τ ν values are estimated for each skydip (as the average over the blocks), we compute, while fixing the τ ν , the c k 0 and c k 1 final values for each KID k with a linear fit.We thus retrieve the coefficients of all the KIDs even though some of them could not contribute to the τ ν determination.
We have observed that the c k 0 and c k 1 coefficients vary between observational campaigns due to a change in the KID properties from one cooldown to another.However, by comparing the results of different skydips, we have verified that the coefficients c k 0 and c k 1 are stable, within the fitting errors, on very long time scales within a cooldown cycle.The coefficients can thus be applied to the whole observing campaign for the opacity derivation.Specifically, the opacity is retrieved for each observation scan by Fig. 8. Example of the global skydip fit for a KID.Each square point represents one step in a skydip (made of eleven elevation steps).A series of 12 skydip scans are jointly used spanning zenith opacities from 0.15 to 0.50 in the 1 mm band.The horizontal axis gives the sky effective temperature T sky = T atm [1 − e −τν x ] in Kelvin, where τ ν is the skydip zenith opacity found in the fit.The vertical axis shows the relative resonance frequency of the KID with respect to 1.9 GHz, given in MHz.The blue line is the linear model using the best-fit c k 0 and c k 1 coefficients (see Eq. 10).
inverting Eq. 10 as: where the median is evaluated using all the valid KIDs of the arrays under concern.Hence, we are able to derive an opacity integrated in the NIKA2 bandpasses and in the line-of-sight of the source in the considered observation scan.

Skydip scan selection
The skydip opacity derivation requires to have on hands a sizeable amount of skydip scans -typically ten to twenty -that i) span the whole opacity range and ii) avoid highly perturbed atmosphere to meet the plane-parallel atmosphere assumption.To that aim, we perform a skydip scan twice a day during a scientific campaign.Then, the (c k 0 , c k 1 ) determination process relies on a selection of the skydip scans.
For each skydip scan and for each KID, we compute the difference between the measured KID resonance frequency and the model given in Eq. 10 taken at the best-fit values of the (c k 0 , c k 1 ) parameters, which is named d f k reso .Then, we determine two indicators of the fit quality per skydip.First, for each block of 40 KIDs, the standard deviation of d f k reso is calculated over all the KIDs of the block.This standard deviation per KID block is called σ 40 .For each skydip, we evaluate the median rms, which is the median σ 40 over all the KID blocks.This fit quality indicator is also sensitive to the noise level during the skydip.We therefore devise a second fit quality indicator to further measure the bias between the data and the best-fit model.Namely, for each skydip, we compute the average d f k reso of each KID k and convert this quantity from Hertz to Kelvin using the corresponding c k 1 parameter.This cross-calibration allows us to compare the d f k reso estimates from different KIDs.Median dT is the median of the average d f k reso in Kelvin over all the KIDs of an array.With these two indicators in hands, we discard the skydip scans that are noisy or that yield a poor fit by applying the following selection criteria: Fig. 9. Median dT quality-fit criterion is plotted as a function of the median rms criterion for each skydip scan of the N2R9 campaign and for the three arrays.The skydips that yield a poor fit of the KID resonance frequencies and hence do not met Median dT criterion, are also discarded using Median rms.The latter criterion further discards noisy skydips.Empty diamonds show the results of the first iteration of the skydip coefficient estimation, labeled 'v1', whereas filled circles show the second iteration, labeled 'v2', for which only the skydips that met both fit-quality criteria are included.After the second iteration, all the remaining skydips met the criteria.
-Median rms < 1.5 × 10 4 Hz -Median dT < 1.6 K The threshold values have been determined using the set of 44 skydip scans of N2R9.The Median rms cut corresponds to twice the median of this quantity per skydip scan, whereas the Median dT cut is twice the standard deviation of Median dT over the skydip scans.N2R9 skydip scan selection is illustrated in Fig. 9, which shows the complementarity between the two fitquality criteria.After selection, 15 skydips are kept for the final step of the (c k 0 , c k 1 ) fit in the case of the N2R9 campaign.The (c k 0 , c k 1 ) estimation proceeds in two steps: first the parameters are estimated using all the available skydip scans for a given campaign, then the estimation is repeated using only skydip scans that met the fit-quality criteria.After the second iteration, we check that no extra skydip scan outliers are left, as shown by the 'v2' label data points in Fig. 9.The stability of the (c k 0 , c k 1 ) parameters and hence the skydip opacity estimates, have been tested against the choice of the selection criteria.We found that the τ ν estimates are robust against the skydip-scan selection as long as the selection includes good skydip scans in high opacity condition (τ 1mm > 0.44) and as the poor fitting skydip scans, which mostly correspond to deviation from the atmosphere plane-parallel model in high opacity skydip scans (as seen in Fig. 9), are excluded.

Corrected skydip
The opacity estimates are ultimately tested by assessing the stability of the top-of-the-atmosphere flux densities of bright sources for a large range of atmospheric conditions, as will be addressed in Sect.9.After opacity correction using the skydip τ ν estimate, the calibration flux density measurements show a residual dependency on the atmospheric transmission, as discussed in Sect.9.This has motivated the development of a cor-rected version of the skydip method that ensures the robustness of the flux densities against atmospheric conditions.
As already noticed in Sect.7.1.2,for small values of the lineof-sight opacity τ ν x, only the parameter combination c k 1 T atm τ ν is constrained in Eq. 10.Degeneracies between c k 1 parameters, the atmospheric temperature and τ ν can translate into a scaling factor in the fitted skydip opacities τ skydip ν .To take this effect into account, we use the flux stability estimators described in Sect.7.1.1 to fit a correction to τ skydip ν as We find a skydip ν of 1.36 ± 0.04, 1.23 ± 0.02, 1.27 ± 0.03 and 1.03 ± 0.03 for A1, A3, A1&A3 and A2 respectively.
Moreover, we test for an additional offset in the correcting relation of the skydip opacities given in Eq. 12.We find bestfitting correcting factors in agreement with the best-fit values estimated using the single-parameter correcting relation, whereas the best-fit offsets are compatible with zero at both wavelengths.We conclude that correcting the skydip opacity estimates for a normalisation as given in Eq. 12 suffices for ensuring flux density robustness against atmospheric opacity conditions.
The exact physical origin for the discrepancy of the empirical factor a skydip ν from the expected unity value for the 1 mm wavebands is currently under investigation.Beside the c k 1 -to-τ ν degeneracy and the effect of the variation of the atmospheric temperature, possible explanations also include a ground pick-up effect.Indeed a fraction of the beam detects radiation from the ground, as evidenced by forward efficiency measurements using heterodyne front-ends (see Sect. 6.3).However, the stability of the flux densities corrected using the corrected skydip opacities and the results consistency using several observation campaigns, as discussed in Sect.9, constitutes a validation of this approach.

Opacity estimate consistency checks
First, we test the stability of the skydip opacities from one observation campaign to another.Panels a) in Fig. 10 show the correlation between the skydip τ ν estimates τ skydip ν and the medianfiltered time-stamped IRAM 225 GHz taumeter zenith opacities τ 225 , as described in Sect.7.1.1,for a series of scans of Uranus and MWC349 acquired during the reference observation campaigns.As guidelines, we also show the predicted correlations using an ATM model integrated in the NIKA2 bandpasses and in the 225 GHz band.The τ skydip ν to τ 225 correlation relations are consistent within statistical errors for the three campaigns.At 1 mm they are also in agreement with the ATM model expectations, while at 2 mm the ATM model underestimates the measured skydip τ ν .Using the correction described in Sect.7.1.4,the ATM model consistently underestimates the corrected skydip opacity estimates at both wavelengths.This mild discrepancy with the ATM model predictions is yet to be understood, but has no impact on our opacity measurements, which do not rely on this model nor on the precision with which the observing bandpasses are known.Further consistency test with ATM expectations will be performed in the future using in-situ bandpass measurements with a dedicated Martin-Pupplett spectrometer.
We further check the robustness of τ skydip ν against the observing elevation.Panel b) in Fig. 10 shows the ratio of NIKA2 skydip opacities to the 225 GHz taumeter opacities as a function of the average scan elevation.The skydip opacity measurements have no significant dependency on the elevation.Moreover, we observe that the results are consistent for Uranus beammap scans and for the shorter raster scans of the secondary calibrator MWC39, indicating that the skydip opacity estimates do not depend on the type of observation scans.
As a summary, NIKA2 skydip opacity estimates i) have reproducible correlation coefficients with the 225 GHz taumeter opacities from a campaign to another, ii) are robust against the observing conditions, and iii) are stable for various sources and scanning strategies.In addition to these properties, the corrected skydip opacities further ensure flux density measurements that are immune to the atmospheric and observing elevation conditions.We use them in the baseline calibration.

Calibration
In this section, we present the absolute calibration of the flux densities.We use Uranus as the main primary calibrator.Sect.8.1 describes the absolute calibration method, Sect.8.2 presents the inter-calibration of all the KIDs and the flat fields.While gathering several observations of calibrators, we have evidenced a daily variation of the absolute calibration coefficients related to daily variation of the beam size induced by weather temperature.If left uncorrected, it leads to a sizeable increase of the calibration uncertainties.To overcome this issue, we primarily flag the most impacted observation times of the day and exclude them from further analysis.We discuss this effect in Sect.8.3.In Sect.8.4, the baseline calibration procedure is summarized and stability tests are performed.

Absolute calibration procedure and photometric system
We detail here the procedure for calibrating the absolute scale of the flux density and the chosen photometric system.

Photometric system
The main primary calibrators of NIKA2 are the giant planets Uranus and Neptune.The latter is used when the former is not visible in the most stable observing conditions.The flux density expectations of the primary calibrators are derived in Appendix A. Reference frequency, ν 0 260 GHz 150 GHz Reference FWHM, FWHM 0 12.5" 18.5" We parametrize the primary calibrator flux density as S c (ν) = S c (ν 0 ) f (ν/ν 0 ), where f (ν/ν 0 ) encloses the spectral dependence, as a function of a reference frequency ν 0 that we choose arbitrarily to be: ν 0 = 150 GHz for the 2 mm array and ν 0 = 260 GHz for both 1 mm arrays.After projecting the raw data (in units of the KID resonance frequency shift, Hz) of a calibrator c on the sky, we model the calibrator raw map as a fixed-width 2D Gaussian where A c is the amplitude of the Gaussian in Hz, and σ 0 is derived from the reference FWHM, labelled FWHM 0 , which is 12.5 for the 1 mm arrays and 18.5 for the 2 mm array.These values have been chosen larger than the main beam values, as reported in Sect.6, to account for a fraction of the signal stemming from the first error beam and first side lobes.Both the reference frequency, ν 0 , and the reference FWHM, FWHM 0 , define our reference photometric system, as summarized in Table 12.
The absolute calibration coefficients are estimated from observations of primary calibrators c, as the ratio of the calibrator flux density expectations at the reference frequency S c (ν 0 ) and A c .Then, for any observed point-like source s of projected map R s (θ, φ) in Hz, the map is calibrated in Jy/beam.The best-fit amplitude estimate of the fixed-width FWHM 0 Gaussian on this map directly gives an estimate of the flux density of the source at the reference frequency S (ν 0 ), excluding color corrections.

Color correction
The flux density estimate S (ν 0 ) gives the flux of the source at the reference frequency only if the source has the same spectral behaviour as the calibrator.In general, to retrieve the flux of the source at the reference frequency, a color correction C s has to be applied which depends on the reference frequency ν 0 , the source SED I s ν and the NIKA2 bandpasses.Neglecting the effect of the atmosphere on the NIKA2 transmission, we compute the color correction factor for target sources of SED that are different from Uranus using where T (ν) is the NIKA2 transmission (Sect.2.5).Assuming Rayleigh-Jeans SED for the calibrator (I c ν ∝ (ν/ν 0 ) α c ) and the source (I s ν ∝ (ν/ν 0 ) α s ), and a spectral index α c = 1.6 for Uranus, we provide color correction factors for eight values of the spectral index of the source α s in Table 13.

Extended sources
The Jy/beam units map M s (θ, φ) can not be used directly to perform aperture photometry or to obtain the flux density of extended sources.We first need to convert M s (θ, φ) in a map in Jy/sr, M ap (θ, φ), taking into account the full beam pattern, as discussed in Sect.6.3.First we correct with the solid angle enclosed in the reference fixed-width Gaussian beam Ω 0 = 2πσ 2 0 to obtain Jy/sr.Then, to account for the signal in the total beam pattern, we further correct with the reference beam efficiency BE 0 , which is defined as the ratio of Ω 0 and the measured solid angle of the total beam Ω tot .To summarize, the map in Jy/sr relates to the map in Jy/beam using In Table 14, we provide the reference beam efficiencies estimated up to a radius of 180 , which are computed as the ratio between Ω 0 and Ω 180 (see Sect. 6.3).To account for the power stemming from radii ranging from 180 to a cutting radius r c , we estimate correcting factors η r c to these reference beam efficiencies, defined as We give the correcting factor η 390 accounting for the main beam and error beams contributions up to 6.5 using Ω mb+eb 180<r<390 , as given in Table 10, as well as the correcting factor η tot accounting for the total beam solid angle using Eq. 5.These correcting factors are given in Table 14.

Calibration procedure
In practice, the calibration procedure is performed in two steps.First S c (ν 0 )-to-A c ratios per detector G k are estimated for each KID k using the map per KID projected from a beammap scan of a calibrator.The calibration coefficient G k for the KID k is computed at zero atmospheric opacity as: where S c (ν 0 ) is the expected flux density of the source at the reference frequency ν 0 (see Appendix A.1), τ ν x is the line-ofsight opacity measured using the corrected skydip method (see Sect. 7) and A k is the best-fit amplitude of the reference FWHM 0 Gaussian, which is fitted in the KID map.This first step accounts for both the relative calibration between KIDs and the absolute calibration using a single calibrator scan.Secondly, the absolute calibration is further refined by evaluating a flux density rescaling factor using a series of observations of Uranus or Neptune.After the first step of the calibration is performed, the KID TOI are projected into a calibrated map M ν , as described in Sect.4, where ν stands for the three arrays and the 1 mm-array combination, and the atmospheric attenuation is corrected.For each of the calibrator observation scans, we compute the ratio between the expected calibrator flux density S c (ν 0 ) and the measured calibrator flux density in M ν , which is estimated as described in Sect.8.1.1.The flux density rescaling factor per array is the average expected-to-measured flux density ratio over all the selected calibrator scans.
The primary calibrator scans are first selected as discussed in Sect.4.6.Then, in addition to the baseline scan selection cuts, we use a Gaussian beam size criterion.The FWHM estimated from the planet observation map is required to be lower than 12.5 at 1 mm and lower than 18 at 2 mm.In further mitigating the flux scatter due to beam broadening, we ensure better accuracy of the absolute calibration.

Relative calibration & flat fields
While absolute calibration of each KID also de facto provides relative calibration, the latter is interesting in itself to characterize the instrument.We focus on this aspect in this section.
The dispersion of the detector responsivity across the field of view (a.k. a. flat fields) has been characterized in the following ways.Main beam flat fields.These are the focal plane distribution of the calibration coefficients per KID.They describe the focal plane distribution of the point spread function (PSF) in the far field of the telescope.The calibration coefficients G k are estimated using Eq. 19, as discussed in Sect.8.1.4.Forward beam flat fields.These are the focal plane distributions of the relative response of each KID to the near field atmospheric background.They are estimated using the correlation factor of each detector TOI to a median common mode estimated offsource (see Sect. 4.4 for more details on common modes).
Figure 11 shows the average main beam flat field for the three arrays.These have been constructed by combining the normalized flat fields of five beammaps acquired during two technical observation campaigns.These data were selected by thresholding the line-of-sight opacity measured in the 1 mm band to τ ν x ≤ 0.85.The distributions for the average flat fields are shown in the bottom panel of Fig. 11.
We observe a significant variation of the flat fields for A1 from the left-most side to the right-most side of the FOV.This    (a) Reference Beam Efficiency, estimated as the ratio between the reference FWHM beam power and the total beam power up to a radius of 180 arcsec reveals a significant change of A1 detector responsivities depending on their position in the focal plane.Namely, this effect mainly impacts the left-most third of the array, which is referred to as the "shadow-zone".This variation of the flat field translates into a broadening of the distribution shown in the lower panel of Fig. 11.However, we verified that A1's flat field dispersions are in line with the ones of A3 after the detectors within the shadow-zone were flagged out using a crescent-shaped mask.
The masked flat field distributions are shown in green in Fig. 11, whereas shadow-zone distributions are in red.The same FOV patterning is also observed in the forward beam flat fields, which excludes a main beam related issue.
The shadow zone effect is caused by a misbehaving of the dichroic in the polarized transmission which is out of specifications.As a result, the 1 mm polarisation that illuminates A1 is attenuated.This effect, which implies a dependence of the frequency cut-off on the radiation incidence angle and linear polarisation, was reproduced using optical simulations.Furthermore, this hypothesis was verified using observations at the technical campaign of September 2018.During this test campaign, a new hot-pressed dichroic had been installed in place of the current air-gap dichroic.The shadow zone variations of the flat field for A1 were not observed during the September 2018 campaign, while huge distortions across the field of view of A2 were reported.These distortions are due to the bending of the hotpressed dichroic at low temperature.This test has confirmed that the shadow zone effect was due to incoming radiation absorption by the current air-gap dichroic.Further efforts are currently conducted to the design of a dichroic that combines robustness against bending induced by low temperature and optimal transmission.The air-gap dichroic, which is immune to low temperature-induced deformation, has been re-installed at the end of the September 2018 run coming back to the instrumental set up discussed in this paper.We have checked that, as expected, the performance of the instrument after this intervention was consistent with the one reported in this paper.

The temperature-induced variation effect
We evidenced a daily variation of the flux density estimates that correlates to the measured beam size.This beam broadening happens mostly in afternoons and around sunrise and sunset.It is also reproducible from one campaign to another.It most certainly comes from the combination of two different effects.
First inhomogeneous solar illumination leads to large scale deformations of the 30-m primary mirror, which in turn lead to variable de-focussing of the telescope.To mitigate this effect, the telescope is equipped with an active thermal ventilation system of the primary mirror and an active temperature control of the secondary support legs.This system is, however, challenged when the Sun partially illuminates the telescope This is a known effect that also impacts observations with the heterodyne instruments operated at the IRAM 30-m telescope.It had also been already observed with the previous generation total-power instruments MAMBO-2 (Kreysa et al. 1999).However, the magnitude of this effect is likely to have increased with the slow disappearance of the surface painting of the primary mirror in the past few years.
Second, on short time scales, atmospheric anomalous refraction also often plays a role..As far as the 30-m telescope is concerned, it has first been described in Altenhoff et al. (1987).Based on experience with the heterodyne receivers, afternoon hours are often affected with an unstable atmosphere when rising moist air moves through the beam of the telescope and causes random refraction.The pointing is then observed to change within few seconds by few arcseconds resulting in an average enlargement of the beam pattern.This effect has been confirmed by measuring several arcsec displacements of the source position when projected using different subsets of subscans of a single observation, as described in Appendix C.2.We find that the apparent beam broadening during the afternoon is due to anomalous refraction for between one third and one half of the scans over the period studied here.
As both effects (the primary mirror deformations and the anomalous refraction) are due to ambiant temperature variations, we refer to them as temperature-induced variation effects in the following.
The temperature-induced variation of the beam size as a function of the UT hours is shown in Fig. 12 for the three reference campaigns using bright sources.These are selected by thresholding the flux density estimates above 1 Jy at both wavelengths.The beam size is estimated by fitting a 2D elliptical Gaussian to the map and computing the geometrical FWHM using Eq. 2. For the resolved planet as Uranus, the FWHM esti-Fig.12. Beam size monitoring using OTF raster scans.Geometrical FWHM at 1 mm (top panel) and 2 mm (bottom panel), as a function of the observation time in UT hours, are shown using scans of giant planets (filled circles) and bright point-like sources with a flux density higher than 1 Jy (filled stars) for the three reference observation campagns (N2R9,N2R12 and N2R14).The cross-hatched areas correspond to the observing time periods that are discarded using the baseline scan selection, as described in Sect.4.6.mates are corrected for the beam broadening due to the finite extension of the apparent disc of the planet, as in Sect.6.2.We observe the same evolution of the FWHM for all campaigns.This goes from a plateau at a median value of 11.3 at 1 mm and 17.5 at 2 mm during the night, to a smooth rise that reaches a maximum of about 14 at 1 mm and 18.5 at 2 mm around 16:00 UT hours.The beam broadening becomes large around 15:00 UT and returns to the plateau only around 22:00 UT.To mitigate the impact of the temperature-induced variation, observation scans acquired during this time interval must be discarded.The UT ranges that are discarded using the baseline scan selection (see Sect. 4.6) are shown as cross-hatched areas in Fig. 12.They consist of the afternoon period between 15:00 and 22:00 UT as well as the period from 9:00 UT to 10:00 UT while the Sun rises.Whereas the global trend of the beam variations is the same for all campaigns, we observe some variability in the amplitude of the effect over the campaigns.This supports the assumption of the important role of the partial illumination of the primary mirror by direct sun light in the temperature-induced effect.This in turn induces a variability of the amplitude of the effect depending on the angle between the telescope boresight and the Sun all along the observations.
The same beam size variations in time are observed using scans of giant planets (Uranus and Neptune) or other bright sources (mainly quasars).However, planets lead slightly larger FWHM estimates than quasars, because of the larger contribution of the error beams to the fitted 2D Gaussian, as their flux are measured with a higher signal-to-noise.

Baseline calibration
To assess NIKA2 performance, we rely on a baseline calibration that resorts to the following steps: i) the calibration in FWHM 0 Gaussian as detailed in Sect.8.1 is implemented, ii) the effect of the temperature-induced variation of the beam size is mitigated using the baseline scan selection described in Sect.4.6 and iii) the atmospheric attenuation is corrected using the corrected skydip opacity estimation described in Sect.7.1.4.
The baseline calibration is validated below by checking the stability of Uranus flux density estimates against the beam size (Sect.8.4.1) and against the atmospheric transmission (Sect.8.4.2).The baseline calibration results are compared on alternative calibration methods using other opacity correction (Sect.8.4.3).

Flux stability against the beam size
We present the Uranus measured-to-predicted flux density ratio as a function of the 2D Gaussian FWHM estimates and colorcoded from the observation times given in UT hours in Fig. 13.As expected from the discussion in Sect.8.3, the largest FWHMs are measured on scans acquired in late afternoon, especially from 16:00 UT to 21:00 UT.Fig. 13.Uranus flux density ratio vs beam size after baseline calibration.The ratio of Uranus measured flux densities to expectations as a function of the measured 2D Gaussian beam FWHM is shown for the 1-mm array combination (top panel) and for array 2 (bottom panel) after absolute calibration using the baseline method.These plots include all Uranus scans acquired during the N2R9, N2R12 and N2R14 campaigns and whose beam FWHMs are below the threshold indicated by the vertical red lines (open circles), as well as the scans that met the baseline selection criteria (filled circles).
The flux density estimates have been calibrated beforehand, so that the flux density ratios are equal to unity in average by construction.We observe no significant dependence of the selected scan flux ratios (shown as filled circles in Fig. 13) on the beam FWHM.This gives a first indication of the efficiency of the baseline scan selection to mitigate the temperature-induced beam variation effect.The flux stability against the beam FWHM is further assessed in Sect.9.

Flux stability against the atmospheric transmission
We test the stability of Uranus flux densities calibrated using the baseline method against the atmospheric transmission.The latter depends on the measured zenith opacity τ ν and the scan average air mass x as exp (−τ ν x).In the first row of Fig. 14, Uranus flux ratio is shown as a function of the atmospheric transmission for the 1-mm array combination and Array 2 and for the three reference observation campaigns (N2R9, N2R12 & N2R14).We observe no significant correlation of the flux ratio with the atmospheric transmission, which gives a first indication of the robustness of the flux density estimates against the atmospheric conditions using the baseline calibration.This will be further tested using a larger number of scans toward other sources in Sect.9. Baseline Taumeter Skydip Fig. 14.Uranus flux density ratio vs atmospheric transmission shown for the 1-mm array combination (left column) and for array 2 (right column) after absolute calibration using (first row) the baseline method, (second row) the 'taumeter'-based and (third row) the 'skydip'-based methods.These plots include all Uranus scans acquired during N2R9, N2R12 and N2R14 campaigns.

Comparison with other opacity correction methods
As a cross-check we have derived the absolute calibration factors using the taumeter (Sect.7.1.1)and the skydip (Sect.7.1.2)atmospheric opacity correction methods.We then compare Uranus flux density estimates after absolute calibration using the baseline calibration and these two alternative corrections.Figure 14 shows the Uranus measured-to-modeled flux ratio as a function of the atmospheric transmission for A1&A3 and for A2 after the taumeter correction (second row) and after the skydip correction (third row).We observe more dispersion for the taumeter-based flux ratio, whereas the skydip-based ratio is very similar as the baseline ratio except for a slight decrease of the flux at low atmospheric transmission.Thus the taumeter and skydip atmospheric opacity correction methods can be used for the absolute calibration in complement to corrected skydip, e.g. to perform robustness tests as in Sect.9.

Photometry & Stability Assessment
NIKA2 photometric capabilities after the calibration presented in Sect.8, are assessed in this section.Firstly, we use observation of secondary calibrators (planetary nebulae NGC7027, CRL2688, and MWC349A) to test the consistency of the flux density estimates with expectations.The flux density expectations in NIKA2 bands for these calibrators are given in Appendix A.2.Then, we verify the stability of the photometry with respect to the atmospheric conditions using a large amount of observations toward a large variety of sources.
The quality criteria used to assess the photometric capabilities and calibration results are defined in Sect.9.1.In Sect.9.2, these criteria are evaluated for the baseline calibration, and in Sect.9.3, we compare these results with other calibration method results before summarizing the main results in Sect.9.4.

Calibration accuracy and uncertainty assessment
We assess the photometric performance by evaluating two quality criteria: first, the calibration bias checks the accuracy of the absolute calibration and then the calibration relative uncertainties test the stability of the flux densities.Finally, we review here the systematic uncertainties on the flux density.

Calibration bias
We define the calibration bias b ν , where ν stands for Array 1, 2, 3 and the 1 mm array combination, as the ratio between the measured flux density Ŝ ν using the reference photometry (Sect.8.1.1)and the flux density expectations S s (ν 0 ) as given in Appendix A.2. From a series of secondary calibrator scans, we evaluate the average calibration bias b ν , which by construction, should be equal to unity within uncertainties.Moreover, we check the stability of the calibration bias against the observed opacities as a robustness test of the opacity derivation method.Likewise, we verify that the photometry is insensitive to optical variations by checking the stability of the calibration bias against the measured beam size.

Calibration rms uncertainties
We evaluate the standard deviation of bright source measuredto-median flux density ratio σ ν per array or array combination ν.As the flux density of most of the considered sources is unknown a priori, we compare the flux density estimate in a single observation scan to the average flux density throughout an observation campaign.This method requires the selection of sources that are bright enough to be detected with a high signal-to-noise ratio with a single repetition of an usual 8 × 5 OTF raster scan.Namely, we perform a source selection by thresholding the flux estimate to 800 mJy at 1 mm and 400 mJy at 2 mm.Moreover, we consider only the sources for which a minimum of 10 scans are available after selection to ensure a precise average flux den-sity estimation.Finally, the selected source scans must meet the baseline scan selection criteria given in Sect.4.6.
Since the flux density ratios are not Gaussian distributed, we evaluate the 68 and 95% confidence level (C.L.) contours using the measured distributions in order to further characterise the uncertainties.We check that the rms errors are larger than the 68% C.L. contours and thus provide conservative 1σ-like errors.
As the rms of the flux density ratio is estimated on a scan set that is representative of the observing conditions encountered at the 30-m telescope, this is an estimate of the calibration uncertainties that encloses errors of optical, atmospheric, instrumental noise and data processing origins.This includes the errors sourced by the temperature-induced beam variations, the effect of the elevation, the uncertainties of the atmospheric opacity correction using either the skydip or the taumeter method and the atmospheric and instrumental noise residuals after the data reduction (Sect.4).

Absolute and systematic uncertainties
To account for all uncertainties, we must add to the rms calibration uncertainties the absolute calibration uncertainties and the systematic errors.The absolute uncertainty is the uncertainty on the primary calibrator flux density expectations.In the case of Uranus, Moreno (1998) and Bendo et al. (2013) report uncertainties of about 5% at both wavelengths.
For the baseline calibration method, which resorts to the corrected skydip method for the atmospheric opacity correction (see Sect. 7.1.4),the uncertainties on the correcting factor a skydip ν , as defined in Eq. 12, must be propagated to the flux uncertainties.These uncertainties, which are referred to as the corrected skydip uncertainties, depend on the line-of-sight atmospheric opacity τ ν x.Precisely, because the corrected skydip opacity correction is consistently used for both the primary calibrator and the target source flux measurement, the corrected skydip uncertainties depends on the difference between the average line-of-sight opacity of the primary calibrator scans and the line-of-sight opacity of the source scan.We evaluate the corrected skydip uncertainties for two different τ ν x values.1) For the reference IRAM 30-m winter observing conditions, defined as 2 mm of pwv and an elevation of 60 o , the corrected skydip uncertainties are of 0.6% at 1 mm and 0.3% at 2 mm.2) In the worst observing conditions allowed by the baseline scan selection (Sect.4.6), which are τ ν x of 0.7 at 1 mm and of 0.5 at 2 mm, we find corrected skydip uncertainties of 2% and 1.5% at 1 and 2 mm, respectively.These constitute conservative upper limits on the corrected skydip uncertainties.
The uncertainties on NIKA2 bandpass measurements (see Sect. 2.5) propagate into uncertainties on the flux densities after the color correction using Eq.16.These uncertainties depend on the source SED but are neglectible in most of the cases.In particular, for MWC349, we find uncertainties below 0.1% at both wavelengths.

Baseline calibration photometry results
We measure the calibration bias and rms uncertainties, as defined in the previous section (Sect.9.1) using the baseline calibration method (Sect.8.4).
The calibration bias is evaluated using a series of scans of MWC349 acquired during the reference observation campaigns.Namely, we use the 72 scans that met the baseline selection cri-teria (see Sect. 4.6) over the 109 available scans for MWC349.The first row of Fig. 15, labelled 'baseline', shows the calibration bias b ν for the combination of the 1 mm arrays and Array 2 as a function of the atmospheric transmission exp (−τ ν x).No significant dependency of the calibration bias on the atmospheric transmission is observed.Table 15 gathers the calibration bias estimates for the three observation campaigns and for all the scans.In the 1 mm band, we find b ν in agreement with unity within the statistical dispersion for the three campaigns, whereas a 5% lack of flux with respect to expectations is observed at 2 mm, consistently for the three campaigns.This bias has a low significance with respect to the absolute calibration precision of NIKA2 (see Sect. 9.1).This will be further investigated by using other calibration methods in Sect.9.3.
Table 15.Baseline calibration results: photometry accuracy and uncertainties.The first subpanel labelled 'Bias' gives the calibration bias b ν and the second subpanel labelled 'Rms' the calibration rms error σ ν , as defined in Sect.9.1, using observations during N2R9, N2R12, N2R14 and the combination of the three campaigns.In each subpanel, the first row indicates the number of acquired scans, while the second row gives the number of selected scans using the baseline scan selection.Figure 16 shows the measured-to-median flux densities evaluated from bright source scans for the combination of Array 1&3 and Array 2 as a function of the atmospheric transmission and color-coded as a function of the observation time.From a total of 487 scans towards flux-selected sources, acquired during N2R9, N2R12 and N2R14, 264 met the baseline selection criteria and are included in Fig. 16 for testing the calibration stability.The calibration uncertainties are estimated using the standard deviation of the flux density ratios for the three campaigns.Results are gathered in Table 15.Combining all the scans, we find rms uncertainties of 5.5% for A1, 6.0% for A3, 5.7% for the 1 mm band and 3.0% for A2.Using the flux ratio distributions, we construct the 68 and 95% C.L. intervals.The 68% C.L. intervals are −6.4% and +3.4% at 1 mm and −3.8% and +1.5% at 2 mm.Hence in average the 68% C. L. errors are of 4.9% and 2.7% at 1 and 2 mm, respectively.We conclude that the rms errors are conservative estimates of the 68% C. L. errors at both wavelengths.The 95% C.L. contours are −15.8% and +5.9% at 1 mm and −8.6% and +3.8% at 2 mm.The rms errors and the 95% C.L. interval are shown in Fig. 16 with the inner and outer dashed lines, respectively.

Characteristics
The flux density ratio is constant within the rms errors along the wide range of tested atmospheric transmission, ranging from 0.5 to 0.9 at 1 mm.However, some scans at atmospheric transmissions of about 0.7 at 1 mm show a mild lack of flux density with respect to the median within the 95% C. L. contours.The scans affected by the lack of flux have all been observed either between 12:00 and 14:00 UT or between 8:00 and 9:00 UT, that are close to the threshold of the observation time cuts of the baseline scan selection (see Sect. 4.6).These scans are likely to be affected by the temperature-induced beam broadening or by the sunrise focus drift, respectively.Furthermore, we find that restricting the used observation time to the 10 more stable hours (from 22:00 to 08:00 UT) would result in rms calibration uncer-Fig.16.Baseline rms calibration uncertainties.The measured-tomedian flux density ratio of bright sources is plotted as a function of the atmospheric transmission color-coded according to the UT observation time of the scans for the combination of A1&A3 (top pannel) and for A2 (bottom pannel).The inner dashed lines from either sides of the unity-ratio line show the rms errors, which are less than 6% at 1 mm and 3% at 2 mm, while the outer dashed lines show the 95% confidence level contours.The lowest flux ratio data points correspond to some of the scans acquired during daytime between 8:00 UT and 15:00 UT hours (yellow and red), while the scans acquired during nighttime between 22:00 UT and 7:00 UT yield data points (dark blue) well distributed within the rms error with a few outliers.tainties of 3.6% at 1 mm and 1.2% at 2 mm, which constitute an improvement of about 60% at 1 mm and 40% at 2 mm of the rms errors.The baseline scan selection, which i) retains 16 hours of observation time a day and ii) results in state-of-the-art rms calibration uncertainties, constitutes an useful tradeoff representative of most of the observations with NIKA2.

Comparison with other calibration methods
In this section, the baseline calibration results are compared to results drawn either using other calibration methods obtained from different opacity corrections (taumeter and skydip as discussed in Sect.8.4.3), or including a photometric correction (PC-demo and PC-point, as described in Appendix C) to mitigate the temperature-induced variation effect (Sect.8.3).For robustness test, we evaluate and compare the photometry quality criteria of Sect.9.1 for the five calibration methods.

Calibration bias
We present the calibration bias as a function of the atmospheric transmission for the five calibration methods in Fig. 15 and report the results in the row labelled 'Bias' of Table 16.
At 1 mm, all methods lead to flux density estimates in agreement with expectations within the rms dispersion.However, taumeter flux ratios have more dispersion than the baseline flux ratios whereas skydip shows some dependency on the atmospheric transmission, with a 10 to 15% excess of the flux density with respect to expectations at high transmission.This residual systematic effect has motivated the development of the corrected skydip method, as discussed in Sect.7.1.4.These features, which are already noticeable from Fig. 15, will be confirmed and further discussed later using more observation scans.On the other hand, the calibration methods based on photometric correction (Appendix C) yield an unbiased photometry (calibration bias in agreement with unity within the rms error) while allowing the use of 30% more scans.These results are encouraging for the exploitation of scans acquired during the observing periods impacted by the temperature-induced beam variation effect (Sect.8.3).
At 2 mm, all methods result in a similar calibration bias of 0.95 with a rms error of 0.05 estimated on the MWC349 scans.This corresponds to a low-significance 5% lack of flux density toward MWC349.To summarize, the calibration bias at 2 mm is stable against i) a large range of atmospheric conditions, ii) the observation campaign, iii) the opacity correction method, iv) the method to treat the temperature-induced beam variation effect.This 5% lack of flux density is thus probably due to uncertainties on the flux density expectations for this source.They come in two flavours.Firstly the uncertainties on the flux expectation, as reported in Appendix A.2, consist in the propagation of the errors on the fitted SED from the Plateau de Bure Interferometre (PdBI) and the Very Large Array (VLA) observations.Systematic uncertainties that may also impact the SED are not included.Secondly, the NIKA2 flux density extrapolation from interferometer data may be not straightforward for MWC349.In particular, NIKA2 flux extrapolation ignores the contamination by strong masers in the radio recombination lines (Gordon 1994), while strong maser emission lines are masked in PdBI observations to measure the continuum.In addition, the resulting continuum shows indications of variability.

Calibration rms uncertainties
The calibration rms uncertainties for the five methods evaluated using flux-selected source scans are gathered in the subpanel labelled 'Rms' of Table 16.
Compared to the baseline method, the taumeter method leads to rms errors increased of about 40 and 30% at 1 and 2 mm, respectively.The skydip method shows lower dispersion but a mild correlation with the atmospheric transmission, as discussed in Sect.9.3.1.
In addition, we have checked the flux density ratios for the bunch of scans with an atmospheric transmission of about 0.7, which were discussed in Sect.9.2, by comparing calibration methods with or without photometric correction.The flux density ratios are low (within the 95% C.L. interval) for the scans observed within 12:00 and 15:00 UT in the three first methods, as shown in Fig. 16 for the baseline method.By contrast, they are within the 68% C.L. interval when using a photometric correction.This further validates the hypothesis that the low flux density of these scans is due to temperature-induced beam effect, as assumed in Sect.9.2.This also constitutes an example of the calibration improvement obtained by using a photometric correction.
Moreover, results based on the PC-demo method show that rms calibration uncertainties as low as 3.8 and 2.2% at 1 and 2 mm are within the reach of NIKA2 without any selection based on the observation time.However, we recall this method relies Table 16.Comparison of results using five calibration methods: (first column) the baseline calibration, (second and third) the calibration methods using other opacity correction, taumeter and skydip, respectively, and (fourth and fifth) the calibration methods using a photometric correction, PC-demo and PC-point, respectively (see Appendix C).The calibration biases, as defined in Sect.9.1.1,are reported in the subpanel labelled 'Bias'.The subpanel labelled 'Rms' gathers (first row) the total number of observation scans of bright sources (see Sect. 9.1.2)acquired during the reference campaigns, (second row) the number of selected scans, and (third to sixth rows) the rms calibration uncertainties (Sect.9.1.2) for Array 1, Array 3, the combination of A1 and A3, and Array 2, respectively.on accurate beam estimates.Using PC-point, which is the practical case, still improves the calibration uncertainties w.r.t. the baseline results but by a factor of about 20% in both bands.Furthermore, the differences between the flux density ratios from PC-demo and PC-point, which are seen e.g. from the corresponding panels of Fig. 15, are likely to be due to the photometric correction noise when monitoring the beam from pointing scans (see Appendix C.2).We conclude that more control on the beam monitoring is needed before routinely using a calibration based on photometry correction.By contrast, the baseline method combines good performance with robustness.

Summary
Among the methods that rely on the UT hour-based scan selection to mitigate the effect of beam size variations, the baseline method shows the best performance in terms of calibration bias and uncertainties.The methods that rely on a photometric correction show good calibration results, and thus represent a promising lead to further improve the calibration uncertainties.However, their robustness depends on the accuracy of the beam monitoring.The proposed beam monitoring based on pointing scans induces some extra dispersion of the flux densities.A more accurate beam monitoring is feasible but requires using dedicated observation scans.From the baseline method results discussed in Sect.9.2, we have found that the measured flux density of MWC349 is in agreement with expectations within 5% for both wavelengths.Moreover, the rms calibration uncertainties are of 5.7% at 1 mm and of 3% at 2 mm using a series of 264 scans of sources of flux density above 1 Jy.These results demonstrate the excellent accuracy and stability of the NIKA2 photometric capabilities.

Sensitivity
In this section, we derive the on-sky sensitivity of the instrument using a large amount of observation scans, including deep integration on faint sources, and assess the stability of our results against the observing conditions.We evaluate the noise equivalent flux density, which is referred to as NEFD hereafter.To further represent the mapping capabilities, we also derive the mapping speed.We first discuss the definitions and the technical derivation of these quantities in Sect.10.1 from measurements.Then, we briefly present several estimation methods that have been considered and the data sets that have been selected in Sect.10.2.Results, together with robustness tests, are reported in Sect.10.3.

NEFD and mapping speed definitions
The NEFD is the 1 σ error on the flux density of a point source in one second of integration time, considered at zero atmospheric opacity.At this stage, we consider a map of a point-like source located at its center and observed with zero atmospheric opacity.
The estimation of the flux density uncertainty σ is described in Sect. 4. We derive here the integration time from actual observations.Indeed, it is not simply the duration of a scan, it depends on the KID distribution in the FOV and the scanning strategy.The flux density map, which has a resolution of ∆r, also comes along with a hit map H p , which counts the number of data samples per pixel (see Sect. 4).From this, we derive the integration time at the center of the map as where f sam is the sampling frequency and H p center is the average of the hit map taken in a disk of radius of one FWHM to be immune to shot noise statistics in individual map pixels.Correcting this quantity from the density of detectors per map pixel at the same time, gives the on-source detector integration time, in other words, the time when the source is actually being observed, by at least one detector: where g is the center-to-center distance between adjacent KIDs in the FOV (Sect.5.2).t det matches the total duration of the scan if the scanning strategy is designed so that the source is always in the FOV and if the FOV is full of valid KIDs.The NEFD is defined using the previous quantities as and is given in mJy • s 1/2 .The mapping speed, M s , is the sky area A scan that can be mapped at a noise level ∆ σ of 1 mJy in an integration time ∆ t of one hour.Noting d FOV = 6.5 arcmin the FOV diameter and η the fraction of valid KIDs for an observation (Sect.5.1), the mapping speed is and has units of arcmin 2 •mJy −2 •h −1 .In real observation conditions, characterized with a given atmospheric opacity τ ν and a given air mass x, correcting the flux density for atmospheric attenuation using Eq.6 increases the NEFD as NEFD τ ν , x = NEFD e τ ν x (24) Using these definitions, the integration time required to reach a target flux density uncertainty σ obs over an observing area A scan for an airmass x and for an atmospheric opacity τ ν is

NEFD estimation methods and scan selection
We have developed several methods for the NEFD estimation.First, we use deep integrations on faint sources.Second, we resort to joint analysis of multiple scans without combining them.

Deep integration method
According to Eq. 22 and Eq. 24, if a source was observed under stable atmospheric conditions, the flux uncertainty would scale directly like t −1/2 det .Using long-time integration observation on a source, this relation provides both a way to estimate the NEFD and to check that the noise does integrate down as expected with the integration time.To that aim, we produce a series of maps, as described in Sect.4.5, using an inverse-variance co-addition of an increasing number of observation scans, and perform a photometric analysis on each map according to Sect. 9.However, in practice, in particular for integrations of several hours, observing conditions do change.Since all the scans are not acquired in the same conditions of atmospheric opacity and observing elevation, they do not contribute with the same weight to the co-addition.In such case, an effective integration time for the co-addition of n scans is defined as where t i , τ i ν and x i are the integration time, the zenith opacity and the air mass of the i-th scan of the n-scans co-addition.Generalizing Eq. 22, the flux density uncertainties on the co-addition of n scans is σ(n) = NEFD/ √ t eff (n).An estimate of the NEFD can therefore be obtained in fitting σ(n) as a function of the corresponding t eff (n).

Scatter method
For any scan, we derive NEFD τ ν , x .Using Eq. 24, the joint analysis of a series of scans acquired with various observing conditions provides an estimate of the NEFD.The scan sample can gather different sources.The selection of the source target for the NEFD derivation is primarily based on the flux density.Indeed, noise characterization may be biased by the signal of a bright source stemming from the most extended error beams and far side lobes.We therefore restrict the analysis to sources with estimated flux below 1 Jy.
Table 17.Stability of the NEFD estimates.Top-of-atmosphere NEFD in mJy.s 1/2 for the two methods described in the text, which are the deep integration (labeled Deep int) and the scatter method (Scatter), and using two different data sets, HLS J0918+5142 and all sub-Jy sources acquired during the reference observation campaigns.The results given in the last row are based on more than a thousand scans (202, 481 and 430 scans during N2R9, N2R12 and N2R14, respectively).First we test the stability of the NEFD estimates using the two methods on the same data set.With this goal, during the N2R9 run, we selected HLS J0918+5142, a moderately faint source (Combes et al. 2012), expected to have flux densities of 74.5 mJy at 1 mm and 15.7 mJy at 2 mm.It was observed for about 9 h in total over three nights using 8 × 5 arcmin 2 OTF raster scans of various orientations.
The left panel of Fig. 17 shows the flux density uncertainties as a function of the integration time.The effective integration time is different between the 1 mm and 2 mm arrays because they have different KID spacings g in the FOV and different atmospheric opacities (see Eq. 26).Black lines show the fit with the inverse of the square root of the integration time, confirming that the noise integration is well consistent with the expected scaling law.The observed small variations around the theoretical Fig. 17.Comparison of NEFD estimates using two methods on observations of HLS J0918+5142.Left panel: Evolution the 1 σ flux density uncertainties as a function of the effective integration time t eff , as defined in Eq. 26, for A1 (cyan), A3 (dark blue), the combination of A1&A3 (medium blue), and A2 (red).The solid black lines are the best-fit models using σ(t eff ) = NEFD/sqrt(t eff ).Right panel: NEFD as a function of the measured line-of-sight opacity using the same color code as in the left panel.The solid black lines are the theoretical fits of NEFD τν, x = NEFD e τν x and give the NEFD when extrapolated to τ ν / sin(el) = 0. Fig. 18.Comparison of the NEFD estimates for three observation campaigns.The measured NEFD using the scatter method is plotted as a function of line-of-sight opacity (τ ν x) for the 1 mm (left) and 2 mm (right) channels.Data points are NEFD estimates in mJy • s 1/2 for N2R9 (blue), N2R12 (orange) and N2R14 (chartreuse).We also show in the plots the expected NEFD evolution with the line-of-sight opacity as solid curves using the median zenith opacity derived from all the scans acquired during a campaign.fit correspond to variations of line-of-sight opacity during the integration.These variations are taken into account for evaluating the NEFD as discussed in the previous section.The right panel of Fig. 17 shows the NEFD τ ν , x per scan, along with best-fit models using Eq. 24, from which are derived the NEFD estimates.Results from these analyses, per array and for the combined A1 and A3, are presented in Table 17.
We observe systematically higher NEFD for A1 compared to A3, which is a mainly due to the dichroic-induced 'shadow effect' that also impacts the flat fields, as discussed in Sect.8.2.The dichroic-induced effect, which also impacts A3, explains part of the NEFD difference of performance between the 1 mm and 2 mm channels, together with the higher atmospheric noise and lower telescope beam efficiency at 1 mm with respect to 2 mm wavelength.Moreover, the NEFD evolution with sky noise is well consistent with expectations for each array and each observing wavelength.
As a second robustness test, we check the stability of the NEFD for three observation campaigns.Figure 18 shows the measured NEFD using the scatter method (see Sect. 10.2) for the sub-Jansky sources acquired at the N2R9, N2R12 and N2R14 campaigns.The NEFD estimates for the three campaigns are in agreement within uncertainties for the whole range of line-ofsight opacities that have been tested.The solid lines show the expected dependance with exp[τ ν x] as given in Eq. 24.Since the measured NEFD τ, x are not Gaussian distributed, we derive the NEFD as the median of the NEFD τ, x per scans after correction of the atmospheric attenuation, which provides us with a more robust estimate compared to a fit.The NEFD estimates are given in Table 17.
Combining the data set of N2R9, N2R12 and N2R14 campaigns, more than one thousand observations scans of sub-Jy sources meet the baseline selection criteria (see Sect. 4.6), providing robust NEFD estimates that are representative of the average NIKA2 performance.The rms uncertainties are evaluated as the rms scatter of the individual NEFD τ ν , x estimates after correction with e τ ν x .These values are then extrapolated using the IRAM 30-m telescope reference Winter observing conditions: 2 mm of precipitable water vapor (pwv) and 60 degrees elevation.The NEFD estimates, as well as the rms uncertainties, are gathered in Table 18.From these estimates, we also derive the corresponding mapping speeds, which are also given in Table 18.We report mapping speeds of 1388 ± 174 and 111 ± 11 arcmin 2 • mJy −2 • h −1 at 1 and 2 mm, respectively.

Summary and conclusions
We have presented NIKA2 performance at the IRAM 30-m telescope.It has been evaluated with a baseline calibration method that goes from observations and raw data to measured flux densities.In the baseline calibration photometric system, the flux density of point-like sources are measured as the amplitude estimates of a fixed-width Gaussian of FWHM of 12.5 and 18.5 at the 150 and 260 GHz reference frequency, respectively.This method relies on three main analysis choices.First the atmospheric and instrumental correlated noises are corrected using a simple and robust method based on the subtraction of a series of the average temporal signals of the most correlated detectors.The atmospheric opacity is estimated using an improved version of the method proposed in Catalano et al. (2014) and used in Adam et al. (2018), which is based on the use of NIKA2 as an in-band total-power taumeter.Finally a scan selection based on the observation time is performed and retains 16 hours of observation a day in order to mitigate the effect beam size variations due to anomalous refraction of the atmosphere and partial illumination of the 30-m telescope that mainly impact the afternoon observations.We have also considered the pros and cons of alternative methods, which may in the future lead to even better calibration accuracy and stability.
The performance of the NIKA2 camera has been assessed using a large number of observations of primary and secondary calibrators and faint sources that have been acquired during three observational campaigns over one year.The data set spans the whole range of observing elevation and atmospheric conditions encountered on-site.The main characteristics that define the NIKA2 performance are summarized in Table 19.We highlight the main points in the following.
1.All designed KIDs detect the signal at least in some observation scans.We conservatively retain only the most stable KIDs, which are immune to the cross-talking effect and yield good signal-to-noise measurement.We report valid KID fractions of 84% for the 1 mm channel arrays and 90% for Array 2. The other KIDs, which do not meet the validity criterion, are randomly distributed across the FOV, so that the whole 6.5 arcmin FOV is covered.2. The main beam is well described with a 2D Gaussian of FWHM of 11.1 for the 1 mm channel arrays and 17.6 for Array 2, with uncertainties of 0.2 for the combination of Array 1&3 and of 0.1 for Array 2. Comparing the main beam fit to the measured full beam up to a radius of 180 , we have derived the main beam efficiency.We found main beam efficiencies of 55 ± 3% at 1 mm and 77 ± 2% at 2 mm.These results show that a significant fraction of the power is received outside the main beam.This underlying, extended, low-level beam pattern shows a complex structure of error beams, rings, and spokes.Using individual maps per KID, Adam et al. (2018) reported an rms dispersion of the main beam FWHM across the FOV of about 0.6 at both wavelengths.This is consistent with the measured curvature of the best focus surface across the FOV.We also provide the reference beam efficiencies, which are the fixed-width Gaussian beam efficiencies, up to a radius of 180", that allow taking into account the power stemming from outside the reference beam.3. We have evaluated the rms calibration uncertainties using 264 scans of sources whose flux density is above about one Jy.We find rms calibration uncertainties of about 6% at 1 mm and about 3% at 2 mm, which are state-of-the-art performance for a ground-based millimeter-wave instrument.The absolute calibration uncertainties are of 5% and the systematic calibration uncertainties evaluated at the IRAM 30-m reference Winter observing conditions are below 1% in both channels.4. The noise does well integrate as the square root of the integration time.We have derived robust estimate of the NEFD using more than a thousand scans encompassing a large range of observing conditions.We found NEFD at zero atmospheric opacity of 30 ± 3 mJy • s 1/2 at 1 mm and 9±1 mJy•s 1/2 at 2 mm.The NEFD estimates demonstrates the high-sensitivity of the KID arrays of NIKA2.The instrumental sensitivity at 1 mm is however currently mainly limited by the non-optimal transmission of the air-gap dichroic plate, mostly prominent in one polarisation component (A1) but affecting the other (A3) as well.In addition to the dichroic upgrade, further possible areas of improvements for the 1 mm observation channel are: 1) improve the data processing and in particular the noise decorrelation methods, 2) increase the bandwidth of the 1 mm arrays (subjected to the improvement of the dichroic) and 3) upgrade the surface of the telescope.5. NIKA2 mapping capabilities are better represented by evaluating the mapping speed, which is defined as the sky area that is covered in one hour of observation with a noise level of 1 mJy.We found mapping speeds at zero atmospheric ity of 111 and 1388 arcmin 2 • mJy −2 • h −1 at 1 mm and 2 mm, respectively.NIKA2 mapping speed is thus at least an order of magnitude better than the previous generation of IRAM 30-m telescope resident continuum cameras (Catalano et al. 2014;Staguhn et al. 2011;Kreysa et al. 1999).
We conclude that NIKA2 has unique capabilities in fast dualband mapping at tens arcsecond resolution.It is currently available to the whole community and will operate at the IRAM 30-m telescope at least for the next decade.NIKA2 performance meet the requirements to address a large range of science topics in astrophysics and cosmology.Notes.
(a) Number of usable detectors, which have been selected in at least two FOV reconstructions (b) Calculated from real array pixel size [2.75 mm / 2.0 mm] and unvignetted entrance pupil diameter [27 m] (c) Full-width at half-maximum of the main beam using the combined results of three methods (d) Ratio between the main beam power and the total beam power up to a radius of 180" (e) Full-width at half-maximum of the beam used in our reference photometric system (f) Ratio between the reference FWHM beam power and the total beam power up to a radius of 180" (f) Systematic calibration uncertainties due to the opacity correction using the corrected skydip method estimated at the reference IRAM 30-m winter observing conditions: 2 mm pwv, 60 o elevation (h) Effective power law of noise reduction with integration time (i) NEFD at zero opacity (j) Mapping speed at zero opacity M.D.P. acknowledges support from Sapienza Universita' di Roma thanks to Progetti di Ricerca Medi 2017, prot.RM11715C81C4AD67.
most by 5 K, that is 4.1%, with the same trend with frequency as observed for Uranus.In summary, this study confirms that Uranus ESA V4 and Neptune ESA V5 models are accurate to 5% for predicting planet flux densities.This result agrees also with the accuracy estimated from Herschel SPIRE and PACS observations (Müller et al. 2016;Swinyard et al. 2014).Furthermore, the variations of Uranus and Neptune fluxes over the duration of a typical NIKA2 run are taken into account, although they are negligible compared to the model accuracy.On the other hand, we found5 that not taking into account the planet shape and orientation with respect to the observer in the computations of its solid angle can lead to errors between 1 and 2%.

Appendix A.2: Secondary calibrators
The secondary calibrator MWC349 is a stellar binary system, including the young Be star MWC349A, surrounded by a disk.Its radio continuum emission originates in an ionized bipolar outflow (Tafoya et al. 2004).MWC349A has been monitored with the PdBI and VLA, and shown to be only slightly angularly resolved, making it a point source for the 30-m telescope.We have computed the flux densities at the NIKA2 reference frequencies 150 and 260 GHz with S ν = 1.16 ± 0.01 × (ν/100GHz) 0.60±0.01provided by this monitoring6 .
The secondary calibrator CRL2688 is an Asymptotic Giant Branch star.Its radio continuum emission is mostly from circumstellar dust and is somewhat extended (Knapp et al. 1994).Its flux densities at 850 µm and 450 µm have been stable at the 5% level as monitored by SCUBA2 in 2011-2012 (Dempsey et al. 2013).We have extrapolated these flux densities to 150 and 260 GHz using the power law S s (ν 0 ) ∝ ν α 0 with an index α = 2.44 ± 0.18 derived from the SCUBA2 measurements.
The secondary calibrator NGC7027 is a young, dusty, carbon rich Planetary Nebula with an ionized core.It is extended in the continuum and molecular lines (Bieging et al. 1991), and is not a point source for the 30-m telescope.Its most recent flux densities are reported at 1100 µm and 2000 µm in Hoare et al. (1992).It has been reported to decrease by ∼ 0.145 percent/yr in the optically thin part of its spectrum above 6 GHz from VLA observations (Zijlstra et al. 2008;Hafez et al. 2008).This makes these flux densities uncertain by 3.6% currently.Its SED from cm wavelengths to optical is also presented in Hafez et al. (2008).Perley & Butler (2013) measured its flux density between 1 and 50 GHz.Moreover, flux density measurement at 90 GHz over 20 years with the 30-m telescope are presented in Kramer et al. (2008).The flux densities have been extrapolated to 150 and 260 GHz and the modeled decrease since 1992 has been included.
All these expected flux densities extrapolated from the literature are summarized in Table A.2.
Measured flux densities however are determined over the broad bandwidth of each array and so must be color-corrected to be compared to the expected flux densities of Table A.2.For this purpose, we have derived color corrections for sources with spectral indices α comprised between -2 and +4 in Table 13.As The temperature-induced beam size variation is primarily monitored using 2D Gaussian fit on all the available bright source observations, as presented in Sect.8.3.
For a finer sampling of the observation time, we also consider using the pointing scans for beam monitoring.As discussed in Sect.3.2, the telescope pointing is monitored on a hourly basis during observation using pointing scans.As these scans consist of two sub-scans in azimuth and elevation of about 10 seconds of integration time each, they can also be used to make a map of the pointing source.For each campaign, we thus have on hands hundreds of maps of mostly point-like bright sources, which can also be used to monitor the beam size.For this purpose, pointing scans are reduced and projected onto maps of 2 resolution using the data analysis pipeline described in Sect. 4. Fitting an elliptical 2D Gaussian to this map, we compute the geometrical FWHM.Pointing scans on slightly extended sources, such as NGC7027, are discarded from the analysis.
For each pointing, we also seek for signs of atmospheric anomalous refraction.There are enough KIDs per observation band to make an independent map using only one subscan, i.e. 10 seconds of integration time.For each of the four cross subscans, we thus estimate the position of the best 2D Gaussian that fits the map.We compute the deviation between each subscanderived position and the best-fit position using the entire scan.An anomalous refraction event is detected when the difference is above 2.5 for at least one subscan.We find that the temperatureinduced beam size variation effect is due to anomalous refraction for between one third and one half of the scans, as reported in Sect.8.3.
The pointing-based FWHM estimates constitute a timesampling of the beam size during the whole observation campaign.They can serve to estimate the beam size of any observation scans, in particular toward sources too faint for a direct FWHM fit to be made on the map.However, we expect less accuracy than using standard OTF raster scans of bright sources due to the shorter integration time and the fact that only the centermost KIDs in the array see the source.To mitigate the dispersion, the time-stamped pointing-based FWHM is filtered with a running median on a 70-minute width time window.Then, the FWHM at the time of the considered scans is interpolated from the median-filtered pointing-based FWHM.

Appendix C.3: The two case studies
We perform two case studies that correspond to the two beam monitoring methods discussed above.
Demonstration case This method named PC-demo hereafter, uses a photometric correction based on the beam monitoring with bright source scans.Both the 2D Gaussian FWHM fit and the FWHM 0 photometry are performed on the map of the source.This method thus applies only to point-like sources that are bright enough for an accurate fit of the beam on a single scan.
To capture only the beam size variations driven by the observing conditions (primary mirror deformations, anomalous refraction, elevation), a small correction δ FWHM has to be made to the 2D Gaussian beam FWHM estimate for bright sources.The estimate of the actual Gaussian size σ is where the offset δ FWHM is null for faint or moderately bright point sources, and non-zero for bright sources.As for σ , the 2D Gaussian fit yields slightly broader FWHM for bright sources (e.g.planets) to accomodate for the side lobes and

Fig. 2 .
Fig. 2. Example of variations of KID time-ordered information.Top: Example of 40 KID calibrated TOIs during an observation of Uranus.The low frequency correlated component (atmospheric and electronic noises) is clearly seen.Bottom: One of these TOIs (in blue) and the common mode that is subtracted from it (in red).The zero level is arbitrary.

Fig. 4 .
Fig.4.Average detector positions for arrays A1, A3, and A2.The three plots show the detectors that met the selection quality criteria for at least two beammaps during two technical campaigns.These consist of 952, 961, and 553 detectors for A1, A3 and A2, respectively.The color indicates how many times a KID was identified as valid on a beammap, ranging from blue for the KIDs valid in at least two beammaps, to red for the KIDs valid in all (ten) beammaps.The inner and outer dash-line circles correspond to circular regions of 5.5 arcmin and 6.5 arcmin, respectively.Units are arcseconds.

Fig. 5 .
Fig. 5. From upper left to lower right, beam maps of A1, A3, the combination of the 1 mm arrays (A1&3) and the 2 mm array (A2) are shown in decibel.These maps, which consist of the normalized combination of four beammap scans of bright point-like sources, are in horizontal coordinates and cover a sky area which extends over 10 .The solid lines and arrows highlight some noticeable features.Red circle in the A1 map (upper left panel): diffraction ring seen in 1-mm maps (the spokes are presumably caused by radial and azimuthal panel buckling (Greve et al. 2010); Orthogonal green lines in the A2 map (lower right panel): diffraction pattern caused by the quadrupod secondary support structure (prominently seen in A2 map); Yellow arrows in the A3 map (upper right panel): pattern of 3 spikes seen in 1 mm maps of unknown origin; Yellow arrows in A2 map (lower right panel): four symmetrical spikes of the first sidelobes; Pink ellipses: four spikes seen in A2 maps.

Fig. 7 .
Fig. 7. Main beam FWHM estimates for the 1 mm (top) and 2 mm (bottom) channels are shown as a function of the atmospheric transmission estimated at the corresponding wavelengths using bright point source observations acquired during the reference observation campaigns (N2R9, N2R12, N2R14).
the atmosphere and reaches NIKA2 detectors.The relation between the observed flux density S ν and the topof-the-atmosphere flux density S ν is parametrized by the zenith opacity τ ν and the air mass x as S ν = S ν e −τ ν x .

Fig. 11 .
Fig. 11.Average main beam flat fields obtained by combining the flat fields of five beammap scans.The top row plots show the normalized average flat fields of Array 1, 3 and 2, respectively.The offset positions with respect to the center of the array are given in arcsecond in the Nasmyth coordinate system.The color code gives the value of the KID calibration coefficients, as defined in Eq. 19, normalized by the average calibration coefficient over all the KIDs of the array.The bottom plots show the average flat field distributions using all KIDs (blue), using Array 1 KIDs that are positioned out of the shadow zone (green) and using Array 1 KIDs inside the shadow zone, which is defined in the text.

Fig. 15 .
Fig. 15.Comparison of the calibration bias for five calibration methods using observations of MWC349.The measured-to-expected flux density ratio is shown as a function of the atmospheric transmission for the baseline method (first row) as well as for methods using the taumeter (second row) and skydip (third) opacity correction, and for methods resorting to the PC-demo (fourth) and PC-point (fifth) photometric corrections.Dashed lines show the flux density ratio rms errors.

L
. Perotto et al.: Calibration and Performance of the NIKA2 camera at the IRAM 30-meter Telescope

Fig. C. 1 .
Fig. C.1.Magnitude of the beam photometric correction f (σ) as a function of the actual FWHM, at 1 mm (blue) and 2 mm (red).Thick lines correspond to a choice of σ derived on a very bright source like Uranus.Thin lines are for sources of 1 Jy at 1 and 2 mm.
Figure C.2 shows two different FWHM estimates for the same data set: on one hand the best-fit FWHM estimates on the OTF-scan map and on

Fig
Fig. C.2. Beam size monitoring.'OTF'-labeled pink data points show the FWHM estimates from a 2D Gaussian fit on the maps of OTF raster scans towards bright sources, whereas the 'Pointing'-labeled light green data points are FWHM estimates obtained by interpolating the medianfiltered pointing-based FWHM at the time of the scans.The pointingbased FWHM estimates are obtained by fitting a 2D Gaussian on the maps of pointing scans.

Table 3 .
Summary of the number of valid detectors per array.

Table 4 .
Fraction of very stable KIDs in percent of the design KIDs.The row 'very stable KIDs' gives the fraction of KIDs that have been selected in 50% of the analysed beammap scans, while the rows 'Used KIDs' gather the median fraction of used KIDs in the data reduction processing after the conservative KID selection has been performed (seeSect.4)

Table 5 .
Median best-fitting values of the parameters of the 3-Gaussian beam profile, as defined in Eq. 1, using the BM18 data set.For i ∈ 1, 2, 3 Āi = A i / A i .The FWHM for each of the Gaussian are given in arcseconds.

Table 6 .
Estimates of the main beam FWHM in arcsec, using three estimation methods (see Sect. 6.2.1) and three data sets (see Sect. 6.2.2), and their combination.

Table 7 .
Estimates of the solid angle of the beam up to a radius of 180 , Ω 180 , and rms uncertainties given in arcsec 2 using Neptune and Uranus scans acquired during three observation campaigns, and the combined result.For each case, the number of scans is given in the column labeled 'nbs'.

Table 8 .
Main beam efficiencies up to a radius of 180 given in percent.

Table 9 .
Main beam efficiency estimates per observation campaigns, given in percent.

Table 11 .
Best-fit parameters and rms uncertainties to infer NIKA2 opacities from the IRAM taumeter measurements.

Table 12 .
NIKA2 reference frequencies and FWHM

Table 13 .
Color correction factors for a target source S ∝ ν αs , as defined using Eq.16.

Table 18 .
Median NEFD and rms uncertainties in mJy • s 1/2 , as well as the derived mapping speed and mapping speed rms uncertainties in arcmin 2 • mJy −2 • h −1 , evaluated in using the scatter method on all sub-Jy sources of runs N2R9, N2R12 and N2R14, given at pwv=0 and 90 degrees elevation (first three rows) and extrapolated at the reference Winter observing conditions at the IRAM 30-m telescope site (last three rows), which are defined as 2 mm pwv and 60 degrees elevation.