Calibration and performance of the NIKA2 camera at the IRAM 30-m Telescope

Context. NIKA2 is a dual-band millimetre continuum camera of 2900 kinetic inductance detectors, operating at 150 and 260GHz,


Pl e a s e n o t e:
C h a n g e s m a d e a s a r e s ul t of p u blis hi n g p r o c e s s e s s u c h a s c o py-e di ti n g, fo r m a t ti n g a n d p a g e n u m b e r s m a y n o t b e r efl e c t e d in t his ve r sio n.Fo r t h e d efi nitiv e ve r sio n of t hi s p u blic a tio n, pl e a s e r ef e r t o t h e p u blis h e d s o u r c e.You a r e a d vis e d t o c o n s ul t t h e p u blis h e r's v e r sio n if yo u wi s h t o cit e t hi s p a p er.
Thi s v e r sio n is b ei n g m a d e a v ail a bl e in a c c o r d a n c e wit h p u blis h e r p olici e s. S e e h t t p://o r c a .cf. a c. u k/ p olici e s. h t ml fo r u s a g e p olici e s.Co py ri g h t a n d m o r al ri g h t s fo r p u blic a tio n s m a d e a v ail a bl e in ORCA a r e r e t ai n e d by t h e c o py ri g h t h ol d e r s .

Introduction
Sub-millimetre and millimetre domains offer a unique view of the Universe from nearby astrophysical objects, including planets, planetary systems, Galactic sources, nearby galaxies, to high-redshift cosmological probes, for example, distant dusty star-forming galaxies, clusters of galaxies, cosmic infrared background (CIB), cosmic microwave background (CMB).Ground-based continuum millimetre experiments have made spectacular progress in the past-two decades thanks to the advent of large arrays of high-sensitivity detectors (Wilson et al. 2008;Siringo et al. 2009;Staguhn et al. 2011;Swetz et al. 2011;Monfardini et al. 2011;Arnold et al. 2012;Bleem et al. 2012;Holland et al. 2013;Dicker et al. 2014;BICEP2 Collaboration et al. 2015;Adam et al. 2017a;Kang et al. 2018).This fast growth will continue as experimental efforts are driven by two challenges: improving the sensitivity to polarisation to detect the signature of the end-of-inflation gravitational waves in the CMB, and improving the angular resolution.The latter has several important scientific implications: reaching arcminute angular resolution to exploit the CMB secondary anisotropies as cosmological probes, and sub-arcminute resolution to unveil the inner structure of faint or complex astrophysical objects and to map the early universe down to the confusion limit.Furthermore, a new generation of sub-arcminute millimetre experiments, like NIKA2, which combines high angular resolution with a high mapping speed and a large coverage in the frequency domain, is set to achieve a breakthrough in our detailed understanding of the formation and evolution of structures throughout the Universe.
The NIKA2 experiment is a sub-arcminute resolution high mapping speed camera, which simultaneously observes a 6.5 ′ diameter field of view (FOV) in intensity in two frequency channels centred at 150 and 260 GHz and in polarisation at 260 GHz (Adam et al. 2018).The NIKA2 instrument was installed at the IRAM 30-m telescope in October 2015 with partial readout electronics, and it has operated in the final instrumental configuration since January 2017.After a successful commissioning phase that ended in October 2017, NIKA2 is open to the community for scientific observations for the next decade.The NIKA2 camera should provide key observations both at the galactic scale and at high redshifts to address a plethora of open questions, including the environmental impact on dust properties, the star formation processes at low and high redshifts, the evolution of the large-scale structures, and the exploitation of galaxy clusters for accurate cosmology.
At the galactic scale, progress in understanding the star formation process relates to an accurate characterisation of dust properties in the interstellar medium (ISM).The NIKA2 will provide the high-resolution, high-mapping speed, dualwavelength millimetre observations that are required for the determination of the mass and emissivity of statistically significant samples of dense, cold, star-forming molecular clouds (Rigby et al. 2018).Deep multi-wavelength surveys of nearby galaxies and of large areas of the Galactic plane also make it possible to set constraints on environmental variations of the dust properties.Furthermore, NIKA2 observations are needed for a detailed study of the inner molecular cloud filamentary structure that hosts Solar-mass star progenitors (Bracco et al. 2017), in order to understand the evolution process that culminates in star formation (see e.g.André et al. 2014 for a review).Ultimately, these observations are also helpful for understanding planet formation within proto-planetary discs.
For cosmology, NIKA2 observations have two major implications.On one hand, they represent a unique opportunity to study the evolution of the galaxy cluster mass calibration with redshift and dynamical state for their accurate exploitation as cosmological probes.Galaxy clusters are efficiently detected via the thermal Sunyaev-Zel'dovich (tSZ) effect (Sunyaev & Zeldovich 1970) up to high redshifts (z < 1.5), as was recently proven by CMB experiments (Hasselfield et al. 2013;Reichardt et al. 2013;Ade et al. 2016a).The exploitation of the vast SZ-selected galaxy cluster catalogues is currently the most powerful approach for cosmology with galaxy clusters (Ade et al. 2016b).However, the accuracy of the tSZ cluster cosmology relies on the calibration of the relation between the tSZ observable and the cluster mass and on the assessment of both its redshift evolution and the impact of the complex cluster physics on its calibration.Previous arcminute resolution experiments only allowed detailed studies of the intra cluster medium spatial distribution for low redshift clusters (z < 0.2).Sub-arcminute resolution high mapping speed experiments are required to extend our understanding of galaxy clusters towards high redshift (Mroczkowski et al. 2019).The first high-resolution tSZ mapping of a galaxy cluster with NIKA2 is reported in Ruppin et al. (2018).Furthermore, NIKA2 capabilities for the characterisation of high-redshift (0.5 < z < 0.9) galaxy clusters have been verified using numerical simulation (Ruppin et al. 2019a), and their implication for cosmology is discussed in Ruppin et al. (2019b).
On the other hand, in-depth mapping of large extragalactic fields with sub-arcminute resolution with NIKA2 should provide unprecedented insight into the distant universe.Indeed, the highangular resolution of NIKA2 is key to reducing the confusion noise, which is the ultimate limit of single-dish cosmological surveys (Bethermin et al. 2017), and the high mapping speed makes it possible to cover large areas.This unique combination should result in the detection of hundreds of dust-obscured optically-faint galaxies up to very high redshift (z ∼ 6) during their major episodes of star formation.This should help to quantify the star formation up to z ∼ 6.We note that galaxy redshifts must be obtained with spectroscopic follow-up observations (e.g. with NOEMA) or multi-wavelength spectral energy distribution fittings (e.g. in the GOODS-N field thanks to the tremendous amount of ancillary data).Galaxy formation studies can also benefit from the large instantaneous field-of-view, high-resolution observations of NIKA2.
The current generation of sub-arcminute resolution experiments also includes the Large APEX Bolometer Camera (LABOCA, Siringo et al. 2009) at the Atacama Pathfinder Experiment (APEX) 12-m telescope, which covers a 12 ′ diameter FOV at 345 GHz at an angular resolution of about 19 ′′ ; AzTEC at the 50-m Large Millimeter Telescope, which operates with a single bandpass centred at either 143, 217, or 270 GHz (Wilson et al. 2008), and which has a beam FWHM of 5, 10, or 18 ′′ , respectively; the Submillimetre Common User Bolometer Array Two (SCUBA-2, Holland et al. 2013;Dempsey et al. 2013) on the 15-m James Clerk Maxwell Telescope, which simultaneously images a FOV of about 7 ′ at 353 GHz and 666 GHz with a main-beam FWHM of 13 ′′ and 8 ′′ in the two frequency channels, respectively; MUSTANG-2 at the 100-m Green Bank telescope, which maps a 4.35 ′ FOV at 90 GHz with 9 ′′ resolution (Dicker et al. 2014;Stanchfield et al. 2016).Therefore, NIKA2 is unique in combining an angular resolution better than 20 ′′ , an instantaneous FOV of a diameter of 6.5 ′ , and multi-band observation capabilities at 150 and 260 GHz.
Most of the other millimetre instruments consist of bolometric cameras.By contrast, NIKA2 is based on kinetic inductance detector (KID) technology (Day et al. 2003;Doyle et al. 2008;Shu et al. 2018a).This concept was first demonstrated with a pathfinder instrument, NIKA (Monfardini et al. 2010(Monfardini et al. , 2011)).Installed at the IRAM 30-m telescope until 2015, NIKA demonstrated state-of-the-art performance (Catalano et al. 2014) and obtained breakthrough results (see e.g.Adam et al. 2014Adam et al. , 2017b)).NIKA has been crucial in optimising the NIKA2 instrument and data analysis.
A thorough description of the NIKA2 instrument is presented in Adam et al. (2018), along with the results of the commissioning in intensity based on the data acquired during the two technical campaigns of 2017.In the present paper, we propose a standard calibration method, which is referred to as the Baseline calibration, to convert raw data into stable and accurate flux density measurements.Regarding the performance of the polarisation capabilities, their assessment is to be addressed in a forthcoming paper.To achieve a reliable and high-accuracy estimation of the performance, we performed extensive testing of the stability with respect to both the methodological analysis choices and the observing conditions.Firstly, the methodological choices and hypothesis may have an impact on the performance results and the systematic errors.At each step of the calibration procedure and for each of the performance metrics, we compared the results obtained using the Baseline method to alternative approaches to ensure the robustness against systematic effects.Secondly, we checked the stability of the results using a large number of independent data sets corresponding to various observing conditions.Specifically, most of the performance assessment relies on data acquired during the February 2017 technical campaign (N2R9) and the October 2017 (N2R12) and January 2018 (N2R14) first and second scientific-purpose observation campaigns.These observation campaigns are referred to as the "reference observation campaigns".Each campaign consists of about 1300 observation scans lasting between two and twenty minutes for a total observation time of about 150 h.
This paper constitutes a review of NIKA2 calibration and performance assessment in intensity.It is intended to be a reference for NIKA2 observations, which are set to last for at least ten years.The outline of the paper consists of two series of sections: Sects.2-4 give short summaries of the instrumental set up, the observational modes, and the data analysis methods that were used for the calibration and the performance assessment; Sects.5-10 detail the dedicated calibration methods, extract the key characteristic results, and discuss their accuracy and robustness as follows.The field-of-view reconstruction and the KID selection are discussed in Sect. 5.The beam pattern is characterised in Sect.6, along with the main-beam FWHM and efficiency.Section 7 is dedicated to the derivation of the atmospheric opacity.The methods that we propose to calibrate are gathered in Sect.8, while Sect.9 presents the validation of these methods and the calibration accuracy and stability assessment.The noise characteristics and the sensitivity are discussed in Sect.10.Finally, Sect.11 summarises the main measured performance characteristics and briefly describes next steps for future improvements on NIKA2.

General view of the instrument
The NIKA2 camera simultaneously images a FOV of 6.5 ′ in diameter at 150 and 260 GHz.It also has polarimetry capabilities at 260 GHz, which are not discussed here.To cover the 6.5 ′ -diameter FOV without degrading the telescope angular resolution, NIKA2 employs a total of around 2900 KIDs split over three distinct arrays, one for the 150 GHz band and two for the 260 GHz band.
A detailed description of the instrument can be found in Adam et al. (2018) and Calvo et al. (2016).Here, we briefly present the main sub-systems focusing on the elements that are specific to NIKA2 or that drive the development of dedicated procedures for the data reduction or calibration.

Cryogenics
KIDs are superconducting detectors, which in the case of NIKA2 are made of thin aluminium films deposited on a silicon substrate (Roesch et al. 2012).For an optimal sensitivity, they must operate at a temperature of around 150 mK, which is roughly one order of magnitude lower than the aluminium superconducting transition temperature.For this reason, NIKA2 employs a custom-built dilution fridge to cool down the focal plane, and the refractive elements of the optics.Overall, a total mass of around 100 kg is kept deeply in the sub-Kelvin regime.Despite the complexity and huge size of the system, the operation of NIKA2 does not require external cryogenic liquids and can be fully operated remotely.

Optics
The NIKA2 camera optics include two cold mirrors and six lenses.The filtering of unwanted (off-band) radiation is provided by a suitable stack of multi-mesh filters like thermal blockers placed at all temperature stages between 150 mK and room temperature.Two aperture stops, at a temperature of 150 mK, are conservatively designed to limit the entrance pupil of the optical system to the inner 27.5 m diameter of the primary mirror.An air-gap dichroic plate splits the 150 GHz (reflection) from the 260 GHz (transmission) beams.This element, which is made of a series of thin micron-like membranes separated by calibrated rings and mounted on a native ring in stainless steel, was designed to resist to low temperature-induced deformation.As discussed in Sect.8.2, the air-gap technology has been proven to be efficient in preserving the planarity of the dichroic, but shows sub-optimal performance in transmission.Moreover, a grid polariser ensures the separation of the vertical and horizontal components of the linear polarisations on the 260 GHz channel.Band-defining filters, custom-designed to optimally match the atmospheric windows while ensuring robustness against average atmospheric condition at 260 GHz, are installed in front of each array.A half-wave polarisation modulator is added at room temperature when operating the instrument in polarimetry mode.
Hereinafter, the detector array illuminated by the 150 GHz (2 mm) beam is named Array 2 (A2), while in the 260 GHz (1.15 mm) channel, the array mapping the horizontal component of the polarisation is referred to as Array 1 (A1), and the one mapping the vertical component is called Array 3 (A3).The 150 GHz observing channel is referred to as the 2 mm band, and the 260 GHz channel as the 1 mm band.

KIDs and electronics
Array 2 consists of 616 KIDs, arranged to cover a 78 mmdiameter circle.Each pixel has a size of 2.8 × 2.8 mm 2 , which is the maximum pixel size allowed in order not to degrade the theoretical 30-m telescope angular resolution.In the case of the 260 GHz band detectors, the pixel size is 2 × 2 mm 2 , to ensure a comparable sampling of the focal plane.This results in a A&A 637, A71 (2020) sampling slightly above λ/D in this channel, where D is the diameter aperture, as discussed in Sect.5.2.In order to ensure a full coverage of the 6.5 ′ FOV, a total of 1 140 pixels are needed in each of the two 260 GHz arrays A1 and A3.
The key advantage of KID technology is the simplicity of the cold electronics and the multiplexing scheme.In NIKA2, each block of around 150 detectors is connected to single coaxial line linked to the readout electronics (Bourrion et al. 2016).Therefore, Array 2 is connected to four different readout feed-lines, while Arrays 1 and 3 are both equipped with eight feed-lines.The warm electronics required to digitise and process the pixel signals is composed of twenty custom-built readout cards (one per feed-line).

KID photometry and tuning
Kinetic inductance detectors are superconducting resonators whose resonance frequency shift linearly depends on the incoming optical power.This was theoretically demonstrated in Swenson et al. (2010) and confirmed for NIKA KIDs, which have a similar design to the current NIKA2 KIDs, using laboratory measurements, as discussed in Monfardini et al. (2014).The measurement of the KID frequency shift ∆ f is critical for the use of KIDs as mm wave detectors.
For the KID readout, an excitation signal is sent into the cryostat on the feed-line coupled to the KID.The excitation tones produced by the electronics are amplified by a cryogenic (4 K) low-noise amplifier after passing through the KIDs and being analysed by the readout electronics.Each KID is thus associated with an excitation tone at a frequency f tone , which corresponds to an estimate of its resonance frequency for a reference background optical load.The transmitted signal can be described by its amplitude and phase, or, as is common practice for KID, by its I (in-phase) and Q (quadrature) components with respect to the excitation signal.The goal is now to relate the measured variations of the KID response to the excitation signal (∆I, ∆Q), which are induced by incident light, to ∆ f .For this, the electronics modulate the excitation tone frequency f tone at about 1 kHz with a known frequency variation δ f , and the read out gives the induced transmitted signal variations (dI, dQ).Projecting linearly (∆I, ∆Q) on (dI, dQ) therefore provides ∆ f .This quantity, in Hz, constitutes the raw KID time-ordered data, which are sampled at a frequency of 23.84 Hz.For historical reasons, this way of deriving KID signals has been nicknamed RfdIdQ.More details on this process are given in Calvo et al. (2013).Once ingested into the calibration pipeline, the raw data will be further converted into astronomical units (Sect.8).
In addition to light of astronomical origin, any change in the background optical load (due, for example, to changes in the atmospheric emission with elevation) also contributes to the shift of the KID resonance frequencies.In order to maximize the sensitivity of a KID, the excitation signal f tone must always be near the KID resonance frequency.We therefore developed a tuning algorithm that performs this optimisation.A tuning is performed at the beginning of each observation scan to adapt the KID f tone to the working background conditions.This process takes only a few seconds.These optimal conditions are further maintained by performing continuous tunings between two scans while NIKA2 is not observing, to match regularly with the observing conditions.

Bandpasses
The NIKA2 spectral bands were measured in the laboratory using a Martin-Puplett interferometer built in-house (Durand Fig. 1.Relative system response of three KID arrays as a function of frequency.For illustration, we also show (black) the atmospheric transmission obtained with the ATM model (Pety et al. 2009;Pardo et al. 2001) for two values of precipitable water vapour (1 and 5 mm).The spectra of the model of Uranus (pink) and the model of Neptune (green), as discussed in Appendix A, are also plotted for illustration with arbitrary normalisation with respect to the NIKA2 transmission.

2007
).The measurement relies on using the difference of two black body radiations used as input signal for the interferometer.Both arrays and filter bands were considered in the measurements, while the dichroic element was not included.Figure 1 shows the relative spectral response for the three arrays.We highlight that Array 2 was upgraded in September 2016 (during the so-called N2R5 technical campaign) and that the spectral transmissions are slightly different (red and orange lines in Fig. 1).
The two arrays operating at 260 GHz, mapping different linear polarisations, exhibit a slightly different spectral behaviour as can be seen on Fig. 1.Besides the effect of the optical elements in front of the two arrays, this may be explained by a tiny difference in the silicon wafer, the difference of the aluminium film thickness of the KID arrays and/or the different responses for two polarisations of the detector (Adam et al. 2018;Shu et al. 2018b).
It is clear from Fig. 1 that the atmosphere may modify the overall transmission of the system, especially at the transmission tails for A2.To highlight this effect, we define an effective frequency ν eff computed as the weighted integral of the frequency considering the NIKA2 bandpass and the SED of Uranus, for various atmospheric conditions.In Table 1, we give ν eff and the bandwidths ∆ν computed both at zero atmospheric opacity and for the reference IRAM 30-m winter semester observing conditions (defined by an atmospheric content of 2 mm of precipitable water vapour (pwv) and an observing elevation of 60 • ).
In the laboratory, spectral characterisation allowed the bandpass of the three arrays to be measured with uncertainties better than one percent.The bandpass characterisation will be further improved using in-situ measurements with a new Martin-Puplett interferometer designed to be placed in front of the cryostat window, making it possible to account for the whole optical system, including the dichroic plate.As discussed in Sect.8, the Baseline calibration method only resorts to the bandpass measurements for colour correction estimations in order to mitigate the bandpass uncertainty propagation in the flux density measurement.
In fact, for each array, we define reference frequencies to define NIKA2 photometric system.These are 260 GHz for the A1 and A3, and 150 GHz for A2.

Observations
This section presents the different observation modes that are used at the IRAM 30-m telescope for both commissioning and scientific-purpose observations with NIKA2.Each observation campaign is organised as an observing pool allowing the optimisation of observations of several science targets in a flexible way.Most of these observing scans are on-the-fly (OTF) raster scans, which consist of a series of scans at constant azimuth or elevation (right ascension or declination) and varying elevation or azimuth (declination or right ascension).Their characteristics are tailored for NIKA2 performance.

Focus
Observation pools start by setting the telescope focus since NIKA2's large FOV alleviates the need to adjust the pointing beforehand.We designed a specific focus procedure that takes advantage of the dense sampling of the FOV making it possible to map a source in a short integration time.This procedure consists in performing a series of five successive one-minute raster scans of a bright (above a few Jy) point source at five axial offsets of the secondary mirror (M2) along the optical axis.As the scan size is 1 ′ × 5 ′ , the main contribution to each map mainly comes from the KIDs located in the central part of the FOV.Elliptical Gaussian fits on the five maps provide estimates of the flux and FWHM along minor and major axes for each focus.The best axial focus in the central part of the array is then estimated as the maximum of the flux or the minimum of the FWHM using parabolic fits of the five measurements.
As presented in more detail in Appendix B, the focus surface, which is defined as the locations of the best focus across the whole FOV, is not flat, but rather slightly bowl-shaped.To account for the curvature of the focus surfaces and optimise the average focus across the FOV, we add −0.2 mm to the best axial focus in the central part of the array.This focus offset was measured on data using a dedicated sequence of de-focused scans, as discussed in Appendix B. It is in agreement with expectations derived with optical simulation using ZEMAX 1 .
Axial focus offsets are measured every other hour during daytime and are systematically checked after sunrises and sunsets, while one or two checks suffice during the night.Lateral focus offsets can also be checked in a similar way, but are found 1 www.zemax.com to stay constant over periods of time that cover several observing campaigns.

Pointing
Once the instrument is correctly focused, we are able to estimate pointing corrections before scientific observations.Based on general operating experience at the 30-m telescope, we use the so-called pointing or cross-type scans to monitor the pointing during observations.The telescope executes a back and forth scan in azimuth and a back and forth scan in elevation, centred on the observed source.We fit Gaussian profiles from the timelines of the reference detector, which is chosen as a valid detector located close to the centre of Array 2. The choice of a reference detector operated at 150 GHz is suitable since most of our pointing sources are radio sources.We use the estimated position of the reference detector to derive the current pointing offsets of the system in azimuth and elevation.This correction is propagated to the following scans.The pointing is monitored in an hourly basis.
In addition, we perform pointing sessions in order to refine NIKA2 pointing model.A pointing session consists of observing about 30 sources on a wide range of elevations and azimuth angles while monitoring the pointing offsets that are measured for each observation.During the N2R9 technical campaign, the rms of the residual scatter after pointing offset correction was 1.62 ′′ in azimuth and 1.37 ′′ in elevation.We conservatively report rms pointing errors <3 ′′ .

Skydip
A skydip scan consists of a step-by-step sky scan along a large range of elevations.Unlike heterodyne receivers for which skydips can be conducted continuously slewing the telescope in elevation, the NIKA2 camera cannot resort to such method, as the KIDs need to be retuned for a given air mass.A NIKA2 skydip, which is quoted skydip, comprises 11 steps in the elevation range from 19 to 65 • , regularly spaced in air mass.For each step, we acquire about twenty seconds of data to ensure a precise measurements.The KIDs are tuned at the beginning of each constant elevation sub-scan (hence once per air mass).
We highlight that skydips are not used for the scan-to-scan atmospheric calibration.For this purpose, the KIDs are used as total power detectors to estimate the emission of the atmosphere and hence, the atmospheric opacity, as discussed in Sect.7. NIKA2 skydips therefore serve to calibrate the KID responses with respect to the atmospheric background for atmospheric opacity derivation.Skydips are typically performed every eight hours for a wide spanning of the atmospheric conditions through an observation campaign.

Beam map
A beammap scan is a raster scan in (az, el) coordinates tailored to map a bright compact source, often a planet, with steps of 4.8 ′′ , which are small enough to ensure a half-beam sampling, which gives around 90% fidelity, for each KID.A scan of 13 × 7.8 arcmin 2 is acquired either with the telescope performing a series of continuous slews at fixed elevation or at fixed azimuth.A continuous scanning slew defines a sub-scan.The fixed-elevation scanning has the advantage of suppressing the air-mass variation across a sub-scan, while the fixed-azimuth scanning offers an orthogonal scan direction to the former: the combination of both gives a more accurate determination of the A71, page 5 of 36 A&A 637, A71 (2020) far side lobes.The scan size is optimised to enable maps to be made for all KIDs, even those located at the edges of the array.A larger size in the scanning direction allows for correlated noise subtraction.During sub-scans, the telescope moves at 65 ′′ s −1 .The need to have a high-fidelity sampling of 11 ′′ beams along the scan direction translates into a maximum speed of 97 ′′ s −1 , which ensures 2.7 samples per beam, for our nominal acquisition rate of 23.8 Hz, and is thus met with margins.For the sake of scanning efficiency with the 30-m telescope, the minimal duration for subscans is 10 s.For beammap scans, sub-scans last 12 s, and the entire scan lasts about 25 min.
Beammaps are key observations for the calibration.While a single beammap acquired in stable observing conditions could suffice, beammap scans are performed on a daily basis.More details on these observations are given in Sect. 5 where we describe how to actually exploit them to derive individual KID properties.

Data reduction: from raw data to flux density maps
The raw KID data (I, Q, ∆I, ∆Q) and the telescope sourcetracking data are synchronised by the NIKA2 acquisition system using a clock that gives the absolute astronomical time, which is the telescope pulses per second, to define the NIKA2 raw data.
From the KID raw data, we compute a quantity that is proportional to the KID frequency shift using the RfdIdQ method, as described in Sect.2.4.This quantity, which is hence proportional to the input signal, constitutes the KID time-ordered information (TOI).
We developed a dedicated data reduction pipeline to produce calibrated sky maps from NIKA2 raw data.This pipeline was first developed for the data analysis during the commissioning campaigns and is currently used for scientific data reduction.The calibration and performance assessment relies on this pipeline.A detailed description of this software is to be presented in a companion paper (Ponthieu et al., in prep.), as well as an application to blind source detection.Here, we summarise the main steps of the data reduction.Moreover, we focus the discussion on the treatment of point-like or compact sources, which are used for the performance assessment.

Low level processing
We isolate the relevant fraction of the data for scientific utilisation and we mask KIDs that do not meet the selection criteria, as discussed in Sect.5.3, or timeline accidents (glitches).Specifically, we flag out cosmic rays, which impact only one data sample per hit due to the KID fast time constant (Catalano et al. 2014), and the KIDs for which the noise level exceed 3σ of the average noise level of all other KIDs of the same array.

Pointing reconstruction
We produce a timeline of the pointing positions of each KID with respect to the targeted source position (usually located at the centre of the scan) using two sets of information.Firstly, the control system of the telescope provides us with the absolute pointing of a reference point of the focal point, which is coincident with the reference KID after the pointing correction are applied, as described in Sect.3.2.Secondly, we estimate the offset positions of each KID with respect to the reference KID using a dedicated procedure, referred to as the focal plane reconstruction, as presented in Sect. 5.After this step, we are able to distinguish KIDs that are on-source from those that are off-source, which is a key piece of information for dealing with the correlated noise.

TOI calibration
The KID TOI in units of Hz (frequency shifts) is converted to Jy beam −1 in two steps.Firstly, the KID data are inter-calibrated using the calibration coefficients, also known as relative gains, as discussed in Sect.8.2, and the absolute scale of the flux density is set using the absolute calibration method discussed in Sect.8.1.Secondly, the instantaneous line-of-sight atmospheric attenuation exp[−τ ν x(t)] is corrected using the zenith opacity at the observing frequency ν, τ ν , which is estimated for a given scan as discussed in Sect.7, and the instantaneous air mass x(t).The latter is estimated as (sin el t ) −1 using the observing elevation el t , as obtained using the pointing reconstruction.

Correlated noise subtraction
The TOI of each KID includes a prominent low-frequency component of correlated noise of two different origins: the atmospheric component, which is dominant with some rare exceptions and common to all KIDs, and the electronic noise, which is common to the KIDs connected to a same electronic readout feed-line (see Sect. 2.3).The subtraction of this correlated noise is a key step of the data processing, as correlated noise residuals are an important limiting factor of the sensitivity.We devised several dedicated methods for this purpose.This is to be thoroughly discussed in Ponthieu et al. (in prep.), while here we illustrate the general principle and only discuss the method routinely used for the calibration.
As illustrated on the upper panel of Fig. 2, the low-frequency noise component is seen by all KIDs at the same time, while the astrophysical signal (Uranus in this case) is shifted from one KID to another.In this figure, the KID TOIs have been rescaled to be null at t = 0.A simple average of the KID TOIs provides an estimate of the low-frequency noise component, which we referred to as a common mode, while the signal is averaged out.The common mode shown as a red line on the lower panel of Fig. 2 is then subtracted to each KID TOI.
For the calibration and performance assessment, we used an atmospheric and electronic noise de-correlation method named Most Correlated Pixels, which comprises two additional technicalities with respect to the common mode method.Firstly, the signal contamination of the common mode estimate is mitigated by discarding on-source KID data samples before averaging the rescaled TOI.This is achieved by deriving a mask per TOI from the pointing information (Sect.4.2), which is zero if the KID is close to the source, and equal to unity otherwise.In the case of a point source, the mask consists of a disk of a minimum radius of 60 ′′ centred on the source, whereas for diffuse emission, tailored masks driven by the source morphology are built using iterative methods, like for example, in Ruppin et al. (2017).Secondly, instead of a single common mode subtraction for all KIDs, we estimate an accurate common mode for each KID.Calculating the KID-to-KID crosscorrelation matrix, we identify the most correlated KIDs.Then, we build an inverse noise weighted co-addition of the timelines of the KIDs that are the most correlated with the KID in question.Furthermore, we tested on simulations that this method does preserve the flux of point-like or moderately extended sources.Bottom: one of these TOIs (in blue) and the common mode that is subtracted from it (in red).The zero level is arbitrary.
Regarding diffuse emission, the noise de-correlation induces a filtering effect at large angular scales that must be corrected for to fully recover the large scale signal.A method to correct for the spatial filtering, which relies on the evaluation of the data processing transfer function using simulations, is described in Adam et al. (2015).The data processing transfer function depends on the morphological properties of the concerned extended source.An example of the data processing transfer function for NIKA2 observation towards a galaxy cluster is given in Ruppin et al. (2018), evidencing a prominent filtering at angular scales larger than the 6.5 ′ diameter FOV.
In Fig. 3, we present the noise power spectra of a typical KID TOI, both before any data reduction and using three noise decorrelation methods, which are the simple common mode (CM) method used in Fig. 2, a method based on a principal component analysis (PCA), and the Most Correlated Pixels method (MCP).We observe that after de-correlation, the 1/ f -like noise in the power spectra (principally due to atmospheric emission drifts) is significantly reduced, leading to nearly flat spectra down to 0.5 Hz, with lower 1/ f -like residual noise for the PCA and the Fig. 3. Data noise power spectra for the three NIKA2 arrays (A1, A3, and A2 from top to bottom).The power spectra are given for the raw data (blue), and for noise de-correlated data using the common mode (labelled CM, green), the PCA (red) and the Most Correlated Pixels (labelled MCP, cyan) methods.
Most Correlated Pixels methods than the common mode decorrelation at low frequencies.The two former methods further subtract a substantial fraction of the correlated noise that originates from the electronic readouts.Moreover, we used simulations to check that the Most Correlated Pixels method was more efficient than the PCA in preserving the astrophysical signal.The former is thus preferred over the latter.

Map projection
We use the pointing information to project the cleaned (lowfrequency noise subtracted) calibrated TOI of all the valid KIDs of an array onto a flux density map (tangential projection).This map M p is produced using an inverse variance noise weighting of all of the data samples that fall into a map pixel as defined using a nearest grid point scheme.We also compute the associated count map H p defined as the number of data samples per map pixels.The map resolution is chosen small enough (typically 2 ′′ per map pixel) to alleviate the need for more refined interpolation scheme.The noise variance σ k for each KID k is A71, page 7 of 36 A&A 637, A71 (2020) evaluated by the standard deviation of the KID TOI far from the source position.The variance map σ 2 p is inhomogeneous and varies as the inverse of H p .Its normalisation is evaluated using the homogeneous background map variance, that is the variance of M p H p calculated far from the source.
To account for the residual correlated noise while evaluating the variance map, we resort to an effective approach.Firstly, we compute the map of the signal-to-noise ratio (S/N) as the ratio of M p and the noise map σ p , that is the square root of the variance map.We observe that the distribution of the S/N map over the pixels far from the source is well-approximated with a Gaussian, but has a width larger than the expected unity.This is due to the remaining correlations between KID TOIs before projection.Then, we multiply the noise map σ p by the required factor so that the width of S/N distribution becomes normalised.This normalising factor ranges from 1.2 to 1.5 depending on the observing conditions.This constitutes an effective approach to account for the off-diagonal terms of the pixel-to-pixel correlation matrix, alleviating the need to accurately measure them.
When several scans of the same source are averaged, we apply an inverse variance weighting as well.The variance map of the sum of scans is also corrected to ensure unity-width S/N distribution.

Observation scan selection
For calibration and performance assessment, we selected scans in average observing conditions by performing mild selection cuts.These scan cuts rely on zenith opacity estimates τ ν in NIKA2 bands, as described in Sect.7, on the elevation and on the observation time of the day.We selected the scans satisfying the following criteria: (i) τ A 3 < 0.5, where τ A 3 is the τ ν estimate for Array 3; (ii) x τ A 3 < 0.7 and el > 20 • , where el is the observing elevation and x the air mass, which depends on the elevation as x = (sin el) −1 .This threshold corresponds to a decrease of the astrophysical signal by a factor of two; (iii) observation time from 22:00 to 9:00 UT and from 10:00 to 15:00 UT, which excludes the sunrise period and the late afternoon.In the following sections, these selection cuts are referred to as the "Baseline scan selection".As discussed in Sect.8.3, the late afternoon observations are often affected by time-variable broadening of the telescope beams caused by (partial) solar irradiation of the primary mirror and/or anomalous atmospheric refraction.Around sunrise, the focus shifts continuously due to the ambient temperature change until the temperature stabilises, so the scans taken from 9:00 to 10:00 UT are likely not to be optimally focused.After the focus stabilisation, the middle of the day period ranging from 10:00 to 15:00 UT offers stable observing conditions provided that the telescope is not pointed too close to the Sun.Otherwise, further scan selection based on the exact sequence of observations and on beam monitoring might be needed before using these observations for performance assessment.In summary, the Baseline scan selection retains 16 h of observations a day and discards observations affected by an atmospheric absorption exceeding 100%.

Focal plane reconstruction
From the operational point-of-view, the KIDs are defined by the resonance frequency and not by their physical position in the focal plane.Therefore, to find the position on the sky, we need to deploy a dedicated method that we refer to as the FOV reconstruction.The FOV reconstruction thus consists of matching the KID frequency tones to positions in the sky and performing a KID selection.Although all of the 2900 KID are responsive, some of them are affected by cross talk or are noisy due to an inaccurate tuning of their frequencies, and must be discarded for further analysis.We use beammaps, which enable an individual map per KID to be constructed to measure the KID positions and relative gains, as discussed in Sect.5.1.The measured KID positions are further checked by matching them with the design positions, as presented in Sect.5.2.In Sect.5.3, we present the final KID selection and FOV geometry, as obtained by repeating the procedure on a series of beammaps.

Reconstruction of the FOV positions and KID properties
In order to be able to produce a map, one needs to associate a pointing direction to any data sample of the system.The telescope provides pointing information for a reference position in the focal plane.This information consists of the absolute azimuth and elevation (az ref t , el ref t ) of the source, together with offsets (∆az ref t , ∆el ref t ) with respect to these, which depend on the scanning strategy.We then need to know the relative pointing offsets of each detector with respect to this reference position.We use beammaps for this purpose (see Sect. 3.4).
We apply a median filter per KID timeline for which the width is set to 31 samples, that is equivalent to about five FWHM at 65 ′′ s −1 and for the sampling frequency of 23.84 Hz.Then, we project one map per KID in Nasmyth coordinates.The median filter efficiently removes most of the low frequency atmospheric and electronic noise, albeit with a slight ringing and flux loss on the source.However, at this stage, we are only interested in the location of the observed source.To derive the Nasmyth coordinates from the provided (az t , el t ) and (∆az t , ∆el t ) coordinates, we build the following quantities at time t: We note that ∆az ref t is already corrected by the cos el ref t factor to have orthonormal coordinates in the tangent plane of the sky and to be immune to the geodesic convergence at the poles.The data timelines are then projected onto (x, y) maps.We fit a 2D elliptical Gaussian on each KID Nasmyth map.The centroid position of this Gaussian provides us with an estimate of the KID pointing offsets with respect to the telescope reference position (az t +∆az t , el t +∆el t ) in Nasmyth (x, y) coordinates (independent of time).
To convert from Nasmyth offsets to (az, el) offsets, we apply the following rotation: where k is a KID index.Adding these offsets to (∆az ref t , ∆el ref t ) gives the absolute pointing of each KID in these coordinates.
Furthermore, the fitted Gaussian per KID further provides us with a first estimate of the KID FWHM, ellipticity and sensitivity.We apply an initial KID selection by removing outliers to the statistics on these parameters.We also manually discard KIDs that show a cross-talk counterpart on their map.We repeat this procedure using the Baseline TOI de-correlation method instead of the median filter.Specifically, we apply the Most Correlated Pixels noise subtraction presented in Sect. 4 to the KID timelines, which are then used to produce maps per KID.This therefore alleviates the flux loss induced by the median filter.This also ensures that the beammaps are treated in the same way as the scientific observation scans are.Finally, a second iteration of the KID selection is performed.This analysis is repeated on all beammaps to obtain statistics and precision on each KID parameter, together with estimates on KID performance stability, as discussed in the next sections.

FOV grid distortion
We compare the reconstructed KID positions in the FOV to their design positions in the array.We fit the 2D field translation and rotation that make it possible to match the measured KID positions with the design positions using a 2D polynomial mapping function.We find that a match can be obtained using a 2D polynomial function of degree one, which corresponds to a linear transformation and a rotation only.We call distortion crossterms between the two spatial coordinates in the polynomial fit.
The aim of this study is twofold.Firstly, we obtain a detailed characterisation of the real geometry of NIKA2 focal plane.Secondly, this analysis is also used for KID selection.The most deviant KIDs, of which the measured position deviates by more than 4 ′′ from the design position, are discarded.
We present the global results of the grid distortion analysis using the KID positions given by the focal plane geometry procedure, as described in Sect.5.1, applied to a beammap scan acquired during the first scientific campaign (also known as N2R12).The initial number of KIDs considered in this analysis results from an initial KID selection, which consists of discarding the KIDs that are the most impacted by the cross-talk effect or the tuning failures, applied in the FOV geometry obtained from the beammap scan, as discussed in Sect.5.1.The results are gathered in Table 2.
Most of the selected KIDs are also well-placed, meaning fewer than 4 ′′ away from their expected position.Moreover, on average, the position of each detector is known to about one arcsecond.We find that Array 1 has some of the most deviant detectors (more than 4 ′′ from their expected position).These detectors are excluded from further analysis.The two 1 mm arrays have almost the same centre, but this centre differs by 7 ′′ , and 2 ′′ in the two Nasmyth coordinates, respectively, from the 2 mm array centre.This has no significant impact on the pointing and the focus settings at the precision of which they are measured.The centre-to-centre distance between contiguous detectors, referred to as grid step, has been estimated in mm and arcseconds.The ratio of the grid step in mm to the grid step in arcseconds gives compatible effective focal lengths of about 42.4 ± 0.3 m at both observing wavelengths.The sampling is above λ/D at 1 mm, assuming a 27-m effective diameter aperture.We note that the rotation angle between the array and the Nasmyth coordinates was designed as 76.2 • , fewer than two degrees away from what is observed.
These results were compared to expectations obtained using ZEMAX simulation.We generated a grid diagram for the NIKA2 optical system and found a maximum grid distortion of 2.7% in the 6.5 ′ FOV.We notice that the strongest distortion appears in the upper-right corner of the Nasmyth plane, which is also the area of the largest defocus in relation to the centre (see Appendix B).An expected distortion of 2.7% is at most a 5 ′′ shift from the centre to the outside of the array.The quoted distortions between the measured and design positions are well within the expected maximum distortions from the NIKA2 optics.  a)  866 808 488 Well-placed KID (b)  864 808 488 Deviation (c) [ ′′ ] 1.01 0.95 0.75 Distortion (d) [ ′′ ] 1.09 1.01 0.84 Array centre (e) [ ′′ ] (1.9, −5.1) (2.3, −6.2) (9.6, −7.8) 4.9 4.9 4.9 Rotation angle (g)  Notes.Example of mapping of the observed KID positions in the sky to their mechanically designed positions.The initial table of selected KIDs is given by the focal plane geometry procedure, as described in Sect.5.1, and applied to a beammap scan acquired during the N2R12 campaign.More than 90% of the detectors are within less than 5 ′′ of their expected position. (a) Initial number of KIDs selected in a FOV geometry using a beammap scan of the N2R12 campaign. (b) Number of KIDs for which the best-fit sky position is less than 4 ′′ away from the expected position. (c) Median angular offset in arcseconds between the expected and measured sky position of the KIDs. (d) Average bestfit cross term of the polynomial model across the FOV given in arcseconds. (e) Array centre in Nasmyth coordinates. (f ) Averaged scaling between measured KID position grid and the designed one. (g) Rotation from the design to Nasmyth coordinates given in degrees. (h) Centreto-centre distance between neighbour detectors. (i) Centre-to-centre distance between neighbour detectors using the reference frequencies (260 GHz and 150 GHz) and a 27-m entrance pupil diameter (see Sect. 2.2). ( j) Centre-to-centre distance between neighbour detectors modelled using ZEMAX simulation.

KID selection and average geometry
In order to identify the most stable KIDs, we compare the KID parameters obtained using the FOV reconstruction procedure, as described in Sect.5.1, with several beammaps.In the following, we show the results as obtained using ten beammaps acquired during two technical observation campaigns in 2017.For each KID, we compute the average position on the focal plane and the average FWHM.As discussed in Sect.5.1, we perform a KID selection while analysing beammaps.A few KIDs have very close resonance frequencies and can be misidentified on some scans.A few others must also be discarded because they appear numerically identical due to the fact that the same (noisy) KID can sometimes be associated with two different frequency tones in the acquisition system.These KIDs are flagged out (less than 1% of the design KIDs).We count how many times a KID has been kept in the KID selection per beammap and has been found at a position in agreement with its median position within 4 ′′ .Using this, we define the valid KIDs as the KIDs that met the selection criteria in about 20% of the FOV geometries (in two beammap analyses out of ten).In Fig. 4, we show the average focal plane reconstruction.The colours, from blue to red, represent the number of times that the KID has been retained after KID selection.The eight feed lines for each of the two 1 mm arrays can also be traced out in several ways in this figure.Firstly, slightly larger spaces are seen between KID rows connected to different feed lines than between KID rows of the same feed line.Secondly, KIDs at the end of a feed line are valid less often than the others (see e.g. the FOV of Array 3).As the tone frequencies increase with the A&A 637, A71 (2020) Fig. 4. Average detector positions for arrays A1, A3, and A2.The three plots show the detectors that met the selection quality criteria for at least two beammaps during two technical campaigns.These consist of 952, 961, and 553 detectors for A1, A3, and A2, respectively.The colour indicates how many times a KID was identified as valid on a beammap, ranging from blue for the KIDs valid in at least two beammaps, to red for the KIDs valid in all (ten) beammaps.The inner and outer dash-line circles correspond to circular regions of 5.5 arcmin and 6.5 arcmin, respectively.Units used are arcseconds.position of the KID in the feed line, some KIDs are sometimes missing because their frequency lays above the maximum tone frequency authorised by the readout electronics.This explains the linear holes in the middle of the 1 mm arrays.For A1, this end-of-feed-line effect is mixed with the effect of the KID gain variation across the FOV, which mainly affects the lower-left third of the array, as discussed in Sect.8.2.For A1, A3, and A2, respectively, we found 952, 961, and 553 valid KIDs (selected at least twice).From this, we deduce the fraction of valid detectors over the design ones, as given in Table 3.
The valid KIDs represent the KIDs that are usable to produce a scientifically exploitable map of the flux density using observations for which no sizeable tuning issues are experienced.
To give an idea of the dependence of the KID selection on the observing conditions, we also evaluate a number of very stable KIDs, such as the KIDs that met the selection criteria in at least 50% of the FOV geometries (at least five times out of ten).We found 840, 868, and 508 KIDs using this definition for A1, A3, and A2, respectively.The fraction of very stable KIDs over the designed ones is reported in Table 4.
For the practical production of flux density maps, we perform a further selection of the valid KIDs using a conservative noise level threshold of the KIDs at the low-level processing, as discussed in Sect. 4. The number of used KIDs for producing scientific maps using the data reduction pipeline, as described in Sect.4, is thus significantly lower than the number of valid KIDs.We evaluate the median number of used KIDs using all scans for each of the observation campaigns.The median fractions of used KIDs with respect to the design ones for each campaign and for the combinations of all scans are given in Table 4.We find median fractions of used KIDs of about 70% for the 1 mm arrays, and of about 80% for the 2 mm array, with a notable exception at the N2R9 technical campaign, for which these frac- Notes.The row "very stable KIDs" gives the fraction of KIDs that have been selected in 50% of the analysed beammap scans, while the rows "Used KIDs" gather the median fraction of used KIDs in the data reduction processing after the conservative KID selection was performed (see Sect. 4).
tions are lower due to severe atmospheric temperature-induced unstable observing conditions (see Sect. 8.3).Moreover, the median fractions of used KIDs are close to the fractions of very stable KIDs from the FOV geometries.We stress that these numbers depend on the choices made in the data reduction pipeline for data sample cuts, and are thus subject to improvement.By contrast, the fractions of valid KIDs η constitute conservative estimates of the stable KIDs usable for scientific exploitation over all the functioning KIDs.These are thus the relevant estimates for the instrument performance assessment.

Beam pattern
The NIKA2 full-beam pattern originates from the KIDs illuminating the internal and external optics, out to the IRAM 30-m telescope primary mirror.To characterise the full-beam pattern, we use beammap observations.Firstly, deep integration maps of bright sources are produced to provide a qualitative description of the complex beam structure in Sect.6.1.Then, we model the beam using three complementary methods to estimate the mainbeam angular resolution (Sect.6.2) and the main-beam efficiency (Sect.6.3).

Full-beam pattern
To study the 2D pattern of the beam, we primarily used a map obtained from a combination of beammap observations of strong A71, page 10 of 36 point sources acquired during the N2R9 commissioning campaign.Namely, we used beammap scans of Uranus, Neptune, and the bright quasar 3C84.Furthermore, we checked the stability of our results on single scan maps, combinations of scans for a single source, and combinations of shallower scans that span a large range of scanning direction.The beammap scans were reduced using the method discussed in Sect. 4 to produce maps.Figure 5 shows the 2D beam pattern as measured with NIKA2 using the former beammap combination, for each of the arrays and for the 1 mm array combination.The beam pattern is shown over a large dynamic range down to about −40 dB and out to radii of about 5 ′ .The telescope beam pattern further extends well beyond this radius, as for example, shown by lunar edge observations at the IRAM 30-m telescope (Greve et al. 1998;Kramer et al. 2013).However, this extended pattern is at present difficult to detect using the data reduction pipeline discussed in Sect.4, as the extended error beams are both filtered and mixed with atmospheric and electronics large-scale correlated noise residuals.We expect the filtering effect caused by the data processing to become non-negligible for angular scales larger than 90 ′′ , which corresponds to the radial size of the mask used in the correlated noise subtraction process (see Sect. 4.4).The contributions to the beam pattern that stem from larger angular scales are further discussed in Sect.6.3.
The NIKA2 beam maps reveal some noticeable features, which are shown in Fig. 5. Ranging from strong and/or extended to weak and/or spiky, they include the following points: 1.The main beam and the underlying first error beam, which is caused by large-scale deformations of the primary mirror, and the first side lobes, which correspond to various diffraction patterns.In particular, the 60 ′′ and 85 ′′ diameter (squareshaped) side lobes at 1 and 2 mm, respectively, at a level lower than −20 dB, are due to the convolution of the primary mirror and the quadrupod diffraction pattern with the pixel (KID) transfer function.2. At a much lower level of about −30 dB, a diffraction ring shows up, which is presumably caused by panel buckling of the primary mirror (Greve et al. 2010), as shown with a red circle in the A1 panel.3. Also at a level of about −30 dB, the side lobes shown via yellow diagonal lines in the A2 panel are caused by diffraction on the quadrupod holding the secondary mirror of the telescope, as is expected from ZEMAX simulations.4. Spikes for which the origin is not fully understood are marked by yellow arrows.The ones that are along the vertical and horizontal axes are reproduced by ZEMAX simulations but at a shallower level, whereas the ones shown in the A3 panel in the diagonal directions may be due to the small cylindrical instrumentation box on the side of the M2 cabin.
The origin of the asymmetry on the 1 mm arrays is unknown, but most probably due to internal optics aberrations.5. Shallow spikes of unknown origin at a level under −30 dB that are circled by pink ellipses.The multiple images on the combined deep beam map indicate a rotation of these spikes with the observing elevation, which in turn point to diffraction-related issues or a ghost image formed inside the cryostat.These shallow features are expected to have no significant impact on NIKA2's scientific results.We further quantify the respective level of the axi-symmetrical features of the beam pattern by evaluating the beam radial profile B ν (r), which is the normalised radial brightness profile for the array ν, where r is the radius from the beam centre.Although the profile cannot represent the sub-dominant non-axisymmetrical features, which are seen in Fig. 5 (quadrupod diffraction pattern, spikes), it provides a useful representation of the internal and central parts of the beam up to radius 180 ′′ .We determine a beam profile from a beam map in centring to the fitted value of the main-beam centre and in computing the weighted average of the map pixels in annular rings.
We checked the stability of the beam against various observing conditions (source intensity, weather conditions, focus optimisation) by comparing the beam profiles of a series of 18 beammap observations.This set of beammaps, which is referred to as BM18, was selected from all the available beammap scans at optimal focus using the Baseline scan selection criteria (Sect.4.6).The measured beam profiles using the BM18 data set are shown in Fig. 6.The profiles consist of the main beam and the first error beams and side lobes, which significantly contribute at levels of less than −10 dB at both observing wavelengths.Moreover, at 1 mm, the contribution of the diffraction ring, which was marked with a red circle in Fig. 5, is seen as a peak at a level up to −33 dB located at a radius of about 115 ′′ .Calculating the rms of the relative difference of the beam A71, page 11 of 36 A&A 637, A71 (2020) Fig. 6.Beam radial profiles given in decibels.The data points are the beam profiles for a series of 18 beammap scans acquired during the reference observational campaigns, labelled from the scan ID.The black line shows the main-beam profile using the "combined" FWHM, as given in Table 6, while the pink line shows the median best-fit three-Gaussian profile, as defined in Eq. (3).
profiles to the median beam profile, we find a dispersion below 5% at 1 mm, and below 2% at 2 mm.
To measure the relative level of the axi-symmetrical beam pattern features, we further model the beam profiles using an empirical function, which accounts for the main beam and for a significant fraction of the error beams and the side lobes.We define this function B 3G (r) as: where A i is the amplitude of the Gaussian G i for i ∈ 1, 2, 3, and B 0 is a pedestal accounting for the residual background level in the map.The measured beam profiles are fitted using Eq. ( 3), and the median best-fit parameters are given in Table 5.The errors are evaluated as the standard deviation of the best-fit parameter values of the 18 beammap scans of BM18.These values are given to gain insight into the beam profile, but they are not used for the calibration procedure.We find the level of the first error beam with respect to the total beam at about −11 dB and −13 dB at 1 and 2 mm, respectively.For illustration, we show in Fig. 6 the median three-Gaussian profiles at 1 and 2 mm (pink lines), and the main-beam profiles (black lines).
Table 5. Median best-fitting values of the parameters of the 3-Gaussian beam profile, as defined in Eq. ( 3), using the BM18 data set.

Main beam
The angular resolution of NIKA2 is characterised using the FWHM of a Gaussian fitted to the main beam.This principal Gaussian encloses most of the measured flux density of a pointlike source.

Main-beam characterisation methods
To characterise the main beam and to derive an estimate of its FWHM, we developed three methods.The first two methods, named Prof-3G and Prof-1G, rely on a fit of the beam profile to benefit from the signal-to-noise ratio increase after azimuthally averaging the signal.The last one by contrast, consists of an elliptical Gaussian fit of the beam map for a better 2D modelling, and is labelled Map-1G.They are presented in more detail below.
The Prof-3G method consists of fitting the beam profile using the three-Gaussian function defined in Eq. ( 3).The mainbeam FWHM estimate is given by the best-fitting value of the FWHM for the first Gaussian function.This main-beam FWHM estimate is expected to be immune to the first error beams and side lobes, which are well accounted for.For consistency checks, we also rely on two other methods that use simpler beam models.
The Prof-1G method relies on fitting a single Gaussian to the beam profile after masking the portion of the profile where the contributions of the side lobes and error beams are the largest.Specifically, the side lobe mask is designed to cut out the radius ranging from an inner radius r in = 0.65 FWHM 0 , where FWHM 0 is the reference Gaussian beam FWHM (see Sect. 8.1.1)to an outer radius r out = 80 ′′ , centred on the beam maximum.The profile is estimated up to a radius of 180 ′′ , which is in the inner part of the beam map where the noise variance is uniform.
The Map-1G method consists of modelling the 2D distribution of the main beam using a 2D elliptical Gaussian of size σ x and σ y .We characterise the NIKA2 main beam using FWHM = 2 2 ln 2 σ x σ y . (4) As with Prof-1G, we use masked versions of the beam map to avoid side-lobe and error-beam contaminations.While r out is conservatively set to be 100 ′′ , r in is left free to vary around a central value of about 8 ′′ for A1 and A3, and of about 12 ′′ for A2 to provide the best 2D Gaussian fit.

Data sets for the main-beam determination
We selected a sub-set of the chosen beammap scans described in Sect.6.1 by discarding scans of Mars.Indeed, beammaps towards Mars unveil the complex full-beam pattern, which extends beyond radii of 100 ′′ , so that the annulus side-lobe mask used in Prof-1G and Map-1G is not sufficient to mitigate the error beams and side-lobe effects.The 12 remaining beammap scans were analysed using the data reduction pipeline of Sect. 4 and projected onto maps with a resolution of 1 ′′ and an angular size of 10 ′ .This data set is referred to as BM12.
We also considered series of shorter integration scans.We selected 5 ′ × 8 ′ raster scans of moderately bright to very bright point sources by thresholding the flux-density estimates at 1 Jy at both wavelengths.After the Baseline scan selection, as described in Sect.4.6, the data set comprises 154 scans towards the giant planets Uranus and Neptune, the secondary calibrator MWC349 and the quasars 3C84, 3C273, 3C345, and 3C454 (also known as 2251+158).For this series of short scans, which is referred to as R154, the data were reduced using the Baseline data analysis pipeline and projected onto 2 ′′ resolution maps.
Finally, we used a series of 75 observation scans of Uranus and Neptune, which includes both beammap and 5 ′ × 8 ′ raster scans.This data set, which is referred to as UN75, consists of all the scans of Uranus and Neptune acquired during the Reference observation campaigns (N2R9, N2R12, and N2R14).

Results
We derived the main-beam FWHM for the three arrays individually, and for the 1 mm array combination, using the three methods presented in Sect.6.2.1, and the data sets of Sect.6.2.2.Namely, our main-beam FWHM estimates consist of (i) the median FWHM estimate using Prof-3G on the BM12 dataset, (ii) the average FWHM estimate using Prof-1G on the UN75 data set, and the Map-1G average FWHM using either BM12 or R154.By comparing these results, we tested the stability of the FWHM estimates against the choices of the data set and of the estimation method.
In the case of Uranus, the FWHM estimates were further corrected for the average beam broadening induced by the extension of the apparent disc of the planet, which is 0.19 ′′ and 0.12 ′′ at 1 and 2 mm, respectively.During the observation period, Uranus' disc diameter varied from 3.3 ′′ to 3.7 ′′ .This diameter variation translates into beam broadening variations of an amplitude of a few tenths of arcseconds, which are neglected.
The results of this analysis are gathered in Table 6, including uncertainties evaluated as the rms dispersion of single-scanbased FWHM estimates.Prof-1G and Map-1G results are in agreement within uncertainties, whereas Prof-3G yields slightly smaller FWHM.The latter is more robust against the error beams and large radii beam features than the formers, as was expected.Combined results were obtained from an error-weighted average of the four FWHM estimates for each array.Because the rms errors estimated using the 12 beammap scans may be opti- mistic considering the small statistic, they were conservatively increased to match the uncertainty estimates based on the R154 data set before performing the weighted average.The combined results, as given in Table 6, provide a robust evaluation of the FWHM.Thus, we report main-beam FWHMs of 11.1 ′′ ± 0.2 ′′ at 1 mm, and 17.6 ′′ ± 0.1 ′′ at 2 mm.

Stability checks
Figure 7 shows the main-beam FWHM estimates using Map-1G as a function of atmospheric transmission, which is modelled as exp (−τ ν x).The main-beam FWHM estimates using data of the three campaigns are in agreement within rms errors.Moreover, the main-beam FWHM is stable against atmospheric conditions at both wavelengths.Slightly lower than average values (about 11 ′′ ) are observed in the best atmospheric conditions at 1 mm, providing us with a lower limit in the absence of correlated atmospheric noise residuals.We note three scans acquired during the N2R12 campaign with larger FWHM than average at 2 mm, although the atmospheric transmission was excellent: this is likely an effect of atmospheric instabilities, which affected a large number of observation scans during N2R12.

Main-beam efficiency
We derive the main-beam efficiency for each array, which is defined as the ratio of the solid angle sustained by the main beam to the total beam solid angle.To estimate the total beam solid angle, we primarily used the maps of the beam pattern presented in Sect.6.1, which provide us with an accurate representation of the main beam and of the first error beams.However, at A71, page 13 of 36 A&A 637, A71 (2020) angular scales larger than the radius of the noise de-correlation mask, 90 ′′ , the filtering effect induced by the data processing becomes non-negligible (see also Sect.4).Furthermore, heterodyne observations at the IRAM 30-m telescope towards the lunar edge, and estimates of the forward beam efficiency using skydips, show that a significant fraction of the full beam stems from angular scales much larger than 90 ′′ (Greve et al. 1998;Kramer et al. 2013).Thus, the accurate assessment of the total beam solid angle requires us to account for the filtering effect and the large angular-scale contributions.To that aim, we resort to a hybrid approach.We utilise both the full-beam pattern measurements with NIKA2 as presented in Sect.6.1, and the results of the IRAM 30-m telescope beam pattern characterisation using EMIR front-end observations (Carter et al. 2012), as reported in Kramer et al. (2013).These studies gave two different outcomes.
Firstly, the main beam and error beams were measured using observations towards the limbs of the Moon.As EMIR detectors are coupled to the 30-m telescope entrance pupil via corrugated horns, the contribution of the first error beam is significantly attenuated compared to the NIKA2 case.In fact, the first error beam measured by NIKA2 was not detected with EMIR, while the second error beam was measured with both NIKA2 and EMIR at compatible levels.The third and fourth error beams, which originate from the IRAM 30-m telescope frame misalignment and panel deformations, respectively, are expected to be measured by EMIR without attenuation (Kramer et al. 2013).We assume these latter are characteristics of the 30-m telescope without any significant dependencies on the receiver instrument.Secondly, the IRAM 30-m telescope forward efficiencies F eff were derived using skydip scans with EMIR.Their measurements allow us to estimate the far side-lobe efficiencies, which receive two different contributions.The forward spillover and scattering efficiencies are estimated as F eff subtracted from the main-beam and error-beam efficiencies, and the rearward spillover and scattering efficiencies equal 1 − F eff (Kramer et al. 2013).
To sum up, we estimate the solid angle of the total beam as where Bν (r) is the normalised beam profile B ν (r) for the array ν, as discussed in Sect.6.1, after rescaling with a factor of (1−A (3) ν −A (4) ν ).A (3) ν and A (4) ν are the amplitudes of the third and fourth Gaussian error beams G (3)  ν and G (4) ν , respectively, which were measured with EMIR and extrapolated for the array ν following the prescription given in Kramer et al. (2013).Ω FSL (ν) is the contribution of the far side lobes to the total beam solid angle of the array ν, as derived from F eff measurements.
In addition, for cross-checks and stability tests, we also compute the total beam solid angle up to a maximum radius where r max is taken as the largest radius for which the filtering effect of the data processing has no significant impact.We take r max as the radius of the noise de-correlation mask, which is of 90 ′′ , and evaluate Ω 90 from the measured beam profiles obtained using the UN75 data set (see Sect. 6.2.2). Results per observational campaigns, with uncertainties evaluated as the rms scatter on the average, are given in Table 7.The Ω 90 estimates are statistically compatible from one campaign to another, with Notes.For each case, the number of scans is given in the column labelled "nbs".some variations of the average value due to the dispersion in the focus settings and atmospheric conditions that prevail during each campaign.The combined results, based on the whole UN75 data set, are the inverse-variance weighted average of the Ω 90 for the three campaigns, while the error bars are conservatively estimated as the maximum half difference between the Ω 90 estimates.
The main-beam solid angle Ω mb is evaluated from the main beam FWHM, as Ω mb = 2π σ 2 mb , where FWHM = 2 √ 2 ln 2 σ mb .The main-beam efficiency, is evaluated using both the UN75 and the BM12 data sets, as presented in Sect.6.2.2.For cross-checks, we compare the results based on three estimates of the total-beam and main-beam solid angles: (1) BE1 relies on the best-fitting parameters of the three-Gaussian model of the full beam, as given in Eq. (3), to derive both the main-beam solid angle and the first two error-beam contributions to the total-beam solid angle.The main-beam solid angle thus corresponds to the volume enclosed by the first Gaussian, as obtained using Prof-3G, while the normalised beam profile in Eq. ( 5) is the normalised best-fitting B 3G (r); (2) BE2 consists of using the normalised beam profile measurements from the UN75 data set to estimate Ω tot , while Ω mb is derived with the FWHM obtained using Prof-1G (see Sect. 6.2.1); (3) BE3 is similar to BE2, but relies on the BM12 data set, while the main-beam FWHM is derived using Map-1G.For all methods, the contributions of the third and fourth error beams and of the far side lobes, which feature in Eq. ( 5), are the same.
The main-beam efficiency estimates using the three methods are gathered in Table 8: central values and error bars are evaluated as the median and the rms error of the estimates on individual observation scans, respectively.We combined the results of the three methods, which are in agreement with each other, using an inverse variance-weighted average and a quadratic mean of the rms uncertainties.The main-beam efficiency uncertainties, as given in Table 8, also include uncertainties on the third and fourth error-beam contributions and on the 30-m telescope forward efficiencies, as reported in Greve et al. (1998), Kramer et al. (2013).Using the combined results, we report main-beam efficiencies of 47 ± 3% at 1 mm, and 64 ± 3% at 2 mm.
Finally, we compute the total beam solid angle and mainbeam efficiency using the combination of the three previously described methods, in three cases that successively include larger angular scale contributions to the full beam: Ω 90 and BE 90 are the total beam solid angle and main-beam efficiency integrated up to 90 ′′ , which include the main beam and the two lower angularscale error beams, as measured in NIKA2 beam maps; Ω hyb and BE hyb further account for the third and fourth error beams, as derived from EMIR front-end measurements; Ω tot and BE are the final estimates that comprise all beam contributions.The results are given in Table 9 for the 1 mm and 2 mm channels.These give insight into the relative importance of the contributions at various angular scales.For example, 18% and 13% of the total beam solid angle stem mainly from the two largest error beams integrated at angular scales larger than 90 ′′ , and 9% and 7% originate from the far side lobes at 1 and 2 mm, respectively.

Atmospheric opacity
The atmospheric opacity constitutes the ultimate limitation of ground-based experiments.Only a fraction of the source signal is transmitted by the atmosphere and reaches NIKA2 detectors.The relation between the observed flux density S ν and the topof-the-atmosphere flux density S ν is parametrised by the zenith opacity τ ν and the air mass x as An accurate derivation of the atmospheric opacity for each scan is of utmost importance to retrieving the source signal and thus to ensuring small calibration uncertainties.We developed three atmospheric opacity derivation methods, which are described in Sect.7.1.In Sect.7.2, we present robustness tests.

Atmospheric opacity estimation
We developed three procedures for the atmospheric opacity derivation: (i) taumeter relies on measurements provided by the resident IRAM taumeter operated at 225 GHz; (ii) skydip consists of using NIKA2 as a total-power taumeter by resorting to a series of skydip scans; (iii) corrected skydip is a modified version of the skydip method that minimises the dependence of the estimated flux density on the opacity.None of the methods (i) rely on an ATM model nor on any hypothesis on the atmospheric contents for the sake of robustness or (ii) use the laboratory measurements of the bandpass (see Sect. 2.5) for more accuracy.

The taumeter method
The IRAM 30-m telescope facility is equipped with a resident taumeter operated at 225 GHz.Every four minutes, it performs elevation scans at fixed azimuth to monitor the atmospheric opacity.The IRAM science support team provides us with timestamped zenith opacities at 225 GHz τ 225 , as derived from the taumeter measurements.Two different τ 225 estimates are available: one relying on a linear model, and the other on an exponential fitting model.They are then filtered by removing outliers and by using a threshold on goodness-of-fit criteria.Based on IRAM experience, we use the linear fit and filtered τ 225 data for the NIKA2 analysis.The time-stamped τ 225 estimates, which are thus sampled about every four minutes, are interpolated to the time of the NIKA2 scans (we consider the time of the middle of the scan).To cross-check, we also produce a smooth version of time-stamped τ 225 data by filtering with a running median of seven samples, which is then interpolated to the NIKA2 scan times.
We fit the relations between the IRAM 225 GHz taumeter opacities and NIKA2 band pass opacities using observation of calibration sources which spans a large range of air masses.This method has the advantages of not relying on an atmospheric model nor on the bandpass measurements in the laboratory.We use a series of 64 scans of MWC349, which consists of the Baseline selected sub-set of scans from the 68 available scans for this source during N2R9.It constitutes a homogeneous data set in flux density, but heterogeneous in atmospheric conditions: zenith opacities at 225 GHz range from 0.08 to 0.32, and elevations from 23 to 73 • , spanning a large range of air mass as required.NIKA2 opacities τ ν , for ν corresponding to the observing frequency of Arrays 1, 2, 3, and the combination of Arrays 1 and 3, are estimated from the 225 GHz taumeter median-filtered linear-based opacity estimates τ 225 as The parameters a 225 ν and b 225 ν are fitted to the data set so that the source flux density is constant within scans.We tested two estimators of the flux stability.The first one relies on minimising the standard deviation of the measured-tomedian flux densities ratio after atmospheric opacity correction using Eq. ( 10).The second one is obtained by minimising where σ is the rms uncertainty of the flux density estimates.Note that these estimators do not depend on the absolute scale of the flux density of the source.Both estimators yield consistent results that are combined and gathered in We highlight that the atmospheric opacity correction using taumeter opacities yields stable flux density measurements with respect to the atmospheric transmission by construction, as we verify in Sect.9.This means that the best-fitting parameters of the model in Eq. ( 9) also correct for any line-of-sight opacitydependent systematic effects on the flux densities, as are further discussed in Sects.7.1.4and 7.2.
Because the IRAM taumeter observes at a fixed azimuth, the taumeter opacities are not the line-of-sight opacities for the observation scans.As a result, as is checked in Sect.9, it induces larger rms errors of the top-of-the-atmosphere flux density estimates compared to opacity correction methods that rely on NIKA2 skydip-based measurements.The taumeter method can thus be used as an alternative method in case of failure of the NIKA2 skydip-based methods, and to perform consistency checks.

NIKA2 skydip-based method
The NIKA2 skydip method for the opacity derivation consists of using the NIKA2 instrument as an in-band total-power taumeter.The opacity integrated in the NIKA2 bandpasses and in the line-of-sight of the observing scan is thereby directly obtained.This idea, which was successfully tested with NIKA (Catalano et al. 2014), relies on the fact that the resonance frequency of each KID varies linearly with the total power, as discussed in Sect.2.4.Firstly, we have to calibrate the relationship between total power and opacity.Then, we can use this calibration to measure the opacity during a given scan.
Firstly, we detail the opacity calibration.For each KID k, the absolute value of its resonance frequency f k reso varies with the atmospheric load according to where c k 0 is a constant equal to the KID resonance frequency at zero opacity, c k 1 is a calibration conversion factor in Hz K −1 , and T atm is the temperature of the atmosphere.By assuming a homogeneous plane-parallel atmosphere, the air mass x is defined from the elevation as x = (sin el) −1 .The Earth sphericity can be safely neglected at the elevations in question here.We take T atm as a constant equal to 270 K.However, the opacity is expected to slightly depend on the atmospheric temperature.For example, in poor weather conditions (6 mm of water vapour contents), the zenith opacity in both observing bands can vary by about 10% for temperature variations of 10 K.The effect of the temperature variations on the final calibration of NIKA2 response to the sky load is mitigated by using several dedicated observation scans regularly distributed all along an observation campaign.
The c k 0 and c k 1 are determined using skydip scans, which consist of moving the telescope in elevation step by step, as defined in Sect.3.3.For each KID k, the evolution of f k reso is ] in Kelvins, where τ ν is the skydip zenith opacity found in the fit.The vertical axis shows the relative resonance frequency of the KID with respect to 1.9 GHz, given in MHz.
The blue line is the linear model using the best-fit c k 0 and c k 1 coefficients (see Eq. ( 12)).monitored as a function of the air mass in each elevation step to perform a joint fit of the zenith opacity τ ν and the c k 0 and c k 1 coefficients.All skydips, obtained under various opacity conditions, are analysed together to break the degeneracy between the opacity and the Hertz-to-Kelvin conversion factor c k 1 .The degeneracy occurs mostly for low-opacity conditions for which we can only determine the combination c k 1 τ ν x.The procedure has two steps.Firstly, all the skydip scans are analysed individually to extract f k reso for each stable elevation and for each KID.Secondly, a simultaneous fit is done for all parameters (one τ ν per skydip, and a set of c k 0 and c k 1 for all KIDs).Figure 8 illustrates the fitting procedure.
This fit is performed on a block of 40 KIDs.We check that the resulting τ ν from the different blocks are consistent within rms errors, which are equal to about 4×10 −3 at 1 mm, and 1×10 −3 at 2 mm.Once the τ ν values are estimated for each skydip (as the average over the blocks), we compute, while fixing the τ ν , the c k 0 and c k 1 final values for each KID k with a linear fit.We thus retrieve the coefficients of all the KIDs, even though some of them could not contribute to the τ ν determination.
We have observed that the c k 0 and c k 1 coefficients vary between observational campaigns due to a change in the KID properties from one cool-down to another.However, by comparing the results of different skydips, we verified that the coefficients c k 0 and c k 1 are stable, within the fitting errors, over very long time scales within a cool-down cycle.The coefficients can thus be applied to the whole observing campaign for the opacity derivation.Specifically, the opacity is retrieved for each observation scan by inverting Eq. ( 12) as where the median is evaluated using all the valid KIDs of the concerned arrays.Therefore, we are able to derive an opacity integrated in the NIKA2 bandpasses and in the line of sight of the source in the considered observation scan.

Skydip scan selection
The skydip opacity derivation requires a sizeable amount of skydip scans -typically ten to twenty -that (i) span the whole opacity range, and (ii) avoid highly perturbed atmosphere to meet the plane-parallel atmosphere assumption.To that aim, we perform a skydip scan twice a day during a scientific campaign.Then, the (c k 0 , c k 1 ) determination process relies on a selection of the skydip scans.
For each skydip scan and for each KID, we compute the difference between the measured KID resonance frequency and the model given in Eq. ( 12) taken at the best-fit values of the (c k 0 , c k 1 ) parameters, which is named d f k reso .Then, we determine two indicators of the fit quality per skydip.Firstly, for each block of 40 KIDs, the standard deviation of d f k reso is calculated over all the KIDs of the block.This standard deviation per KID block is called σ 40 .For each skydip, we evaluate the median rms, which is the median σ 40 over all the KID blocks.
This fit quality indicator is also sensitive to the noise level during the skydip.We therefore devise a second fit quality indicator to further measure the bias between the data and the best-fit model.Namely, for each skydip, we compute the average d f k reso of each KID k and convert this quantity from Hertz to Kelvin using the corresponding c k 1 parameter.This crosscalibration allows us to compare the d f k reso estimates from different KIDs.Median dT is the median of the average d f k reso in Kelvin over all the KIDs of an array.With these two indicators in hand, we discard the skydip scans that are noisy or that yield a poor fit by applying the following selection criteria: -Median rms < 1.5 × 10 4 Hz, -Median dT < 1.6 K.The threshold values were determined using the set of 44 skydip scans of N2R9.The Median rms cut corresponds to twice the median of this quantity per skydip scan, whereas the Median dT cut is twice the standard deviation of Median dT over the skydip scans.The N2R9 skydip scan selection is illustrated in Fig. 9, which shows the complementarity between the two fit-quality criteria.After selection, 15 skydips are kept for the final step of the (c k 0 , c k 1 ) fit in the case of the N2R9 campaign.
The (c k 0 , c k 1 ) estimation proceeds in two steps: firstly, the parameters are estimated using all the available skydip scans for a given campaign, and then the estimation is repeated using only skydip scans that met the fit-quality criteria.After the second iteration, we check that no extra skydip scan outliers are left, as shown by the "v2" label data points in Fig. 9.The stability of the (c k 0 , c k 1 ) parameters, and hence the skydip opacity estimates, were tested against the choice of the selection criteria.We found that the τ ν estimates are robust against the skydip scan selection as long as the selection includes good skydip scans in high opacity condition (τ 1 mm > 0.44), and as the poor fitting skydip scans, which mostly correspond to deviation from the atmosphere plane-parallel model in high opacity skydip scans (as seen in Fig. 9), are excluded.

Corrected skydip
The opacity estimates are ultimately tested by assessing the stability of the top-of-the-atmosphere flux densities of bright sources for a large range of atmospheric conditions, as is addressed in Sect.9.After opacity correction using the skydip τ ν estimate, the calibration flux density measurements show a residual dependency on the atmospheric transmission, as discussed in Sect.9.This has motivated the development of a Fig. 9. Median dT quality-fit criterion is plotted as a function of the median rms criterion for each skydip scan of the N2R9 campaign and for the three arrays.The skydips that yield a poor fit of the KID resonance frequencies, and hence do not met Median dT criterion, are also discarded using Median rms.The latter criterion further discards noisy skydips.Empty diamonds show the results of the first iteration of the skydip coefficient estimation, labelled "v1", whereas filled circles show the second iteration, labelled "v2", for which only the skydips that met both fit-quality criteria are included.After the second iteration, all the remaining skydips met the criteria.corrected version of the skydip method that ensures the robustness of the flux densities against atmospheric conditions.
As already noted in Sect.7.1.2,for small values of the lineof-sight opacity τ ν x, only the parameter combination c k 1 T atm τ ν is constrained in Eq. ( 12).Degeneracies between c k 1 parameters, the atmospheric temperature, and τ ν can translate into a scaling factor in the fitted skydip opacities τ skydip ν .To take this effect into account, we used the flux stability estimators described in Sect.7.1.1 to fit a correction to τ skydip ν such as We find a skydip ν of 1.36 ± 0.04, 1.23 ± 0.02, 1.27 ± 0.03, and 1.03 ± 0.03 for A1, A3, A1 and A3, and A2, respectively.Moreover, we tested for an additional offset in the correcting relation of the skydip opacities given in Eq. ( 14).We found best-fitting correcting factors in agreement with the best-fit values estimated using the single-parameter correcting relation, whereas the bestfit offsets were compatible with zero at both wavelengths.We conclude that correcting the skydip opacity estimates for a normalisation as given in Eq. ( 14) is sufficient to ensure flux density robustness against atmospheric opacity conditions.
The exact physical origin for the discrepancy of the empirical factor a skydip ν from the expected unity value for the 1 mm wavebands is currently under investigation.Besides the c k 1 -toτ ν degeneracy and the effect of the variation of the atmospheric temperature, other explanations include effects that are not directly related to the atmospheric opacity derivation, as the empirical factor is measured on the flux densities.However, the stability of the flux densities corrected using the corrected skydip opacities and the consistency of the results using several observation campaigns, as discussed in Sect.9, constitutes a validation of this approach.

Opacity estimate consistency checks
Firstly, we tested the stability of the skydip opacities from one observation campaign to another.The panels a in Fig. 10  to τ 225 correlation relations are consistent within statistical errors for the three campaigns.At 1 mm, they are also in agreement with the ATM model expectations, while at 2 mm, the ATM model underestimates the measured skydip τ ν .Possible explanations include a ground pick-up effect, which would have comparatively more impact on the opacity derivation at 2 mm, where the atmosphere is more transparent, than at 1 mm.A fraction of the beam indeed detects radiation from the ground, as evidenced by forward efficiency measurements using heterodyne front ends (see Sect. 6.3).This mild discrepancy with the ATM model predictions is yet to be understood, but has no impact on our opacity measurements, which do not rely on this model nor on the precision with which the observing bandpasses are known.Further consistency test with ATM expectations will be performed in the future using in-situ bandpass measurements with a dedicated Martin-Pupplett spectrometer.
We further checked the robustness of τ skydip ν against the observing elevation.Panel b in Fig. 10 shows the ratio of NIKA2 skydip opacities to the 225 GHz taumeter opacities as a function of the average scan elevation.The skydip opacity measurements have no significant dependency on the elevation.Moreover, we observe that the results are consistent for Uranus beammap scans and for the shorter raster scans of the secondary calibrator MWC39, indicating that the skydip opacity estimates do not depend on the type of observation scans.
As a summary, NIKA2 skydip opacity estimates (i) have reproducible correlation coefficients with the 225 GHz taumeter opacities from one campaign to another, (ii) are robust against the observing conditions, and (iii) are stable for various sources and scanning strategies.In addition to these properties, the corrected skydip opacities further ensure flux density measurements that are immune to the atmospheric and observing elevation conditions.We use them in the Baseline calibration.

Calibration
In this section, we present the absolute calibration of the flux densities.We use Uranus as the main primary calibrator.Section 8.1 describes the absolute calibration method, Sect.8.2 presents the inter-calibration of all the KIDs and the flat fields.While gathering several observations of calibrators, we evidenced a daily variation of the absolute calibration coefficients related to daily variation of the beam size induced by weather temperature.If left uncorrected, it leads to a sizeable increase of the calibration uncertainties.To overcome this issue, we primarily flag the most impacted observation times of the day and exclude them from further analysis.We discuss this effect in Sect.8.3.In Sect.8.4, the Baseline calibration procedure is summarised, and stability tests are performed.

Absolute calibration procedure and photometric system
Here, we detail the procedure for calibrating the absolute scale of the flux density and the chosen photometric system.

Photometric system
The main primary calibrators of NIKA2 are the giant planets Uranus and Neptune.The latter is used when the former is not visible in the most stable observing conditions.The flux density expectations of the primary calibrators are derived in Appendix A.

mm 2 mm
Reference frequency, ν 0 260 GHz 150 GHz Reference FWHM, FWHM 0 12.5 ′′ 18.5 ′′ arbitrarily to be: ν 0 = 150 GHz for the 2-mm array, and ν 0 = 260 GHz for both 1-mm arrays.After projecting the raw data (in units of the KID resonance frequency shift, Hz) of a calibrator c on the sky, we model the calibrator raw map as a fixed-width 2D Gaussian where A c is the amplitude of the Gaussian in Hz, and σ 0 is derived from the reference FWHM, labelled FWHM 0 , which is 12.5 ′′ for the 1-mm arrays and 18.5 ′′ for the 2 mm array.These values have been chosen larger than the main-beam values, as reported in Sect.6, to account for a fraction of the signal stemming from the first error beam and first side lobes.Both the reference frequency, ν 0 , and the reference FWHM, FWHM 0 , define our reference photometric system, as summarised in Table 11.
The absolute calibration coefficients are estimated from observations of primary calibrators c, as the ratio of the calibrator flux density expectations at the reference frequency S c (ν 0 ) and A c .Then, for any observed point-like source s of projected map R s (θ, φ) in Hz, the map is calibrated in Jy/beam.The best-fit amplitude estimate of the fixed-width FWHM 0 Gaussian on this map directly gives an estimate of the flux density of the source at the reference frequency S (ν 0 ), excluding colour corrections.

Colour correction
The flux density estimate S (ν 0 ) gives the flux of the source at the reference frequency only if the source has the same spectral behaviour as the calibrator.In general, to retrieve the flux of the source at the reference frequency, a colour correction C s has to be applied: which depends on the reference frequency ν 0 , the source SED I s ν , and the NIKA2 bandpasses.Neglecting the effect of the atmosphere on the NIKA2 transmission, we compute the colourcorrection factor for target sources of SED that are different from Uranus using where T (ν) is the NIKA2 transmission (Sect.2.5).Assuming Rayleigh-Jeans SED for the calibrator (I c ν ∝ (ν/ν 0 ) α c ), the source (I s ν ∝ (ν/ν 0 ) α s ), and a spectral index α c = 1.6 for Uranus, we provide colour-correction factors for eight values of the spectral index of the source α s in Table 12.

Extended sources
While point-source studies are usually performed with maps given in Jy beam −1 unit, diffuse-emission studies or aperture photometry ones require map expressed in Jy sr −1 .To that aim, we need to take into account the full-beam pattern, as discussed in Sect.6.3.Firstly, we correct, with the solid angle enclosed in the reference fixed-width Gaussian beam, Ω 0 = 2πσ 2 0 , to obtain a map homogeneous with Jy sr −1 .Then, to account for the signal in the total beam pattern, we further correct with the reference beam efficiency BE 0 .As it is defined as the ratio of Ω 0 and the total beam solid angle Ω tot , BE 0 represents the fraction of the full-beam solid angle that is enclosed in the solid angle sustained by the reference beam.To summarise, the map in Jy sr −1 M(θ, φ) relates to the map in Jy beam −1 M s (θ, φ) using In Table 13, we provide the estimates of the reference beam efficiency, which are derived from the measured Ω tot as given in Sect.6.3.

Calibration procedure
In practice, the calibration procedure is performed in two steps.Firstly, S c (ν 0 )-to-A c ratios per detector G k are estimated for each KID k using the map per KID projected from a beammap scan of a calibrator.The calibration coefficient G k for the KID k is computed at zero atmospheric opacity as: where S c (ν 0 ) is the expected flux density of the calibrator at the reference frequency ν 0 (see Appendix A.1), τ ν x is the line-ofsight opacity measured using the corrected skydip method (see Sect. 7), and A k is the best-fit amplitude of the reference FWHM 0 Gaussian, which is fitted in the KID map.This first step accounts for both the relative calibration between KIDs and the absolute calibration using a single calibrator scan.Secondly, the absolute calibration is further refined by evaluating a flux density rescaling factor using a series of observations of Uranus or Neptune.After the first step of the calibration is performed, the KID TOI is projected into a calibrated map M ν , as described in Sect.4, where ν stands for the three arrays and the 1-mm array combination, and the atmospheric attenuation is corrected.For each of the calibrator observation scans, we compute the ratio between the expected calibrator flux density S c (ν 0 ) and the measured calibrator flux density in M ν , which is estimated as described in Sect.8.1.1.The flux density rescaling factor per array is the average expected-to-measured flux-density ratio over all the selected calibrator scans.
The primary calibrator scans are first selected as discussed in Sect.4.6.Then, in addition to the Baseline scan selection cuts, we use a Gaussian beam size criterion.The FWHM estimated from the planet observation map is required to be lower than 12.5 ′′ at 1 mm, and lower than 18 ′′ at 2 mm.In further mitigating the flux scatter due to beam broadening, we ensure better accuracy of the absolute calibration.

Relative calibration and flat fields
While absolute calibration of each KID also de facto provides relative calibration, the latter is interesting in itself to characterise the instrument.We focus on this aspect in this section.

A1 A3 A1 and A3 A2
FWHM 0 [arcsec] 12.5 12.5 12.5 18.5 BE 0 [% ] 60 The dispersion of the detector responsivity across the field of view (also known as flat fields) has been characterised in the following ways: Main-beam flat fields.These represent the focal plane distribution of the calibration coefficients per KID.They describe the focal plane distribution of the point spread function (PSF) in the far field of the telescope.The calibration coefficients G k are estimated using Eq. ( 20), as discussed in Sect.8.1.4.Forward-beam flat fields.These are the focal plane distributions of the relative response of each KID to the near field atmospheric background.They are estimated using the correlation factor of each detector TOI to a median common mode estimated offsource (see Sect. 4.4 for more details on common modes).
Figure 11 shows the average main-beam flat field for the three arrays.These were constructed by combining the normalised flat fields of five beammaps acquired during two technical observation campaigns.These data were selected by thresholding the line-of-sight opacity measured in the 1 mm band to τ ν x ≤ 0.85.The distributions for the average flat fields are shown in the bottom panel of Fig. 11.
We observe a significant variation of the flat fields for A1 from the left-most side to the right-most side of the FOV.This reveals a significant change of A1 detector responsivity depending on their position in the focal plane.Namely, this effect mainly impacts the left-most third of the array, which is referred to as the "shadow zone".This variation of the flat field translates into a broadening of the distribution shown in the lower panel of Fig. 11.However, we verified that A1's flat-field dispersions are in line with those of A3 after the detectors within the shadow zone were flagged out using a crescent-shaped mask.The masked flat-field distributions are shown in green in Fig. 11, whereas shadow-zone distributions are in red.The same FOV patterning is also observed in the forward beam flat fields, which excludes a main-beam-related issue.
The shadow-zone effect is caused by misbehaviour of the dichroic in the polarised transmission which is out of specifications.As a result, the 1 mm polarisation that illuminates A1 is attenuated.This effect, which implies a dependence of the frequency cut-off on the radiation incidence angle and linear polarisation, was reproduced using optical simulations.Furthermore, this hypothesis was verified using observations at the technical campaign of September 2018.During this test campaign, a new hot-pressed dichroic had been installed in place of the current air-gap dichroic.The shadow-zone variations of the flat field for A1 were not observed during the September 2018 campaign, while huge distortions across the field of view of A2 were reported.These distortions result from the bending of the hot-pressed dichroic at low temperatures.This test confirmed that the shadow-zone effect was due to incoming radiation absorption by the current air-gap dichroic.Further efforts are currently being conducted to the design of a dichroic that combines robustness against bending induced by low temperatures, and optimal transmission.The air-gap dichroic, which is immune to lowtemperature-induced deformation, has been re-installed at the end of the September 2018 run coming back to the instrumental set-up discussed in this paper.We found that, as expected, the performance of the instrument after this intervention was consistent with the one reported in this paper.

The temperature-induced variation effect
We evidenced a daily variation of the flux density estimates that correlates to the measured beam size.This beam broadening happens mostly in afternoons and around sunrise and sunset.It is also reproducible from one campaign to another.It most certainly comes from the combination of two different effects.
Firstly, inhomogeneous solar illumination leads to largescale deformations of the 30-m primary mirror, which in turn leads to variable de-focussing of the telescope.To mitigate this effect, the telescope is equipped with an active thermal ventilation system of the primary mirror and an active temperature control of the secondary support legs.This system is, however, challenged when the Sun partially illuminates the telescope.This is a known effect that also impacts observations with the heterodyne instruments operated at the IRAM 30-m telescope.It had also been already observed with the previous generation total-power instruments MAMBO-2 (Kreysa et al. 1999).However, the magnitude of this effect is likely to have increased with the slow disappearance of the surface painting of the primary mirror in the past few years.
Secondly, on short time scales, atmospheric anomalous refraction also often plays a role.As far as the 30-m telescope is concerned, it was first described in Altenhoff et al. (1987).Based on experience with the heterodyne receivers, afternoon hours are often affected by an unstable atmosphere when rising moist air moves through the beam of the telescope and causes random refraction.The pointing offsets are then observed to change by some few arcseconds within a short amount of time of typically some few seconds, resulting in an average enlargement of the beam pattern.This effect has been confirmed by measuring several-arcsecond displacements of the source position when projected using different subsets of sub-scans of a single observation, as described in Appendix C.2.We found that the apparent beam broadening during the afternoon results from anomalous refraction for between one third and one half of the scans over the period studied here.As both effects (the primary mirror deformations and the anomalous refraction) are due to ambient temperature variations, we refer to them as temperature-induced variation effects in the following text.
The temperature-induced variation of the beam size as a function of the UT hours is shown in Fig. 12 for the three  20), normalised by the average calibration coefficient over all the KIDs of the array.Bottom plots: average flat-field distributions using all KIDs (blue), using Array-1 KIDs that are positioned out of the shadow zone (green) and using Array-1 KIDs inside the shadow zone, which is defined in the text.reference campaigns using bright sources.These were selected by thresholding the flux density estimates above 1 Jy at both wavelengths.The beam size was estimated by fitting a 2D elliptical Gaussian to the map and computing the geometrical FWHM using Eq. ( 4).For resolved planets such as Uranus, the FWHM estimates were corrected for the beam broadening caused by the finite extension of the apparent disc of the planet, as in Sect.6.2.We observe the same evolution of the FWHM for all campaigns.This goes from a plateau at a median value of 11.3 ′′ at 1 mm, and 17.5 ′′ at 2 mm during the night, to a smooth rise that reaches a maximum of about 14 ′′ at 1 mm, and 18.5 ′′ at 2 mm around 16:00 UT hours.The beam broadening becomes large around 15:00 UT and returns to the plateau only around 22:00 UT.To mitigate the impact of the temperature-induced variation, observation scans acquired during this time interval must be discarded.The UT ranges that are discarded using the Baseline scan selection (see Sect. 4.6) are shown as cross-hatched areas in Fig. 12.They consist of the afternoon period between 15:00 and 22:00 UT, as well as the period from 9:00 UT to 10:00 UT while the Sun rises.While the global trend of the beam variations is the same for all campaigns, we observe some variability in the amplitude of the effect over the campaigns.This supports the assumption of the important role of the partial illumination of the primary mirror by direct Sun light in the temperature-induced effect.This in turn induces a variability of the amplitude of the effect depending on the angle between the telescope bore-sight and the Sun all along the observations.
The same beam size variations in time are observed using scans of giant planets (Uranus and Neptune) or other bright sources (mainly quasars).However, planets have slightly larger FWHM estimates than quasars, because of the larger contribu-tion of the error beams to the fitted 2D Gaussian, as their fluxes are measured with a higher signal-to-noise ratio.

Baseline calibration
To assess NIKA2 performance, we rely on a baseline calibration that utilises the following steps: (i) the calibration in FWHM 0 Gaussian beam as detailed in Sect.8.1 is implemented, (ii) the effect of the temperature-induced variation of the beam size is mitigated using the Baseline scan selection described in Sect.4.6, and (iii) the atmospheric attenuation is corrected using the corrected skydip opacity estimation described in Sect.7.1.4.
The Baseline calibration is validated below by checking the stability of Uranus flux density estimates against the beam size (Sect.8.4.1) and against the atmospheric transmission (Sect.8.4.2).The Baseline calibration results are compared on alternative calibration methods using other opacity correction (Sect.8.4.3).

Flux stability against the beam size
We present the Uranus measured-to-predicted flux-density ratio as a function of the 2D Gaussian FWHM estimates and colourcoded from the observation times given in UT hours in Fig. 13.As expected from the discussion in Sect.8.3, the largest FWHMs are measured on scans acquired in late afternoon, especially from 16:00 UT to 21:00 UT.
The flux density estimates were calibrated beforehand, so that the flux-density ratios would be equal to unity in average by construction.We observe no significant dependence of the A71, page 21 of 36 A&A 637, A71 (2020) Fig. 12. Beam size monitoring using OTF raster scans.Geometrical FWHM at 1 mm (top panel) and 2 mm (bottom panel) as a function of the observation time in UT hours are shown using scans of giant planets (filled circles) and bright point-like sources with a flux density higher than 1 Jy (filled stars) for the three Reference observation campaigns (N2R9, N2R12, and N2R14).The cross-hatched areas correspond to the observing time periods that are discarded using the Baseline scan selection, as described in Sect.4.6.selected scan flux ratios (shown as filled circles in Fig. 13) on the beam FWHM.This gives a first indication of the efficiency of the Baseline scan selection to mitigate the temperature-induced beam variation effect.The flux stability against the beam FWHM is further assessed in Sect.9.

Flux stability against the atmospheric transmission
We test the stability of Uranus flux densities calibrated using the Baseline method against the atmospheric transmission.The latter depends on the measured zenith opacity τ ν and the average air mass x over a scan as exp (−τ ν x).In the first row of Fig. 14, Uranus' flux ratio is shown as a function of the atmospheric transmission for the 1 mm array combination and Array 2 and for the three Reference observation campaigns (N2R9, N2R12, and N2R14).We observe no significant correlation of the flux ratio with the atmospheric transmission, which gives a first indication of the robustness of the flux-density estimates against the atmospheric conditions using the Baseline calibration.This is further tested using a larger number of scans towards other sources in Sect.9.

Comparison with other opacity correction methods
As a cross-check, we derived the absolute calibration factors using the taumeter (Sect.7.1.1)and the skydip (Sect.7.1.2)atmospheric-opacity correction methods.We then compare Uranus' flux density estimates after absolute calibration using the Baseline calibration and these two alternative corrections.Figure 14 shows the Uranus measured-to-modelled flux ratio as Fig. 13.Uranus flux-density ratio vs beam size using the Baseline calibration.The ratio of Uranus measured-to-expected-flux densities as a function of the measured 2D Gaussian beam FWHM is shown for the 1 mm array combination (top panel) and for Array 2 (bottom panel) after absolute calibration using the Baseline method.These plots include all Uranus scans acquired during the N2R9, N2R12, and N2R14 campaigns and whose beam FWHMs are below the threshold indicated by the vertical red lines (open circles), as well as the scans that met the Baseline selection criteria (filled circles).a function of the atmospheric transmission for A1 and A3 and for A2 after the taumeter correction (second row) and after the skydip correction (third row).We observe more dispersion for the taumeter-based flux ratio, whereas the skydip-based ratio is very similar to the Baseline ratio, except for a slight decrease in flux at low atmospheric transmission.Thus, the taumeter and skydip atmospheric-opacity correction methods can be used for the absolute calibration in complement to corrected skydip, for example, to perform robustness tests as described in Sect.9.

Photometry and stability assessment
Photometric capabilities of NIKA2 after the calibration presented in Sect.8, are assessed in this section.Firstly, we use observations of secondary calibrators (planetary nebulae NGC7027, CRL2688, and MWC349A) to test the consistency of the flux-density estimates with expectations.The flux-density expectations in NIKA2 bands for these calibrators are given in Appendix A.2.Then, we verify the stability of the photometry with respect to the atmospheric conditions using a large number of observations towards a large variety of point-like sources.
The quality criteria used to assess the photometric capabilities and calibration results are defined in Sect.9.1.In Sect.9.2, these criteria are evaluated for the Baseline calibration, and in Sect.9.3, we compare these results with other calibrationmethod results before summarising the main results in Sect.9.4.

Calibration accuracy and uncertainty assessment
We assess the photometric performance by evaluating two quality criteria: firstly, the calibration bias checks the accuracy of the absolute calibration, and then the point-source rms calibration uncertainties test the stability of the flux densities.The stability of the full-beam pattern at large angular scales, which impacts the stability of the diffuse emission flux density, is not tested here.Finally, we review the systematic uncertainties on the flux density.

Calibration bias
We define the calibration bias b ν , where ν stands for Array 1, 2, 3 and the 1 mm array combination as the ratio between the measured flux density Ŝ ν using the Reference photometry (Sect.8.1.1)and the flux density expectations S s (ν 0 ) as given in Appendix A.2. From a series of secondary calibrator scans, we evaluate the average calibration bias b ν , which, by construction, should be equal to unity within uncertainties.Moreover, we check the stability of the calibration bias against the observed opacities as a robustness test of the opacity derivation method.Likewise, we verify that the photometry is insensitive to optical variations by checking the stability of the calibration bias against the measured beam size.

Point-source rms calibration uncertainties
We evaluate the standard deviation of bright point-like source measured-to-median flux-density ratios σ ν per array or array combination ν.As the flux density of most of the considered sources is a priori unknown, we compare the flux-density estimate in a single observation scan to the average flux density throughout an observation campaign.This method requires the selection of sources that are bright enough to be detected with a high signal-to-noise ratio with a single repetition of an usual 8 ′ × 5 ′ OTF raster scan.Namely, we perform a source selection by setting a threshold on the flux estimate to 800 mJy at 1 mm, and 400 mJy at 2 mm.Moreover, we only consider the sources for which a minimum of 10 scans are available after selection to ensure a precise average flux-density estimation.Finally, the selected source scans must meet the Baseline scan selection criteria given in Sect.4.6.
Since the flux-density ratios are not Gaussian-distributed, we evaluate the 68 and 95% confidence level (C.L.) contours using the measured distributions in order to further characterise the uncertainties.We check that the rms errors are larger than the 68% C.L. contours and thus provide conservative 1σ-like errors.
As the rms of the flux-density ratio is estimated on a scan set that is representative of the observing conditions encountered at the 30-m telescope, this is an estimate of the calibration uncertainties that encloses errors of optical, atmospheric, instrumental-noise, and data-processing origins.This includes the errors caused by the temperature-induced beam variations, the effect of the elevation, the uncertainties of the atmosphericopacity correction using either the skydip or the taumeter method, and the atmospheric-and instrumental-noise residuals after the data reduction (Sect.4).However, as the data set comprises point-like sources, and because flux-density measurements in the reference photometric system are immune to beam variations at large angular scales, we refer to the rms of the fluxdensity ratios as point-source rms calibration uncertainties.
For extended sources and diffuse emission studies, the maps in Jy beam −1 units, where the beam refers to the reference beam, are converted into Jy sr −1 units using Eq. ( 19).Propagating the uncertainty on the total-beam solid angles, the reference beam efficiencies are estimated with a precision of 5% and 3% at 1 and 2 mm, respectively, as reported in Table 13 in Sect.8.1.3.These uncertainties must be further accounted for in the error budget of diffuse emission studies.

Absolute and systematic uncertainties
To account for all uncertainties, we must add the absolute calibration uncertainties and the systematic errors to the pointsource rms calibration uncertainties.The absolute uncertainty is the uncertainty on the primary calibrator flux density expectations.In the case of Uranus, Moreno (1998) and Bendo et al. (2013) report uncertainties of about 5% at both wavelengths.
For the Baseline calibration method, which resorts to the corrected skydip method for the atmospheric opacity correction (see Sect. 7.1.4),the uncertainties on the correcting factor a skydip ν , as defined in Eq. ( 14), must be propagated to the flux uncertainties.These uncertainties, which are referred to as the corrected skydip uncertainties, depend on the line-of-sight atmospheric opacity τ ν x.Precisely, because the corrected skydip opacity correction is consistently used for both the primary calibrator and the target source flux measurement, the corrected skydip uncertainties depend on the difference between the average line-of-sight opacity of the primary calibrator scans and the line-of-sight opacity of the source scan.We evaluate the corrected skydip uncertainties for two different τ ν x values.(1) For the reference IRAM 30-m winter observing conditions, defined as 2 mm of pwv and an elevation of 60 • , the corrected skydip uncertainties are of 0.6% at 1 mm and 0.3% at 2 mm.( 2 of 0.7 at 1 mm and of 0.5 at 2 mm, we find corrected skydip uncertainties of 2% and 1.5% at 1 and 2 mm, respectively.These constitute conservative upper limits on the corrected skydip uncertainties. The uncertainties on NIKA2 bandpass measurements (see Sect. 2.5) propagate into uncertainties on the flux densities after the colour correction using Eq. ( 18).These uncertainties depend on the source SED but are negligible in most of the cases.In particular, for MWC349, we find uncertainties below 0.1% at both wavelengths.Notes.The first sub-panel labelled "Bias" gives the calibration bias b ν and the second sub-panel labelled "rms" the calibration rms error σ ν , as defined in Sect.9.1, using observations during N2R9, N2R12, N2R14, and the combination of the three campaigns.In each sub-panel, the first row indicates the number of acquired scans, while the second row gives the number of selected scans using the Baseline scan selection.

Baseline calibration photometry results
We measure the calibration bias and rms uncertainties, as defined in the previous section (Sect.9.1) using the Baseline calibration method (Sect.8.4).The calibration bias was evaluated using a series of scans of MWC349 acquired during the reference observation campaigns.Namely, we used the 72 scans that met the Baseline selection criteria (see Sect. 4.6) over the 109 available scans for MWC349.The first row of Fig. 15, labelled "baseline", shows the calibration bias b ν for the combination of the 1 mm arrays and Array 2 as a function of the atmospheric transmission exp(−τ ν x).No significant dependency of the calibration bias on the atmospheric transmission is observed.Table 14 gathers the calibration-bias estimates for the three observation campaigns and for all the scans.In the 1 mm band, we find b ν in agreement with unity within the statistical dispersion for the three campaigns, whereas a 5% lack of flux with respect to expectations is observed at 2 mm, which is consistent for the three campaigns.This bias has low significance with respect to the absolute calibration precision of NIKA2 (see Sect. 9.1).This is further investigated using other calibration methods in Sect.9.3.
Figure 16 shows the measured-to-median flux densities evaluated from bright source scans for the combination of Arrays 1 and 3 and Array 2 as a function of the atmospheric transmission and is colour-coded as a function of the observation time.From a total of 487 scans towards flux-selected sources, acquired during N2R9, N2R12, and N2R14, 264 met the Baseline selection criteria and are included in Fig. 16 to test the calibration stability.The calibration uncertainties are estimated using the standard deviation of the flux-density ratios for the three campaigns.Results are gathered in Table 14.Combining all the scans, we find rms uncertainties of 5.5% for A1, 6.0% for A3, 5.7% for the 1 mm band, and 3.0% for A2.Using the flux-ratio distributions, we construct the 68 and 95% C.L. intervals.The 68% C.L. intervals are −6.4% and +3.4% at 1 mm, and −3.8% and +1.5% at 2 mm.Therefore, on average, the 68% C.L. errors are of 4.9% and 2.7% at 1 and 2 mm, respectively.We conclude that the rms errors are conservative estimates of the 68% C.L. errors at both wavelengths.The 95% C.L. contours are −15.8% and +5.9% at 1 mm, and −8.6% and +3.8% at 2 mm.The rms errors and the 95% C.L. interval are shown in Fig. 16 with the inner and outer dashed lines, respectively.
The flux-density ratio is constant within the rms errors along the wide range of tested atmospheric transmission, ranging from 0.5 to 0.9 at 1 mm.However, some scans at atmospheric transmissions of about 0.7 at 1 mm show a mild lack of flux density with respect to the median within the 95% C.L. contours.The scans affected by the lack of flux were all observed either between 12:00 and 14:00 UT or between 8:00 and 9:00 UT, which are close to the threshold of the observation time cuts of the Baseline scan selection (see Sect. 4.6).These scans are likely to be affected by the temperature-induced beam broadening or by the sunrise focus drift, respectively.Furthermore, we find that restricting the used observation time to the ten more stable hours (from 22:00 to 08:00 UT) would result in rms calibration uncertainties of 3.6% at 1 mm, and 1.2% at 2 mm, which constitutes an improvement of about 60% at 1 mm, and 40% at 2 mm of the rms errors.The Baseline scan selection, which (i) retains 16 h of observation time per day, and (ii) results in state-of-the-art rms calibration uncertainties, constitutes a useful trade-off representative of most of the observations with NIKA2.

Comparison with other calibration methods
In this section, the Baseline calibration results are compared to results drawn either using other calibration methods obtained from different opacity corrections (taumeter and skydip as discussed in Sect.8.4.3), or including a photometric correction (PC-demo and PC-point, as described in Appendix C) to mitigate the temperature-induced variation effect (Sect.8.3).To test robustness, we evaluate and compare the photometry quality criteria of Sect.9.1 for the five calibration methods.

Calibration bias
We present the calibration bias as a function of the atmospheric transmission for the five calibration methods in Fig. 15 and report the results in the row labelled "Bias" of Table 15.At 1 mm, all methods lead to flux-density estimates in agreement with expectations within the rms dispersion.However, taumeter flux ratios have more dispersion than the Baseline flux ratios, whereas skydip shows some dependency on the atmospheric transmission, with a 10 to 15% excess of the flux density with respect to expectations at high transmission.This residual systematic effect motivated the development of the corrected skydip method, as discussed in Sect.7.1.4.These features, which are already noticeable from Fig. 15, are confirmed and further discussed later using more observation scans.On the other hand, the calibration methods based on photometric correction (Appendix C) yield an unbiased photometry (calibration bias in agreement with unity within the rms error) while allowing the use of 30% more scans.These results are encouraging for the exploitation of scans acquired during the observing periods impacted by the temperature-induced beam-variation effect (Sect.8.3).
At 2 mm, all methods result in a similar calibration bias of 0.95, with an rms error of 0.05 estimated on the MWC349 scans.This corresponds to a low-significance 5% lack of flux density towards MWC349.To summarise, the calibration bias at 2 mm is stable against (i) a large range of atmospheric conditions, (ii) the observation campaign, (iii) the atmospheric-opacity-correction method, and (iv) the method to treat the temperature-induced beam-variation effect.This 5% lack of flux density is thus Fig. 16.Baseline rms calibration uncertainties.The measured-tomedian flux-density ratio of bright sources is plotted as a function of the atmospheric transmission, colour-coded according to the UT observation time of the scans for the combination of A1 and A3 (top panel) and for A2 (bottom panel).The inner dashed lines from either side of the unity-ratio line show the rms errors, which are less than 6% at 1 mm, and 3% at 2 mm, while the outer dashed lines show the 95% confidence level contours.The lowest flux-ratio data points correspond to some of the scans acquired during the daytime between 8:00 UT and 15:00 UT hours (yellow and red), while the scans acquired during the night between 22:00 UT and 7:00 UT yield data points (dark blue) welldistributed within the rms error with a few outliers.
probably due to uncertainties in the flux density expectations for this source.They may have two different origins.Firstly, the uncertainties regarding the flux expectation, as reported in Appendix A.2, consist of the propagation of the errors on the fitted SED from the Plateau de Bure Interferometre (PdBI) and the Very Large Array (VLA) observations.Systematic uncertainties that may also impact the SED are not included.Secondly, the NIKA2 flux density extrapolation from interferometer data may not be straightforward for MWC349.In particular, NIKA2 flux extrapolation ignores the contamination by strong masers in the radio recombination lines (Gordon 1994), while strong maser emission lines are masked in PdBI observations to measure the continuum.In addition, the resulting continuum shows indications of variability.

Calibration rms uncertainties
The rms calibration uncertainties for the five methods evaluated using flux-selected source scans are gathered in the sub-panel labelled "rms" of Table 15.Compared to the Baseline method, the taumeter method leads to rms errors increased by about 40 and 30% at 1 and 2 mm, respectively.The skydip method shows lower dispersion but a mild correlation with the atmospheric transmission, as discussed in Sect.9.3.1.
In addition, we checked the flux-density ratios for all of the scans with an atmospheric transmission of about 0.7, which A71, page 25 of 36 A&A 637, A71 (2020) was discussed in Sect.9.2, by comparing calibration methods with or without photometric correction.The flux-density ratios are low (within the 95% C.L. interval) for the scans observed within 12:00 and 15:00 UT in the first three methods, as shown in Fig. 16 in the case of the Baseline method, for example.By contrast, they are within the 68% C.L. interval when using a photometric correction.This further validates the hypothesis that the low flux density of these scans is due to a temperature-induced beam effect, as assumed in Sect.9.2.This also constitutes an example of the calibration improvement obtained by using a photometric correction.Moreover, results based on the PC-demo method show that rms calibration uncertainties as low as 3.8 and 2.2% at 1 and 2 mm are within the reach of NIKA2 without any scan-selection based on the observation time.However, we recall this method relies on accurate beam estimates.Using PC-point, which is the practical case, still improves the calibration uncertainties with regard to the Baseline results, but by a factor of about 20% in both bands.Furthermore, the differences between the fluxdensity ratios from PC-demo and PC-point, which are seen, for example, from the corresponding panels of Fig. 15, are likely to be due to the photometric-correction noise when monitoring the beam from pointing scans (see Appendix C.2).We conclude that more control on the beam monitoring is needed before routinely using a calibration based on photometry correction.By contrast, the Baseline method combines good performance with robustness.

Summary
Among the methods that rely on the UT hour-based scan selection to mitigate the effect of beam-size variations, the Baseline method shows the best performance in terms of calibration bias and uncertainties.The methods that rely on a photometric correction show good calibration results, and thus represent a promising lead to further improve the calibration uncertainties.However, their robustness depends on the accuracy of the beam monitoring.The proposed beam monitoring based on pointing scans induces some extra dispersion of the flux densities.A more accurate beam monitoring is feasible but requires the use of dedicated observation scans.From the Baseline method results discussed in Sect.9.2, we found that the measured flux density of MWC349 is in agreement with expectations within 5% for both wavelengths.Moreover, the point-source rms calibration uncertainties are of 5.7% at 1 mm, and of 3% at 2 mm using a series of 264 scans of sources of flux density above 1 Jy.These results demonstrate the excellent accuracy and stability of the NIKA2 point-source photometric capabilities.

Sensitivity
In this section, we derive the on-sky sensitivity of the instrument using a large number of observation scans, including deep integration on faint sources, and we assess the stability of our results against the observing conditions.We evaluate the noise equivalent flux density, which is referred to as noise equivalent flux density (NEFD) hereinafter.To further represent the mapping capabilities, we also derive the mapping speed.We first discuss the definitions and the technical derivation of these quantities from measurements in Sect.10.1.Then, we briefly present several estimation methods that have been considered and the data sets selected in Sect.10.2.Results, together with robustness tests, are reported in Sect.10.3.

NEFD and mapping speed definitions
The NEFD is the 1σ error on the flux density of a point source in one second of integration time, considered at zero atmospheric opacity.At this stage, we consider a map of a point-like source located at its centre and observed with zero atmospheric opacity.The estimation of the flux-density uncertainty σ is described in Sect. 4. Here, we derive the integration time from actual observations.Indeed, it is not simply the duration of a scan, it depends on the KID distribution in the FOV and the scanning strategy.The flux-density map, which has a resolution of ∆r, also comes along with a hit map H p , which counts the number of data samples per pixel (see Sect. 4).From this, we derive the integration time at the centre of the map as where f sam is the sampling frequency and H p centre is the average of the hit map taken in a disc of the radius of one FWHM in order to be immune to shot noise statistics in individual map pixels.
Correcting this quantity from the density of detectors per map pixel at the same time gives the on-source integration time per beam, in other words, the time when the source is actually being observed, by at least one detector: where g is the centre-to-centre distance between adjacent KIDs in the FOV (Sect.5.2), and t beam matches the total duration of the scan if the scanning strategy is designed so that the source is always in the FOV and if the FOV is full of valid KIDs.The NEFD is defined using the previous quantities as and is given in mJy s 1/2 .The mapping speed, M s , is the sky area A scan that can be mapped at a variance level ∆ σ 2 of 1 mJy 2 in an integration time ∆ t of one hour.Noting d FOV = 6.5 arcmin, the FOV diameter, and η the fraction of valid KIDs for an observation (Sect.5.1), the mapping speed is and has units of arcmin 2 mJy −2 h −1 .In real observation conditions, characterised with a given atmospheric opacity τ ν and a given air mass x, correcting the flux density for atmospheric attenuation using Eq. ( 8) increases the NEFD to Using these definitions, the integration time required to reach a target flux-density uncertainty σ obs over an observing area A scan for an airmass x and for an atmospheric opacity τ ν is

NEFD estimation methods and scan selection
We developed several methods for the NEFD estimation.Firstly, we used deep integrations on faint sources.Secondly, we resorted to joint analysis of multiple scans without combining them.

Deep integration method
According to Eqs. ( 23) and ( 25), if a source were observed under stable atmospheric conditions, the flux uncertainty would scale directly like t −1/2 beam .Using long-time integration observation on a source, this relation provides both a way to estimate the NEFD and to check that the noise does integrate down as expected with the integration time.To that aim, we produced a series of maps, as described in Sect.4.5, using an inverse-variance co-addition of an increasing number of observation scans, and performed a photometric analysis on each map according to Sect. 9.However, in practice, in particular for integrations of several hours, observing conditions do change.Since all the scans were not acquired in the same conditions of atmospheric opacity and observing elevation, they do not need the same atmospheric opacity correction, nor do they have the same level of atmospheric-noise residuals after the noise de-correlation, as described in Sect.4.4, and hence, they do not contribute as significantly to the co-addition.In such a case, an effective integration time for the co-addition of n scans is defined as where t i , τ i ν , and x i are the integration time, and the zenith opacity and the air mass of the ith scan of the n-scans co-addition.Generalising Eq. ( 23), the flux-density uncertainties on the coaddition of n scans is σ(n) = NEFD/ √ t eff (n).An estimate of the NEFD can therefore be obtained in fitting σ(n) as a function of the corresponding t eff (n).

Scatter method
For any scan, we derived NEFD τ ν , x .Using Eq. ( 25), the joint analysis of a series of scans acquired with various observing conditions provides an estimate of the NEFD.The scan sample can gather different sources.The selection of the source target for the NEFD derivation was primarily based on the flux density.Noise characterisation may indeed be biased by the signal of a bright source stemming from the most extended error beams and far side lobes.We therefore restricted the analysis to sources with estimated flux below 1 Jy.

Results and robustness tests
Firstly, we tested the stability of the NEFD estimates using the two methods on the same data set.With this goal, during the N2R9 run, we selected HLS J0918+5142, a moderately faint source (Combes et al. 2012), expected to have flux densities of 74.5 mJy at 1 mm, and 15.7 mJy at 2 mm.It was observed for about nine hours in total over three nights using 8 ′ × 5 ′ OTF raster scans of various orientations.
The left panel of Fig. 17 shows the flux-density uncertainties as a function of the integration time.The effective integration time is different between the 1 mm and 2 mm arrays, because they have different KID spacings g in the FOV and different atmospheric opacities (see Eq. ( 27)).Black lines show the fit with the inverse of the square root of the integration time, confirming that the noise integration is well-consistent with the expected scaling law.The observed small variations around the theoretical fit correspond to variations of line-of-sight atmospheric opacity during the integration.These variations were taken into account for evaluating the NEFD, as discussed in the previous section.The right panel of Fig. 17 shows the NEFD τ ν , x per scan, along with best-fit models using Eq. ( 25), from which the NEFD estimates are derived.Results from these analyses, per array and for the combined A1 and A3, are presented in Table 16.
We systematically observe higher NEFD for A1 compared to A3, which is a mainly due to the dichroic-induced "shadowzone effect" that also impacts the flat fields, as discussed in Sect.8.2.The dichroic-induced effect, which also impacts A3, explains part of the NEFD difference of performance between the 1 mm and 2 mm channels, together with the higher atmospheric noise and lower telescope main-beam efficiency at the 1 mm with respect to the 2 mm wavelength.Moreover, the NEFD evolution with sky noise is well-consistent with expectations for each array and each observing wavelength.
As a second robustness test, we checked the stability of the NEFD for three observation campaigns.Figure 18 shows the measured NEFD using the scatter method (see Sect. 10.2) for the sub-Jansky sources acquired at the N2R9, N2R12, and N2R14 A71, page 27 of 36 A&A 637, A71 (2020) Fig. 17.Comparison of NEFD estimates using two methods on observations of HLS J0918+5142.Left panel: evolution of the 1σ flux density uncertainties as a function of the effective integration time t eff , as defined in Eq. ( 27), for A1 (cyan), A3 (dark blue), the combination of A1 and A3 (medium blue), and A2 (red).The solid black lines are the best-fit models using σ(t eff ) = NEFD/ √ t eff .Right panel: NEFD as a function of the measured line-of-sight atmospheric opacity using the same colour code as in the left panel.The solid black lines are the theoretical fits of NEFD τν, x = NEFD e τν x and give the NEFD when extrapolated to τ ν / sin(el) = 0. Notes.Top-of-atmosphere NEFD in mJy s 1/2 for the two methods described in the text, which are the deep integration (labelled "Deep int") and the scatter method ("Scatter"), and using two different data sets, HLS J0918+5142 and all sub-Jy sources acquired during the reference observation campaigns.The results given in the last row are based on more than a thousand scans (202, 481, and 430 scans during N2R9, N2R12, and N2R14, respectively).
campaigns.The NEFD estimates for the three campaigns are in agreement within uncertainties for the whole range of line-ofsight atmospheric opacities that have been tested.The solid lines show the expected dependence with e τ ν x as given in Eq. ( 25).Since the measured NEFD τ, x are not Gaussian-distributed, we derive the NEFD as the median of the NEFD τ, x per scans after correction of the atmospheric attenuation, which provides us with a more robust estimate compared to a fit.The NEFD estimates are given in Table 16.
Combining the data set of N2R9, N2R12, and N2R14 campaigns, more than one thousand observations scans of sub-Jy sources meet the Baseline selection criteria (see Sect. 4.6), providing robust NEFD estimates that are representative of the average NIKA2 performance.The rms uncertainties are evaluated as the rms scatter of the individual NEFD τ ν , x estimates after correction with e τ ν x .These values are then extrapolated using the IRAM 30-m telescope reference Winter observing conditions: 2 mm of pwv and a 60 • elevation.The NEFD estimates, as well as the rms uncertainties, are gathered in Table 17.From these estimates, we also derive the corresponding mapping speeds, which are also given in Table 17.We report mapping speeds Table 17.Median NEFD and rms uncertainties in mJy s 1/2 , as well as the derived mapping speed and mapping speed rms uncertainties in arcmin 2 mJy −2 h −1 , evaluated using the scatter method on all sub-Jy sources of runs N2R9, N2R12, and N2R14, given at pwv = 0 and 90 • elevation (first three rows) and extrapolated at the reference Winter observing conditions at the IRAM 30-m telescope site (last three rows), which are defined as 2 mm pwv and 60 • elevation.

Summary and conclusions
We present NIKA2 performance at the IRAM 30-m telescope and evaluate it with a Baseline calibration method that translates observations and raw data into measured flux densities.In the Baseline calibration photometric system, the flux densities of point-like sources are measured as the amplitude estimates of a fixed-width Gaussian of FWHM of 12.5 ′′ and 18.5 ′′ at the 150 and 260 GHz reference frequencies, respectively.This calibration method relies on three main analysis choices.Firstly, the atmospheric and instrumental correlated noises are corrected using a simple and robust method based on the subtraction of a series of the average temporal signals of the most correlated detectors.The atmospheric opacity is estimated using an improved version of the method proposed in Catalano et al. (2014) and used in Adam et al. (2018), which is based on the use of NIKA2 as an in-band total-power taumeter.Finally, a scan selection based on the observation time is performed and Fig. 18.Comparison of NEFD estimates for three observation campaigns.The measured NEFD using the scatter method is plotted as a function of the line-of-sight atmospheric opacity (τ ν x) for the 1 mm (left) and 2 mm (right) channels.Data points are NEFD estimates in mJy s 1/2 for N2R9 (blue), N2R12 (orange) and N2R14 (chartreuse).We also show the expected NEFD evolution with the line-of-sight atmospheric opacity in the plots as solid curves using the median zenith atmospheric opacity derived from all the scans acquired during a campaign.retains 16 h of observation a day in order to mitigate the effect of beam-size variations caused by anomalous refraction of the atmosphere and partial illumination of the 30-m telescope that mainly impact the afternoon observations.We also consider the pros and cons of alternative methods, which may in the future lead to even better calibration accuracy and stability.
The performance of the NIKA2 camera was assessed using a large number of observations of primary and secondary calibrators and faint sources that were acquired during three observational campaigns over one year.The data set spans the whole range of observing elevation and atmospheric conditions encountered on-site.The main characteristics that define NIKA2 performance are summarised in Table 18.We highlight the main points here: 1.All designed KIDs detect the signal at least in some observation scans.We conservatively retain only the most stable KIDs, which are immune to the cross-talking effect and yield good signal-to-noise measurements.We report valid KID fractions of 84% for the 1 mm channel arrays and 90% for Array 2. The other KIDs, which do not meet the validity criterion, are randomly distributed across the FOV, so that the whole 6.5 ′ FOV is covered.2. The main beam is well-described with a 2D Gaussian of FWHM of 11.1 ′′ for the 1 mm channel arrays, and 17.6 ′′ for Array 2, with uncertainties of 0.2 ′′ for the combination of Arrays 1 and 3, and of 0.1 ′′ for Array 2. Comparing the main-beam fit to the measured full beam, and including large angular-scale contributions from IRAM 30-m telescope heterodyne measurements, we derived the main-beam efficiency.We find main-beam efficiencies of 48 ± 4% at 1 mm and 63±3% at 2 mm.These results show that a significant fraction of the power is received outside the main beam.This underlying, extended, low-level beam pattern shows a complex structure of error beams, rings, spokes, and far side lobes.Using individual maps per KID, Adam et al. (2018) reported an rms dispersion of the main-beam FWHM across the FOV of about 0.6 ′′ at both wavelengths.This is consistent with the measured curvature of the best focus surface across the FOV.We also provide the reference beam efficiencies, which are the fixedwidth Gaussian beam efficiencies that allow us to take into account the power stemming from outside the reference beam, as is needed to study the diffuse emission.3. We evaluated the rms calibration uncertainties using 264 scans of point-like sources of which the flux density is above about one Jy.We find point-source rms calibration uncertainties of about 6% at 1 mm and about 3% at 2 mm, which represent state-of-the-art performance for a ground-based millimetre-wave instrument.The rms calibration uncertainties for the diffuse emission must further include the uncertainties surrounding the reference beam efficiencies, as given in Table 18.The absolute calibration uncertainties are of 5%, and the systematic calibration uncertainties evaluated at the IRAM 30-m reference Winter observing conditions are below 1% in both channels.4. The noise does integrate well as the square root of the integration time.We derived a robust estimate of the NEFD using more than a thousand scans encompassing a large range of observing conditions.We find NEFD at zero atmospheric opacity of 30 ± 3 mJy s 1/2 at 1 mm, and 9 ± 1 mJy s 1/2 at 2 mm.The NEFD estimates demonstrate the high sensitivity of the KID arrays of NIKA2.The instrumental sensitivity at 1 mm is, however, currently mainly limited by the nonoptimal transmission of the air-gap dichroic plate, mostly prominent in one polarisation component (A1), but affecting the other (A3) as well.In addition to the dichroic upgrade, further possible areas of improvements for the 1 mm observation channel are: (1) improve the data processing, and in particular the noise de-correlation methods, (2) increase the bandwidth of the 1 mm arrays (subjected to the improvement of the dichroic) and (3) upgrade the surface of the telescope.5. NIKA2 mapping capabilities are better represented by evaluating the mapping speed, which is defined as the sky area that is covered in one hour of observation with a noise level of 1 mJy.We find mapping speeds at zero atmospheric opacity of 111 and 1388 arcmin 2 mJy −2 h −1 at 1 mm and 2 mm, respectively.The mapping speed of NIKA2 is thus at least an order of magnitude better than the previous generation of IRAM 30-m telescope resident continuum cameras (Catalano et al. 2014;Staguhn et al. 2011;Kreysa et al. 1999).We conclude that NIKA2 has unique capabilities in fast dualband mapping of a large FOV at an angular resolution of the order of ten arcseconds.It is currently available to the whole community and will operate at the IRAM 30-m telescope for at least the next decade.With this unique performance, NIKA2 will provide key observations in the mm-wave domain for addressing a large range of science topics in astrophysics and cosmology.
A71, page 29 of 36 A&A 637, A71 (2020)  (c) Full width at half maximum of the main beam using the combined results of three methods. (d) Ratio between the main beam and the total beam solid angles including large angular-scale error beams and far side lobes. (e) Full width at half maximum of the beam used in our reference photometric system. (f ) Ratio between the reference FWHM beam and the total beam solid angles including large angular-scale error beams and far side lobes. (g) Systematic calibration uncertainties due to the opacity correction using the corrected skydip method estimated at the reference IRAM 30-m Winter observing conditions: 2 mm pwv, 60 • elevation. (h) Effective power law of noise reduction with integration time. (i) NEFD at zero atmospheric opacity. (j) Mapping speed at zero atmospheric opacity.not taking into account the planet shape and orientation with respect to the observer in the computations of its solid angle can lead to errors between 1 and 2%.

A.2. Secondary calibrators
The secondary calibrator MWC349 is a stellar binary system, including the young Be star MWC349A, surrounded by a disc.Its radio continuum emission originates in an ionised bipolar outflow (Tafoya et al. 2004).MWC349A has been monitored with the PdBI and VLA, and shown to be only slightly angularly resolved, making it a point source for the 30-m telescope.We computed the flux densities at the NIKA2 reference frequencies 150 and 260 GHz with S ν = 1.16 ± 0.01 × (ν/100 GHz) 0.60±0.01provided by this monitoring6 .The secondary calibrator CRL2688 is an asymptotic giant branch star.Its radio continuum emission is mostly from circumstellar dust and is somewhat extended (Knapp et al. 1994).Its flux densities at 850 µm and 450 µm have been stable at the 5% level as monitored by SCUBA2 in 2011 and 2012 (Dempsey et al. 2013).We extrapolated these flux densities to 150 and 260 GHz using the power law S s (ν 0 ) ∝ ν α 0 with an index α = 2.44 ± 0.18 derived from the SCUBA2 measurements.
The secondary calibrator NGC 7027 is a young, dusty, carbon-rich planetary nebula with an ionised core.It is extended in the continuum and molecular lines (Bieging et al. 1991), and it is not a point source for the 30-m telescope.Its most recent flux densities are reported at 1100 µm and 2000 µm in Hoare et al. (1992).It was reported to decrease by ∼0.145 percent/yr in the optically thin part of its spectrum above 6 GHz from VLA observations (Zijlstra et al. 2008;Hafez et al. 2008).This makes these flux densities uncertain by 3.6% currently.Its SED from cm wavelengths to optical was also presented in Hafez et al. (2008).Perley & Butler (2013) measured its flux density between 1 and 50 GHz.Moreover, flux density measurements at 90 GHz over 20 years with the 30-m telescope are presented in Kramer et al. (2008).The flux densities were extrapolated to 150 and 260 GHz, and the modelled decrease since 1992 was included.
All these expected flux densities extrapolated from the literature are summarised in Table A.2.
Measured flux densities, however, are determined over the broad bandwidth of each array and so must be colour-corrected to be compared to the expected flux densities of Table A.2.For this purpose, we derived colour corrections for sources with spectral indices α comprised between −2 and +4 in Table 12.Evidently, this effect can be a few percent for MWC349, NGC 7027, and CRL2688.Notes.Uncertainties of flux densities extrapolated at 150 and 260 GHz include contribution of the uncertainty on the spectral index α, which is defined as S ν ∝ ν α .
Assume that we observe the source under stable conditions and denote with a ⋆, the associated amplitude A ⋆ , beam width σ ⋆ and flux estimate Ŝ ⋆ .Using aperture photometry, the energy conservation ensures that one has 2πσ 2 A = 2πσ 2 ⋆ A ⋆ .To retrieve the flux estimate Ŝ ⋆ from the flux density estimate Ŝ , as obtained for any observations, a corrected flux density Ŝ pc can be obtained using where the photometric correction is .1 shows how f (σ) varies with the actual beam width and for two choices of σ ⋆ depending on the flux of the observed source.
The beam size in stable atmospheric conditions σ ⋆ is determined by fitting 2D Gaussian beam on the series of scans of source with varying flux densities that were used for the beam characterisation in Sect.6.2.However, σ ⋆ is not equivalent to the main beam's Gaussian size since the side lobes and first error beams are not taken into account for this beam size monitoring.The σ ⋆ estimates are slightly larger for bright sources due to the contribution of the first side lobes and error beams, which are above the noise level, in the Gaussian fit.

C.2. Monitoring of the temperature-induced beam size variation
The photometric correction thus relies on the measuring of the current beam size σ.The induced uncertainties on the fluxdensity measurements depend on the precision of the beam-size determination.Here we consider two methods to monitor the beam size.
The temperature-induced beam size variation is primarily monitored using 2D Gaussian fit on all the available bright source observations, as presented in Sect.8.3.
For a finer sampling of the observation time, we also consider using the pointing scans for beam monitoring.As discussed in Sect.3.2, the telescope pointing is monitored on a hourly basis during a science-purpose observation using pointing scans.As these scans consist of two sub-scans in azimuth and two others in elevation, of about 10 s of integration time each, they can also be used to make a map of the pointing source.For each campaign, we thus have hundreds of maps of mostly point-like bright sources, which can also be used to monitor the beam size.For this purpose, pointing scans are reduced and projected onto maps of 2 ′′ resolution using the data analysis pipeline described in Sect. 4. Fitting an elliptical 2D Gaussian to this map, we compute the geometrical FWHM.Pointing scans on slightly extended sources, such as NGC 7027, are discarded from the analysis.
For each pointing, we also seek signs of atmospheric anomalous refraction.There are enough KIDs per arrays to make an independent map using only one sub-scan, such as ten seconds of integration time.For each of the four cross sub-scans, we thus estimate the position of the best 2D Gaussian that fits the map.We compute the deviation between each sub-scan-derived position and the best-fit position using the entire scan.An anomalous refraction event is detected when the difference is above 2.5 ′′ for at least one subscan.We find that the temperature-induced beamsize variation effect is due to anomalous refraction for between one third and one half of the scans, as reported in Sect.8.3.
The pointing-based FWHM estimates constitute a timesampling of the beam size during the whole observation campaign.They can serve as estimates of the beam size of any observation scan, in particular towards sources too faint for a direct FWHM fit to be made on the map.However, we expect less accuracy than when using standard OTF raster scans of bright sources due to the shorter integration time and the fact that only the innermost KIDs in the array see the source.To mitigate the dispersion, the time-stamped pointing-based FWHM is filtered with a running median on a 70-min-width time window.Then, the FWHM at the time of the considered scans is interpolated from the median-filtered pointing-based FWHM.

C.3. The two case studies
We perform two case studies that correspond to the two beam monitoring methods discussed above.

C.3.1. Demonstration case
This method, referred to as PC-demo hereinafter, uses a photometric correction based on the beam monitoring with bright source scans.Both the 2D Gaussian FWHM fit and the FWHM 0 photometry are performed on the map of the source.This method thus applies only to point-like sources that are bright enough for an accurate fit of the beam on a single scan.
To capture only the beam size variations driven by the observing conditions (primary mirror deformations, anomalous refraction, elevation), a small correction δ FWHM has to be made to the 2D Gaussian beam FWHM estimate for bright sources.The estimate of the actual Gaussian size σ is σ where the offset δ FWHM is null for faint or moderately bright point sources, and non-zero for bright sources.As for σ ⋆ , the 2D Gaussian fit yields slightly broader FWHM for bright sources (e.g.planets) to accommodate for the side lobes and error beams that are measured with high signal to noise.For Uranus, δ FWHM includes also the beam widening due to Uranus' disc, as discussed in Sect.6.2.3.We measure Uranus δ FWHM by comparing the average FWHM estimates using Uranus and MWC349 scans, we found δ FWHM = 0.4 ′′ at 1 mm, and δ FWHM = 0.25 ′′ at 2 mm, which basically distributes with half due to Uranus finite extension and the other half stemming from the side lobes.

C.3.2. Practical case using pointing scans
This method, named PC-point hereinafter, performs a photometric correction based on the beam monitoring with pointing scans.Unlike PC-demo, this method is usable even for sources fainter than about one Jy but relies on interpolations between beam-width estimates on other scans.For Uranus, this value

Fig. 2 .
Fig. 2. Example of variations of KID TOI.Top: example of 40 KID calibrated TOIs during an observation of Uranus.The low-frequency correlated component (atmospheric and electronic noises) is clearly seen.Bottom: one of these TOIs (in blue) and the common mode that is subtracted from it (in red).The zero level is arbitrary.

Fig. 5 .
Fig. 5. From left to right, beam maps of A1, A3, the combination of the 1 mm arrays (A1 and A3), and the 2 mm array (A2) are shown in decibels.These maps, which consist of the normalised combination of four beammap scans of bright point-like sources, are in horizontal coordinates.They represent a zoom in the inner part of larger maps that cover a sky area which extends over 10 ′ .The solid lines and arrows highlight some noticeable features.Red circle in the A1 map (first panel): diffraction ring seen in 1 mm maps (the spokes are presumably caused by radial and azimuthal panel buckling (Greve et al. 2010); orthogonal yellow lines in the A2 map (rightmost panel): diffraction pattern caused by the quadrupod secondary support structure (prominently seen in A2 map); yellow arrows in the A3 map (second panel): pattern of three spikes seen in 1 mm maps of unknown origin; yellow arrows in A2 map (fourth panel): four symmetrical spikes of the first side lobes; pink ellipses: four spikes seen in A2 maps.

Fig. 7 .
Fig. 7. Main-beam FWHM estimates for the 1 mm (top) and 2 mm (bottom) channels are shown as a function of the atmospheric transmission estimated at the corresponding wavelengths using bright point source observations acquired during the Reference observation campaigns (N2R9, N2R12, N2R14).

Fig. 8 .
Fig.8.Example of global skydip fit for a KID.Each square point represents one step in a skydip (made of eleven elevation steps).A series of 12 skydip scans are jointly used, spanning zenith opacities from 0.15 to 0.50 in the 1 mm band.The horizontal axis gives the sky effective temperature T sky = T atm [1 − e −τν x ] in Kelvins, where τ ν is the skydip zenith opacity found in the fit.The vertical axis shows the relative resonance frequency of the KID with respect to 1.9 GHz, given in MHz.The blue line is the linear model using the best-fit c k 0 and c k 1 coefficients (see Eq. (12)).

Fig. 11 .
Fig. 11.Average main-beam flat fields obtained by combining the flat fields of five beammap scans.Top row plots: normalised average flat fields of Arrays 1, 3, and 2, respectively.The offset positions with respect to the centre of the array are given in arcseconds in the Nasmyth coordinate system.The colour code gives the value of the KID calibration coefficients, as defined in Eq. (20), normalised by the average calibration coefficient over all the KIDs of the array.Bottom plots: average flat-field distributions using all KIDs (blue), using Array-1 KIDs that are positioned out of the shadow zone (green) and using Array-1 KIDs inside the shadow zone, which is defined in the text.

Fig. 14 .
Fig.14.Uranus flux-density ratio vs atmospheric transmission shown for the 1 mm-array combination (left column) and for Array 2 (right column) after absolute calibration using (first row) the Baseline method, (second row) the "taumeter"-based and (third row) the "skydip"-based methods.These plots include all Uranus scans acquired during N2R9, N2R12, and N2R14 campaigns.

A71Fig. 15 .
Fig. 15.Comparison of calibration bias for five calibration methods using observations of MWC349.The measured-to-expected fluxdensity ratio is shown as a function of the atmospheric transmission for the Baseline method (first row), as well as for methods using the taumeter (second row) and skydip (third) opacity corrections, and for methods resorting to the PC-demo (fourth) and PC-point (fifth) photometric corrections.Dashed lines show the flux-density ratio rms errors.
Figure C.2 shows two different FWHM estimates for the same data set: on one hand the best-fit FWHM estimates on the OTFscan map, and on the other hand the interpolation from the pointing-based FWHM monitoring.The two estimates show the same global variations as a function of the UT hours.They are in agreement with each other, although the pointing-based estimates have more dispersion and a few outliers, as expected.

Fig
Fig. C.2. Beam size monitoring."OTF"-labelled pink data points show the FWHM estimates from a 2D Gaussian fit on the maps of OTF raster scans towards bright sources, whereas the "Pointing"-labelled light green data points are FWHM estimates obtained by interpolating the median-filtered pointing-based FWHM at the time of the scans.The pointing-based FWHM estimates are obtained by fitting a 2D Gaussian on the maps of pointing scans.

Table 1 .
Effective frequencies ν eff and bandwidths ∆ν of the three arrays computed in two different observing conditions defined with the precipitable water vapour.
Notes.pwv content in mm and observing elevation in degrees.

Table 3 .
Summary of the number of valid detectors per array.

Table 4 .
Fraction of very stable KIDs (percent) of the design KIDs.

Table 6 .
Estimates of the main-beam FWHM in arcseconds using three estimation methods (see Sect. 6.2.1) and three data sets (see Sect. 6.2.2), and their combinations.

Table 7 .
Estimates of the solid angle of the beam up to a radius of 90 ′′ , Ω 90 , and rms uncertainties given in square arcseconds, using Neptune and Uranus scans acquired during three observation campaigns, and the combined result.

Table 8 .
Main-beam efficiency estimated for each array or array combination, using three different methods, and the combined results, given in percentages.

Table 9 .
Estimates of the total beam solid angle and main-beam efficiency including in turn further contributions to the full beam ranked by their angular scales (see text).
Notes.All solid angles are given in square arcseconds, while the mainbeam efficiencies are in percentages.

Table 10 .
Best-fit parameters and rms uncertainties to infer NIKA2 opacities from the IRAM taumeter measurements.
Table 10.The quoted errors ∆a 225 ν and ∆b 225 ν are 1-σ errors of the fit.

Table 11 .
A71, page 18 of 36 L.Perotto et al.:Calibration and performance of the NIKA2 camera at the IRAM 30-m Telescope NIKA2 reference frequencies and FWHM.

Table 14 .
Baseline calibration results: photometry accuracy and uncertainties.

Table 15 .
Comparison of results using five calibration methods.Notes.The table lists: (first column) the Baseline calibration (second and third), the calibration methods using other opacity corrections, taumeter and skydip, respectively, and (fourth and fifth) the calibration methods using a photometric correction, PC-demo and PC-point, respectively (see Appendix C).The calibration biases, as defined in Sect.9.1.1,are reported in the sub-panel labelled "Bias".The sub-panel labelled "rms" gathers (first row) the total number of observation scans of bright sources (see Sect. 9.1.2)acquired during the Reference campaigns, (second row) the number of selected scans, and (third to sixth rows) the rms calibration uncertainties (Sect.9.1.2) for Array 1, Array 3, the combination of A1 and A3, and Array 2, respectively.

Table 16 .
Stability of the NEFD estimates.
Table A.1.Physical quantities used for Uranus and Neptune flux computations (Eq.(A.2)).Ob-lat and Delta are quantities tabulated by NASA Horizons system as a function of the date.
Table A.2. Reference flux densities of secondary calibrators at the NIKA2 reference frequencies 150 and 260 GHz.