The MeerKAT Absorption Line Survey: Homogeneous continuum catalogues towards a measurement of the cosmic radio dipole

The number counts of homogeneous samples of radio sources are a tried and true method of probing the large scale structure of the Universe, as most radio sources outside the galactic plane are at cosmological distances. As such they are expected to trace the cosmic radio dipole, an anisotropy analogous to the dipole seen in the cosmic microwave background (CMB). Results have shown that although the cosmic radio dipole matches the direction of the CMB dipole, it has a signiﬁcantly larger amplitude. This unexplained result challenges our assumption of the Universe being isotropic, which can have large repercussions for the current cosmological paradigm. Though signiﬁcant measurements have been made, sensitivity to the radio dipole is generally hampered by systematic e ﬀ ects that can cause large biases in the measurement. Here we assess these systematics with data from the MeerKAT Absorption Line Survey (MALS), a blind search for absorption lines with pointings centred on bright radio sources. With the sensitivity and ﬁeld of view of MeerKAT, thousands of sources are observed in each pointing¸ allowing for the possibility of measuring the cosmic radio dipole given enough pointings. We present the analysis of ten MALS pointings, focusing on systematic e ﬀ ects that could lead to an inhomogeneous catalogue. We describe the calibration and creation of full band continuum images and catalogues, producing a combined catalogue containing 16,313 sources and covering 37.5 square degrees of sky down to a sensitivity of 10 µ Jy / beam. We measure the completeness, purity, and ﬂux recovery statistics for these catalogues using simulated data. We investigate di ﬀ erent source populations in the catalogues by looking at ﬂux densities and spectral indices, and how they might inﬂuence source counts. Using the noise characteristics of the pointings, we ﬁnd global measures that can be used to correct for the incompleteness of the catalogue, producing corrected number counts down to 100 - 200 µ Jy. We show that we can homogenise the catalogues and properly account for systematic e ﬀ ects. We determine that we can measure the dipole to 3 σ signiﬁcance with 100 MALS pointings.


Introduction
The vast majority of sources seen at radio wavelengths outside of the Galactic plane are known to be at cosmologically ⋆ The catalogue is only available at the CDS via anonymous ftp to cdsarc.cds.unistra.fr(130.79.128.5) or via https:// cdsarc.cds.unistra.fr/viz-bin/cat/J/A+A/673/A113A&A 673, A113 (2023) in favour of a strongly evolving universe even before the discovery of the cosmic microwave background (CMB; Penzias & Wilson 1965).As radio sources trace the large-scale structure of the Universe, they are expected to abide by the cosmological principle, which asserts that the Universe is homogeneous and isotropic.However, there is an anisotropy expected in the number counts of radio sources, caused by the velocity of the Solar System with respect to the cosmological background.This expresses itself as a dipole and is the dominant anisotropy observed in the CMB (Lineweaver 1997).The movement of the observer induces Doppler boosting and relativistic aberration that cause the apparent luminosity and position of radio sources to shift, resulting in a dipole in the number counts of radio sources.A measurement for the cosmic radio dipole was first proposed by Ellis & Baldwin (1984), who showed that 2 × 10 5 sources, adequately distributed along the axis of the dipole, are required for a 3σ measurement of the radio dipole, assuming the Solar System velocity derived from CMB measurements.
Using data from the National Radio Astronomy Observatory (NRAO) VLA Sky Survey (NVSS; Condon et al. 1998), Blake & Wall (2002) made the first significant measurement of the dipole, with a direction and amplitude that correspond to those of the CMB.Subsequent studies were performed with the NVSS and other radio surveys, such as the Westerbork Northern Sky Survey (WENSS; Rengelink et al. 1997), the Sydney University Molonglo Sky Survey (SUMSS; Mauch et al. 2003), and the Tata Institute for Fundamental Research (TIFR) Giant Metrewave Radio Telescope (GMRT) Sky Survey's first alternative data release (TGSS ADR1, Intema et al. 2017).It was found that the cosmic radio dipole, while being consistent in terms of direction with that of the CMB, significantly differs from the CMB dipole in terms of amplitude (e.g.Singal 2011;Rubart & Schwarz 2013;Tiwari & Jain 2013;Tiwari et al. 2015;Colin et al. 2017).These early dipole measurements found that survey-wide systematic effects, which cause varying source densities, can greatly bias dipole estimates.This is usually remedied by strict cuts in flux density, which dramatically decrease the number of usable sources.Even with these flux density cuts, some surveys, such as TGSS, yield anomalous dipole results that have been attributed to systematics due to problems with flux calibration (e.g.Singal 2019;Bengaly et al. 2018;Siewert et al. 2021).While results differ depending on the survey and estimator used, the amplitude of the radio dipole is consistently larger (by a factor of 2-6) than the amplitude of the CMB dipole (see Siewert et al. 2021, for an overview), while the direction of the dipole remains consistent.With similar results found using the number counts of active galactic nuclei (AGN) at infrared wavelengths (Secrest et al. 2021(Secrest et al. , 2022;;Singal 2021), it becomes increasingly difficult to explain them with systematic effects or faulty analysis.Only in a recent analysis by Darling (2022) was a dipole found to be consistent with the CMB in both direction and amplitude; this was done by combining the VLA Sky Survey (VLASS; Lacy et al. 2020) and the Rapid Australian Square Kilometre Array Pathfinder (ASKAP) Continuum Survey (RACS;McConnell et al. 2020), though it presents only one counterpoint to the many works that find an increased dipole amplitude.Considering a purely kinematic origin of the dipole, the cosmic radio dipole and the CMB dipole are in obvious tension with each other.The excess dipole found in the radio therefore must be a result of a different process, which could have major implications for cosmology.As radio galaxies trace the underlying matter distribution, a dipole in their distribution would break with isotropy, one of the fundamental assumptions of cosmology.
The assumption of isotropy and homogeneity is founded on the notion that we as observers do not occupy a special place in the Universe, and these results suggest that there is some flaw in this assessment.
Working towards an independent measurement of the radio dipole, we utilise the MeerKAT Absorption Line Survey (MALS; Gupta et al. 2016), a deep radio survey with pointings centred on bright radio sources.MALS is carrying out a dust-unbiased search for neutral hydrogen (HI, 21 cm) and hydroxyl (OH, 18 cm) absorption lines at redshifts 0 < z < 2 in order to unravel the processes driving the steep evolution of the star formation rate density.As a blind search for absorption lines, every MALS pointing is centred on a bright AGN (>200 mJy at 1.4 GHz).The targets have been chosen from the NVSS and SUMSS catalogues and are cross-checked with Wide-field Infrared Survey Explorer (WISE) data in order to build a dust-unbiased sample of AGN (Gupta et al. 2022).Early results show that MALS is able to attain unprecedented sensitivity to absorption lines in these bright AGN (Gupta et al. 2021;Combes et al. 2021).In addition to the search for absorption lines, the data taken will be sensitive enough to produce deep continuum images, down to 10 µJy beam −1 .With a full width at half maximum (FWHM) field of view of 1 degree in the L band (1.27 GHz), each MeerKAT pointing presents a few square degrees and potentially thousands of sources.With 391 pointings currently observed in the L band, the full survey will provide thousands of square degrees of deep continuum sky and hundreds of thousands of sources.
Though the expected MALS number counts are sufficient for a dipole measurement, a dipole estimate requires a homogeneous catalogue.Systematic effects influencing the sensitivity of surveys are common and are usually dealt with by making conservative cuts in the data to avoid biasing the dipole estimate.Instead, in this work we present a thorough analysis of ten MALS pointings, aiming to fully understand the systematics present in the survey data.This will allow us to account for these systematics when measuring the radio dipole using hundreds of MALS pointings.The nature of the survey provides additional challenges for this type of measurement.Previously, measurements of the dipole have been performed with contiguous surveys such as the NVSS, whereas MALS will be sparser, sampling the sky in many different directions.However, compared to these surveys, MeerKAT has a much higher sensitivity (10 µJy beam −1 ), which allows us to probe deeper into the population of faint radio sources.Furthermore, past dipole measurements from contiguous sky surveys have been performed post-factum, with little knowledge of the internal processing, and therefore present systematics of these surveys beyond what is described in the literature.In this paper we study the first ten continuum images of MALS in depth in order to assess their quality.We investigate the systematics in calibration, imaging, and source finding on image quality and source counts, and extrapolate our findings to the rest of the survey.
This paper is organised as follows.In Sect. 2 we describe the MALS data.The initial creation of the source catalogues and the completeness measures are described in Sect.3. In Sect. 4 we describe results from the full catalogue of sources.We investigate how different source populations affect the catalogues in Sect. 5.In Sect.6 we assess the prospects for a dipole measurement with MALS using the results in this paper.Finally, in Sect.7, we summarise the findings of this paper.
A113, page 2 of 30 Table 1.Calibration details of the pointings presented in this paper, grouped by observation runs.

Flux cal
Flux density Target Gain cal Flux density Reference flux (a) Spectral index (a) Distance from target 870 MHz 1365 MHz 1400 MHz (Jy) (Jy) (Jy) (degrees) Notes.The flux densities of the calibrators are reported by the CASA fluxscale task, which changes reference frequency based on whether the source is a flux calibrator or gain calibrator. (a) https://skaafrica.atlassian.net/wiki/spaces/ESDKB/pages/1452146701/L-band+gain+calibrators for properties of MeerKAT L-band calibrators. (b) Value from the old list of calibrators, no longer publicly available.The Galactic plane is largely avoided, and since 89% of the pointings are selected directly from NVSS, the vast majority of pointings are above a declination of −40 degrees.The pointings used in this analysis are highlighted in red.

MALS data
The distribution of the first 391 observed pointings of MALS is shown in Fig. 1.In order to assess the data quality of the individual MALS pointings and the impact for the dipole estimates an initial set of ten pointings, shown in Fig. 1 in red, has been selected out of five observing runs to probe different ranges of right ascension, declination, and central source flux density.

Observations and calibration
The general setup of a single MALS observation includes observations of three science targets and corresponding calibrators.
The observation is scheduled with a flux calibrator observed for 10 min at the start and end of each observing run.Each target is observed for 20 min at a time, cycling through all targets three times for a total observing time of an hour per source.Before and after each target observation, a nearby gain calibrator is observed for one minute.Cycling between targets like this maximises the UV-coverage with minimal increase in overhead.Observing multiple targets in a single run is not only convenient in terms of processing, but is also critical in taking stock of systematic effects, such as flux density scale or phase errors, potentially introduced during observation or calibration.All observations have a correlator integration time of 8 s, with observations carried out in 32 K mode, providing 32 768 channels with a channel width of 26.123 kHz.With a frequency range of 856-1712 MHz, the total bandwidth is 856 MHz, with a central frequency of 1.285 GHz.
The MeerKAT data are shipped to the Inter-University Centre for Astronomy and Astrophysics (IUCAA) in India and processed by the Automated Radio Telescope Imaging Pipeline (ARTIP).The complete deployment of ARTIP in MALS is described in Gupta et al. (2021).ARTIP presents an environment where data can be processed according to user specifications and is based on the Common Astronomy Software Applications (CASA) tasks (The CASA Team et al. 2022).Each dataset undergoes a round of basic flagging, removing known radio frequency interference (RFI) frequencies.This is followed by flux calibration, bandpass calibration, and gain calibration, each step having the possibility of additional automated flagging.The final target visibilities used for the imaging process are produced by applying the flags and calibration solutions.
As part of the overall evaluation of the individual pointings, all the available information was assessed automatically with an evaluation scheme that has been developed to trace errors of the calibration process by searching through the logging information of ARTIP.This scheme also extracts relevant information from the logs, such as the flux densities of the calibrator sources.An overview of the targets and calibrators of the ten selected pointings, organised by observation block, is shown in Table 1.For the gain calibrators, both the flux density determined during calibration and from a reference catalogue is listed.

Self-calibration and continuum imaging
For the purposes of continuum imaging, the data are averaged over 32 channels and divided into 16 spectral windows (SPWs), resulting in 64 channels per SPW.Once again, frequencies with  As each field contains a strong point source at its centre, both phase and amplitude self-calibration can be performed (Cornwell & Fomalont 1989).In total, three phase and one amplitude calibration steps are performed, with imaging each step to improve the local sky model.As is common with self-calibration in CASA, we use the clean components created in tclean as the local sky model for calibration.We iterate on the model by creating masks for tclean using the Python Blob Detection and Source Finder (PYBDSF;, Mohan & Rafferty 2015).Starting with a mask containing only the central source, after a set number of iterations PYBDSF is used on the image to create the mask for the full field, initially with a high S/N threshold and lowering the threshold for subsequent runs to gradually expand the model.Creating the clean masks in such a way ensures that cleaning is mostly limited to real emission, while also speeding up the imaging by limiting the cleaning area.
Though the self-calibration can be a significant improvement on the image it can also be potentially unstable.To monitor the stability of solutions, a diagnostic tool for self-calibration produces a report on the variation of relevant statistics such as noise and central source flux density in different steps of calibration.As with calibration, the logs were evaluated for errors and warnings during the self-calibration process and information relevant to assessing the quality of the products, such as percentage of flagged data and theoretical noise limit, were extracted.
Imaging is performed using Multi-term Multi-Frequency Synthesis (MTMFS; Rau & Cornwell 2011) deconvolution with four pixel scales (0, 2, 3, and 5 pixels) to model extended emission in the images and two Taylor terms to account for the spectral shape of the sources.This produces two Taylor term images, which describe the spectral shape of the emission to zeroth and first order, respectively.As such, the zeroth order Taylor term I 0 represents the continuum flux density of the field at the reference frequency of 1.27 GHz, while the first order Taylor term I 1 describes the spectral index, (1) To maintain a balance between sensitivity and resolution in the images, visibilities are weighted using Briggs weighting (Briggs 1995) with robust value of 0. Because we are imaging with a large field of view, we use W-projection (Cornwell et al. 2005) with 128 projection planes to correct for the fact that our baselines are non-coplanar.2, they are on average aligned in the north-south direction, with a mean major axis of 9.3 ′′ and mean minor axis of 6.5 ′′ .

Spectral index images
The L band of MeerKAT has a bandwidth of 802.5 MHz, which is large enough to be sensitive to the spectral shape of the radio emission within the band.If this is not taken into account when imaging the full band, this incurs a large uncertainty in flux density.The general solution for this is MTMFS deconvolution, which models the frequency dependence of the emission with a Taylor expansion.In our case, as mentioned in Sect.2.2, we model the frequency dependence of the emission in the pointings to first order in ν.With this we can create maps describing the spectral index α, defined by the relation between flux density S and frequency S ∝ ν α , of the emission in the image.
Although MTMFS imaging also produces a spectral index image, pixels below 5 times the peak residual are masked in this image.To retain flexibility, we therefore chose to produce the spectral index images from the Taylor term images ourselves.From the definition of the Taylor term images in Eq. ( 1), a spectral index image can be obtained using α = I 1 /I 0 , from which we will be able to measure the spectral indices of sources.To keep values in the spectral index image from diverging, pixels are masked where values in the Stokes I image are below 10 µJy beam −1 .When measuring the spectral index in some region of the image, usually defined by the extent of a source, we assign a spectral index as the intensity weighted mean of the measured pixels in the spectral index image, with intensity weighted standard deviation as the error, If more than half of the measured pixels in a region are invalid in the spectral index image, this carries over to the measured spectral index and uncertainty by assigning a masked value.

Primary beam correction
Due to the primary beam response of the MeerKAT antennas, sources away from the pointing centre appear fainter than they are in reality.As this effect is not corrected for in the imaging stage, resulting continuum images will have accurate flux densities at the pointing centre but attenuated flux densities that become fainter the farther from the pointing centre they are located.A simplified model of the primary beam is described in Mauch et al. (2020), which assumes the primary beam of MeerKAT as directionally symmetric, describing it with a cosine-tapered illumination function, Here ρ is the distance from the pointing centre and θ pb represents the angular size of the FWHM of the primary beam, a quantity that is dependent on the observing frequency, ν, (5) At the central frequency of our continuum images of 1.27 GHz, the FWHM of the primary beam is θ pb = 67 ′ .This simplified model is implemented in the katbeam1 PYTHON package.As the primary beam is frequency dependent, it affects the spectral index images, increasing the measured spectral index away from the pointing centre.The spectral index change induced by the primary beam can be approximated by Again, we assume the frequency ν to be equal to the central frequency ν 0 = 1.27GHz.
In reality, the MeerKAT primary beam in the L band is more complicated and cannot be completely described by a directionally symmetric model.de Villiers & Cotton (2022) present and analyse holographic measurements of the MeerKAT primary beam, showing the directional asymmetries present due to variations between individual antennas.For an accurate model of the primary beam, we use these holographic measurements to correct our images.As we utilise the full 802.5 MHz bandwidth of the L band for these images, a primary beam correction must take this into account.Though a wideband primary beam correction is implemented in the CASA task widebandpbcor, there are no models of the MeerKAT beams available.As such, we implement the wideband primary beam correction ourselves using the same basic recipe, which consists of creating a primary beam with a frequency structure matching that of the image, in this case creating a primary beam model for each of the 16 SPWs of the continuum data.As in the imaging step, we model the multifrequency primary beam with two Taylor terms.The primary beam corrected Taylor term images are then defined as follows: Here, P 0 and P 1 represent the zeroth and first order Taylor term primary beams, respectively, where P 0 /P 1 should be equal to α pb as specified in Eq. ( 6).While we use the holographic wideband primary beam corrections described in Eq. ( 7) for the main results of this work, we also briefly explore the simplified corrections of Eqs. ( 4) and ( 6) and see how they compare to the wideband corrections.At applying the primary beam corrections, the image is cut off at the 5% level of the primary beam (at the central frequency of 1.27 GHz), which leaves us with a circular image with a diameter of approximately 4000 pixels, or 2.2 degrees.As a result of reduced sensitivity towards the edges of the image, the noise is increased there.

Assessment of calibration
Processing the raw data to the final scientific data products can introduce errors, affecting the flux density scale.A first order estimation of the flux density scale can be obtained by comparing the flux densities of the gain calibrators with their literature values.As discussed in Sect.2.1, we evaluate the automated process of the data calibration by generating diagnostic reports and automatically evaluating logged information in order to determine problems in the data processing.This evaluation singles out errors and warnings present in the logs, allowing direct insight into any problems that might have occurred during the calibration process.Furthermore, it extracts information we can use to assess the quality of calibration from the logs, such as the flux densities of calibrator sources.
Table 1 summarises the observation and calibration details, showing the targets and their associated calibrator sources.The flux densities of the flux-and gain calibrators are extracted from the logs and the flux densities of gain calibrators are compared to the MeerKAT reference catalogue (Taylor & Legodi 2021).We extend this to a broader assessment of the flux density scale in Fig. 2, where we show the flux density offsets of the gain calibrators and central sources of the individual pointings.Along with the comparison in Table 1, both gain calibrators and central sources (see Table 2) are compared to their NVSS counterparts.Flux densities are corrected for frequency using the spectral index from the reference catalogue if A113, page 5 of 30 A&A 673, A113 (2023) available, assuming α = −0.75otherwise.Combining the measurements from the ten pointings, the mean flux density ratios are 1.03 ± 0.26 between the gain calibrators and their NVSS counterparts, 1.07 ± 0.07 between gain calibrators and their reference values, and 1.08 ± 0.19 between central sources and their NVSS counterparts.We note that the absolute amplitude calibration of NVSS is based on Baars et al. (1977) and has an uncertainty of up to 12% with respect to the here used Perley & Butler (2017) scale, depending on the calibrator used.
We note that the SUMSS and NVSS measurements were taken with different instruments at different times, so some variation is to be expected.The current assessment does not include astrometric precision, as calibrators are not imaged.We assess this aspect along with the another flux density scale assessment by cross-matching the full catalogue of sources with other surveys in Sect.4.2.

Assessment of image quality
With any radio image, there is a great number of variables that can influence the quality of the image, both related to intrinsic properties of the pointing and to the process of calibration and self-calibration.As discussed in Sect.2.2, a report is generated that monitors image statistics such as noise and central source flux during different self-calibration steps.Furthermore, the logs are automatically evaluated for possible errors and warnings and information relevant to the quality of the self-calibration and imaging is extracted.To evaluate the final image product, the image quality of the individual ten pointings is assessed by using the root mean square (RMS) noise maps that are automatically produced during the source finding procedure by PyBDSF (see Sect. 3).In particular, we investigate the overall noise characteristics by evaluating the sky coverage with respect to the RMS noise.A direct measurement of the noise allows us then to easily correlate image quality with other characteristics of the pointings.
We create a smoothed representation of the ten pointings by median stacking their normalised RMS noise images, which is shown in Fig. 3.As all pointings have a strong source at their centre, the noise is increased at the pointing centre and increases towards the edges of the pointing as a consequence of the primary beam response.Figure 3 shows that some directional effects are left in the image.Notably, there is an elongated noise structure in the centre, associated with the bright central source, aligned in the north-south direction.The stacked beam included in the figure aligns well with the elongated structure, indicating that the most prominent structures are a result of the shape of the stacked PSF of the images.The imprint of the stacked PSF is also the most likely cause of the cross-like structure seen in the stacked image.Though we have the wideband primary beam correction based on holographic images that take into account the asymmetries present in the primary beam, pointings are observed for three separate blocks of 20 min in an observing night, which smears out the asymmetries in the primary beam2 .This effect cannot be easily corrected for in the image plane, but could be taken into account during imaging using A-projection (Bhatnagar et al. 2008).Though present, the asymmetries here are small and dominated by the other noise structures in the image.
The usual method of determining RMS noise in an image relies on measuring RMS noise in an area close enough to the Fig. 3. Median stacked pixel values of the RMS noise images of all ten pointings.As primary beam correction is applied, the noise goes up towards the edges of the image.Since a strong central source is always present, the noise is always higher in the centre as well.The given contours from white to black are 20, 40, and 60% RMS noise coverage.The stacked beam (50 time increase in size) of all the pointings is shown in the lower-left corner, matching the elongated structure in the centre.Fig. 4. RMS noise coverage for all ten pointings.The dotted line indicates the 20% coverage level, which is used to define σ 20 .Noise varies appreciably between the pointings; however, the overall structure of the RMS noise coverage curves remains consistent, indicating that σ 20 is a good zeroth order measure of the noise scale.
pointing centre to not be affected by the primary beam and far enough from strong sources to be unaffected by artefacts.Due to the number and structure of MALS pointings, this cannot be reliably done in an automated fashion.Instead, we investigate the differences in noise level between individual pointings by using the RMS noise images to assess the RMS noise coverage, measuring the cumulative distribution of noise levels across an image.Figure 4 shows the RMS noise coverage curves for the individual fields, and it can be seen that RMS noise coverage curves are similar in structure but offset from each other.To quantify this offset and thus characterise the noise in the individual pointings, we define σ 20 at 20% RMS noise coverage, representing the noise in the central portion of the image (see Fig. 3).We will see that σ 20 excellently serves as a normalisation factor to account for the differences in noise levels between the A113, page 6 of 30 pointings, and can be used to unify the assessment of individual pointings and extend them to the full survey.
There are several factors which can contribute to the overall noise level in an image, not all of which are easily quantifiable.However, an important aspect to consider is the shape of the synthesised beam or PSF, determined by the UV-coverage of the observation, which in turn is determined by the array configuration, observing time, and elevation of the target at the time of observation.There are two aspects to the PSF that influence image noise.A measurement of RMS noise in Jy beam −1 will be influenced by the shape of the beam3 , and very bright sources can have persistent and bright sidelobes from the shape of the PSF that are difficult to clean completely and as a result push up the noise in an image.To quantify this last effect we calculate the demerit score detailed in Mauch et al. (2020) to estimate the contributions of bright sources to RMS noise in the image.We calculated the independent source contributions to the errors in the image using all sources that have an unattenuated flux density of more than 100 mJy.The demerit score, d, is then defined as where the first term represents the contribution of pointing error σ p scaling with distance from the pointing centre ρ, and the second term is the receiver gain error σ g .The contribution of each source comes in the form of their attenuated flux density S a .Appropriate values for the MeerKAT L band are detailed in Mauch et al. (2020), which we also use (θ pb = 67 ′ , σ p = 30 ′′ , σ g = 0.01).The demerit scores of all pointings are included in Table 2.A correlation is present between demerit score and σ 20 , and especially pointings with high σ 20 show increased demerit scores.Though pointings with lower σ 20 show more scatter in their demerit scores, this nonetheless shows demerit score as a first order estimate of pointing quality, which we can utilise as a predictive measure.

Source finding
With thousands of sources expected to be detected in every MALS pointing, we require an automated source finding algorithm to find and characterise these sources.A small number of these are suitable for radio images, and perform comparatively similar (Hale et al. 2019).Of these, PYBDSF has been used in several recent data releases of large-scale surveys, such as the LOFAR Two-Metre Sky Survey (LoTSS; Shimwell et al. 2019) and the RACS (Hale et al. 2021).PYBDSF stands out in its ability to model extended emission with its wavelet decomposition module, and provides easy ways to compile source catalogues and assess the quality of the fields.Besides generating catalogues, PYBDSF provides output maps related to the input image, such as the RMS noise images we used in Sect.2.6, and mean and residual images.Once RMS noise and mean maps are obtained PYBDSF allows these maps to be used as input to ensure source finding is performed with the exact same parameters.For MALS, we thus make use of PYBDSF, both for creating clean masks during self-calibration as detailed in Sect.2.2, and integrating PYBDSF into the workflow to automatically carry out source finding, cataloguing, cross-matching and combining catalogues, using PYTHON-based scripts developed by the authors 4 .In order to understand the impact of the individual pointings to a general catalogue, we evaluate the source finding procedure for each pointing.We investigate completeness (what fraction of sources do we detect) and purity (what fraction of sources is real) in source counts with respect to signal-to-noise ratio, flux density, and source size, as well as PYBDSF's capability to accurately recover flux densities.

Stokes I catalogues
In order to compile a source catalogue from PYBDSF, various steps that depend on the initial setup of PYBDSF are needed.PYBDSF identifies islands of emission that are brighter than the island threshold.Within these islands PYBDSF finds emission peaks above a corresponding pixel threshold, and for each peak found fits a 2D Gaussian to the peak and surrounding emission.Performing source finding on our MALS images, we impose an island threshold of 3σ, and a pixel threshold of 5σ.Individual Gaussian components are combined into sources in a way that can be specified by the user, and we elect to combine Gaussian components that occupy the same island into a single source.The RMS noise in the images is determined by a sliding box, and we decrease the size of the sliding RMS box near bright sources to avoid spurious detections of artefacts around these sources as much as possible.Furthermore, to improve fitting of extended sources in the field, we enable à trous wavelet decomposition (Holschneider et al. 1989).The PYBDSF settings can be summarised as follows: For the purposes of analysing and building the final catalogue, we required a number of output products from PYBDSF.The output from source finding includes both a catalogue of sources and of individual Gaussian components.Furthermore, a background RMS noise and background mean image are produced, as well as a residual image.
For a single pointing, we ran source finding using PYBDSF and modified the output source catalogues by adding to the existing columns the ID of the MALS pointing, a source name following IAU convention, and the distance of the source to the pointing centre.As PYBDSF does not calculate spectral indices unless an image has multiple channels, we measured the spectral index of the sources in our fields from the spectral index images as described in Sect.2.3, using the extent of the Gaussian (major axis, minor axis, position angle) of the source to define the region in the image.
Though PYBDSF is configured to avoid spurious detections as much as possible, it is unavoidable that some artefacts are falsely identified as sources.We identified artefacts around the ten brightest sources in each image by flagging sources within five times the major axis of the beam that have less than 10% of the peak flux density of the bright source.This is largely motivated by the shape of the PSF, which can have sidelobes with a strength of up to 10% of the maximum.Though this does not get rid of all false detections in the image (see Sect. 3.2.4), it flags the most prominent imaging artefacts.
To assess the quality of the Gaussian fitting by PYBDSF, we performed a visual inspection on select sources.We created cutouts from the images and performed visual inspection, which was implemented in a separate module based on PYTHON and CASA5 .PYBDSF assigns each source a flag indicating whether the source is fit by a single Gaussian ('S'), multiple Gaussian components ('M'), or Gaussian component(s) on an island with other sources ('C').Since all Gaussian components that occupy the same island are always combined into one source, the 'C' flag is not present in our catalogues.For the visual inspection, we considered all sources made up of multiple Gaussian components.As such, all sources that carry the 'M' flag -which make up 8% of the all sources found in the fields -are flagged for visual inspection.Through the visual inspection, we then assigned an additional flag indicating the nature of the source and how well it is described by the PYBDSF model: G: sources that are well described by the Gaussian model.I: complex sources that are not adequately described by the Gaussian components fit to them.The flux density of these sources is better described by the integrated flux of the island, and their position by the flux weighted mean position of the island.
P: sources fit with multiple Gaussian components where only one is required to adequately describe the source.Other Gaussian components are likely fit to noise fluctuations coinciding with the source.
A: artefacts that will be flagged as such in the catalogue.Figure 5 shows an example for each of the cutout classes, and how we identify the different possible cases.To aid in visual inspection, in the source cutout we plot the individual Gaussian components, the combined source Gaussian, and the island threshold.Therefore, in this step we use both the source catalogues and the Gaussian component catalogues.Along with the cutout classes, additional columns are added to the table that describe the sources.In the cutouts we measure the integrated flux density of the island, the spectral index of the island, the intensity weighted mean position of the source, and a flag indicating if these measures are valid.Additionally, the number of Gaussian components is recorded for each source, as the initial PYBDSF catalogue only indicates whether a source has been fit with multiple Gaussian components or not.
Figure 6 shows the classification of all sources in the ten pointings that have been fit with multiple Gaussian components, 1259 in total.We see that almost all 120 sources assigned with the 'P' class have two Gaussian components, and a relatively large percentage of sources with the 'I' class have more than two Gaussian components assigned.Around 185 (15%) of these sources were considered to be adequately described by their Gaussian components, while 946 (75%) are more complex and better described by their island attributes.Only 8 sources are flagged as obvious artefacts.

Evaluation of individual pointings
In order to determine the reliability of the source finding routine and to assess how detection of sources is affected by their properties, we measured the completeness, purity, and flux recovery A113, page 8 of 30 From the SKADS simulations we created mock catalogues with 5000 sources uniformly distributed in flux density that have a flux density above 10 µJy, which is equal to the limit of thermal noise (10 µJy beam −1 ) for an unresolved source.With this we allowed for the possibility of noise fluctuations to push sources above the detection threshold.We injected sources from the mock catalogue uniformly distributed into the residual images produced by PYBDSF, which are devoid of sources but share the noise characteristics of the original images.We then performed the source finding routine again on these images, using the same mean and RMS noise maps determined by PYBDSF from the original image.This ensures that source finding is performed in the exact same way as the original image.We considered a source recovered if it is detected within the FWHM of the major axis of the clean beam from the original position.In order to reach a more robust measure, this process was repeated 50 times for each pointing, separately for point sources and resolved sources.

Completeness
With the procedure described above, we can make a statistically robust assessment of the completeness in the pointings.The (source) completeness in this case simply gives the fraction of sources that is detected, most commonly measured as a function of flux density of the source.The completeness curves for the individual pointings, for both resolved and unresolved sources, can be appreciated in Fig. 7.Not only is there a large difference between resolved sources and unresolved sources, pointings individually have large differences between them as well.To investigate other aspects of the completeness, we look at the fields J0249-0759 and J1232-0224, which have the lowest and highest noise levels among the pointings, respectively (see Table 2), which should yield the most extreme cases and allow us to probe variation between the fields.
Unresolved sources.As the SKADS catalogues describe the intrinsic shapes of sources, we can assess completeness for point sources by only injecting sources with a major axis of zero.The sources are defined in the image as delta functions, and convolved with the clean beam of the individual image.As the total flux density of point sources is concentrated in one peak, they are much easier to detect relative to resolved sources.Point sources allow us to assess completeness without being affected by source morphology, and so we use them to determine the completeness with respect to distance from the pointing centre.As sensitivity decreases outwards from the centre, we expect completeness to decrease as well.Figure 8 shows source completeness as a function of both flux density and distance from the pointing centre for the pointings J0249-0759 and J1232-0224.It is clear that indeed the completeness decreases with increased distance from the pointing centre, but is also lower near the pointing centre.This is a direct result of the strong source at the centre of each pointing pushing up the noise in its immediate vicinity.In the case of J1232-0224, which has a very strong source at the pointing centre, there is significant impact on completeness in the central portion of the image.To investigate the relation between the completeness and noise floor as a function of distance to the pointing centre we use the RMS maps created by PYBDSF.By radially averaging the RMS noise of the pointing, we obtain RMS noise as a function of distance to the pointing centre.As we have set the detection threshold at 5σ, we plot the radially averaged 5σ detection curve in Fig. 8, showing that this curve almost perfectly follows the completeness 'transition zone' for both pointings.In this transition zone the completeness goes up steeply from zero to one, and the flux density at which this occurs is directly related to the noise floor.Resolved sources.We performed the same experiment for resolved sources, where we define a resolved source as a source that has major axis and minor axis larger than 0 in the SKADS catalogues 6 .Sources are randomly selected out of the catalogue, so the distribution of source shapes injected in the image represents the distribution of the SKADS sample.These sources are injected as Gaussians into the image, and as with point sources, convolved with the clean beam.Owing to their lower surface brightness, resolved sources are often less easily detectable compared to point sources with the same flux density.To check how the size of sources affects completeness, we define the area ratio Q A of a source as the ratio between the area of the source and the beam as defined by their Gaussian characteristics.These are the major and minor axes θ maj and θ min for the source and B maj and B min for the clean beam of the pointing, We show the completeness as a function of area ratio and flux density in Fig. 9 for the pointings J0249-0759 and J1232-0224.
As shown in Fig. 8, the completeness of unresolved sources is related to the noise floor.This same relation should be present for resolved sources, in addition to the relation between completeness and source size.In order to disentangle the two different contributions to completeness for resolved sources, we divided flux densities by the ratio of local noise to the lowest noise in the image.The result in Fig. 9 shows completeness for uniform noise, so that only the source size and flux density affect completeness.We see a power law decrease (linear in the log-log scale) in completeness as a function of area ratio in both pointings, indicating that this is a universal feature for our source detection.This can be easily understood by considering that for larger sources the total flux density is divided over a larger area, which decreases the peak flux density that is used to detect these sources.
6 A small subset of sources with minor axis of 0 and major axis larger than zero are not included in the simulations.

Flux recovery
Using the mock catalogues, we investigate the ability of PYBDSF to accurately recover flux densities.This can be checked by looking at the flux densities measured by PYBDSF relative to the input flux densities from the mock catalogues.This is an important quality to verify as deviations from the expected 1:1 relationship are obviously undesirable.In Fig. 10 we show the measured flux densities against input flux densities of the pointings J0249-0759 and J1232-0224.We see that on average sources have a flux density that matches with their input value.
There is however a portion of sources with lower input flux density that have a significantly higher measured flux density than their input.These sources have their flux densities boosted by noise fluctuations, which are present in various orders of strength in the images, from thermal noise to calibration artefacts.We expect these sources to land on positive as well as negative noise peaks, but only sources on positive peaks will be detected.This results in an Eddington bias (Eddington 1913) pushing up the distribution of flux densities.To make quantitative statements about this bias, we need to combine data from all the 10 pointings, which we do in Sect.4.3.

Limitations of simulations
The method we have used here for measuring completeness and flux recovery relies on injecting sources directly into the residual images and measuring their properties with PYBDSF.
The advantage of this is a direct probe into the machinery of PYBDSF, as this is the only 'black box' between the input sources and the measurement.However, these simulations ignore some effects that affect the flux densities and shapes of sources in radio data, such as calibration effects, clean bias, and averaging effects like time and bandwidth smearing (Bridle & Schwab 1999).Probing these requires injecting sources directly into the visibilities and reprocessing the image, something that is not efficient for a large survey such as MALS.Finally, sources are injected into the image convolved with the clean beam as opposed to the PSF.This is well motivated for brighter sources, A113, page 10 of 30  as these have been mostly cleaned during the imaging process.
For faint sources this is not the case, especially since the masks for cleaning are generated by PYBDSF and are thus subject to the same selection that we used for the final images.To make the simulations more realistic, all undetected sources should therefore be convolved with the PSF.It is not clear how this should affect source finding, but the general consequence of this is that below the detection threshold sources immediately become fainter as a consequence of being convolved with the PSF rather than the clean beam.The PSF also spreads the emission of these sources over a large area, which could affect RMS noise if source crowding is high enough.This would however only be the case if images would be close to or at the confusion limit, which is not the case for MeerKAT in the L band down to at least 0.25 µJy (Mauch et al. 2020).

Purity
The the purity, we invert the pixel values of the images and run PYBDSF on the inverted images, using the RMS and inverted mean maps determined by PYBDSF from the original image.This again ensures that source finding is performed in the same way as on the original image.Since all real sources have positive flux density the only sources detected in the inverted images will be false detections.These false detections broadly fall into three categories, which we differentiate as noise peaks, (calibration) artefacts, and ghost sources.Noise peaks are statistical outliers of noise and can therefore appear at any point in the image, and are symmetric around the mean, such that these sources detected in the inverted image correspond roughly to the false detections in the normal image.Artefacts are sidelobes found around strong sources in the image, making them more easily traceable.
As described in Sect.3.1, we consider a source to be an artefact if they are found within 5 times the major axis of the clean beam of the ten brightest sources in the image, and have less than 10% of the peak flux density of the bright source.As the brightest negative sidelobe of the PSF is in general twice as bright as the brightest positive sidelobe, we would expect more artefacts to be found in the negative image.This seems consistent with the data, as using this criterion for artefacts flags 44 of the 241 sources found in the inverted images, while flagging 22 sources in the pointing catalogues.Finally, there are ghost sources, which appear as negative sources too bright to be noise fluctuations, in some cases even strong enough that they have sidelobes that are detected as sources.These sources can be caused by calibration with an incomplete sky model, and only have faint positive counterparts (Grobler et al. 2014).Strong ghosts can add to the number of false detections with their sidelobes, but only a handful of such cases are seen in the images.
We plot the amount of false detections per pointing in Fig. 11, both in terms of absolute counts (coloured bars) and fraction (red line).The amount of false detections strongly depends on pointing, and we find two pointings that are most strongly affected: J0001-1540 and J1232-0224.The latter is affected by a strong central source, which leads to reduced number counts and an increased fraction of false detections, while the former would be considered a good pointing, both in terms of number counts and noise properties.There does appear however a cluster of relatively bright (10-100 mJy) sources present far from the pointing centre, which can contribute to noise.The presence of a number of strong sources far out in the field has also potentially affected self-calibration, as a high number of ghost sources are seen in the image.This result suggests that purity of any individual pointing is not always easily predictable, and should each be assessed separately.

Combined catalogue
The combined source catalogue of ten pointings contains 16 307 sources, and covers 35.7 square degrees of sky.In the previous section we mostly assess the quality of individual pointings.
Here we combine the catalogues of the individual pointings to increase statistical power, which allows us to investigate subtler systematic effects that affect all pointings.

Correcting residual primary beam effects
In Sect.2.4 we described primary beam corrections to both the flux densities and spectral indices in the images.Besides the main wideband primary beam corrections using holographic images, we also described corrections with simplified analytic forms.Before investigating the difference between these methods, we must make additional corrections to residual primary beam effects.In general, the simplified analytical corrections work well up to the FWHM of the primary beam, but farther out results begin to diverge.This is an effect that is seen in both the spectral indices as well as the flux densities of sources, mainly caused by using the primary beam correction based on the central frequency ν 0 = 1.27GHz for the entire bandwidth of 802.5 MHz.In order to take into account the contribution of the entire bandwidth, we recalculate the corrections by integrating over the bandwidth rather than assuming the frequency to be equal to ν 0 .The necessary corrections are computed for each source in the catalogue separately, depending on distance from the pointing centre and spectral shape.
In the images corrected by the simplified analytical function of Eq. ( 4), flux densities of sources appear higher the farther they are from the pointing centre.In order to properly correct for the effect in flux density, the spectral index of the source must be known or assumed.For reasons explored in Sect.5.2, we cannot trust all spectral indices to be accurate and perform this correction assuming α = −0.75 for all sources.We then calculate a correction factor for the flux densities as a function of distance from the pointing centre.The assumed primary beam model is as before (Eq.( 4)), and the correction is computed by integrating the primary beam over the frequency range of the band.Considering a source with some spectral index α, the flux density of the source is described by S (ν, α) ∝ ν α .Due to the effect of the primary beam, the flux density of the source has some attenuation factor a(ρ, α) applied to it.This factor is described by the primary beam: Since the flux densities have already undergone primary beam correction, we need to correct for the ratio between this term and the correction from Eq. ( 10), The effect of this correction should become visible when comparing flux densities to external catalogues.If the flux densities are properly corrected, the flux density ratio between catalogues should be constant as a function of distance to the pointing centre.
In contrast to the flux densities, both the analytical function from Eq. ( 6) and the wideband primary beam correction from Eq. ( 7) leave a residual effect in the spectral indices of sources farther from the pointing centre.Taking Eq. ( 6) to describe the spectral index induced by the primary beam variation, we correct these values taking the full bandwidth into account, recalculating the effect of the primary beam on spectral indices by integrating over the bandwidth, ∆ν: To correct the already measured spectral indices present in the catalogues we subtract the difference between the integrated and original primary beam correction term,

Cross-matching catalogues
To further investigate systematics that affect the pointings on a more general level, we continue the assessment from Sect.2.5, now using the sources of the entire field.We cross check our sources with their counterparts from NVSS and RACS, as all our pointings here are within the sky coverage of these two surveys.
Cross-matching was performed by checking whether source ellipses, defined by the 3σ extent of the Gaussians describing these sources, overlap between the catalogues.We required a minimum overlap in area of 80% to consider sources to be a match.Sources in one catalogue could be matched with any number of sources in the other, to account for different resolutions between the catalogues.Due to uncertainties in position and flux density near the NVSS detection threshold of 2.5 mJy, sources below 5 mJy in NVSS were not considered.We find that 997 sources are matched to NVSS, of which 845 are matched to a single source, and 2064 sources are matched to RACS, with 1949 matched to a single source.
There are a number of factors that can influence astrometric precision of an observation, such as errors in the reference frequency or timestamps.Some of these errors were present in earlier MeerKAT observations (e.g.Mauch et al. 2020;de Villiers & Cotton 2022).While the errors should no longer be an issue, it is important to cross check positions in the field with an external catalogue for potential astrometric errors.The astrometric offsets of sources to their NVSS counterparts can be seen in Fig. 12, where the offsets are shown only for single matched sources.Overall, the offsets are very small with a median offset of ∼0.3 ′′ , which is less than one-sixth of the image pixel size (2 ′′ ) and well within the uncertainty.The scatter in both directions is smaller than 3 ′′ , which is less than the semi-minor axis of the average clean beam of 3.25 ′′ , also shown on the figure.
In Sect.4.1 we corrected spectral indices and flux densities accounting for residual effects introduced by the frequency range covered by the band.Cross-matching sources with external catalogues is an important check of the correctness of their measured flux densities.Figure 13 shows the flux density ratio of 845 MALS and NVSS sources, and Fig. 14 shows the flux density ratio of 1949 MALS and RACS sources.Only sources that are matched to a single source are used, and flux densities  have been converted to the MALS rest frequency (1.27 GHz) assuming α = −0.75.In both figures the corrections for the residual primary beam effect have properly re-scaled the flux densities, as the median flux density ratio (blue line) stays largely consistent with distance from the pointing centre, but a residual effect is left towards the edges of the image.In Fig. 13 it stands out immediately that there is a systematic offset between the MALS and NVSS flux densities, as MALS flux densities are 18% higher on average.The overall flux density scale offset is S MALS /S NVSS = 1.18 ± 0.26.In contrast to NVSS, Fig. 14 shows that the flux densities between MALS and RACS agree extremely well up to ρ ∼ 0.5 degrees.The overall flux density scale offset is S MALS /S RACS = 1.06 ± 0.39, with the 6% overall offset originating mostly from the outer parts of the primary beam.Though the result from NVSS might indicate any number of problems that could cause the offset, the additional data from RACS rules out most of these assumptions.A likely source of uncertainty would be the assumption of spectral index; however, this would impact the RACS results far more significantly, with its rest frequency of 887 MHz.From the RACS flux density offset we can assume that the 6% offset stems mostly from the residual primary beam effect, but this can only explain part of the offset seen in NVSS.If this offset is persistent, it points to a systematic effect affecting either NVSS or both MALS and RACS.Due to the relatively low sensitivity of the surveys, only about 10% of MALS sources are matched to a counterpart, which makes the error bars on the flux density offset measurement rather large.As such, the measured offset is within the uncertainty, preventing us from making any definitive statement on the flux density offset.Combined with the measurement from Sect.2.5, the flux density scale of MALS does not currently significantly deviate from the expected value, but the offset seen here indicates that more data are needed.

General assessment of the complete catalogue
In Sect. 3 the individual pointings have been evaluated with respect to completeness, purity, and flux density recovery.Here these properties are assessed on the entire catalogue in order to understand the impact of these characteristics on the final data product.

Completeness
To statistically determine completeness of the data, we refactored the completeness to make it consistent between different pointings.In order to achieve this, instead of expressing completeness as a function of flux density in units of Jansky, we show flux density in units of σ 20 as defined in Sect.2.6.As we showed in Sect.3.2.1,completeness for point sources scales linearly with local noise, and thus should be 0.2 at 5σ 20 for all pointings.Though σ 20 has units of Jy beam −1 , for point sources integrated flux density and peak flux density are in principle equal, which means in this definition σ 20 has the same value in Jy. Figure 15 shows that in terms of σ 20 , pointings have very similar completeness curves, which allows us to combine the individual pointings and evaluate completeness for the whole survey, as indicated by the black combined completeness curve.
Combined completeness is also assessed as a function of separation from the pointing centre using only point sources, and as a function of major axis of the source using resolved sources.Both are shown in Fig. 16.Combining the completeness from all the pointings gives enough statistical power to paint a clear picture of how the completeness is dependent on these variables.A clear relation is shown between completeness and distance from the pointing centre.The major difference between individual pointings seems to be the influence of the central source on the completeness.These differences are however extremely well modelled by the radially averaged RMS noise (see Fig. 8), indicating that completeness is related to the local noise.The right plot in Fig. 16 shows that there is a power law decrease in completeness for larger sources.This was already suggested in Fig. 9, but with the combined catalogues we have enough number counts to fully cover the space.

Flux recovery
In evaluating the individual pointings in Sect.3.2, faint sources on average had higher measured flux densities compared to input flux densities.To fully assess this effect, we combine the flux recovery statistics from all pointings in Fig. 17.In the combined statistics the effect is clearer, with bins farther away from the flux density ratio of unity being occupied with on average 10 sources per bin.There is no visible dependence on flux density or distance from the flux density ratio of unity.Assuming a Poisson distribution of these bins with mean and variance λ = 10, we take all bins with fewer than 25 sources (5σ) to be part of this distribution.These bins combined contain 1.7% of all sources, indicating that this effect is rather small in terms of induced bias.Up to this point we have assumed that the flux density measured from the Gaussian fitting (the Total_flux column in the catalogues) best represents the flux density of the sources.In Fig. 17 we compare the flux recovery between the Gaussian flux density and the integrated flux density from the island that the source occupies, where the contour indicates the threshold of 100 sources per bin.We see that across the board Gaussian flux densities are skewed towards higher values, where island flux densities remain symmetric around the input flux density.This is an effect that can significantly affect our catalogues, especially considering the increased number counts at lower flux densities.Consequently, in the analysis in the rest of the paper, we assume that the flux densities of sources are more accurately represented by the island flux density.

Purity
Combining false detections from all ten pointings, 241 sources are detected in the inverted images, making up 1.5% of the combined catalogue.As described in Sect.3.2.4,our artefact identification criterion flags 44 of these, leaving 197 sources, or 1.2% of the combined catalogue.With the combined catalogue of false detections, we can investigate how purity is affected by other variables such as flux density and distance from the pointing centre.This will allow us to properly account for the purity of the catalogue, in order to not overestimate number counts.Given the variety of 'source types' seen in negative images and what counterparts we expect to see in the positive, the overall A113, page 14 of 30  amount of false detections should be lower than the amount of sources seen in the negative image.In this sense, the purity is more appropriately an upper limit rather than a direct measure of false detections.
Figure 18 shows the combined purity as a function of flux density and distance from the pointing centre.The left plot shows the purity as a function of flux density and shows that the fraction of false detections increases with higher flux density.This is largely a result of the overall number of sources decreasing at higher flux density, but does show that the flux density distribution of false detections is not the same as that of real sources.The lack of false detections at low flux densities shows that our 5σ detection threshold does not lead to a lot of spurious detections.It is noteworthy that more sources are flagged as artefacts at higher flux densities, indicating that artefacts around bright sources have higher flux densities on average.The right plot of Fig. 18 shows a strong dependence of purity on the separation from the pointing centre, similar to the completeness.False detections increase near the central source because of strong artefacts, and there is a steady increase towards the edges of the primary beam.We see that our artefact selection criterion correctly picked out most of the artefacts originating from sidelobes of the central source, which dramatically increases the purity in the central portion of the image.Although the number of false detections restricts the statistical power of these results, the relations already show clear trends for the purity as a function of flux density and distance from the pointing centre that can be used when assessing number counts.

Resolved and unresolved sources
For the analysis of completeness in Sects.3.2.1 and 4.3.1,we assumed that our catalogues are populated with both unresolved or resolved sources, and that these should be assessed separately.Figure 7 shows this distinction is warranted, as these sources types have very different completeness relations.If we want to apply this knowledge to real sources in the catalogue, we must have a reliable way of determining whether a source is resolved.We expect sources are resolved when their size exceeds the size of the synthesised beam of the image; however, we must take the uncertainties introduced by noise in the image and fitting errors into account.
We determine source size by measuring the ratio between integrated flux density S and peak flux density S peak of the source, which should be equal to one for an unresolved source.Figure 19 shows S /S peak as a function of S/N, for both resolved and unresolved sources in our combined catalogue.Here and in subsequent usage of S/N we define it as the ratio between the peak flux density of the source and the local RMS.Due to a combination of uncertainties, unresolved sources follow a lognormal distribution in S /S peak (Franzen et al. 2015), and thus a normal distribution in R = ln(S /S peak ) with mean 0 and standard We take both the uncertainties σ S and σ S peak as the sum in quadrature of their errors as determined by PYBDSF and a calibration error of 3%.The magnitude of the error in calibration is motivated by assessing Gaussian fits of bright unresolved sources and the flux density offset determined in Sect.2.5.Using these quantities, the compactness criterion is then The factor of 1.25σ R encloses 95% of sources below S /S peak = 1, so with the symmetry of the Gaussian distribution, 95% of all unresolved sources should be correctly identified with this criterion.As can be appreciated in Fig. 19, according to this metric, 50% of all sources in the combined catalogue are resolved.

Catalogue columns
In the final catalogue, the majority of the columns are preserved from the PYBDSF source catalogues.Additional columns are however added in subsequent steps where required.Our aim is to only add information, and not remove any.This means that, for example, sources can be flagged as artefacts, but will still be present in the catalogue.-Cutout_class -Classification assigned at visual inspection as described in Sect.3.1, indicating whether a source is well described by the Gaussian model ('G'), is better described by the island characteristics ('I'), is better described by a single Gaussian component ('P'), or an artefact ('A').

Source characteristics
When considering source counts in the radio regime, extra care must be taken in understanding the population of sources that is being probed.Depending on observing frequency and flux density, different source populations may appear in the sample.
Our reference for source counts are the SKADS simulations, as the simulated sample is built up by different source populations.
Given the theoretical noise limit of 10 µJy beam −1 , we can expect to detect sources down to 50 µJy.As shown in Wilman et al. (2008), the radio population is dominated by AGN above flux densities of 1 mJy, while below that star-forming galaxies start to make up a significant fraction of the source counts.There are several important distinctions between these source types that can influence source counts and a dipole measurement.Among them is source morphology, as multi-component sources can easily be mistaken for multiple separate sources, biasing number counts.Star-forming galaxies are primarily found at lower flux densities and can be morphologically described by a single component, such that we expect them to appear as faint point sources in our fields.Consequently, these sources can be easily counted as they are unambiguously unique sources and are thus statistically independent.At higher flux densities however, some sub-classes of AGN, such as Fanaroff-Riley type I (FRI, coredominated) and type II (FRII, lobe-dominated) sources (Fanaroff & Riley 1974), can boast extended structures that can complicate automated source finding methods.

Extended sources in PYBDSF
PYBDSF operates by fitting Gaussians to sources, which is an effective method for most radio sources, but breaks down in sources with more complex structure.PYBDSF offers multiple ways of improving the fit to extended sources, as specified in Sect.3, but this chiefly improves detection of extended and more diffuse sources.To ensure that complex sources are accurately fit by a combination of Gaussians, we employ a special recipe for these types of sources, which make up 8% of the all sources found in the fields.These sources are flagged in our workflow for visual inspection upon which the Gaussian fit is assessed as described in Sect.3.1.However, automated source finding algorithms will only recognise objects as a single source if they are closely connected, and will therefore fail on a subset of sources.This effect is strongest for FRII sources, as increased luminosity in the lobes makes them appear as separate radio sources.Source association is one of the outstanding problems in radio astronomy, and beyond the scope of this paper.Instead, we try to characterise this effect and how it might influence source counts.As the components of these sources are not statistically independent, they will naturally bias source counts.To get an estimate of how complex sources are fit by PYBDSF, we look at the number of components fit to sources as a function of flux density in Fig. 20.We see a steady increase in the number of Gaussian components at higher flux densities; however, at ≳100 mJy the increase in components flattens out.This may indicate that at these flux densities we are seeing all the emission from these sources, and this is the 'true' distribution of components.However, an alternative explanation is that at these flux densities some extended sources no longer have connecting emission and are not recognised as single sources anymore.In this case the number of components keep increasing, but individual components will split off and be detected as separate sources, effectively keeping the number of components per source the same.Another indication of this happening might be seen in the differential source counts in Fig. 24, where there is an excess in source counts above 100 mJy, relative to the expected values.
To get an alternative measure of this, we look at FRII galaxies in SKADS and the separation of their components.For an upper limit estimate on how many sources we expect to split up, we count components as separated when the distance between them exceeds 6.5 ′′ , which is the minor axis of the average clean beam.With this, 11% of sources in the range 10 mJy-1 Jy have two separated components.Furthermore, 6.3 and 24% of sources in the ranges 10-100 mJy and 100 mJy-10 Jy, respectively, have three separated components.Outside of these ranges the fractions are negligible.Doubling the distance required for separation mostly exchanges the amount of triple component sources for double component sources in the range 100 mJy-1 Jy.The amount of triple component sources does not change in the range 1-10 Jy, indicating that the brightest sources are also the largest and most likely to separate.If we define the excess fraction of sources detected as f n = ñ−n n , where ñ is the amount of sources detected counting separate components and n the actual amount of sources, f n = 0.4 at 10-100 mJy, f n = 0.9 at 100 mJy-1 Jy, and f n = 0.7 at 1-10 Jy.The values of f n are given for each of the flux density bins used to determine number counts in Table 3.The maximum value f n can take is 2, when a bin is entirely occupied by sources with three separate components.From this we can conclude that this effect is more important at higher flux densities, and is most significant at flux densities ≳100 mJy, which contains only a very tiny subset (102 sources, 0.6%) of the full catalogue.

Spectral indices
Due to different emission mechanisms and sources of emission, there can be differences in spectral index distribution between star-forming galaxies and AGN.Additionally, the Doppler shift observed as a consequence of the motion of the observer induces a change in observed flux density that depends on the spectral index of the source.Thus, the spectral indices of sources influences the magnitude of the radio dipole.In general, dipole studies assume a single value for spectral index based on the physics of synchrotron emission, α ≈ −0.75 near 1 GHz (e.g.Rubart & Schwarz 2013;Tiwari & Jain 2013;Siewert et al. 2021).Measuring the spectral index of sources generally requires either large bandwidth or measurements at different frequencies, which in turn requires high S/N to ensure that sources are detected at both ends of the frequency range.With the large bandwidth (802.5 MHz) of MALS we are able to create spectral index images, as described in Sect.2.3, and measure spectral indices of sources in the catalogue.
To compare the source types, we look at the spectral indices of all sources with respect to flux density.While we cannot completely separate these source types based on flux density or spectral index, these source populations are labelled accordingly in the SKADS simulated sample (Wilman et al. 2008).Figure 21 shows the distribution of spectral indices measured compared to the populations in the SKADS simulated sample.Here, the AGN are separated into FRI (orange) and FRII (red) sub-populations, as are star-forming galaxies separated into 'starburst' (light blue) and 'normal' (dark blue) galaxies.It is noteworthy that the spectral index distribution of AGN boasts A113, page 18 of 30 two peaks, corresponding to lobes at α ≈ −0.75, and cores at α ≈ −0.25.These peaks are respectively dominated by FRII and FRI galaxies, but there is cross-contamination present as we have plotted source components rather than combined source characteristics.
The right plot of Fig. 21 highlights the MALS sources with high S/N, which are more likely to be detected across the full band and thus have more reliable spectral index measurements.Though less tight, the peak at α ≈ −0.75 is present in the MALS sample.The distribution is broad enough to have mixed with the peak α ≈ −0.25, so this second peak is possibly lost in the data.To see if we can retrieve these two populations, we take all sources with S /N > 20 and assert that the spectral index of the sources above and below α = −0.5 represent FRI and FRII sources, respectively.With this definition, there are 626 FRI sources, of which 54% are resolved and 16% are fit with multiple Gaussian components.The brightest of these can be appreciated in At lower flux densities there appears to be a discrepancy between measured and theoretical spectral indices.We see in Fig. 21 that not only is there a wider distribution in the MALS data, spectral indices are steeper at lower flux densities compared to the SKADS sources.The median spectral index for sources with S > 1 mJy is α = −0.76,while for sources with S < 1 mJy it is α = −1.17.Because of the high sensitivity that is needed, spectral indices are not commonly measured at lower flux densities.Looking at deep field surveys however, we see that this result is inconsistent with the spectral indices found in the XMM-LSS/VIDEO deep field (Heywood et al. 2020), where it is found that at lower flux densities spectra flatten out.The S /N > 20 sources shown in the right plot of Fig. 21, though increasing in spread at lower flux densities, are not affected by the same bias.
We further investigate the bias seen in low S/N sources, and verify the corrections made in Sect.4.1.We compare our spectral indices to those generated by comparing flux densities in SPW2 (1.0 GHz) and SPW9 (1.38 GHz) of the same MALS data by Deka et al. (2022).Smolčić et al. (2017) find a discrepancy between the spectral indices generated by MTMFS deconvolution and those generated by comparing flux densities at different frequencies, so we make the same comparison in Fig. 22, showing the offset between the MTMFS and SPW derived spectral indices.The median offsets for both corrected (blue) and uncorrected (red) are shown, indicating that spectral indices have been properly corrected for residual primary beam effects.Though the offset trends negatively at lower flux densities, it is well within the uncertainties.Overall, the offset between spectral indices is −0.06 ± 0.92 (0.16 ± 0.94 without corrections) for all sources, and −0.01 ± 0.78 for S /N > 20 sources, agreeing well between the catalogues.There is no systematic effect seen of the corrected spectral indices with respect to the distance to the pointing centre, indicating no residual primary beam contribution.Deka et al. (2022) observe an overall flattening at low S/N compared to our overall steepening, creating a discrepancy that is clearly showing at lower flux densities.Overall spectral indices appear to be reliable down to mJy flux densities, or S/N of 20.Considering the flux densities of these sources, it is unlikely that many star-forming galaxies are included in the high S/N sample, precluding an analysis of these sources.

Number counts
Now it is left for us to assert that we have the necessary number counts for a dipole measurement.Extrapolating from the combined catalogue of the ten pointings, a catalogue of the first 391 pointings is expected to carry ∼650 000 sources, enough to produce a dipole estimate if most sources can be used.However, because our pointings are inhomogeneous, both in terms of internal structure as well as with respect to other pointings, we have to assess to which extent this affects number counts and whether A113, page 19 of 30  corrections can be made to homogenise the catalogues.The most common method of comparing number counts to other surveys or simulations is to compute differential source counts, which describes the number of sources dN within a given flux density bin S + dS per steradian on the sky.This is usually multiplied by S 5/2 , which would yield a flat curve in a static Euclidean Universe (Condon & Ransom 2016).
To verify that we would be able to retrieve the correct number counts, we repeated the experiment carried out in Sect.3.2, injecting and retrieving sources in the residual images.To simulate a realistic physical distribution of sources, we cut out an area equal to the size of the pointing from the SKADS simulated catalogue.We repeated this experiment for every pointing, each time choosing a random position in the SKADS sample as pointing centre.Figure 23 shows the differential number counts for all the stages of the experiment.The reference sample from the SKADS simulations (black) represents the full 10 × 10 degree area simulated in Wilman et al. (2008).Out of the full sample, we cut out ten pointings with the same sky area as the MALS pointings that are injected into the residual images (blue).We performed our source finding routine on these images and saw what number counts we could retrieve.Figure 23 shows that below a few millijansky flux densities, detected source counts begin to fall off (pink), indicating the limit of 100% completeness for the full catalogue.These number counts are normalised by the area coverage of the pointings; however, we can make a simple correction based on the fact that the area coverage is not constant between different flux bins due to varying RMS noise.The actual area covered in a certain flux bin S + dS can be obtained by taking the RMS noise coverage (as shown in Fig. 4) assuming a detection limit of 5σ.This basic correction (green squares) produces correct number counts down to 100-200 µJy, showing that we can account for completeness of the catalogue down to this flux density.Below this, we reach the absolute sensitivity limit of the pointings, as the RMS noise coverage is so low that it produces diverging results.Above 100 mJy, results are more scattered, mainly because of low number counts at these flux densities and the smaller sky coverage of our pointings (35.7 sq.deg.) compared to SKADS (100 sq.deg.).
Having shown that we can reproduce number counts for a large range of flux densities, we measure differential number counts for the combined MALS catalogue.Both corrected and uncorrected differential source counts are tabulated in Table 3 and shown in Fig. 24, where they are compared to the SKADS simulated sample.We apply a completeness correction (purple hexagons) using either unresolved or resolved completeness based on whether the source is classified as such using the criterion from Sect.4.4.Once again we also correct number counts using the RMS noise coverage (green squares).For comparison, number counts from the full SKADS simulated sample are also shown, both for the central frequency of 1.27 GHz and for the full frequency band, accounting for the fact that for most sources flux density is not equal across the band.The number counts derived from the MeerKAT DEEP2 image and NVSS by Matthews et al. (2021) are also shown (white diamonds).Error bars are computed taking into account Poisson uncertainties as well as source clustering following Heywood et al. (2013).The same data from Fig. 24 is tabulated in Table 3, showing the numerical values of flux bins, raw number counts, and uncorrected and corrected differential number counts.Though it is not taken into consideration in Fig. 24, the number of false detection per bin is included in Table 3.In all cases the number of false detections is smaller than the uncertainty on the number counts, from which we infer that the purity is a small factor compared to completeness of the catalogue.
As for the simulated sample, the corrections appear to hold down to 100-200 µJy, after which we reach the sensitivity limit and solutions diverge.Until solutions diverge, the completeness and RMS noise coverage corrections produce similar results, indicating that a simple sky coverage correction performs well given the ease with which it can be generated.Given the agreement between the completeness and RMS noise coverage corrections, we can safely say that the major contributor to completeness is the inhomogeneous sky coverage of the pointings.This results in the dependence of completeness on the local noise seen for point sources in Fig. 8.With these corrections applied, Fig. 24 shows a wide range of flux densities where the differential number counts deviate from the expected values.At high flux densities (>100 mJy), we see an increase in sources that can A113, page 20 of 30  3.
for some part be attributed to the central sources in the images, as well as single sources being classified as multiple sources, as described in Sect.5.1.However, we see that number counts, except for the range 20-200 mJy, are higher across the board than what we might expect given theoretical predictions.There is some evidence that the SKADS simulations underestimated the number of star-forming galaxies, causing lower counts at low flux densities compared to what is seen in nature (e.g.Hale et al. 2023).Though this can explain an offset at the lowest flux densities, this effect would only be significant up to millijansky flux densities, whereas our number counts are higher up to an order of magnitude above that.An alternative explanation that, with its selection of high flux density sources as pointing targets, MALS is probing overdensities, which naturally boosts the number of sources in the pointings.Lastly, such an effect can also be produced by a systematic overestimation of flux densities, which is an option that cannot be ruled out at this stage.We expect that this offset might also be caused by low number statistics, and may disappear once more data are added.

Towards the cosmic radio dipole
With a thorough assessment of the quality of the pointings described in this work we have the opportunity to extrapolate our findings to the larger survey of 391 pointings, both in estimating how many pointings will be needed for a dipole estimate, as well as how to effectively homogenise the catalogues.Given these results, there are however some questions and limitations that remain, and these will have to be addressed in later works.We first estimate the statistical power we can reach with the observed MALS pointings.As we determined in Sect.2.6, the demerit score is a strong indicator for pointings with high noise and thus low number counts.Consequently, the demerit score allows us to make predictions about the quality of other pointings in the survey.Of the ten pointings we have investigated, we consider seven of them to be of good quality based on their noise and source count values.Since all MALS pointings are in the footprint of either NVSS or SUMSS, we use these surveys to match sources and calculate the demerit score to predict quality of the images.Based on this principle Fig. 25 shows the distribution of the demerit scores of the first 391 observed MALS pointings.Defining a quality threshold of d < 15 mJy based on the ten pointings we have investigated here, we see that 322 of 391 pointings are below this threshold.The seven good quality pointings average ∼2000 sources per pointing, meaning that 100 such pointings will result in approximately 2 × 10 5 sources.If we choose our pointings to properly cover the sky along the axis of the dipole, this is the minimum number of sources required for a 3σ measurement of the cosmic radio dipole, assuming an amplitude equal to that of the CMB dipole.
Based on the demerit scores of the ten pointings and those of the first 391 pointings, we also see that our ten pointings well represent the average pointing, and we can use the differential number counts to extrapolate them to the rest of the survey.This allows us to take into account the completeness of the survey so far.As shown in Figs.23 and 24, we get correct source counts down to 100-200 µJy.Naturally, this means that sources below that flux density cannot be included in a dipole estimate, which leaves 13 663 sources in the combined catalogue.However, the corrections are essentially compensating for the missing sources, yielding effectively ∼28 000 sources down to 200 µJy, meaning even fewer than 100 pointings would again suffice for a 3σ dipole measurement.
In both cases we may expect that the first 391 observed pointings will yield around a million sources, which should yield a dipole measurement at a significance level of 6.5σ, assuming an adequate coverage of the dipole axis and a dipole amplitude equal to that of the CMB (Ellis & Baldwin 1984).Though it is clear that MALS will deliver the number counts needed for a significant measurement of the cosmic radio dipole, the larger challenge is making a measurement while accounting for the systematics present in the survey.In order to thus successfully measure the dipole with MALS, the way forwards is to build on the corrections introduced in Sect.5.3.

Compiling a homogeneous catalogue
We have effectively shown that we can make a unified description of the properties of the average MALS pointing, which should now allow us to homogenise the catalogue.Though the structure of the survey, with deep coverage over distinct patches of sky, appears to not lend itself especially well to large-scale cosmology, the fact that these pointings are all equal area by design allows for straightforward discretisation.In the simplest use case, each pointing of MALS can thus be treated as a single unit simply containing N number of sources.When measuring number counts over a full sky, inhomogeneities between the pointings induce higher order multipoles in the data that will spill over into a dipole measurement.To homogenise the data and get an unbiased estimate of the dipole, we must account for the individual differences between the pointings and calculate the corrected effective number counts N eff .
A113, page 22 of 30 To get to effective number counts, the starting point is the corrections to number counts as shown in Fig. 24 that are seen to largely compensate for incomplete catalogues.We can extend this treatment by assigning a 'completeness factor' to each individual source.As discussed in Sect.5.3, the largest contributor is the incomplete sky coverage, which is well modelled with the RMS noise coverage of the pointings.Immediately we can disentangle the completeness into a detection probability, P(det), and sky coverage, Ω, Here the detection probability has a power law dependence on source size Q A (linear in log-log space, right plot of Fig. 16), and assuming Gaussian errors on the flux density, detection probability should follow a Gaussian cumulative distribution function or similar sigmoid function7 as a function of flux density.Sky coverage is exclusively determined by the local RMS σ of the source, encoding the noise structure of the pointing.This completeness factor can largely correct for the imhogeneities present in the catalogues, but we can make it even more robust by including information on other investigated quantities.Using information on purity, we can define a 'purity factor', Purity(S , ρ, pointing) = Purity(S , ρ)Purity pointing , which indicates how likely a source is to be a true positive.This depends on distance from the pointing centre ρ, flux density S (Fig. 11) and has a multiplicative factor that indicates a purity level that is different per pointing (Fig. 18).Finally, individual sources have associated uncertainties that can be used to weigh each source accordingly.An obvious choice is a weight based flux density, as the uncertainty in flux density σ S is dependent on flux density S (Fig. 17), which can be combined with a potential flux density scale error ∆S .We are not limited to one uncertainty factor, and a second choice that is relevant for a dipole measurement is the uncertainty in position σ ϕ , which in absence of any systematics (as we see in Fig. 12) is simply equal to the measured uncertainty ∆ϕ.Combining all these measures, we can assign a weight to sources based on the quantities laid out, = w ϕ w S Purity(S , ρ)Purity pointing The effective weight factor, w eff , fulfils a dual purpose in estimating the number counts.The completeness and purity factors correct the number counts, while the weights from the flux density and position errors then serve as a quality measure for each source, allowing us to measure the effective number density of the individual pointings,

Limitations and future prospects
With this prescription, the systematic effects that we have characterised can be accounted for when computing the number counts and estimating the dipole.There remain however some effects that have not been explicitly characterised that could influence a dipole measurement.By checking the corrections to number counts as we did in Sect.5.3 on simulated data, we essentially calibrated the corrections on the SKADS sample, which as a simulation might not perfectly represent the number counts found in nature (this can be plainly seen in Fig. 24, where SKADS number counts do not always agree with the counts from Matthews et al. 2021).While this can introduce an unknown error into the process, the error is expected to be in overall number counts, and therefore not directionally dependent.Similarly, an important aspect of MALS is the selection of the pointings, as every pointing has a bright radio source at the centre.Although pointings are distributed isotropically, this is not equivalent to a random selection as bright central sources are more likely to be embedded in overdensities.This effect seems to be very pronounced in our measured source counts already, which are larger than expected.Again, this effect is expected to be directionally independent, but whether this is truly the case remains to be determined.Finally, the depth of MALS might be to its detriment when measuring a dipole, as the reached depth of 200 µJy probes into the population of starburst and normal galaxies.It is worth noting that an increase number counts as seen in Fig. 24 can also be (partially) caused by a systematic flux density offset.The results seen in Fig. 13 hint to a systematic flux density offset with respect to NVSS, which is worth investigating with the larger MALS catalogue.If this offset turns out to be indeed significant, there are many possibly explanations given that such a systematic effect could potentially be introduced at many points in the data processing pipeline.We have already shown in Sect.4.3.2 that the island flux density from PYBDSF properly recovers the flux density from simulated sources, which verifies that the source finding step is not inducing a systematic flux density offset.To further narrow down the options, we crosscheck our results with those from Deka et al. (2022), which use the same data, calibration pipeline and source finding strategy but show an overall agreement with NVSS.The most notable difference between these catalogues is the fact that we utilise the full band while Deka et al. (2022) only use individual SPWs with 50 MHz bandwidth.Such a difference between results from the full bandwidth and individual SPWs could point to systematic effects introduced in the imaging stage, as we model the emission with two Taylor terms in frequency as opposed to a single Taylor term in case of individual SPW images.Though the ten pointings explored here have already provided a wealth of statistics and insight into systematic effects, these will be further investigated with the full suit of MALS pointings.
With regards to a dipole measurement, there are still some questions that remain to be answered.Throughout this work we have used the estimate by Ellis & Baldwin (1984) of 2 × 10 5 sources properly distributed along the axis of the dipole.Though the MALS pointings properly cover the axis of the CMB dipole, if the direction of the radio dipole deviates from this, for example towards the northern hemisphere, the coverage of MALS pointings might not be adequate.Furthermore, the exact amount of sources needed for a dipole estimate can vary depending on this coverage and has been differently estimated in different works.Crawford (2009) claims 2 × 10 6 sources are necessary for a 3σ dipole estimate, which is an order of magnitude more than the number from Ellis & Baldwin (1984).Dipole studies using NVSS have generally reached 3σ significance with 3 × 10 5 sources (e.g.Singal 2011;Rubart & Schwarz 2013;Secrest et al. 2022).This is in closer agreement to the Ellis & Baldwin (1984) numbers; however, this significance is only reached because of the anomalously high dipole amplitude.Therefore, the significance with which the dipole can be measured ultimately depends on many factors, including sky coverage, number counts, dipole amplitude, frequency, and the employed estimator.Consequently, another important step towards measuring the cosmic radio dipole with MALS is defining an appropriate dipole estimator (see e.g.Siewert et al. 2021), which beyond the scope of this paper but will be explored in a future work.

Summary and conclusion
In this work we have presented a thorough analysis of the first ten deep continuum pointings of MALS (Gupta et al. 2016) and have compiled a catalogue with 16 307 sources covering 35.7 square degrees of deep radio sky.We set out to extensively analyse the properties of the first ten pointings of MALS, with the ultimate goal of measuring the cosmic radio dipole.To achieve a measurement of number counts unbiased by the inhomogeneities present between the MALS pointings, we characterised systematic effects that can influence such a measurement.This assessment of systematic effects in the ten pointings as presented in this work shows that these effects are for the most part predictable and can be properly accounted for.This will eventually not only benefit a dipole measurement, but all continuum science carried out with MALS.In the current literature on the cosmic dipole, there are many examples of systematic effects that limit the sensitivity of these estimates and could not be pushed further due to a lack of information on the inner workings of the surveys that were used.For MALS, we have a complete assessment of the inner workings of the survey, with insight and access into the processing pipeline.Looking forwards, we determine that 100 MALS pointings suffice for a dipole measurement.Paired with the analysis on source characteristics and counts in the pointings, we are poised to perform the most complete dipole estimate of a radio survey thus far.
Calibration and imaging of all the data was carried out through ARTIP.After imaging, we separately created spectral index images and performed primary beam correction averaged over the frequency range on these and the continuum images with a primary beam model derived from holographic measurements.We made an initial assessment of the calibration by checking the flux density scale of the calibrators and central sources, and find that flux densities are consistent with those reported in the literature.We investigated the quality of the images by looking at the RMS noise maps created by PYBDSF.Measuring RMS noise coverage shows that all pointings have similar noise structure, but overall noise levels are offset between the pointings.We quantified this offset with σ 20 , which gives the noise level at 20% RMS noise coverage for a pointing.To try to explain the difference in pointing quality, we calculated demerit scores for each pointing to estimate the contributions of bright sources to the noise.Though there is a correlation between the noise in a pointing and the demerit score, other factors play a part in introducing a scatter in this relation, especially for pointings with lower demerit scores.As σ 20 directly describes the overall noise in the image, it is the quality measure of choice for the pointings.
We performed source extraction on all the images using PYBDSF and converted PYBDSF catalogues to full Stokes I catalogues, extending them to include spectral indices and flagging artefacts.We considered sources fit with multiple Gaussian components to be potentially complex and visually inspected them to investigate how well the Gaussian fit describes these sources.We further assessed the quality of the individual pointings and how it affects source finding by measuring completeness, flux recovery, and purity.For completeness and flux recovery, we need to know the intrinsic properties of the sources in the images, so we created mock catalogues of sources from the SKADS simulated sample and injected these into the residual images of the pointings.We assessed completeness for unresolved and resolved sources separately.Using unresolved sources, we assessed completeness as a function of distance from the pointing centre, and for resolved sources we investigated completeness as a function of source size.
Combining the catalogues of the individual pointings, we corrected for residual primary beam effects in the flux densities and spectral indices sources originating from the frequency dependence of the primary beam.To check these corrections and other potential systematic effects, we cross-matched the catalogues with NVSS to check if positions and flux densities were consistent.There is no appreciable astrometric offset; there is an 18% offset in flux density compared to NVSS, but this is still A113, page 24 of 30 within uncertainties.Using σ 20 as a normalisation factor, we combined completeness measures from the individual pointings and find unified completeness relations that hold for all pointings.Combining flux recovery statistics from all pointings, we find that a systematic bias is present in the integrated flux densities from the fitted Gaussians in the catalogue.This bias is not present in the integrated flux densities of the islands that the sources occupy, making this quantity the logical choice of flux density for the analysis presented here.Combining purity from all pointings, we find that we can account for 20% of false detections with a suitable artefact identification scheme.The remaining false detections make up an increasing fraction of sources farther out from the pointing centre.
As the full catalogues are expected to be populated by various sources types, we assessed how this can influence source counts.Looking at the number of Gaussian components fit to sources, we see that the number of components needed to describe sources increases as a function of flux density until it stagnates at around 100 mJy.This implies that, around this flux, density sources separate and can be counted multiple times; however, as only a tiny percentage of sources are present at these flux densities, it is unlikely to bias the source counts.To further differentiate between source populations, we looked at the spectral indices of sources and find that we can reasonably separate core-dominated FRI from lobe-dominated FRII sources by making a cut at α = −0.5.At low flux densities we find that spectral indices are much steeper than expected, which is likely caused by low S/N and inadequate modelling by the MTMFS deconvolution scheme.
Finally, to show that we can account for the systematic effects present in the catalogues, we calculated and corrected differential number counts for a set of simulated catalogues using the SKADS simulated sample.We find that a simple correction using only the RMS noise coverage produces correct number counts down to 100-200 µJy.We then computed differential number counts for the full catalogue and corrected number counts using the combined completeness measures found for unresolved and resolved sources, as well as the RMS noise coverage.Once again, corrections seem to hold down to 100-200 µJy.Comparing these number counts to the expected number counts from the SKADS sample, we see that our number counts are higher in the full range of probed flux densities.This is independent of corrections, so a likely explanation is that MALS is probing overdense regions, as it targets bright sources.The same effect can, however, be (partially) produced by a systematic flux density offset, which should be taken into consideration given the results on the flux density scale.
Using both the demerit score to predict the quality of other MALS pointings and the corrected number counts of this sample, we show that we will require 100 MALS pointings to reach the necessary number counts for a 3σ measurement of the dipole.Going further, we assert that we can assess the dipole on the level of individual sources using the information on flux density scale, completeness, purity, flux errors, and position errors.The precise implementation is left to later works, along with an exploration of viable dipole estimators.

Fig. 1 .
Fig. 1.Sky distribution of the first 391 observed pointings of MALS.The Galactic plane is largely avoided, and since 89% of the pointings are selected directly from NVSS, the vast majority of pointings are above a declination of −40 degrees.The pointings used in this analysis are highlighted in red.

Fig. 2 .
Fig. 2. Flux offsets of the central sources and gain calibrators per field.Gain calibrators are compared to their reference flux density (circles) as specified in Table 1 and with their NVSS counterparts (diamonds).Central sources are compared to their NVSS counterparts (squares), and NVSS flux densities are all converted to rest frequency assuming α = −0.75.Sources are ordered by observing run, separated by vertical black lines.

4
Fig. 5. Examples of possible source classes.The black outline shows the island threshold, the magenta ellipses show the individual Gaussian components fit to to the source, and the blue ellipses show the combined Gaussian describing the source.From left to right: (i) an elongated source fit by two Gaussian components.The combined Gaussian describes the source adequately, and it has been assigned the 'G' class.(ii) A likely FRI source with complex structure, better described by the island than the Gaussian components.It has been assigned the 'I' class.(iii) A point source with an additional noise peak that has been fitted with a Gaussian component.It has been assigned the 'P' class.(iv) An artefact caused by a nearby bright source.It has been assigned the 'A' class.

Fig. 6 .
Fig.6.Assigned flags to sources with multiple Gaussians, separated by the number of Gaussian components fit to them by PYBDSF.Keeping in mind that most of these sources are fit with two Gaussian components, the 'I' class is preferred for sources with three or more components, while the 'P' class consists almost exclusively of sources with two components.

Fig. 7 .
Fig. 7. Completeness for unresolved sources (left) and resolved sources (right) for the different fields and their associated uncertainties as a function of flux density.There are large differences between the pointings: pointings with higher noise levels have lower overall completeness.The completeness is lower for resolved sources as well.
Fig. 8. Completeness for unresolved sources as a function of flux density and distance from the pointing centre, ρ, for the fields J0249-0759 (left) and J1232-0224 (right).The radially averaged 5σ curves (black lines) for the corresponding pointings are seen to follow the zone where completeness transitions from zero to one.Due to the presence of a strong central source in J1232-0224, completeness is lower in the central region of this pointing.Pixels with no sources in them have been coloured grey.

Fig. 9 .
Fig. 9. Completeness for resolved sources as a function of flux density and ratio between the area of the source and the beam Q A , for the fields J0249-0759 (left) and J1232-0224 (right).Completeness can be seen to linearly decrease in the log-log scale as a function of Q A , showing that larger sources are harder to detect.Flux densities are compensated for the local noise in order to equalise completeness for different positions in the image.Pixels with no sources in them have been coloured grey.
Fig. 11.Purity of catalogues for different pointings.The fraction of false detections is indicated by the red line.The dashed red line indicates the fraction of false detections without flagging artefacts.The open histograms show the number of sources detected in the pointings, with the filled histograms indicating the number of false detections.

Fig. 12 .
Fig. 12. Astrometric offsets to NVSS for all ten pointings combined.The median offsets are given by the grey dashed lines, with the grey area indicating the uncertainty.The majority of sources lie within a FWHM of the average minor axis of the clean beam.The data are binned where five or more sources occupy the defined bin area; otherwise, individual sources are shown.

Fig. 13 .
Fig. 13.Ratio of flux densities of the sources in MALS compared to their NVSS counterparts as a function of distance from the pointing centre (ρ).The running median flux density ratio of the analytical primary beam correction both with (blue line) and without (red line) the corrections made in Sect.4.1 are shown, as well as the running median flux density ratio of the holographic wideband primary beam correction (green line).

Fig. 14 .
Fig. 14.Ratio of flux densities of the sources in MALS compared to their RACS counterparts as a function of distance from the pointing centre (ρ).The running median flux density ratio of the analytical primary beam correction both with (blue line) and without (red line) the corrections made in Sect.4.1 are shown, as well as the running median flux density ratio of the holographic wideband primary beam correction (green line).

Fig. 15 .
Fig. 15.Completeness as a function of flux density for unresolved (left) and resolved (right) sources for the different fields, re-factored with σ 20 and combined (black curves).Re-factoring the completeness curves to σ 20 shows clearly that they are simply shifted with respect to each other, and we can define a unified completeness measure for the survey as a function of σ 20 for both resolved and unresolved sources.

Fig. 16 .
Fig. 16.Combined source completeness as a function of distance from the pointing centre (unresolved sources, left) and major axis of the source (resolved sources, right).The left plot reflects the overall structure of the pointings, and shows that completeness is quite straightforwardly a radially averaged version of the noise structure as shown in Fig. 3.Note that the flux density is normalised by σ 20 .The right plot indicates a clear power law relation between the size of sources and the completeness, where larger sources are on average less complete.

Fig. 17 .
Fig. 17.Input flux density plotted against the measured flux density for both Gaussian flux densities (left) and Island flux densities (right).The threshold of 100 sources per bin (black contour) shows quite clearly the bias present in Gaussian flux densities compared to island flux densities.

Fig. 18 .
Fig. 18.Purity of catalogues as a function flux density (left) and separation from the pointing centre (right).The fraction of false detections is indicated by the red line, both the total fraction (dotted line) and with removal of sidelobes (solid line).The open histograms show the number of sources detected in the pointings, with the filled histograms indicating the number of false detections.Though there seems to be no strong relation between flux density and purity, the number of false detections is strongly dependent on distance from the pointing centre, increasing both towards the centre and towards the edges of the pointing.Our criterion for identifying artefacts flags most of the false detections around the central source.

Fig. 19 .
Fig. 19.Ratio of total to peak flux as a function of signal to noise of both unresolved (blue) and resolved (red) sources in the combined catalogue.
When performing additional corrections on source flux densities and spectral indices, the correction factors are inserted into the catalogues so that the original values can be easily reproduced.The catalogue has 49 columns in total.Several lines of the catalogue are shown in Table B.1 as an example.-Pointing_id -The ID of the pointing where the source has been found formatted as PT-JHHMM±HHMM.-Source_name -Name of the source, following IAU convention, formatted as JHHMMSS.S±HHMMSS.S with prefix MALS.-Source_id -Source ID as assigned by PYBDSF.-Isl_id -Island ID as assigned by PYBDSF.-RA and DEC (and errors) -The J2000 position of the source, defined as the centre of the composite Gaussian of the source, and associated errors.-Sep_PC -Distance of the source from the pointing centre.-Total_flux (and error)-Total flux density of the source and associated error.-Flux_correction -The correction factor for residual primary beam effects on the flux density of the source.-Peak_flux (and error) -Measured peak flux of the source and associated error.-Spectral_index (and error) -Spectral index of the source, measured from the spectral index image, and associated error.-Spectral_index_correction -The correction factor for residual primary beam effects on the spectral index of the source.-RA_max and DEC_max (and errors) -Position of maximum intensity of the source and associated errors.-Maj, Min, and PA (and errors) -FWHM of the major axis, minor axis and position angle of the source fit by PYBDSF and associated errors.-DC_Maj, DC_Min, DC_PA (and errors) -FWHM of deconvolved major axis, minor axis, and position angle, and associated errors.-Isl_Total_flux (and error) -Total integrated flux of the island in which the source is located, and associated error.-Isl_rms -Average background RMS noise of the island in which the source is located.-Isl_mean -Average background mean value of the island in which the source is located.-Resid_Isl_rms -Average residual background RMS noise of the island in which the source is located.-Resid_Isl_mean -Average residual background mean value of the island in which the source is located.-S_Code -Value generated by PYBDSF indicating whether a source is: fit by a single Gaussian ('S'), fit by multiple Gaussians ('M'), or one of multiple sources on the same island ('C').-N_Gaus -Number of Gaussian components fit to the source.-Resolved -Boolean indicating whether the source is resolved according to the metric defined in Sect.4.4.-Flag_Artifact -Boolean indicating whether the source is a likely artefact according to the criterion described in Sect.3.1.-RA_mean and DEC_mean -Mean intensity weighted position of all pixels of the island in which the source is located, measured if a source is fit with multiple Gaussians.-Cutout_Spectral_index -Intensity weighted average spectral index of all pixels of the island in which the source is located, measured if a source is fit with multiple Gaussian components.-Cutout_Total_flux -Total flux density of all pixels above the island threshold, measured if source fit with multiple Gaussian components.-Cutout_flag -Flag assigned to cutout in certain conditions: the mean position falls outside the island ('M'), the position of the brightest pixel does not correspond to the maximum position measured by PYBDSF ('C'), the difference between Cutout_total_flux and Isl_Total_flux is more than 20% ('F').

Fig. 20 .
Fig. 20.Fractions of sources fit with varying numbers of Gaussians as a function of flux density.The number of components fit to sources increases steadily towards high flux density, but flattens out around 100 mJy, which can be caused by sources splitting up at these flux densities.

Fig. 21 .
Fig. 21.Distribution of spectral indices of MALS sources.Left: MALS spectral indices (black) compared to AGN (red) and star-forming galaxies (blue) from SKADS as a function of flux density.Right: MALS spectral indices as a function of flux density, sources with S/N above 20 are coloured by S/N.The median value and error of spectral indices of different flux bins are indicated by red error bars, indicating that at lower flux densities spectral indices tend towards lower values.
Fig. A.1, showing mostly point sources or sources where core emission dominates.FRII sources are more numerous, with 2581 present in the catalogue, of which 75% resolved and 30% fit with multiple Gaussian components.A set of the brightest FRII are also shown in Fig. A.2, showing more sources with two or more components representing radio lobes.These results show that the expected dichotomy in morphology between these sources is indeed present, with FRII sources being more likely identified as extended and/or resolved.

Fig. 22 .
Fig. 22. Offsets of spectral indices measured from the wideband MTMFS images with respect to spectral indices derived from processing the full bandwidth using 15 individual SPW images from Deka et al. (2022) as a function of flux density.The spectral indices are calculated by using the SPW2 and SPW9 images.Binned median offsets are shown (blue), along with the offsets without the correction applied in Sect.4.1 (red), showing that the spectral indices are properly corrected.

Fig. 23 .
Fig. 23.Differential source counts from the SKADS simulations.The complete SKADS sample is shown (black), as well as the sample extracted from SKADS and injected into the images (blue).The uncorrected number counts (beige) indicate the sources detected by the source finding routine, which are then corrected with the RMS noise coverage (green).

Fig. 24 .
Fig.24.Differential source counts from MALS, uncorrected (beige circles) and corrected with completeness (purple hexagons) taking into account whether a source is resolved or unresolved.Lastly, source counts are also corrected with RMS noise coverage (green squares), which for the lowest flux bins goes to zero, causing solutions to diverge.This is all compared to the SKADS source counts, both for the central frequency of 1270 MHz (black), and for the full frequency range (grey area), and to the source counts derived from the MeerKAT DEEP2 image and NVSS fromMatthews  et al. (2021, white diamonds).The number counts are tabulated in Table3.

Fig. 25 .
Fig. 25.Demerit scores calculated for the 391 currently observed MALS pointings, using bright (>100 mJy) sources retrieved from their corresponding surveys (NVSS and SUMSS).The dotted line indicates a quality threshold of d < 15 mJy based on the quality of the pointings inspected in this paper.

Table 2 .
Details on all ten pointings after complete processing and source finding.
The final data products consist of the restored, model, residual, sum-of-weights, and point spread function (PSF) images for both Taylor terms.Furthermore, spectral index, spectral index error, and mask images are also produced.The continuum images have a pixel size of 2 ′′ and a size of 6000 × 6000 pixels.This results in a square image of 3.3 degrees on a side.Though individual pointings have different beams, as detailed in Table Table 1 and with their NVSS counterparts (diamonds).Central sources are compared to their NVSS counterparts (squares), and NVSS flux densities are all converted to rest frequency assuming α = −0.75.Sources are ordered by observing run, separated by vertical black lines.

Table 3 .
Differential counts of MALS, including corrected counts using RMS coverage, completeness of unresolved sources, and completeness of resolved sources.Notes.Counts are normalised for the sky area of 35.7 degrees.Raw counts (N) and number of false detections in (N false ) per bin are also given.The excess fraction of sources due to separated components f n is also given, based on a separation distance of 6.5 ′′ of FRII sources in SKADS.