Open Access
Issue
A&A
Volume 698, June 2025
Article Number A148
Number of page(s) 19
Section Cosmology (including clusters of galaxies)
DOI https://doi.org/10.1051/0004-6361/202452734
Published online 11 June 2025

© The Authors 2025

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1. Introduction

The observation of cosmic large-scale structures, for example the angular and spatial distribution of galaxies, allows us to relate their statistical properties to models of their formation. The counts-in-cells statistics – the most primitive statistics one can think of – holds valuable information about galaxy clustering (Neyman et al. 1953; Peebles 1980; Saslaw 2000; Bernardeau et al. 2002). Despite its significance, it received less attention compared to the more widely studied two-point correlation function (Peebles 1980; Wang et al. 2013), which for Gaussian random fields contains all non-trivial information (Peebles 1980; Saslaw 2000). Different distribution functions were proposed to model the counts in cells distribution of galaxies (Neyman et al. 1953; Sheth & Saslaw 1994; Yang & Saslaw 2011; Hurtado-Gil et al. 2017) and the distribution of matter (Klypin et al. 2018; Wen et al. 2020). For this work, we measured the counts-in-cells statistics of radio continuum sources to investigate their statistical distribution and angular clustering, thereby providing insights into the large-scale properties of the cosmic web.

Extra-galactic radio sources, which are unaffected by dust extinction and are detectable across large cosmological distances, can provide an unbiased sample for probing larger volumes compared to those probed by optical surveys (de Zotti et al. 2010). Above a frequency-dependent flux-density threshold of about 2 mJy at 150 MHz, the population of radio sources is dominated by active galactic nuclei (AGNs, Huynh et al. 2008; Rawlings et al. 2015: Smolčić et al. 2017; Algera et al. 2020; Best et al. 2023). With respect to the radio continuum, AGNs fall into two main groups: those with strong radio emissions (radio-loud) and those with faint or absent radio emissions (radio-quiet). According to Best et al. (2023), above about 2 mJy at 150 MHz, a significant fraction of sources are radio-loud AGNs. At sub-mJy flux densities, radio-quiet AGNs are an important class, but accounting for only less than 10% of all sources. Below 1 mJy, star-forming galaxies (SFGs) become the dominant population, comprising 90% of sources at flux densities of about 0.1 mJy, but make up less than 20% of all sources above a flux density of 2 mJy (Best et al. 2023). Through surveys covering wide areas of sky, radio galaxies and quasars can be detected over significant cosmological distances (up to z ∼ 7; see e.g. De Breuck et al. 2010; Singh et al. 2014; Saxena et al. 2019; Sotnikova et al. 2021; Duncan et al. 2021; Best et al. 2023). This offers a reliable way to study the distribution and clustering of galaxies. Galaxy clustering quantifies the excess probability of finding galaxies at a given spatial separation (spatial clustering) or angular separation (angular clustering) compared to a random distribution (Totsuji & Kihara 1969; Peebles & Wilkinson 1968; Peacock & Nicholson 1991; Cress et al. 1996; Blake & Wall 2002a; Wang et al. 2013). Because most radio galaxies have very faint optical counterparts, measuring their redshift is difficult. Therefore, many studies have been conducted on the angular clustering of radio continuum sources across the radio spectrum (see Seldner & Peebles 1981 for the fourth Cambridge survey (4C); Magliocchetti et al. 1999 for the Faint Images of the Radio Sky at Twenty-Centimetres (FIRST) survey; Rengelink 1999 for the Westerbork Northern Sky Survey (WENSS) and the Green Bank 6-cm (GB6) survey; Blake & Wall 2002b for the NRAO VLA Sky Survey (NVSS); Dolfi et al. 2019 for the TIFR GMRT Sky Survey (TGSS); Siewert et al. 2020; Hale et al. 2024 for the LOFAR Two-metre Sky Survey (LoTSS) first and second data release (DR1, DR2), respectively).

Counts-in-cells statistics have received less attention but have also been studied with different radio surveys (Jauncey 1975; Magliocchetti et al. 1999 for the FIRST survey; Siewert et al. 2020 for LoTSS-DR1). Siewert et al. (2020) have shown that the distribution of radio sources is non-Poissonian and that a compound Poisson distribution provides a better description. In this work, we probe two theoretical model distributions (compound Poisson and negative binomial distributions) to determine the best fit for the distribution of these sources. Using moments of the counts in cells – such as the mean and variance – we study the cosmic large-scale structure as traced by continuum radio sources from the second data release (DR2) of LoTSS (Shimwell et al. 2022) by increasing the area of the survey by approximately a factor of 10 compared to LoTSS-DR1 (Shimwell et al. 2019).

The Low-Frequency Array (LOFAR; van Haarlem et al. 2013) is a large radio interferometer designed to observe the entire northern sky at frequencies between 10 and 240 MHz. LOFAR consists of core, remote, and international stations, with the core and remote stations located in the Netherlands and the international stations spread across Europe. LOFAR observes the radio sky using two types of antennas: LBA (low-band antennas: 10−90 MHz) and HBA (high-band antennas: 110−240 MHz). The LOFAR Two-metre Sky Survey (LoTSS; Shimwell et al. 2017, 2019, 2022), which covers frequencies between 120 and 168 MHz, uses the HBAs. The central observing frequency of LoTSS is 144 MHz, which corresponds to a wavelength of about 2 m. Thanks to its sensitivity, resolution, and large sky coverage (detailed further in Sect. 3.1), LoTSS provides an excellent opportunity to investigate the statistical properties of the radio sky.

In Sect. 2, we provide an overview of the theoretical frameworks that encompass the various distributions employed in this analysis. We evaluate the data quality of LoTSS-DR2 in Sect. 3.1 and the mock random catalogue in Sect. 3.2. Sect. 4 discusses the masking strategies used, while Sect. 5 presents the differential source counts and completeness. In Sect. 6, we outline the results of the one-point correlation functions of LoTSS-DR2 and explore the scaling properties of the cell sizes to find the two-point correlation function. Finally, in Sect. 8, we offer our conclusions and engage in further discussions. Five appendices cover several aspects of masking (Appendices A and B), comparison to LoTSS-DR1 in Appendix C and present further technical details (Appendices D and E).

2. Counts in cells

The counts-in-cells statistics is a simple method to study the spatial distribution of sources within a survey region (see, e.g. Neyman et al. 1953; Peebles 1980; Magliocchetti et al. 1999; Saslaw 2000; Bernardeau et al. 2002). This approach involves dividing the survey area into smaller, equally sized cells and counting the number of sources within each cell (counts in cells). These cells are equivalent to bins in a histogram. We use counts-in-cells statistics to explore the underlying distribution of the sources in LoTSS-DR2. The distribution can provide insight into the fundamental cosmological physics and thermodynamic processes of the Universe (Yang & Saslaw 2011; Hurtado-Gil et al. 2017).

The cosmological principle states that the statistical distribution of matter is isotropic (the same in all directions) and homogeneous (uniform) when observed on scales larger than hundreds of megaparsecs. If radio sources would be statistically independent tracers of the cosmic matter distribution, we would expect them to be distributed isotropically. Disregarding any imperfections of the observations and systematic biases in the survey pipelines, and assuming that all sources are point-like (i.e., their angular size is smaller than the angular resolution of the telescope), as well as statistically independent, the counts-in-cells statistics should follow a Poisson distribution (Hayat & Higgins 2014).

However, that ignores the fact that many astrophysical objects show up as multi-component sources (e.g. the core and lobes of AGNs) or resolved large spirals with many individual components. Examples are provided in Appendix B, including the resolved LoTSS image of M101 (Pinwheel) galaxy, which the source-finding algorithm fails to identify as a single object. While large prominent objects similar to M101 can easily be manually corrected to contribute as a single source to the source counts, this correction becomes far more challenging for fainter radio sources. Additionally, there are non-physical associations, such as image artefacts, or associations at larger scales, such as several members of a galaxy cluster or groups of galaxies, as well as cosmological structure formation causing correlations on all scales. All of those lead to deviations from a Poisson distribution. Previously, Siewert et al. (2020) showed that deviations from a Poisson distribution are found with high statistical significance. This work aims to extend that analysis by increasing the survey area by approximately a factor of 10.

2.1. Statistical moments

We briefly recall the sample statistics that have been used in Siewert et al. (2020) to describe the distribution of the source counts ki per cell i (with Ncell disjoint cells of identical solid angle). The statistical moments, which are numerical measures describing the shape and characteristics of a probability distribution, are estimated as

μ j 1 N cell i = 1 N cell k i j , $$ \begin{aligned} \mu _j \equiv \frac{1}{N_{\rm cell}}\sum _{i=1}^{N_{\rm cell}}k_i^j, \end{aligned} $$(1)

with the sample mean μ ≡ μ1. Then the central moments, which describe the shape of the distribution by taking deviations from the mean, are

m j 1 N cell i = 1 N cell ( k i μ ) j , $$ \begin{aligned} m_j \equiv \frac{1}{N_{\rm cell}}\sum _{i=1}^{N_{\rm cell}}(k_i-\mu )^j, \end{aligned} $$(2)

with the variance σ2 ≡ m2, and the coefficients of skewness and excess kurtosis (Zwillinger & Kokoska 2000)

g 1 m 3 m 2 3 / 2 , g 2 3 m 4 m 2 2 3 , $$ \begin{aligned} g_1 \equiv \frac{m_3}{m_2^{3/2}}, \quad g_2 - 3 \equiv \frac{m_4}{m_2^2}-3, \end{aligned} $$(3)

respectively. Skewness measures the degree of asymmetry in a distribution, while kurtosis measures the tailedness of a distribution and is therefore very sensitive to outliers. We also define the clustering parameter (Peebles 1980),

n c σ 2 μ , $$ \begin{aligned} n_c \equiv \frac{\sigma ^2}{\mu }, \end{aligned} $$(4)

which is unity for a Poisson distributed quantity and where values of nc exceeding unity indicate clustering. For a Poisson distribution we expect

g 1 P = μ 1 / 2 , g 2 P 3 = μ 1 . $$ \begin{aligned} g^\mathrm{P}_1 =\mu ^{-1/2}, \quad g^\mathrm{P}_2 - 3 = \mu ^{-1}. \end{aligned} $$(5)

2.2. Modelling the counts-in-cells distribution

Deviations from a Poisson distribution are expected due to various factors. Deviations which are uncorrelated at large angular scales can stem from either local physics (for example in the case of AGNs or nearby spirals), but also imaging artefacts – such as sidelobes in interferometric data or noise spikes. These effects can be described by what is called a Cox process (Cox 1955), which models a random process within an underlying Poisson random process.

Let Ni be a random variable that denotes the counts of radio sources in a single cell i (measured by means of a radio source catalogue) and Oi being the Poisson distributed number of physical objects (i.e. an AGN or a SFG) in that cell, then

N i j = 1 O i C ji , O i Poisson ( λ ) , $$ \begin{aligned} N_i \sim \sum _{j=1}^{O_i} C_{ji}, \quad O_i \sim \mathrm{Poisson}(\lambda ), \end{aligned} $$(6)

where Cji denotes another random variable that counts the number of components that are associated with a physical object j in cell i. Finally, N = ∑Ni denotes the total number of radio sources in the source catalogue.

The resulting distribution for Ni (and N) depends on the detailed assumptions we make about the discrete distribution of Cji. The associations captured by the component counts Cji can have many different reasons; they can include imaging artefacts, typically associated with bright sources; they could be the lobes correlated with the core of an AGN, or it could happen that a large spiral is broken up into several components by the source-finding algorithm. From the LoTSS-DR1 value-added catalogue (Williams et al. 2019), visual classification in the citizen science project LOFAR Galaxy Zoo has shown that approximately 13 200 sources, or about 4% of the total, are associated together. This indicates that the distribution of component counts depends on a combination of astrophysical correlations, survey properties, and the specifics of the source-finding algorithm. Given the high level of complexity of these factors, we do not attempt to derive the distribution of Cji from the first principles; instead, we base reasonable probability distributions for them on an educated guess.

If Cji = 1 for all objects, Ni and N are Poisson-distributed numbers. Siewert et al. (2020) argued that the counts of components is a Poisson process itself with Cji∼ Poisson(κ), where κ denotes the intensity of the process (i.e. the mean number of component counts per physical object). This ansatz does allow for zero components, which would mean that we make the assumption that several physical objects remain undetected in the survey, either due to being below the detection threshold or due to observational limitations. The resulting distribution of Ni can be easily obtained starting from the generating function of a Poisson (P) distribution (Johnson et al. 2005),

G P ( z ) = exp [ λ ( z 1 ) ] , $$ \begin{aligned} G_{\rm P}(z) = \exp [\lambda (z - 1)], \end{aligned} $$(7)

where λ > 0 denotes the intensity of the Poisson process and z denotes the random variable. Then the generating function of the compound Poisson (CP) distribution becomes

G CP ( z ) = exp [ λ ( exp [ κ ( z 1 ) ] 1 ) ] , $$ \begin{aligned} G_{\rm CP}(z) = \exp [\lambda (\exp [\kappa (z -1)] - 1)], \end{aligned} $$(8)

and we can calculate its mean and variance (Johnson et al. 2005),

μ CP = G CP ( 1 ) = λ κ , $$ \begin{aligned} \mu _{\rm CP}&= G_{\rm CP}\prime (1) = \lambda \kappa , \end{aligned} $$(9)

σ CP 2 = G CP ( 1 ) + G CP ( 1 ) [ G CP ( 1 ) ] 2 = λ κ ( 1 + κ ) $$ \begin{aligned} \sigma ^2_{\rm CP}&= G_{\rm CP}{\prime \prime }(1) + G_{\rm CP}\prime (1) - [G_{\rm CP}\prime (1)]^2 = \lambda \kappa (1+ \kappa ) \end{aligned} $$(10)

Consequently, the clustering parameter becomes ncCP = 1 + κ > 1. Similarly, we can estimate higher central moments and find

g 1 CP = 1 + 3 κ + κ 2 ( λ κ ) 1 / 2 ( 1 + κ ) 3 / 2 = n c 2 + n c 1 μ 1 / 2 n c 3 / 2 , $$ \begin{aligned} g_1^\mathrm{CP}&= \frac{1 + 3 \kappa + \kappa ^2}{(\lambda \kappa )^{1/2}(1+ \kappa )^{3/2}} = \frac{n_c^2 + n_c -1}{\mu ^{1/2}\, n_c^{3/2}}, \end{aligned} $$(11)

g 2 CP 3 = 1 + 7 κ + 6 κ 2 + κ 3 λ κ ( 1 + κ ) 2 = n c 3 + 3 n c 2 2 n c 1 μ n c 2 · $$ \begin{aligned} g_2^\mathrm{CP} - 3&= \frac{1 + 7 \kappa + 6 \kappa ^2 + \kappa ^3}{\lambda \kappa (1 + \kappa )^2} = \frac{n_c^3 + 3 n_c^2 - 2 n_c - 1}{\mu \, n_c^2}\cdot \end{aligned} $$(12)

The assumption that the counts of components may turn out to be zero might be in contradiction with the assumption that a survey is complete above a certain flux density. Thus, we also make use of a logarithmic distribution of Cji, for which the number of components is at least 1 and is discrete1. We now show that for Cji ∼ Logarithmic(p), with 0 < p < 1 denoting the parameter of the distribution, the resulting Cox process produces a negative binomial distribution. This outcome is significant because the negative binomial distribution is well-suited for modelling count data with overdispersion, where variance exceeds the mean, and provides a more realistic representation of the observed radio source counts. The generating function for the logarithmic (L) distribution (Johnson et al. 2005) reads

G L ( z ) = ln ( 1 p z ) ln ( 1 p ) , $$ \begin{aligned} G_{\rm L}(z) = \frac{\ln (1-pz)}{\ln (1 - p)}, \end{aligned} $$(13)

and the generating function for a Cox process with logarithmic distribution becomes

exp [ λ ( ln ( 1 p z ) ln ( 1 p ) 1 ) ] . $$ \begin{aligned} \exp \left[\lambda \left(\frac{\ln (1-p z)}{\ln (1 - p)} - 1\right)\right]. \end{aligned} $$(14)

Introducing the new variable r ≡ −λ/ln(1 − p) > 0 we find that this becomes the generating function of a negative binomial (NB) distribution (Johnson et al. 2005)

G NB ( z ) = ( 1 p 1 p z ) r , $$ \begin{aligned} G_{\rm NB}(z) = \left(\frac{1- p}{1-p z}\right)^r, \end{aligned} $$(15)

from which we obtain the mean and variance,

μ NB = rp 1 p , $$ \begin{aligned} \mu _{\rm NB}&= \frac{r p}{1-p}, \end{aligned} $$(16)

σ NB 2 = rp ( 1 p ) 2 · $$ \begin{aligned} \sigma ^2_{\rm NB}&= \frac{r p}{(1-p)^2}\cdot \end{aligned} $$(17)

Therefore, the clustering parameter reads nc = 1/(1 − p) > 1 and we find

g 1 NB = 1 + p ( r p ) 1 / 2 = 2 n c 1 μ 1 / 2 n c 1 / 2 , $$ \begin{aligned} g_1^\mathrm{NB}&= \frac{1 +p}{(r p)^{1/2}} = \frac{2 n_c - 1}{\mu ^{1/2}\, n_c^{1/2}}, \end{aligned} $$(18)

g 2 NB 3 = 1 + 4 p + p 2 rp = 6 n c 2 6 n c + 1 μ n c · $$ \begin{aligned} g_2^\mathrm{NB} - 3&= \frac{1 + 4 p + p^2}{r p} = \frac{6 n_c^2 - 6 n_c + 1}{\mu \, n_c}\cdot \end{aligned} $$(19)

In the limit nc → 1, both the negative binomial and compound Poisson distributions converge to the Poisson distribution, which means that their higher moments (skewness, kurtosis, and beyond) also agree with each other. For nc > 1 and an identical mean μ, the negative binomial distribution has a greater skewness and excess kurtosis.

The mean number of components per physical object in a negative binomial distribution is given by the expectation value of the logarithmic distribution,

μ L = p ( 1 p ) ln ( 1 p ) · $$ \begin{aligned} \mu _L = \frac{-p}{(1 - p) \ln (1 - p)}\cdot \end{aligned} $$(20)

A simple method to estimate the parameters of the distributions in Eq. (7), (8), and (15) is to use the estimates of the first and second moments, μ ̂ $ \hat\mu $ (mean) and σ 2 ^ $ \widehat{\sigma^2} $ (variance), which provide the following parameter estimates; λ ̂ P = μ ̂ $ \hat{\lambda}_{\mathrm{P}} = \hat{\mu} $ for a Poisson distribution, and for the two Cox processes we find that

λ ̂ CP = r ̂ NB = μ ̂ 2 σ 2 ^ μ ̂ , $$ \begin{aligned} \hat{\lambda }_{\rm CP} = \hat{r}_{\rm NB} = \frac{{\hat{\mu }}^2}{\widehat{\sigma ^2} - \hat{\mu }}, \end{aligned} $$(21)

and

κ ̂ CP = σ 2 ^ μ ̂ 1 , $$ \begin{aligned} \hat{\kappa }_{\rm CP} = \frac{\widehat{\sigma ^2}}{\hat{\mu }} -1, \end{aligned} $$(22)

and

p ̂ NB = 1 μ ̂ σ 2 ^ , $$ \begin{aligned} \hat{p}_{\rm NB} = 1- \frac{\hat{\mu }}{\widehat{\sigma ^2}}, \end{aligned} $$(23)

respectively, where the hat notation denotes the estimator of the parameter. Thus, we see that, based on the method of moments, λCP and r have the same numerical value. However, if we instead use the method of least-squares deviations, we do not expect that to happen, in general.

2.3. Angular correlation function from the reduced normalised variance

The variance of counts in cells depends on the size of the cells in a specific way, which connects this statistics with the angular two-point correlation function, w(ϑ), as originally shown in Totsuji & Kihara (1969) and further elaborated in Peebles (1975). In essence, the two-point correlation function w(ϑ) describes how the probability of finding two sources at an angular separation ϑ deviates from a purely random (Poisson) distribution. Since the variance of the source counts is influenced by clustering, it naturally reflects information from w(ϑ). In general, the statistical nth moment (see Eq. 1), can be written in terms of n-point correlation functions, which describe the degree of clustering at different scales. For the second moment (n = 2), it is useful to consider the reduced and normalised variance

Ψ 2 m 2 μ μ 2 = 1 Ω c 2 w ( ϑ 12 ) d Ω 1 d Ω 2 , $$ \begin{aligned} \Psi _2 \equiv \frac{m_2 - \mu }{\mu ^2} = \frac{1}{\Omega _c^2} \int \!\! \int \; w (\vartheta _{12}) \,\mathrm{d}\Omega _1 \,\mathrm{d}\Omega _2, \end{aligned} $$(24)

where ϑ12 is the angular distance between two solid angle elements, dΩi, and Ωc denotes the area of the cell. When sources are Poisson distributed, w(ϑ) = 0 and the Ψ2 vanishes.

However, in the presence of clustering, correlations between sources lead to deviations from Poisson statistics, making Ψ2 a crucial measure of the distribution of structures in the Universe.

At small angular scales, the assumption of a power law for the two-point correlation function can be made,

w ( ϑ 12 ) = A 0 ( ϑ 12 ϑ 0 ) 1 γ , $$ \begin{aligned} w(\vartheta _{12}) = A_0 \; \left(\frac{\vartheta _{12}}{\vartheta _0}\right)^{1-\gamma }, \end{aligned} $$(25)

with amplitude A0 defined at the pivot scale ϑ0, and clustering exponent γ, which characterises how the correlation decays with increasing separation ϑ12. Observational studies suggest a typical value of γ ≈ 1.8 (see, e.g., Peebles 1975; Blake & Wall 2002a; Wilman et al. 2003; Lindsay et al. 2014). A higher γ means that sources are highly clustered on small scales but become uncorrelated on larger scales.

Evaluating the integral in Eq. (24) over a cell, occupying a solid angle Ωc with linear angular scale Θ Ω c $ \Theta \equiv \sqrt{\Omega_c} $, one finds

Ψ 2 ( Θ ) = A 0 C γ ( Θ Θ 0 ) 1 γ , $$ \begin{aligned} \Psi _2(\Theta ) = A_0 \, C_{\gamma } \left(\frac{\Theta }{\Theta _0}\right)^{1-\gamma }, \end{aligned} $$(26)

where Cγ is a coefficient of order unity depending on γ, with C1 = 1. Θ0 serves as a pivot scale. More details on the numerical evaluation of Cγ are given in Appendix E. A higher γ means that the clustering is stronger at small Θ, leading to higher variance at smaller cell sizes.

3. Data

3.1. LOFAR Two-metre Sky Survey: DR1 & DR2

The LOFAR Two-metre Sky Survey will eventually cover the most of the northern sky at 120−168 MHz. The first data release (Shimwell et al. 2019) included 58 pointings in the region of the HETDEX Spring Field and covered 424 square degrees. The second data release (Shimwell et al. 2022), which is used in this study, extends the first data release in terms of sky coverage and source density. It consists of 841 pointings with a coverage of 5635 square degrees of northern sky. Due to resource limitations and ongoing technical developments, only the core and remote stations are used. With maximum baselines extending up to ∼100 km across the Netherlands, the survey achieves an angular resolution of 6 arcsec.

thumbnail Fig. 1.

Counts in cells of the LoTSS-DR2 radio source catalogue in Mollweide view and equatorial coordinates without a flux-density cut. The counts in cells are based on HEALPIX with a resolution of 13.74 square arcmin.

Observations for LoTSS are carried out with the LOFAR-HBA (High Band Antenna), utilising the core and remote stations. The pointing size (given by the station beam) of these observations is 3.8 deg in diameter at 150 MHz, while the pointings are typically separated by 2.58 deg, with six nearest neighbours within 2.8 deg (Shimwell et al. 2019). This significant overlap between the pointings can be used for mosaicing – the process of averaging data from overlapping regions – and thus avoiding lower-quality data at the outer edges. In LoTSS-DR2 (Shimwell et al. 2022), 626 and 215 pointings cover the RA-13 h and RA-1 h regions, respectively. The 4 396 228 detected radio sources of LoTSS-DR2 are shown as a HEALPIX2 (Górski et al. 2005) counts-in-cells map with a resolution of NSide = 256 as a Mollweide projection in Fig. 1. This resulting catalogue comes from the combined mosaic of the individual pointings where overlapping areas are averaged and combined.

Shimwell et al. (2022) tested the completeness of the radio source catalogue by injecting point sources and real deconvolution components of the observed point sources into the 841 source-subtracted mosaics in 10 simulations. For point sources a completeness of 50%, 90% and 95% is found at 0.34 mJy, 0.8 mJy, 1.1 mJy. This suggests that the investigation of cosmological questions calls for a flux-density threshold well above 1.1 mJy to ensure a completeness level above 95% in all fields. We explore the completeness considerations in greater detail in Sect. 5.

3.2. Random mock catalogue

For comparison of the observations with a Poissonian source distribution, we used the random mock catalogue generated by Hale et al. (2024). As outlined in Sect. 3.2 of Hale et al. (2024), the generation process involves several sequential steps. Initially, random sources are generated across the survey field of view, excluding regions affected by data reduction failures. Next, the process accounts for position-dependent smearing effects, which affect the detection of sources, necessitating modelling and correction for its dependence on field elevation. The mock catalogue also incorporates incompleteness and measurement errors, accounting for sensitivity variations across the survey area and their impact on source-detection completeness and properties such as flux density. We use it as a comparison to the distribution of the sources and for the estimation of the two-point correlation function (discussed in Sect. 6).

4. Masking strategies

In addition to applying a flux-density threshold to ensure completeness (see Sect. 5), we have to exclude regions that are strongly affected by systematics. Alternative strategies based on the rms noise level percentiles which were employed in Siewert et al. (2020), and new strategies based on the pointing geometry, tested for DR2, are discussed in Appendix A.2. In the following, we discuss the strategy used to define a mask that excludes the outer regions of the survey that are affected by systematics, resulting in our default mask.

As discussed in Sect. 3.3.2 and Fig. 9 of Shimwell et al. (2022), the ratio between the integrated flux-density values of LoTSS to NVSS varies systematically as a function of the position in the sky. Variations in the flux-density scale across an individual pointing are significantly reduced by mosaicing. Through mosaicing the overestimation of flux density in the northern regions of a pointing is somewhat reduced by the underestimation in the southern regions of a neighbouring pointing. The overlap areas between the pointings thus reduce the flux-density variation by averaging out these systematic biases across the survey area.

As such, for a mask based on the pointing geometry, it is important that we exclude the outer edges of the survey area, where adjacent pointings have not been mosaiced. These un-mosaiced regions can introduce inconsistencies, such as flux-density variations. Therefore, we mask these outer edges of boundary pointings of the survey. Additionally, and for a similar reason, we want to remove those areas where there are a large number of gaps within the image due to facets that failed during the data reduction process (see Fig. 2). These incomplete pointings often, though not exclusively, lie towards the outer edges of the observations, affecting the amount of masking used. Consequently, we mask both the un-mosaiced boundary areas and the regions with incomplete pointings. This mask is referred as ‘mask d’, which has been adopted from Hale et al. (2024).

thumbnail Fig. 2.

Example of complete pointings in the HETDEX field and incomplete pointings at the boundary of the survey area, and pointings that are not mosaiced with other neighbouring pointings.

This default mask (mask d) is used for all LoTSS-DR2 cosmology analysis and is shown in Fig. 3, where we compare it to the sky coverage of the full dataset. The RA and Dec cuts used to determine these regions are given in Table 1. Whilst slightly larger areas could have been included, these cuts are employed to be conservative to ensure that we only analyse regions with reliable data. The RA and Dec cuts are applied to the central position of HEALPIX pixels, not to the individual sources as done in Hale et al. (2024). Additionally, we exclude three cells with outstandingly high source density as a result of large foreground sources split into several single components. For more details, see Appendix B. With these cuts applied, we have ∼78% of the total area of LoTSS-DR2 remaining.

thumbnail Fig. 3.

For ‘mask d’ the light grey regions remain after masking, while dark grey is the survey area.

Table 1.

Definition of mask d.

5. Differential source counts

We used differential source counts (Condon 1988) as a function of flux density to evaluate the completeness of our data and the effectiveness of the applied masks. In Fig. 4, we show the differential number counts using Euclidean normalisation, which, in a static, homogeneous, and spatially flat Universe, are expected to remain constant as a function of flux density. In the left panel, we present the differential number counts for ‘mask d’ alongside two more conservative masks, which were derived using different masking strategies (see Sect. 4 and Appendix A for further details on the masks). We also show the ‘mask d value’, which denotes the previously defined ‘mask d’ but applied specifically to the LoTSS-DR2 value-added catalogue (Hardcastle et al. 2023). The value-added catalogue combines radio sources, which are observed as several components, into one single source. This process is only complete above 4 mJy. For comparison, we also include the results from the LoTSS Deep Fields (Mandal et al. 2021) and results from the LoTSS-DR1 value-added catalogue (Siewert et al. 2020).

thumbnail Fig. 4.

Left: Comparison of the differential source counts from mask d, mask 1017 (explained in Appendix A.3), and mask 50 (explained in Appendix A.2) with the results from the LoTSS Deep Fields (Mandal et al. 2021), and LoTSS-DR1 (Siewert 2021). The ending ‘radio’ refers to the application of the mask to the LoTSS-DR2 radio source catalogue, while the ending ‘value’ refers to the LoTSS-DR2 value-added catalogue. Downward pointing arrows indicate error-bars that extend outside of the plotted y-range. Right: Comparison of the differential source counts from mask d with the results from the LoTSS Deep Fields, TGSS (150 MHz; Intema et al. 2017), RACS (888 MHz; Hale et al. 2021), LoLSS-DR1 (54 MHz; de Gasperin et al. 2021), NVSS (1.4 GHz; Matthews et al. 2021), and SKADS (1400 MHz Wilman et al. 2008).

At low flux densities (on the mJy level), all masks except ‘mask 50’ yield consistent results, which confirms a high degree of completeness for LoTSS-DR2 at S > 2 mJy. The noise-based mask ‘mask 50’, which keeps the first 50% of all cells ranked by increasing averaged cell noise (more in Appendix A.2) retains more sources at a low flux density but removes sources at higher flux densities, likely due to higher rms noise around bright sources, and therefore explains the difference at the lower and higher flux densities. The value-added catalogue shows a lower source count in the flux-density range between 4 and 20 mJy, which is due to the value-adding process (see the left panel of Fig. 4).

In addition to the completeness analysis discussed above, we can confidently rely on the source counts of the mask d for flux density levels above 2 mJy (see the right panel of Fig. 4). Based on the differential source counts, we find that the more sophisticated masking strategy (for more details, see Appendix A) does not offer any advantage over the simpler survey geometry-based mask (mask d). Furthermore, while the improvements of the value-added catalogue only appear above 4 mJy, it is not suitable for a cosmological analysis requiring a homogeneous sample starting from 2 mJy. Given these findings, we adopt the ‘mask d’ and a flux-density threshold of at least 2 mJy as the default selection for the cosmological analysis of LoTSS-DR2.

In the right panel of Fig. 4 we extend our comparison of ‘mask d’ source counts with results from various radio surveys at different frequencies. Specifically, we compare with the RACS-low source catalogue (Hale et al. 2021) at 888 MHz, the LoLSS-DR1 source catalogue (de Gasperin et al. 2021) at 54 MHz, and the NVSS source catalogue (Matthews et al. 2021) at 1.4 GHz. These three results are scaled to 144 MHz with a spectral index between −0.8 and −0.7, resulting in bands of differential source counts. In addition, we compare with the SKADS catalogue (Wilman et al. 2008) at 1.4 GHz, scaled to 144 MHz using α = −0.7, and with the TGSS best-fit polynomial (Intema et al. 2017), scaled from 150 MHz using α = −0.73 (the median spectral index of Intema et al. 2017). It is shown that up to a different flux-density scaling of around 10%, our result agrees very well with previous results.

A key contribution to the difference between the LoTSS-DR2 and the LoTSS Deep Field differential source counts is that in the deep fields the individual radio components have been identified and combined into single sources. Additionally, flux scale differences exist between the deep fields and LoTSS-DR2.

6. Results

In the following, we present results for three different flux-density thresholds, spaced equally on a logarithmic scale, namely 2, 4, and 8 mJy, resulting in 827 362, 460 572, and 279 304 radio sources, respectively. Without applying any other thresholds, Siewert et al. (2020) demonstrated that the LoTSS-DR1 data above 2 mJy produced robust and reliable estimates of the two-point correlation. An additional cut in the signal-to-noise ratio (defined as the peak flux density over rms noise) allowed Hale et al. (2021) to lower that threshold to 1.5 mJy. Looking at a variety of flux-density thresholds, allows us to test the robustness of our findings.

6.1. Counts-in-cells distribution

In this section, building on the discussions in Sect. 2, we compare the counts-in-cells model distributions with the LoTSS-DR2 data. The parameters of these model distributions are determined by means of the method of moments, based on empirical estimates of the mean μ ̂ $ \hat{\mu} $ and variance σ 2 ^ $ \widehat{\sigma^2} $. Table 2 presents the parameter values for the Poisson, compound Poisson, and negative binomial distributions, evaluated for mask d and at different flux-density thresholds.

Table 2.

Values of the parameters of three different models for the counts-in-cells distribution of LoTSS-DR2 sources.

The upper panels of Fig. 5 display the corresponding histograms of the counts in cells distribution at different flux-density thresholds, comparing the LoTSS-DR2 data and the random mock catalogue. Since the mock catalogue contains more sources than the actual data, we randomly selected an equivalent number of sources after applying the same mask and flux-density threshold. The observed distribution of sources (blue histograms) deviates from a Poisson distribution (the grey dots). For increasing flux-density thresholds, this deviation gradually decreases but remains clearly visible. In contrast, the random mock catalogue follows a Poisson distribution, with minor deviations arising from survey systematics (see also Sect. 3.2).

thumbnail Fig. 5.

Histograms of counts in cells of mask d for LoTSS-DR2 and the random mock catalogue at the flux-density thresholds 2, 4, and 8 mJy (left, top to bottom), and the CDFs (right, top to bottom) with the best-fit Poisson and compound Poisson and negative binomial distributions.

Based on the measured value of the clustering parameter nc (Eq. 4), the excess variance in the random mock catalogue is about 10% at 2 mJy, likely due to various systematic effects incorporated into the randoms. In contrast, the excess variance for the LoTSS-DR2 sources is about 46% which cannot be understood by the effects incorporated in the random mock catalogue (see Fig. 5). This supports the hypothesis that the multi-component nature of radio sources plays a key role in shaping the distribution of the counts. However, some of this excess variance may still be due to imaging artefacts. Notably, both the negative binomial (blue dots) and compound Poisson (red dots) fit quite well, though a clear preference is not immediately evident. We make use of both frequentist and Bayesian statistical tests to examine which model better describes the distribution of sources.

The Pearson χ2 test (Pearson 1900) compares the expected and observed distribution of events sorted into classes or bins,

χ 2 = i = 1 n ( O i E i ) 2 E i , $$ \begin{aligned} \chi ^2 = \sum _{i=1}^{n} \frac{(O_i - E_i)^2}{E_i}, \end{aligned} $$(27)

where Oi and Ei are the observed and expected abundances of event i, and n is the number of event classes or bins. This statistic depends on the number of degrees-of-freedom (d.o.f.), which are the number of histogram bins minus the number of fitted parameters of the distribution. This allows us to account for the fact that models with more fitted parameters might fit the data better, simply because they have more flexibility. The so-called reduced chi-square statistic (Wong 1992), which is defined as χ2/d.o.f., is hence used to reduce these dependencies. A value of χ2/d.o.f. of around unity indicates a good fit, which means that the data are likely to have been drawn from the proposed model. If the value is much greater than 1, it suggests that the data are unlikely given the model, while a value significantly less than 1 could imply that the model is overfitting the data.

To minimise the impact of outliers, we limit the test to event classes with significant expected and observed abundance (in order to avoid division by small numbers). We limit the analysis to histogram bins containing at least 1000 cells to ensure the histograms are robust, free from distortions and not unduly affected by low-count bins. In Fig. 5, this corresponds to the bins of 3 to 18 sources, 1 to 12 sources, and 0 to 8 sources for 2, 4, and 8 mJy flux-density threshold, respectively. Bins with fewer than 1000 cells, at flux density above 2 mJy, constitute less than 10% of all sources. Table 3 shows the reduced chi-square test for the mask d and for the different flux cuts. The negative binomial distribution is preferred for all flux-density thresholds. This also holds for the other geometry-based masks detailed in the Appendix D.

Table 3.

χ2/d.o.f. for three models of the counts in cells for mask d at different flux-density thresholds for LoTSS-DR2 sources.

Pettitt & Stephens (1977) demonstrated that for discrete distributions, the Kolmogorov-Smirnov (KS) test might exhibit higher statistical power compared to the χ2 test. Therefore, we make use of the KS test as our secondary evaluation method. The KS test is non-parametric, meaning it makes no assumption about the distributions being compared. It quantifies the distance between the empirical cumulative distribution function (CDF) of the sample Fn(x) (where the index n represents the number of data bins) and the CDF of the reference distribution F(x) (see Lista 2017). The KS statistic, or d-value, measures the maximum vertical distance between the two CDFs:

d n = sup x | F n ( x ) F ( x ) | . $$ \begin{aligned} d_n = \underset{x}{\sup } \,|F_n(x) - F(x)|. \end{aligned} $$(28)

The null hypothesis, namely that Fn is a random realisation of the model distribution F, is rejected at confidence level (1 − α) if dn > dα, where α is the frequentist probability of false rejection of the null hypothesis. For details on the critical values dα, refer to Feller (1948) and Smirnov (1948).

However, when the form or parameters of the model distribution F(x) are estimated from the data, the tabulated critical values provided in the standard references no longer apply. In this case, Monte Carlo methods must be used to determine the critical values. We thus simulate realisations of the hypothesised distribution and calculate the 99% confidence level (C.L.) of the resulting value of dα, which serve as our critical values for rejecting the null hypothesis that the theoretical distribution matches the observed data.

The CDFs of the counts in cells for the data, random mocks, and three theoretical models are shown on the lower panels of Fig. 5. Table 4 presents the measured KS test statistic d value and its critical values dα at a significance level of α = 0.01 for various flux-density thresholds. All d values for the Poisson distribution at all flux-density thresholds exceed the critical values by a large margin, leading to the rejection of the Poisson distribution with very high confidence (> 99% C.L.). This is consistent with the results drawn from the reduced chi-square test discussed above. Similarly, for the compound Poisson distribution, the measured d values are two or three times higher than the critical values, allowing us to exclude this distribution across all flux-density thresholds. For both the Poisson and compound Poisson distributions, we used 1000 Monte Carlo realisations to calculate the critical d-values. Given that the measured values were already significantly larger, extreme precision in handling outliers was unnecessary. However, for the negative binomial case, where the measured and critical d-values are closer to each other, we increased the number of realisations to 10 000 to account for potential outliers and ensure greater accuracy. For the negative binomial distribution with mask d, the lower observed d-values compared to those of the compound Poisson distribution suggest that the negative binomial distribution is preferred. However, the KS test still rejects this distribution, indicating that other effects on top of the ones accounted for in the negative binomial distribution must exist.

Table 4.

Kolmogorov-Smirnov test statistic for mask d at different flux-density thresholds of the LoTSS-DR2 sources.

Both Pearson’s chi-square statistic and the KS test clearly exclude a Poissonian distribution of the data, while the random mocks are in good agreement with a Poissonian distribution. Furthermore, both tests favour the negative binomial distribution over a compound Poissonian distribution. To further quantify that preference, we also embark on a Bayesian model comparison by computing the Bayes factor (B). The Bayes factor, which is the ratio of evidence of two models, quantifies the support for one model over the other. Essentially, it is the ratio of the posterior odds to the prior odds, and if the priors for both models are the same, it reduces to the posterior odds.

We calculate the Bayes factor for the hypothesis of the negative binomial over the compound Poisson distribution. When prior odds are presumed to be non-informative, the posterior odds reduce to the ratio of their likelihoods, which defines the Bayes factor (Jeffreys 1961; Gelman et al. 2003)

B NB CP = p ( data | model NB ) p ( data | model CP ) · $$ \begin{aligned} B_{\mathrm{NB-CP} } = \frac{p(\mathrm{data} \,|\,\mathrm{model} _{\mathrm{NB} })}{p(\mathrm{data} \,|\,\mathrm{model} _{\mathrm{CP} })}\cdot \end{aligned} $$(29)

For flux-density thresholds of 2, 4, and 8 mJy and mask d, the Bayes factors are 25.4, 17.2, and 15.9, respectively. According to Jeffrey’s scale of evidence (Jeffreys 1961), a Bayes factor higher than 10 provides strong evidence for a model. Therefore, there is strong evidence in favour of the negative binomial model.

We conclude that the negative binomial distribution provides an excellent fit to the data and offers a proper physical interpretation, especially when using conservative masks (see also Appendix D). For mask d, our default in the cosmological analysis of LoTSS-DR2, there are some additional effects that are obviously not described perfectly by the negative binomial distribution, since dn is still larger than dα, while the more aggressive masks discussed in the appendix result in dn ∼ dα for the negative binomial distribution. Notably, the negative binomial distribution ensures that the number of components starts from at least one, aligning with the expectation of a complete catalogue. This consistency supports the survey’s completeness estimates above the 2 mJy flux-density threshold.

6.2. Angular correlation function from scaling of cell size

As detailed in Sect. 2.3, the reduced normalised variance Ψ2 (Eq. 24) of the counts in cells is directly related to the angular two-point correlation function. Through fitting this statistic over various cell sizes to a power-law ansatz for the angular two-point correlation function, we can determine both the amplitude and exponent of the power law. To account for uncertainties, we employ uncertainty propagation and, in addition, we use the bootstrap method. To estimate the uncertainty of Ψ2, we propagate the uncertainties,

σ Ψ 2 2 = ( Ψ 2 m 2 ) 2 σ m 2 2 + ( Ψ 2 μ ) 2 σ μ 2 + 2 Ψ 2 m 2 Ψ 2 μ σ m 2 , μ 2 , $$ \begin{aligned} {\sigma }^2_{\Psi _2} = \left(\frac{\partial \Psi _2}{\partial m_2}\right)^2 \! {\sigma }^2_{m_2} + \left(\frac{\partial \Psi _2}{\partial \mu }\right)^2 \! {\sigma }^2_\mu + 2 \frac{\partial \Psi _2}{\partial m_2} \frac{\partial \Psi _2}{\partial \mu } \sigma ^2_{m_2, \mu }, \end{aligned} $$(30)

where σΨ22, σm22 and σμ2 are the variances of Ψ2, m2 and μ, respectively and σm2, μ2 denotes the covariance of m2 and μ. Evaluating the partial derivatives and inserting

σ μ 2 = m 2 n , σ m 2 2 = m 4 ( m 2 ) 2 n + 2 m 2 2 n ( n 1 ) , σ m 2 , μ 2 = m 3 n , $$ \begin{aligned} \sigma ^2_\mu = \frac{m_2}{n}, \sigma ^2_{m_2} = \frac{m_4 - (m_2)^2}{n} + \frac{2m_2^2}{n(n-1)}, \sigma ^2_{m_2, \mu } = \frac{m_3}{n}, \end{aligned} $$(31)

from Ramalingam (2008) in Eq. (30), we get

σ Ψ 2 2 = 1 n μ 4 [ m 4 + 2 m 3 + m 2 + 4 m 3 2 μ 2 n 3 n 1 m 2 2 4 m 2 ( m 3 + m 2 ) μ ] . $$ \begin{aligned} {\sigma }^2_{\Psi _2}\!\! = \!\frac{1}{n\mu ^4}\! \left[ m_4\! + \!2m_3\! + \!m_2\! +\! \frac{4m_3^2}{\mu ^2}\! - \!\frac{n\!-\!3}{n\!-\!1} m_2^2 \! -\! \frac{4m_2(m_3\! +\!m_2)}{\mu }\right]. \end{aligned} $$(32)

This quantity can then be estimated by means of the observed moments μ ̂ , m ̂ 2 , m ̂ 3 $ \hat{\mu}, \hat{m}_2, \hat{m}_3 $ and m ̂ 4 $ \hat{m}_4 $.

We also employ the bootstrap method (Efron 1979) and compare its results with those obtained from uncertainty propagation. The bootstrap is a resampling technique that generates multiple datasets by randomly selecting samples from the original dataset, with the same size as the original dataset, allowing for repeated selection of the same data points. This method is widely used to estimate and visualise the sampling distribution of a statistic. The bootstrap-generated distribution can be used to compute standard errors and confidence intervals for statistical estimates. One key advantage of the bootstrap method is that it does not require model assumptions about the underlying data distribution or the normality of the sample statistic, making it particularly useful in cases where traditional parametric methods may not be applicable.

In Fig. 6, we present the results of error analysis using both analytical uncertainty propagation and the bootstrap method at a flux-density threshold of 2 mJy. Notably, the comparison reveals a high degree of consistency between the two approaches.

thumbnail Fig. 6.

Empirical variance of Ψ2 estimated analytically (blue) and from a bootstrap method (orange) with 500 realisations, each at 2 mJy flux-density threshold at different values of Nside of the HEALPIX scheme.

In the following, we determine Ψ2(Θ), and fit a power-law parametrisation of the angular two-point correlation. After numerically computing the coefficients Cγ (as detailed in Appendix E) and subtracting the Ψ2(Θ) values obtained from the random catalogue, we fit a power law to derive parameters A0, and γ. To achieve this, we manipulate the mask d by both up-scaling and down-scaling, then apply it to individual HEALPIX maps at varying Nside resolutions. Our calculations encompass the computation of the reduced normalised variance across Nside values ranging from 16 to 512, covering angular scales spanning from 3.66 deg down to 0.11 deg. Additionally, we perform this analysis over Nside values from 16 to 256, corresponding to angular scales ranging from 3.66 deg down to 0.22 deg. Below 0.1 deg, multi-component source clustering becomes important (see also Hale et al. 2024).

The fitting parameters and results of the reduced chi-square test at different flux-density thresholds are presented in Table 5. Fig. 7 illustrates the best fit values for γ across varying flux-density thresholds for the two fitted angular distance ranges (3.66 − 0.11 deg and 3.66 − 0.22 deg), with the results extrapolated down to 0.03 deg.

Table 5.

Best-fit results for the amplitude and exponent of the power law.

thumbnail Fig. 7.

Fitted single power law for variance variations of re-scaling HEALPix maps for Nside = 16 − 512 corresponding to 3.8 deg to 0.11 deg at different flux-density thresholds (left) and for Nside = 16 − 256 corresponding to 3.8 deg to 0.23 deg (right), plotted for Nside = 16 − 2048, which means the fit was not performed for the fainter data points.

As observed, higher flux-density thresholds result in a decrease in amplitude and an increase in the slope of the power law, indicating an improved fit at these levels. This trend aligns with the findings of LoTSS-DR1 (Siewert et al. 2020; Bhardwaj et al. 2024) but contrasts with the results from Wilman et al. (2003) for the Boötes Deep Field and Rana & Bagla (2019) for TGSS-ADR1. This discrepancy may stem from flux-scale calibration issues in the LoTSS pipeline, as previously discussed by Shimwell et al. (2022) and Hale et al. (2024). Alternatively, it could reflect a shift in the distribution of source populations with flux density, where a larger fraction of SFGs dominates at lower flux-density thresholds (Best et al. 2023). At higher thresholds, the clustering signal weakens due to fewer components being associated with each source. The steeper slope at these flux densities is likely driven by the increasing dominance of active galactic nuclei (AGNs), which tend to exhibit lower clustering amplitudes (Magliocchetti et al. 2016; Hale et al. 2017).

Drawing from the completeness discussions in Hale et al. (2024), Section 3.2.3, we examine the fitting results for different signal-to-noise ratios (S/N), defined as the peak flux density divided by the rms noise, with the integrated flux-density threshold set to 2 mJy (see the fitting results in Table 6). As shown in Fig. 8, there is a slight decreasing trend in slope as the S/N thresholds increase. Notably, these fits exhibit a closer alignment with lower Nside, which correspond to larger spatial scales. Higher S/Rs tend to result in better fits. Table 6 contains the fitting parameters and the reduced chi-square test results for different S/Rs at a flux density of 2 mJy. As the S/N increases, the amplitude slightly decreases and the slope of the power law decreases, an inverse trend of its power to what we saw for the higher flux-density thresholds. Higher S/N results in a decrease in both the slope and amplitude of the correlation function, as brighter, more isolated sources are less clustered than fainter, unresolved ones. Additionally, the potential for overfitting or too large weights, indicated by small reduced chi-square values, suggests that caution should be taken in interpreting the fits, especially when fitting models to high S/N data.

Table 6.

Best-fit results for amplitude and exponent of a power law.

thumbnail Fig. 8.

Fitted single power law for variance variations of re-scaling HEALPix maps for Nside = 16 − 512 (left) and for Nside = 16 − 256 (right) at flux-density threshold 2 mJy and different S/N cuts plotted for Nside = 16 − 2048 (solid points represent the data points for which fits were performed).

The results of the parametrised two-point correlation function by fitting power laws to different angular ranges for the 2 mJy flux-density threshold and S/N 7.5 are presented in Fig. 9. We fit power laws to various combinations of Nside values ranging from 16 to 1024, corresponding to angular sizes ranging from 3.66 − 0.06 deg. Additionally, we estimate the model-independent two-point correlation function using the Landy-Szalay estimator (Landy & Szalay 1993) at the same flux-density threshold and S/N, utilising the random mock catalogue. The Landy-Szalay estimator is a widely used method for directly measuring the two-point correlation function by comparing the spatial distribution of observed sources to that of a randomly distributed catalogue. This estimator has been shown to have minimal variance and be less biased than other estimators (Landy & Szalay 1993). To measure it, we use the package TREECORR (Jarvis 2015) following the parameter settings detailed in Sect. 3 of Hale et al. (2024). Given the extensive area covered by LoTSS-DR2, we set the separation metric in TREECORR to ‘Arc’ for accurate great circle distance calculations. In addition, we set bin_slop to 0 for precise pair counts within each angular separation bin.

thumbnail Fig. 9.

Fitting results for power laws to different angular ranges at 2 mJy flux-density threshold and S/N 7.5. (dashed lines). The dots represent the results of direct measurements of the two-point correlation function using the Landy-Szalay estimator. The inner plot shows the 1σ and 2σ contours for amplitude and exponent of the power-law fit to w(ϑ).

We also calculated the integral constraint, which accounts for the finite survey area and estimator bias (see, e.g. Roche & Eales 1999; Siewert et al. 2021). Since the area of LoTSS-DR2 is much larger than that of LoTSS-DR1, the integral constraint effect is of the order of 10−6 and is therefore negligible.

As clearly shown in Fig. 9, the results based on the reduced normalised variance of source counts are in very good agreement with the direct measurements from the Landy-Szalay estimator, particularly at angular scales ranging from 0.3 down to 0.02 deg, corresponding to Nside values from 128 to 2048. This agreement validates our approach, demonstrating that the counts-in-cells method provides a reliable alternative for estimating the clustering properties of radio sources. Moreover, our results are consistent with previous studies (e.g. Magliocchetti et al. 1999; Lindsay et al. 2014), which have reported slopes typically ranging from −1.2 to −0.8 at different flux-density limits, further reinforcing the robustness of our findings.

7. Discussions

7.1. Negative binomial distribution

In this section, we discuss the results of Sect. 6.1, where we demonstrated that the negative binomial distribution is the preferred model. At a 2 mJy flux-density threshold, the mean number of components per source derived from the negative binomial distribution is 1.27 (Eq. (20) and Table 2). This suggests that most sources are likely single entities, possibly unresolved AGNs or SFGs. From the probability distribution function of the logarithmic distribution describing the components of the negative binomial distribution, we find that approximately 80% of the sources have one component, 15% have two components, and 2% have three components. These results are consistent with the findings of Böhme et al. (2023), who demonstrated that in the cross-match of the low-resolution (45″) preliminary release of the LOFAR LBA Sky Survey (LoLSS-PR) with LoTSS-DR2 in the HETDEX spring field, the mean number of association components is around 1.33. Furthermore, Fig. 8 of Böhme et al. (2023) shows the fraction of sources with different numbers of components, which aligns well with our findings. However, the mean number of components from our results is slightly higher than the mean association component derived from the value-added catalogue (Hardcastle et al. 2023) at the same flux-density threshold, which is 1.13. The difference likely arises because the value-adding process starts only above 4 mJy. Since the number of sources above 2 mJy is almost double that of sources above 4 mJy (see Fig. 5), the mean number of components from our analysis remains consistent with the value-added catalogue. For the negative binomial distribution, all sources have at least one component, consistent with the completeness considerations presented in Sect. 5.

Interestingly, the preference for a negative binomial distribution is consistent with previous studies of counts in cells for optical sources (e.g. Neyman et al. 1953 in the Lick survey; Yang & Saslaw 2011; Hurtado-Gil et al. 2017 in the SDSS survey). Carruthers & Duong-van (1983) proposed a universal mechanism that might link galaxy clusters counting distributions to empirical data from high-energy collisions in particle physics. While some authors (Saslaw & Fang 1996) have rejected the negative binomial distribution as a physically complete description of galaxy clustering, arguing that it violates the second law of thermodynamics, others have found it to be justified (Elizalde & Gaztanaga 1992; Betancort-Rijo 2000; Hurtado-Gil et al. 2017).

In the context of halo occupation distribution models (HOD; Berlind & Weinberg 2002; Zheng et al. 2005), which study the relationship between the number of galaxies and the mass of their host halos, these models provide a statistical framework to describe how galaxies trace dark matter halos. The HOD framework is particularly important for linking observations of galaxy clustering to the underlying dark matter distribution. Simulation results indicate that the scatter in the number of satellite galaxies at a fixed halo mass is likely non-Poissonian (Boylan-Kolchin et al. 2010; Mao et al. 2015), suggesting that simple Poisson-based assumptions may not fully capture the complexity of galaxy-halo relationships. Jiménez et al. (2019) demonstrated that a negative binomial distribution provides a better model for the number of satellite galaxies in halos. The negative binomial distribution accounts for this increased variability by introducing a second parameter that allows more flexibility in variance. Unlike the Poisson distribution, which assumes fixed variance, the negative binomial can handle overdispersed data, where the variance exceeds the mean.

7.2. Comparison of the methods for two-point correlation function

In Sect. 6.2, we employed the scaling of cell sizes to determine the two-point correlation function using the reduced normalised variance. This approach simplifies the computation by using source counts within cells rather than direct pairwise comparisons. Compared to the direct measurement from the Landy-Szalay estimator used by Hale et al. (2024), our method is computationally more efficient, scaling linearly with the number of sources, N, instead of the quadratic scaling N(N − 1)/2 required for direct pair counting. As future surveys detect increasingly larger numbers of radio sources, calculating the two-point correlation function will become significantly more time-consuming, even with advanced methods developed to address this challenge.

For comparing different bias models within the HOD framework, the method outlined in Hale et al. (2024) (see Sects. 5 and 6 of Hale et al. 2024) may prove more advantageous than our approach, particularly because it employs the full estimator, which provides a more comprehensive analysis of clustering behavior. While our approach is well-suited for fitting a power law, its applicability is limited providing values for the two-point correlation function across all angular separations, as it depends on a predefined set of cell sizes, which constrain its flexibility and broader applicability. In contrast, the estimator approaches used in direct measurements are more complicated when calculating higher-order correlation functions, which are necessary for studying non-Gaussianity, especially at large scales. However, our method can be more easily extended to higher-order correlation functions, as it primarily relies on statistical moments.

8. Conclusion

In this work, we studied the counts-in-cells distribution of LoTSS-DR2 radio sources, to determine the best-fitting statistical model. By applying spatial masks and flux-density thresholds, we conducted a comparative analysis of the distribution of sources using three discrete stochastic processes. We found, with high statistical significance, that the distribution of the radio sources deviates from a Poisson distribution, indicating the presence of clustering. Our research indicates that Cox processes, specifically compound Poisson and negative binomial distributions, provide a good fit to the source distribution. To distinguish between these models, we performed two hypothesis tests (the reduced chi-square and Kolmogorov-Smirnov tests) and calculated the Bayes factor. All three tests culminate in the conclusion that there is strong statistical evidence in favour of the negative binomial distribution for the counts in cells of radio sources above flux-density thresholds of 2 mJy. We focused our analysis on the Poisson, compound Poisson, and negative binomial distributions, leaving open the possibility that other distributions might also fit the data well.

By analysing the scaling properties of cell sizes and the relationship between the second moment of counts in cells and the two-point correlation function, we performed a fitting procedure on the reduced normalised variance across different cell sizes. This fitting process allows us to determine the parameters of the two-point correlation function, specifically identifying the amplitude and exponent of its power law. At a flux-density threshold of 2 mJy and a S/N of 7.5, we measure a value of 1 − γ in the range of −1.05 to −0.8, which aligns well with previous studies using optical, infrared, and radio data (Yang & Saslaw 2011; Labini 2011; Oliver et al. 2004; Pollo et al. 2012; Magliocchetti et al. 1999; Lindsay et al. 2014).

Since the counts-in-cells statistics clearly indicate clustering, higher-order moments of the galaxy distribution can be used to test non-linear and non-Gaussian models of the large-scale structure formation. While this work focuses on the second moment, which relates to the two-point correlation function, a more accurate description of the radio galaxy distribution requires measuring higher-order correlation functions using higher moments. Future work should address this, as pioneered in Peebles (1980), Magliocchetti et al. (1999) and Saslaw (2000).

Applying the same method used in this work to higher-order correlation functions will scale the computational effort linearly with the number of sources, N. In contrast, methods that rely on optimal estimators, such as the Landy-Szalay estimator for the two-point correlation, scale with Nm for the m-point correlation function, making them computationally expensive for future surveys with a very large number of sources. Therefore, we conclude that simple counts-in-cells methods remain valuable, as they provide resource efficient and powerful insights into the large-scale structure of the Universe. The upcoming LoTSS Data Release 3 will present an opportunity to investigate non-Gaussianity on the large cosmological scales (see e.g. Desjacques & Seljak 2010).


1

Fisher et al. (1943) introduced the logarithmic distribution by studying the relation of the number of species and the number of individuals in a random animal population. From a statistical perspective, our problem at hand is similar, with individuals (radio components) that belong to different species (physical objects such as AGNs or SFGs).

Acknowledgments

MPA acknowledges support from the Bundesministerium für Bildung und Forschung (BMBF) ErUM-IFT 05D23PB1. LB acknowledges support by the Studienstiftung des deutschen Volkes. TMS and DJS acknowledge the Research Training Group 1620 ‘Models of Gravity’, supported by Deutsche Forschungsgemeinschaft (DFG) and by the German Federal Ministry for Science and Research BMBF-Verbundforschungsprojekt D-LOFAR IV (grant number 05A17PBA). CLH acknowledges support from the Leverhulme Trust through an Early Career Research Fellowship and also acknowledges support from the Oxford Hintze Centre for Astrophysical Surveys which is funded through generous support from the Hintze Family Charitable Foundation. CH’s work is funded by the Volkswagen Foundation. CH acknowledges additional support from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy EXC 2181/1 – 390900948 (the Heidelberg STRUCTURES Excellence Cluster). JZ acknowledges support by the project “NRW-Cluster for data intensive radio astronomy: Big Bang to Big Data (B3D) “funded through the programme “Profilbildung 2020”, an initiative of the Ministry of Culture and Science of the State of North Rhine-Westphalia”. LOFAR is the Low Frequency Array designed and constructed by ASTRON. It has observing, data processing, and data storage facilities in several countries, which are owned by various parties (each with their own funding sources), and which are collectively operated by the ILT foundation under a joint scientific policy. The ILT resources have benefited from the following recent major funding sources: CNRS-INSU, Observatoire de Paris and Université d’Orléans, France; BMBF, MIWF-NRW, MPG, Germany; Science Foundation Ireland (SFI), Department of Business, Enterprise and Innovation (DBEI), Ireland; NWO, The Netherlands; The Science and Technology Facilities Council, UK; Ministry of Science and Higher Education, Poland; The Istituto Nazionale di Astrofisica (INAF), Italy. This research was conducted using several tools and Python packages that were crucial for our analysis, including: healpy (Zonca et al. 2019), HEALPix (Górski et al. 2005), Astropy (Astropy Collaboration 2013, 2018, 2022), NumPy (van der Walt et al. 2011), SciPy (Virtanen et al. 2020), TreeCorr (Jarvis 2015), IPython (Pérez & Granger 2007), Matplotlib (Hunter 2007). We also used Aladin (Bonnarel et al. 2000) to produce some images.

References

  1. Algera, H. S. B., van der Vlugt, D., Hodge, J. A., et al. 2020, ApJ, 903, 139 [Google Scholar]
  2. Astropy Collaboration (Robitaille, T. P., et al.) 2013, A&A, 558, A33 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  3. Astropy Collaboration (Price-Whelan, A. M., et al.) 2018, AJ, 156, 123 [Google Scholar]
  4. Astropy Collaboration (Price-Whelan, A. M., et al.) 2022, ApJ, 935, 167 [NASA ADS] [CrossRef] [Google Scholar]
  5. Berlind, A. A., & Weinberg, D. H. 2002, ApJ, 575, 587 [Google Scholar]
  6. Bernardeau, F., Colombi, S., Gaztañaga, E., & Scoccimarro, R. 2002, Phys. Rep., 367, 1 [Google Scholar]
  7. Best, P. N., Kondapally, R., Williams, W. L., et al. 2023, MNRAS, 523, 1729 [NASA ADS] [CrossRef] [Google Scholar]
  8. Betancort-Rijo, J. 2000, J. Sqtat. Phys., 98, 917 [Google Scholar]
  9. Bhardwaj, N., Schwarz, D. J., Hale, C. L., et al. 2024, A&A, 692, A2 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  10. Blake, C., & Wall, J. 2002a, MNRAS, 337, 993 [NASA ADS] [CrossRef] [Google Scholar]
  11. Blake, C., & Wall, J. 2002b, MNRAS, 329, L37 [Google Scholar]
  12. Böhme, L., Schwarz, D. J., de Gasperin, F., Röttgering, H. J. A., & Williams, W. L. 2023, A&A, 674, A189 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  13. Bonnarel, F., Fernique, P., Bienaymé, O., et al. 2000, A&AS, 143, 33 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  14. Boylan-Kolchin, M., Springel, V., White, S. D. M., & Jenkins, A. 2010, MNRAS, 406, 896 [Google Scholar]
  15. Carruthers, P., & Duong-van, M. 1983, Phys. Lett. B, 131, 116 [Google Scholar]
  16. Condon, J. J. 1988, Radio Sources and Cosmology (New York: Springer), 641 [Google Scholar]
  17. Cox, D. R. 1955, J. Roy. Stat. Soc.: Ser. B (Methodol.), 17, 129 [Google Scholar]
  18. Cress, C. M., Helfand, D. J., Becker, R. H., Gregg, M. D., & White, R. L. 1996, ApJ, 473, 7 [NASA ADS] [CrossRef] [Google Scholar]
  19. De Breuck, C., Seymour, N., Stern, D., et al. 2010, ApJ, 725, 36 [NASA ADS] [CrossRef] [Google Scholar]
  20. de Gasperin, F., Williams, W. L., Best, P., et al. 2021, A&A, 648, A104 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  21. de Zotti, G., Massardi, M., Negrello, M., & Wall, J. 2010, A&ARv, 18, 1 [CrossRef] [Google Scholar]
  22. Desjacques, V., & Seljak, U. 2010, Class. Quant. Grav., 27, 124011 [Google Scholar]
  23. Dolfi, A., Branchini, E., Bilicki, M., et al. 2019, A&A, 623, A148 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  24. Duncan, K. J., Kondapally, R., Brown, M. J. I., et al. 2021, A&A, 648, A4 [EDP Sciences] [Google Scholar]
  25. Efron, B. 1979, Ann. Stat., 7, 1 [Google Scholar]
  26. Elizalde, E., & Gaztanaga, E. 1992, MNRAS, 254, 247 [Google Scholar]
  27. Feller, W. 1948, Ann. Math. Stat., 19, 177 [CrossRef] [Google Scholar]
  28. Fernique, P., Boch, T., Donaldson, T., et al. 2014, MOC – HEALPix Multi-Order Coverage map Version 1.0, IVOA Recommendation 02 June 2014 [Google Scholar]
  29. Fisher, R. A., Steven Corbet, A., & Williams, C. B. 1943, J. Animal Ecol., 12, 42 [Google Scholar]
  30. Gelman, A., Carlin, J., Stern, H., & Rubin, D. 2003, Bayesian Data Analysis, Second Edition, Chapman& Hall/CRC Texts in Statistical Science (Taylor& Francis) [Google Scholar]
  31. Górski, K. M., Hivon, E., Banday, A. J., et al. 2005, ApJ, 622, 759 [Google Scholar]
  32. Hale, C. L., Jarvis, M. J., Delvecchio, I., et al. 2017, MNRAS, 474, 4133 [Google Scholar]
  33. Hale, C. L., McConnell, D., Thomson, A. J. M., et al. 2021, PASA, 38, e058 [NASA ADS] [CrossRef] [Google Scholar]
  34. Hale, C. L., Schwarz, D. J., Best, P. N., et al. 2024, MNRAS, 527, 6540 [Google Scholar]
  35. Hardcastle, M. J., Horton, M. A., Williams, W. L., et al. 2023, A&A, 678, A151 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  36. Hayat, M. J., & Higgins, M. 2014, J. Nurs. Educ., 53, 207 [Google Scholar]
  37. Hunter, J. D. 2007, Comput. Sci. Eng., 9, 90 [NASA ADS] [CrossRef] [Google Scholar]
  38. Hurtado-Gil, L., Martínez, V. J., Arnalte-Mur, P., et al. 2017, A&A, 601, A40 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  39. Huynh, M. T., Jackson, C. A., Norris, R. P., & Fernandez-Soto, A. 2008, AJ, 135, 2470 [Google Scholar]
  40. Intema, H. T., Jagannathan, P., Mooley, K. P., & Frail, D. A. 2017, A&A, 598, A78 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  41. Jarvis, M. 2015, Astrophysics Source Code Library [record ascl:1508.007] [Google Scholar]
  42. Jauncey, D. L. 1975, ARA&A, 13, 23 [Google Scholar]
  43. Jeffreys, H. 1961, The Theory of Probability, 3rd edn. (Oxford University Press) [Google Scholar]
  44. Jiménez, E., Contreras, S., Padilla, N., et al. 2019, MNRAS, 490, 3532 [CrossRef] [Google Scholar]
  45. Johnson, N., Kemp, A., & Kotz, S. 2005, Univariate Discrete Distributions, Wiley Series in Probability and Statistics (Wiley) [CrossRef] [Google Scholar]
  46. Klypin, A., Prada, F., Betancort-Rijo, J., & Albareti, F. D. 2018, MNRAS, 481, 4588 [Google Scholar]
  47. Labini, F. S. 2011, Europhys. Lett., 96, 59001 [Google Scholar]
  48. Landy, S. D., & Szalay, A. S. 1993, ApJ, 412, 64 [Google Scholar]
  49. Lindsay, S. N., Jarvis, M. J., Santos, M. G., et al. 2014, MNRAS, 440, 1527 [NASA ADS] [CrossRef] [Google Scholar]
  50. Lista, L. 2017, Statistical Methods for Data Analysis in Particle Physics (Cham: Springer International Publishing), 941 [CrossRef] [Google Scholar]
  51. Magliocchetti, M., Maddox, S. J., Lahav, O., & Wall, J. V. 1999, Looking Deep in the Southern Sky (Berlin: Springer-Verlag), 112 [CrossRef] [Google Scholar]
  52. Magliocchetti, M., Popesso, P., Brusa, M., et al. 2016, MNRAS, 464, 3271 [Google Scholar]
  53. Mandal, S., Prandoni, I., Hardcastle, M. J., et al. 2021, A&A, 648, A5 [EDP Sciences] [Google Scholar]
  54. Mao, Y.-Y., Williamson, M., & Wechsler, R. H. 2015, ApJ, 810, 21 [NASA ADS] [CrossRef] [Google Scholar]
  55. Matthews, A. M., Condon, J. J., Cotton, W. D., & Mauch, T. 2021, ApJ, 909, 193 [NASA ADS] [CrossRef] [Google Scholar]
  56. Mohan, N., & Rafferty, D. 2015, Astrophysics Source Code Library [record ascl:1502.007] [Google Scholar]
  57. Neyman, J., Scott, E. L., & Shane, C. D. 1953, ApJ, 117, 92 [Google Scholar]
  58. Oliver, S., Waddington, I., Gonzalez-Solares, E., et al. 2004, ApJS, 154, 30 [NASA ADS] [CrossRef] [Google Scholar]
  59. Peacock, J. A., & Nicholson, D. 1991, MNRAS, 253, 307 [NASA ADS] [CrossRef] [Google Scholar]
  60. Pearson, K. 1900, The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science (Taylor& Francis), 50/157 [Google Scholar]
  61. Peebles, P. J. E. 1975, ApJ, 196, 647 [Google Scholar]
  62. Peebles, P. J. E. 1980, The Large-Scale Structure of the Universe (Princeton: Princeton University Press) [Google Scholar]
  63. Peebles, P. J. E., & Wilkinson, D. T. 1968, Phys. Rev., 174, 2168 [NASA ADS] [CrossRef] [Google Scholar]
  64. Pérez, F., & Granger, B. E. 2007, Comput. Sci. Eng., 9, 21 [Google Scholar]
  65. Pettitt, A. N., & Stephens, M. A. 1977, Technometrics, 19, 205 [Google Scholar]
  66. Pollo, A., Takeuchi, T. T., Suzuki, T. L., & Oyabu, S. 2012, PKAS, 27, 343 [Google Scholar]
  67. Ramalingam, S. 2008, J. Mod. Appl. Stat. Meth., 7, 6 [Google Scholar]
  68. Rana, S., & Bagla, J. S. 2019, MNRAS, 485, 5891 [NASA ADS] [CrossRef] [Google Scholar]
  69. Rawlings, J. I., Page, M. J., Symeonidis, M., et al. 2015, MNRAS, 452, 4111 [Google Scholar]
  70. Rengelink, R. 1999, in The Most Distant Radio Galaxies, eds. H. J. A. Röttgering, P. N. Best, & M. D. Lehnert, 399 [Google Scholar]
  71. Ricci, L., Boccardi, B., Nokhrina, E., et al. 2022, A&A, 664, A166 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  72. Roche, N., & Eales, S. A. 1999, MNRAS, 307, 703 [NASA ADS] [CrossRef] [Google Scholar]
  73. Saslaw, W. 2000, The Distribution of the Galaxies: Gravitational Clustering in Cosmology (Cambridge: Cambridge University Press) [Google Scholar]
  74. Saslaw, W. C., & Fang, F. 1996, ApJ, 460, 16 [Google Scholar]
  75. Saxena, A., Röttgering, H. J. A., Duncan, K. J., et al. 2019, MNRAS, 489, 5053 [Google Scholar]
  76. Seldner, M., & Peebles, P. J. E. 1981, MNRAS, 194, 251 [Google Scholar]
  77. Shappee, B. J., & Stanek, K. Z. 2011, ApJ, 733, 124 [NASA ADS] [CrossRef] [Google Scholar]
  78. Sheth, R. K., & Saslaw, W. C. 1994, ApJ, 437, 35 [Google Scholar]
  79. Shimwell, T. W., Röttgering, H. J. A., Best, P. N., et al. 2017, A&A, 598, A104 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  80. Shimwell, T. W., Tasse, C., Hardcastle, M. J., et al. 2019, A&A, 622, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  81. Shimwell, T., Hardcastle, M. J., Tasse, C., et al. 2022, A&A, 659, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  82. Siewert, T. M. 2021, PhD Thesis, Universität Bielefeld, Bielefeld [Google Scholar]
  83. Siewert, T. M., Hale, C., Bhardwaj, N., et al. 2020, A&A, 643, A100 [EDP Sciences] [Google Scholar]
  84. Siewert, T. M., Schmidt-Rubart, M., & Schwarz, D. J. 2021, A&A, 653, A9 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  85. Singh, V., Beelen, A., Wadadekar, Y., et al. 2014, A&A, 569, A52 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  86. Smirnov, N. 1948, Ann. Math. Stat., 19, 279 [CrossRef] [Google Scholar]
  87. Smolčić, V., Delvecchio, I., Zamorani, G., et al. 2017, A&A, 602, A2 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  88. Sotnikova, Y., Mikhailov, A., Mufakharov, T., et al. 2021, MNRAS, 508, 2798 [CrossRef] [Google Scholar]
  89. Totsuji, H., & Kihara, T. 1969, PASJ, 21, 221 [NASA ADS] [Google Scholar]
  90. van der Walt, S., Colbert, S. C., & Varoquaux, G. 2011, Comput. Sci. Eng., 13, 22 [Google Scholar]
  91. van Haarlem, M. P., Wise, M. W., Gunst, A. W., et al. 2013, A&A, 556, A2 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  92. Virtanen, P., Gommers, R., Oliphant, T. E., et al. 2020, Nat. Meth., 17, 261 [Google Scholar]
  93. Wang, Y., Brunner, R. J., & Dolence, J. C. 2013, MNRAS, 432, 1961 [CrossRef] [Google Scholar]
  94. Wen, D., Kemball, A. J., & Saslaw, W. C. 2020, ApJ, 890, 160 [Google Scholar]
  95. Williams, W. L., Hardcastle, M. J., Best, P. N., et al. 2019, A&A, 622, A2 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  96. Wilman, R. J., Röttgering, H. J. A., Overzier, R. A., & Jarvis, M. J. 2003, MNRAS, 339, 695 [Google Scholar]
  97. Wilman, R. J., Miller, L., Jarvis, M. J., et al. 2008, MNRAS, 388, 1335 [NASA ADS] [Google Scholar]
  98. Wong, S. 1992, Computational Methods in Physics and Engineering (Allied Publishers) [Google Scholar]
  99. Yang, A., & Saslaw, W. C. 2011, ApJ, 729, 123 [NASA ADS] [CrossRef] [Google Scholar]
  100. Zheng, Z., Berlind, A. A., Weinberg, D. H., et al. 2005, ApJ, 633, 791 [NASA ADS] [CrossRef] [Google Scholar]
  101. Zonca, A., Singer, L., Lenz, D., et al. 2019, J. Open Source Softw., 4, 1298 [Google Scholar]
  102. Zwillinger, D., & Kokoska, S. 2000, CRC Standard Probability and Statistics Tables and Formulae (Boca Raton: Chapman and Hall/CRC) [Google Scholar]

Appendix A: Masking strategies

The sky coverage, observed by the LoTSS-DR2, is available as a multi-order coverage map (MOC; Fernique et al. 2014). We use this MOC to define a HEALPIX mask with a resolution of Nside = 256, which corresponds to a covered area of Ωcell ≈ 1.60 × 10−5 sr ≈189.1 square arcmin for each cell. Initially, the highest resolution of the MOC is Nside = 1024. We then downgrade the resolution to Nside = 256 and reject all cells in the new resolution, which already have one masked cell in the old resolution. This procedure ensures that the new coverage mask not only recovers observed sky patches but also decreases the usable sky fraction. Fig. 1 shows the LoTSS-DR2 radio source catalogue, displayed as a HEALPIX source count map, without applying any mask or flux-density threshold. With the MOC mask, we recover 4 370 692 sources within a fractional sky coverage of fsky ≈ 13.7%, which corresponds to ∼1.72 sr. In addition to the coverage defined by the MOC mask, we exclude cells with less than five sources in it to ensure statistical stability and reject (nonphysical) empty cells (see Siewert et al. 2020 for more details). This mask will be called ‘mask 5s’.

thumbnail Fig. A.1.

Noise masks for LoTSS-DR2. The grey regions correspond to the lowest noise regions included in ‘mask50’, while dark grey is included in ‘mask75’ and black in ‘mask99’. The orange regions are outside of all three masks but non-empty cells in LoTSS-DR2.

Table A.1.

Comparison of the different masks for the LoTSS-DR2. The full sky with a HEALPIX resolution of Nside = 256 has 786 432 cells.

A.1. Default mask

The LoTSS-DR2 source catalogue is based on a mosaic of many individual pointings. Typically, maximal sensitivity is achieved close to the pointing centre. Combining the information from individual overlapping pointings reduces the effects of decreasing sensitivity and beam smearing in the outer regions of the individual observations of a pointing. In mask d (see Fig. 2 and Tab. 1) we remove area close to the boundary of the survey area in which the mosaic cannot use information form more than a single pointing. More details of the mask d are described in Sect. 4 and in Hale et al. (2024).

A.2. Masks based on rms noise

Similar to the set of masks of Siewert et al. (2020), we define a set of masks for the full LoTSS-DR2 based on the local rms noise of the sources averaged per cell. We define three noise masks that keep 50%, 75%, and 99% of all cells ranked by increasing the average cell noise. The masks are named ‘mask 50’, ‘mask 75’, and ‘mask 99’, respectively. Additionally, we reject in all noise masks three cells with a large amount of components arising from large and nearby spiral galaxies and clusters. For more details on these three cells, see the Appendix B. The five masks are presented in Fig. A.1 and the details of the observed number of cells (Ncell) and sources (N) for LoTSS-DR2 are presented in Table A.1. The low noise region of ‘mask 50’ with an averaged local noise of Srms ≤ 0.086 mJy per cell corresponds to the grey region in Fig. A.1. Including medium grey and black regions correspond to the ‘mask 75’ and ‘mask 99’ with Srms ≤ 0.111 mJy and 0.274 mJy, respectively. Including the dark blue regions defines the full coverage of the LoTSS-DR2.

For the comparison of LoTSS-DR1 and DR2, we make use of the masks defined in Siewert et al. (2020), see Fig. C.1. As the first data release of the LoTSS radio source catalogue has incomplete coverage in the HETDEX spring field and under-sampled regions, we have to mask those regions by rejecting cells of the DR2 source count map. In order to compare consistent sky coverages, we apply the default mask of the cosmology analysis of LoTSS-DR1 (‘mask d’; Siewert et al. 2020) also to LoTSS-DR2. This mask rejects five incomplete and under-sampled pointings, based on an averaged pointing radius of r = 1.7 deg, and, additionally, rejects cells with less than five sources. The sky coverage of this mask is fsky = 0.9%, which corresponds to the 7176 observed cells.

A.3. Masks based on pointing geometry

Our third masking strategy is based on the geometry of the pointings and the distance of the HEALPIX cell centres from the pointing centres. As discussed before, the variations in flux scale across the surveyed area are reduced by mosaicing. This means that overlapping reduces the variations of the flux scale. Based on it, we define our strategy as follows. Below a given angular distance between the pointing and cell centre (θ1) we accept all cells. For cells further away, but below a second angular distance (θ2), we also accept all cells that are in the neighbourhood of at least two pointing centres, which means overlapping of at least two pointings. Additionally, following Siewert et al. (2020), we exclude cells with less than five sources to reject unexpectedly empty cells, and we reject the three cells described in the Appendix B.

thumbnail Fig. A.2.

The ’mask 1219’ (top) and ’mask 1017’ (bottom) are based on the survey and pointing geometry. The light grey regions remain after masking. The patterns shown in the bottom panel represent masked regions that lie far from the pointing centre and are not shared by at least two pointings.

For these pointing-based masks, we use two sets of combinations, which are shown in Fig. 3. A more conservative one that leads to a quite patchy accepted survey area and one that minimises the number of holes in the survey area. Again, we used the HEALPIX pixelation in Nside = 256. For the conservative mask, we take θ1 = 1 deg and θ2 = 1.7 deg. It means that until 1 deg from pointing centres we pick all cells and from 1 to 1.7 deg we pick just the cells which are in common in at least two pointings. We call it ‘mask 1017’, which keeps 76% of the survey area. The number 10 in the mask name corresponds to 1.0 deg. and the number 17 corresponds to 1.7 deg. representing θ1 and θ2 in the mask definition. For the less conservative mask, we have θ1 = 1.2 deg and θ2 = 1.9 deg. This mask is called ‘mask 1219’ and maintains 92% of the survey area. As before, the number 12 in the mask name corresponds to 1.2 deg. and 19 corresponds to 1.9 deg., representing θ1 and θ2 in the mask definition. Both of these two combinations maintain an additional cut at RA = 270 deg for the RA-13 h field from the former mask, as the pointings outside this cut are mainly incomplete, which can be seen in Fig. 2 and 3.

Appendix B: Excluded Cells

The cell with the highest source density is cell 208 938 in HEALPIXNside = 256 ring scheme with 231 sources. It is located in the pointing P195+27 and contains the diffuse emission near the galaxy NGC 4889, which is deeply embedded within the Coma cluster. NGC 4889 is a well-known supergiant elliptical galaxy since the 18th century. Fig. B.1 top panel shows NGC 4889 with a red cross detected by the Python Blob Detection and Source Finder algorithm (Mohan & Rafferty 2015, PYBDSF) as separated sources. The small red dots are the detected sources and the green grid shows the HEALPIX cells in Nside = 256 in the nested scheme. The figure is generated by ALADIN3. ALADIN uses the nested scheme of HEALPIX, which is why the labels are different.

The second highest source density is found in cell 195 113 with 155 sources. In this case the source finder picked up the emission of jet gas of an AGN, the galaxy NGC 315 (Fig. B.1 middle panel). It is a giant elliptical galaxy, also well known since the 18th century and was discovered by William Herschel. Its distance is ∼70 Mpc (Ricci et al. 2022).

The third cell that we mask is 73 793 in pointing P209+55 with 141 sources. Here, the source finder picks up emission from gas and stars of the well-known Pinwheel galaxy, NGC5457 or M101 (Fig. B.1 bottom panel), a nearby galaxy of a distance of 6.4 Mpc (Shappee & Stanek 2011). PYBDSF does not detect it as a single source but many sources.

We checked that the fourth and fifth most populated cells sit in the smoothly extrapolated tail of the empirical counts-in-cells distribution and are kept for that reason. For calculation of the one- and two-point functions we should therefore exclude the three most populated cells.

thumbnail Fig. B.1.

NGC4889 galaxy in Coma cluster with diffuse emission around it (top), NGC315 galaxy, an active AGN with its jets (middle), NGC5457 or M101 or Pinwheel galaxy, a nearby galaxy (bottom). The red circles are the sources detected by PYBDSF. The green grid is the HEALPIX cells in Nside = 256.

Appendix C: Comparison to DR1

C.1. Comparison to HETDEX field

Similarly to Siewert et al. (2020), we use empirical moments to examine the region investigated in LoTSS-DR1 to identify the effect that the improved analysis pipeline has on the counts-in-cells statistics. For details on the definition of the sample statistics, see Sect. 2.1 and 2.2.

thumbnail Fig. C.1.

Source counts per HEALPIX cell of the LoTSS-DR1 (top) and LoTSS-DR2 (bottom) for the HETDEX spring field. Both surveys are masked with the default mask of Siewert et al. (2020). The observed difference in the mean source counts is attributed to changes in the pipeline for DR2 (Shimwell et al. 2022).

The distribution of source counts in LoTSS-DR1 revealed that nc > 1, indicating an excess variance compared to a Poisson distribution for all flux-density thresholds considered. This showed good agreement with a compound Poisson distribution (Siewert et al. 2020). While the improvements of the second data release lead to an increase in the source density in the HETDEX field, we calculate empirical moments, such as the clustering coefficient, skewness, and excess kurtosis of DR2 and compare them to the results of DR1 (Siewert et al. 2020). In Fig. C.1, we show the source count map of the HETDEX field covered by ’mask 5s’ of Siewert et al. (2020) (originally named ’mask d’) for both data releases. The sample statistics of both radio source catalogues as a function of flux-density threshold are compared in Fig. C.2.

Additionally, we present sample statistics for both the DR1 value-added catalogue and the DR2 value-added catalogue. It is important to note that the DR2 value-added catalogue only looks to the sources brighter than 8 mJy, rather than all sources in the DR2 catalogue. In order to compare the sample statistics of the same regions in the HETDEX field, we also use the noise masks of Siewert et al. (2020), which reject cells with a per cell averaged local noise above one (‘mask 1’) or two times the median local noise (‘mask 2’) of 71 μJy beam−1.

As already discussed in Siewert et al. (2020), the excess kurtosis (brown boxes) of the DR1 sample statistics depends on the local noise in the field of view and depends on improvements made in the value-added catalogue using a combination of statistical techniques and visual association and identification. Thereby reducing outliers in the distribution and resulting in a smoother kurtosis. Also the clustering coefficient (orange points) comes closer to the expectation for a Poisson distribution of nc = 1 for the value-added catalogue and the lowest noise region. For the HETDEX field of the DR2 radio source catalogue, we find a similar difference between ‘mask 5s’ and ‘mask 1’. The intermediate cut in local noise of ‘mask 2’ still shows a highly varying excess kurtosis and skewness (blue triangles).

The most prominent effect to the sample statistics can be seen by removing one cell with the highest source density. This cell, with the index ‘73793’ in the HEALPIX ring scheme, covers the Messier 101, or NGC5457, spiral galaxy. For more details see Appendix B. By removing this cell, the sample statistics, especially excess kurtosis, become more continuous over all flux-density thresholds. In Fig. C.3 this effect is shown for ‘mask 5s’ of Siewert et al. (2020) for the complete DR2 HETDEX sample (top) and for the sample with the cell removed (bottom). The sample statistics of a randomly drawn sample of same size from a mock catalogue (see Sect. 3.2) are also shown as lines. Large multi-component sources, which increase the cell source density in the catalogue, are not simulated in the mock catalogue. Therefore, the mock catalogue is stable against removing the same most dense cell of the catalogue, as seen in Fig. C.3. In general, the clustering parameter of the mock sample represents for most flux-density thresholds a Poisson distribution with nc = 1. At flux-density thresholds below 1 mJy, the variance of the counts-in-cell distribution of the mock catalogues increases, which corresponds to increasing incompleteness at lower flux densities.

thumbnail Fig. C.2.

Empirical moments of the LoTSS-DR1 radio source catalogue (left), DR1 value-added catalogue (middle left), the LoTSS-DR2 radio source catalogue (middle right), and DR2 value-added catalogue (right). We show the clustering parameter (nc, orange), skewness (g1, blue), and excess kurtosis (g2 − 3, brown) for the HETDEX field of DR1 and DR2 with ‘mask 1’, ‘mask 2’, and ‘mask 5s’ of Siewert et al. (2020).

C.2. Empirical moments of the full LoTSS-DR2

In comparison to the smaller HETDEX field (see Fig. C.2 and Fig. C.3), the sample statistics of the full LoTSS-DR2 field (see Fig. C.4) show a different behaviour while masking regions with higher noise. Skewness and excess kurtosis are for all masks similar and more continuous for flux-density thresholds between 2 and 10 mJy, than seen before in the smaller region of the HETDEX field. The clustering parameter still varies with values nc > 1 from the general assumption of a pure Poisson distribution, for which nc = 1 holds. Two effects seem reasonable for the more continuous skewness and excess kurtosis. Firstly, the noise masks additionally exclude three cells with a high number of sources (see Appendix B). Removing these outliers of the Poisson distribution can smooth the sample statistics over a broader range of flux-density thresholds. The lower statistical moments are defined on the sample mean and are therefore sensitive to outliers. Secondly, in the case of the larger sample size of the full LoTSS-DR2, the underlying distribution is sampled with more statistical accuracy for each subset of flux-density thresholds. For all masks at flux densities above 2 mJy, the skewness is about +1, indicating that the distribution is slightly skewed towards higher counts in cells. It differs from the skewness of the Poisson distribution, whose expectation value is close to zero for the flux threshold of 2 mJy (see Eq. 5 and the values in Table 2). Similar behaviour of the clustering parameter, the excess kurtosis and the skewness can be observed for all spatial masks. The behaviour of the statistical moments of the noise masks (‘mask 50’, ‘mask 75’, and ‘mask 99’) is almost the same and stable for flux densities above 2 mJy. This is another reason why we decided to use mask d for the cosmological analysis of LoTSS-DR2.

thumbnail Fig. C.3.

Empirical moments of the LoTSS-DR2 radio source catalogue (top) for the HETDEX field and the same catalogue, but with the most dense cell removed (bottom), as described in Sect. A.2 and Appendix B. Lines represent the corresponding sample statistics of a mock sample of same size.

thumbnail Fig. C.4.

Sample statistics of the full LoTSS-DR2 radio source catalogue (left) and value-added catalogue (right) as function of flux-density threshold for all masks defined in Table A.1. We show the clustering parameter (nc, orange), skewness (g1, blue), and excess kurtosis (g2 − 3, brown).

Appendix D: Results of distribution and clustering analysis for masks 1219 and 1017

D.1. Counts-in-cells distribution analysis

In this section, we present the results of the counts-in-cells distribution analysis, similar to Sect. 6.1, focussing on masks 1219 and 1017. Table D.1 lists the parameters for the Poisson, compound Poisson, and negative binomial distributions at various flux-density thresholds. The results of the reduced chi-square test for these masks are provided in Table D.2. Compared to mask d (see Table 3), masks 1219 and 1017 yield reduced chi-square values closer to one. Notably, only the 2 mJy flux-density threshold for these masks results in values below two. This indicates that masks 1219 and 1017 may effectively mitigate flux-density fluctuations towards the outer regions of the pointings (see Shimwell et al. 2022), thereby reducing systematic effects.

Table D.1.

Values of the parameters of three different models for the counts-in-cells distribution of LoTSS-DR2 sources.

Table D.2.

χ2/dof for mask 1219 and mask 1017 at different flux-density thresholds for LoTSS-DR2 sources.

Following the methodology outlined in Sect. 6.1, we apply the KS test to masks 1219 and 1017. The results are summarised in Table D.3. Mask 1017, our most conservative mask designed to minimise imaging artefacts, and mask 1219, slightly less conservative but more restrictive than mask d, yield dn values that are notably close to the critical thresholds. This suggests that the counts in cells filtered by these geometric masks are better described by the negative binomial distribution than any other model considered in this study.

D.2. Clustering analysis

In this section, we present the results of the two-point correlation function for three different spatial masks for comparison. As shown in Fig. D.1, the results were obtained using the TREECORR package with the same parameter settings described in Sect. 6.2. The findings indicate that there are no significant differences between the masks, with deviations at the level of 0.01 sigma, further supporting our choice of mask d as the default for our cosmological analysis. The inner plot highlights the differences between mask d and masks 1017 and 1219, respectively, in the range of 0.1 to 1 degree.

Table D.3.

Kolmogorov-Smirnov test statistic for masks 1219 and 1017 at different flux-density thresholds of the LoTSS-DR2 sources.

thumbnail Fig. D.1.

Comparison of the two-point correlation function for mask d, mask 1017 and mask 1219 using the Landy-Szalay estimator.

Appendix E: Numerical calculation of Cγ

We calculate the numerical coefficient Cγ replacing the integral in Eq. 24 by a sum over small and equally sized pixels, which provides the relation

C γ = 1 Ω c 2 ( Θ 0 Θ ) 1 γ i = 1 N i j N ( ϑ ij ϑ 0 ) 1 γ ( Δ Ω ) 2 , $$ \begin{aligned} C_\gamma = \frac{1}{\Omega _c^2} \left(\frac{\Theta _0}{\Theta }\right)^{1-\gamma } \sum _{i=1}^{N} \sum _{i \ne j}^{N} \, \left(\frac{\vartheta _{ij}}{\vartheta _0}\right)^{1 - \gamma }\!\!(\Delta \Omega )^2, \end{aligned} $$(E.1)

where ΔΩ denotes the solid angle of the small pixels. This definition ensures that C1 = 1. We evaluate Cγ at the large cell of Nside = 16 and split it into three different smaller cell sizes of Nside = 1024, 2048, 4096 corresponding to cell sizes of 0.57, 0.28, 0.14 deg2, respectively. The result of Cγ as function of γ is shown in Fig. E.1. It shows that for γ up to ∼2, Cγ converges. With higher values of γ we need to split the cells to smaller solid angles, in order to converge. Since at almost all values of γ ≤ 2, after applying S/N cuts, the Cγ values converge, we do not need to split the cells to smaller solid angles.

thumbnail Fig. E.1.

Cγ as a function of γ. The cell at Nside = 16 is split into three different smaller cell sizes to test the variation of Cγ w.r.t. γ.

All Tables

Table 1.

Definition of mask d.

Table 2.

Values of the parameters of three different models for the counts-in-cells distribution of LoTSS-DR2 sources.

Table 3.

χ2/d.o.f. for three models of the counts in cells for mask d at different flux-density thresholds for LoTSS-DR2 sources.

Table 4.

Kolmogorov-Smirnov test statistic for mask d at different flux-density thresholds of the LoTSS-DR2 sources.

Table 5.

Best-fit results for the amplitude and exponent of the power law.

Table 6.

Best-fit results for amplitude and exponent of a power law.

Table A.1.

Comparison of the different masks for the LoTSS-DR2. The full sky with a HEALPIX resolution of Nside = 256 has 786 432 cells.

Table D.1.

Values of the parameters of three different models for the counts-in-cells distribution of LoTSS-DR2 sources.

Table D.2.

χ2/dof for mask 1219 and mask 1017 at different flux-density thresholds for LoTSS-DR2 sources.

Table D.3.

Kolmogorov-Smirnov test statistic for masks 1219 and 1017 at different flux-density thresholds of the LoTSS-DR2 sources.

All Figures

thumbnail Fig. 1.

Counts in cells of the LoTSS-DR2 radio source catalogue in Mollweide view and equatorial coordinates without a flux-density cut. The counts in cells are based on HEALPIX with a resolution of 13.74 square arcmin.

In the text
thumbnail Fig. 2.

Example of complete pointings in the HETDEX field and incomplete pointings at the boundary of the survey area, and pointings that are not mosaiced with other neighbouring pointings.

In the text
thumbnail Fig. 3.

For ‘mask d’ the light grey regions remain after masking, while dark grey is the survey area.

In the text
thumbnail Fig. 4.

Left: Comparison of the differential source counts from mask d, mask 1017 (explained in Appendix A.3), and mask 50 (explained in Appendix A.2) with the results from the LoTSS Deep Fields (Mandal et al. 2021), and LoTSS-DR1 (Siewert 2021). The ending ‘radio’ refers to the application of the mask to the LoTSS-DR2 radio source catalogue, while the ending ‘value’ refers to the LoTSS-DR2 value-added catalogue. Downward pointing arrows indicate error-bars that extend outside of the plotted y-range. Right: Comparison of the differential source counts from mask d with the results from the LoTSS Deep Fields, TGSS (150 MHz; Intema et al. 2017), RACS (888 MHz; Hale et al. 2021), LoLSS-DR1 (54 MHz; de Gasperin et al. 2021), NVSS (1.4 GHz; Matthews et al. 2021), and SKADS (1400 MHz Wilman et al. 2008).

In the text
thumbnail Fig. 5.

Histograms of counts in cells of mask d for LoTSS-DR2 and the random mock catalogue at the flux-density thresholds 2, 4, and 8 mJy (left, top to bottom), and the CDFs (right, top to bottom) with the best-fit Poisson and compound Poisson and negative binomial distributions.

In the text
thumbnail Fig. 6.

Empirical variance of Ψ2 estimated analytically (blue) and from a bootstrap method (orange) with 500 realisations, each at 2 mJy flux-density threshold at different values of Nside of the HEALPIX scheme.

In the text
thumbnail Fig. 7.

Fitted single power law for variance variations of re-scaling HEALPix maps for Nside = 16 − 512 corresponding to 3.8 deg to 0.11 deg at different flux-density thresholds (left) and for Nside = 16 − 256 corresponding to 3.8 deg to 0.23 deg (right), plotted for Nside = 16 − 2048, which means the fit was not performed for the fainter data points.

In the text
thumbnail Fig. 8.

Fitted single power law for variance variations of re-scaling HEALPix maps for Nside = 16 − 512 (left) and for Nside = 16 − 256 (right) at flux-density threshold 2 mJy and different S/N cuts plotted for Nside = 16 − 2048 (solid points represent the data points for which fits were performed).

In the text
thumbnail Fig. 9.

Fitting results for power laws to different angular ranges at 2 mJy flux-density threshold and S/N 7.5. (dashed lines). The dots represent the results of direct measurements of the two-point correlation function using the Landy-Szalay estimator. The inner plot shows the 1σ and 2σ contours for amplitude and exponent of the power-law fit to w(ϑ).

In the text
thumbnail Fig. A.1.

Noise masks for LoTSS-DR2. The grey regions correspond to the lowest noise regions included in ‘mask50’, while dark grey is included in ‘mask75’ and black in ‘mask99’. The orange regions are outside of all three masks but non-empty cells in LoTSS-DR2.

In the text
thumbnail Fig. A.2.

The ’mask 1219’ (top) and ’mask 1017’ (bottom) are based on the survey and pointing geometry. The light grey regions remain after masking. The patterns shown in the bottom panel represent masked regions that lie far from the pointing centre and are not shared by at least two pointings.

In the text
thumbnail Fig. B.1.

NGC4889 galaxy in Coma cluster with diffuse emission around it (top), NGC315 galaxy, an active AGN with its jets (middle), NGC5457 or M101 or Pinwheel galaxy, a nearby galaxy (bottom). The red circles are the sources detected by PYBDSF. The green grid is the HEALPIX cells in Nside = 256.

In the text
thumbnail Fig. C.1.

Source counts per HEALPIX cell of the LoTSS-DR1 (top) and LoTSS-DR2 (bottom) for the HETDEX spring field. Both surveys are masked with the default mask of Siewert et al. (2020). The observed difference in the mean source counts is attributed to changes in the pipeline for DR2 (Shimwell et al. 2022).

In the text
thumbnail Fig. C.2.

Empirical moments of the LoTSS-DR1 radio source catalogue (left), DR1 value-added catalogue (middle left), the LoTSS-DR2 radio source catalogue (middle right), and DR2 value-added catalogue (right). We show the clustering parameter (nc, orange), skewness (g1, blue), and excess kurtosis (g2 − 3, brown) for the HETDEX field of DR1 and DR2 with ‘mask 1’, ‘mask 2’, and ‘mask 5s’ of Siewert et al. (2020).

In the text
thumbnail Fig. C.3.

Empirical moments of the LoTSS-DR2 radio source catalogue (top) for the HETDEX field and the same catalogue, but with the most dense cell removed (bottom), as described in Sect. A.2 and Appendix B. Lines represent the corresponding sample statistics of a mock sample of same size.

In the text
thumbnail Fig. C.4.

Sample statistics of the full LoTSS-DR2 radio source catalogue (left) and value-added catalogue (right) as function of flux-density threshold for all masks defined in Table A.1. We show the clustering parameter (nc, orange), skewness (g1, blue), and excess kurtosis (g2 − 3, brown).

In the text
thumbnail Fig. D.1.

Comparison of the two-point correlation function for mask d, mask 1017 and mask 1219 using the Landy-Szalay estimator.

In the text
thumbnail Fig. E.1.

Cγ as a function of γ. The cell at Nside = 16 is split into three different smaller cell sizes to test the variation of Cγ w.r.t. γ.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.