The multi-dimensional halo assembly bias can be preserved when enhancing halo properties with HALOSCOPE

Sujatha Ramakrishnan; Violeta Gonzalez-Perez; Gabriele Parimbelli; Gustavo Yepes

doi:10.1051/0004-6361/202453030

Home

All issues

Volume 697 (May 2025)

A&A, 697 (2025) A70

Full HTML

Open Access

Issue		A&A Volume 697, May 2025


Article Number		A70
Number of page(s)		14
Section		Cosmology (including clusters of galaxies)
DOI		https://doi.org/10.1051/0004-6361/202453030
Published online		08 May 2025

A&A, 697, A70 (2025)

The multi-dimensional halo assembly bias can be preserved when enhancing halo properties with HALOSCOPE

Sujatha Ramakrishnan¹^,2^,3^⋆, Violeta Gonzalez-Perez¹^,2, Gabriele Parimbelli³ and Gustavo Yepes¹^,2

¹ Departamento de Física Teórica, Facultad de Ciencias M-8, Universidad Autónoma de Madrid, 28049 Madrid, Spain
² Centro de Investigación Avanzada en Física Fundamental (CIAFF), Facultad de Ciencias, Universidad Autónoma de Madrid, 28049 Madrid, Spain
³ Institute of Space Sciences (ICE, CSIC), Campus UAB, Carrer de Can Magrans, s/n, 08193 Barcelona, Spain

^⋆ Corresponding author.

Received: 15 November 2024
Accepted: 14 March 2025

Abstract

Context. Over 90% of dark matter haloes in cosmological simulations have unresolved properties. This can hinder the dynamical range of simulations and result in systematic biases when modelling cosmological tracers. Current methods for enhancing unresolved haloes cannot preserve the multi-dimensional assembly bias found in simulations.

Aims. We aim to more precisely determine unresolved structural and dynamical halo properties while preserving the correlations with environment and halo assembly bias found in simulations.

Methods. We have developed HALOSCOPE, a machine learning technique that uses multi-variate conditional probability distribution functions. This method ensures that correlations among various halo properties, as well as their dependence on the local environment, are preserved. In this work, we trained HALOSCOPE with a high-resolution (HR) simulation and used it to better determine the properties (concentration, spin, and two shape parameters) of unresolved dark matter haloes in an eight times lower resolution simulation.

Results. HALOSCOPE is able to recover the multi-dimensional halo assembly bias, that is, the correlations of different combinations of halo properties with the large-scale environment, measured in the HR simulation. This is achieved by including the linear halo-by-halo bias and tidal anisotropy in the set of input training parameters. HALOSCOPE, by design, also recovers the joint distribution of the halo properties. To study how resolution effects propagate into the clustering of model galaxies, we generated catalogues of central galaxies using two implementations of the assembly bias in a halo occupation distribution model. The clustering of central model galaxies is improved by a factor of three at 0.009<k (h Mpc⁻¹)<0.6 when the unresolved haloes are enhanced with HALOSCOPE.

Conclusions. Our method can preserve the multi-dimensional halo assembly bias when trained using the local environment of haloes. HALOSCOPE can improve the accuracy of cosmological tracer catalogues produced with approximate methods when many realisations are needed.

Key words: methods: numerical / methods: statistical / Galaxy: general / cosmology: theory / dark matter / large-scale structure of Universe

© The Authors 2025

Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1. Introduction

Over 80% of dark matter haloes in numerical simulations have unresolved halo masses, and this number increases to 90% when considering other halo properties (see Fig. A.1). Simulations are required to assess the systematic errors and incompleteness of current and future cosmological surveys (LSST Science Collaboration 2009; Euclid Collaboration: Mellier et al. 2025 and DESI Collaboration 2024). These surveys are probing the large-scale structure of the Universe to understand the nature of dark matter and dark energy. The tailored simulations that support these surveys (e.g. Euclid Flagship, Outer Rim, and Abacus Summit; Potter et al. 2017; Heitmann et al. 2019, and Maksimova et al. 2021) are hindered by their unresolved halo properties (Mansfield & Avestruz 2021).

The most common way to overcome unresolved halo properties is to resample the properties of low-mass haloes with fitting functions that provide the mean and scatter of a given halo property (Knebe & Power 2008; Diemer & Kravtsov 2015; Diemer & Joyce 2019 and Ishiyama et al. 2021). However, correlations between halo properties have only been taken into account in a handful of studies (e.g. Farahi et al. 2022 and Mendoza et al. 2023). None of the current methods for more precisely and accurately determining unresolved halo properties can preserve the multi-dimensional assembly bias (AB) found in simulations.

Halo AB is a term used to describe the residual dependence of the large-scale clustering on halo properties other than mass after fixing halo mass (Sheth & Tormen 2004; Gao et al. 2005; Faltenbacher & White 2010, and reference therein). Many studies over the past decades have helped develop the understanding that the large-scale halo AB is a consequence of the tidal influence on the halo properties at relatively small scales (Hahn et al. 2007; Shi et al. 2015; Borzyszkowski et al. 2017; Salcedo et al. 2018; Musso et al. 2018; Ramakrishnan et al. 2019).

The halo AB affects the clustering of the galaxies they host. Galaxy AB studies based on both galaxy models and hydrodynamical simulations have determined that the local environment influences galaxy clustering (Gonzalez-Perez et al. 2020; Xu et al. 2021; Balaguera-Antolínez et al. 2024; Paviot et al. 2024; Alam et al. 2024; Yuan et al. 2024). The halo environment at large and intermediary scales has been used to generate galaxy catalogues that can recover the two-point correlation function measured by SDSS-IV/eBOSS, GAMA, and DESI (Paviot et al. 2024; Alam et al. 2024 and Yuan et al. 2024) or the bispectrum (Coloma-Nadal et al. 2024).

Halo occupation distribution (HOD) models have been widely used to explore the effect that uncertainties in the clustering of particular cosmological tracers might have on cosmological inferences (e.g. Alam et al. 2021). The simplest HOD models place galaxies into haloes based only on their mass (Zheng et al. 2005 and Zehavi et al. 2011). However, several studies highlight the requirement to go beyond halo mass to reproduce the observed clustering of cosmological tracers, such as galaxies (Watson et al. 2015; Hearin et al. 2016; Tinker et al. 2018 and Rocher et al. 2023).

Lau et al. (2021) and Zhang et al. (2024) point out the need to take into account the various correlations between halo properties, in addition to the environment, when modelling clustering systematic errors. Besides occupation, models for the galaxy-halo connection have also explored associating galaxy properties such as galaxy spins, bar formation, and disk size with halo properties (Fall & Efstathiou 1980; Mo et al. 1998 and Kataria & Shen 2022)

In this work we used the halo mass and local environment to develop a multi-dimensional prediction for the internal halo properties (concentration, shape, and spin). Previously, we made this prediction for one halo property at a time at a given redshift (Ramakrishnan et al. 2021) and incorporating a cosmology and redshift dependence (Ramakrishnan & Velmani 2022). Here, we extend the capability of our algorithms to incorporate correlations between the halo properties and to be able to recover the multi-dimensional halo AB (i.e. the dependence of the halo bias and clustering on two or more halo properties). We also applied them to a model galaxy population built on a low-resolution (LR) halo catalogue and show the effects and improvements in the predicted galaxy clustering.

The paper is organised as follows. Section 2 describes the specifications of the set of simulations that are used in this paper. It also introduces the dark matter haloes in the simulations and their properties. Section 3 introduces the mathematical framework of our algorithm, HALOSCOPE (HALO propertieS having COvariance Preserved with Environment), used to predict the multi-dimensional distribution and AB of halo properties. This code is publicly available¹. In Sect. 4 we show how HALOSCOPE can be used on poorly resolved haloes from a LR simulation to recover the properties and associated correlations of haloes from a high-resolution (HR) simulation. In Sect. 6, using an HOD with AB modelling, we show how poorly resolved haloes can affect the modelling of galaxy clustering and how our algorithm can be used to improve it. We conclude in Sect. 7.

2. Simulations

Our primary N-body simulation suite is the UNIT² (Chuang et al. 2019), which has a box size of 1 Gpc h⁻¹. These are dark-matter only runs with the following cosmological parameters: Ω_m = 0.3089, h₀ = 0.6774, n_s = 0.9667, σ₈ = 0.8147. We used a pair of simulations with two dark-matter particle mass resolutions, one with m_p = 1.2×10⁹ M_⊙ h⁻¹, which we call a HR simulation, and another with eight times worse resolution (i.e. m_p = 9.6×10⁹ M_⊙ h⁻¹), which we call a LR simulation. We performed our analysis on the final simulation snapshot, at z = 0.

The dark matter haloes in the simulation were identified using ROCKSTAR (Behroozi et al. 2013a) and CONSISTENT TREES (Behroozi et al. 2013b).

2.1. Halo properties

In our analysis we used several of the dark matter halo properties computed by ROCKSTAR. We divided this properties into primary and secondary properties:

–
Primary halo property: The primary property of dark matter haloes is their mass. By default, we used M_200b. This is the mass enclosed inside 200 times the background density, R_200b. We discuss other possible primary halo properties in Appendix C.
–
Secondary halo properties:
- 1.
  c_vir – is the slope of the Navarro-Frenk-White (NFW) density profile. It is also a proxy for merger history of the halo (Wang et al. 2020).
- 2.
  λ – measure of the angular momentum of the halo.
- 3.
  c/a – ratio of the smallest ellipsoidal axis to the largest.
- 4.
  b/a – this is the ratio of the second smallest ellipsoidal axis to the largest.

Before the analysis, we also applied cleaning cuts on the haloes to ensure that only parent haloes are considered (PID=−1) and to select for virialised haloes (2T/|U|≤2) (Bett et al. 2007). We also ensured that the haloes had not been subjected to recent major mergers, which could drastically alter their halo properties (z_lmm>0.4).

2.2. Environmental properties

The key idea of our method is to assign internal halo properties using descriptors of the halo's environment. These environmental descriptors need to be computed at sufficiently large scales that they will also be resolved by either lower resolution simulations or fast approximate methods. In this regard we used as input three halo environmental properties: tidal anisotropy (α), overdensity (δ), and the linear bias (b₁). Below we define these properties and we describe the scales at which they are computed.

2.2.1. Tidal anisotropy

We also describe the environment of each halo outside of its boundary at different scales. For this, we primarily used the overdensity of the halo and the tidal anisotropy of the halo. Both quantities were constructed using the eigenvalues λ₁, λ₂, λ₃ of the tidal tensor field ∂_i∂_jψ at several smoothing scales (Catelan & Theuns 1996; Heavens & Peacock 1988 and Hahn et al. 2007):

$δ_{S} = λ_{1} + λ_{2} + λ_{3}$ $\delta _S = \lambda _{1}+\lambda _{2}+\lambda _{3}$ (1)

$α_{S} = \frac{\sqrt{q^{2}}}{1 + δ_{S}},$ $\alpha _S = \dfrac {\sqrt {q^{2}}}{1+\delta _{S}},$ (2)

where q²=(1/2)[(λ₁−λ₂)²+(λ₂−λ₃)²+(λ₃−λ₁)²], and S is the choice of smoothing scale. We refer to Paranjape et al. (2018) for the exact procedure to compute it. In Ramakrishnan et al. (2019) it was statistically established that the tidal anisotropy $α_{R_{200 b}}$ $\alpha _{R_{\rm 200b}}$ ³ is the primary indicator of the halo AB. We computed α_S at 20 different Gaussian smoothing scales ranging from 0.24 to 4 Mpc h⁻¹ and interpolated in between to assign for each halo the $α_{4 R_{200 b}}$ $\alpha _{4R_{200b}}$ . The choice of the smallest and the largest smoothing scales are proportional to the smallest and largest haloes in the UNIT simulation. In Appendix B we provide an independent justification for this choice of smoothing scale as it maximises correlation with the overdensity when compared to all other smoothing scales (see Fig. B.1). A multi-scale treatment of the cosmic web filaments is essential for accurately describing the environment of the dark matter halo (Paranjape 2021 and Dhawalikar & Paranjape 2024).

2.2.2. Overdensity

The overdensity, $δ_{10 R_{200 b}}$ $\delta _{10 R_{200b}}$ , has already been defined in terms of the eigenvalues of the tidal tensor in Eq. (1) in the section above. This is computed at a larger smoothing scale of about 10×R_200b.

2.2.3. Linear bias

We computed the linear bias, b₁, at large scales, r>60 h⁻¹ Mpc (k_max = 0.1 h Mpc⁻¹). For this calculation, we followed the halo-by-halo bias estimator introduced in Paranjape et al. (2018) (see also Ramakrishnan et al. 2019; Contreras et al. 2021; Balaguera-Antolínez & Montero-Dorta 2024). This estimator provides a single value for the linear bias of each halo. This halo-by-halo bias b₁ is defined as follows,

$b_{1} = \sum_{low - k} w_{k} [〈 e^{i . k . x} δ^{*} (k) 〉_{k} / P_{mm} (k)],$ $b_{1} = \sum _{\rm low - k} w_{k}\left [\langle e^{i.{\boldsymbol {k}}.{\boldsymbol {x}}} \delta ^{*}({\boldsymbol {k}})\rangle _{k}/P_{\rm mm}(k) \right ],$ (3)

where x is the spatial location of the halo whose bias is being computed, δ(k) is the density contrast of the dark matter field, P_mm(k) is the matter power spectrum and w_k are weights proportional to the k modes available in the simulation.

For a halo population, the average value of the halo-by-halo bias is equal to the linear bias obtained using a traditional estimator. In this case, the traditional estimator would be the ratio between the halo-matter power spectrum and the matter-matter power spectrum at k<0.1 h Mpc⁻¹.

3. Methods: HALOSCOPE

Over 90% of haloes have unresolved properties (see Appendix A) that can be improved. We developed HALOSCOPE to improve unresolved halo properties. HALOSCOPE is a machine learning (ML) technique that uses multi-variate Gaussian distributions with conditional probability given a set of halo properties. Our aims are to (i) impose halo property correlations and (ii) make an adequate choice of training parameters to preserve the multi-dimensional AB.

We enhanced the secondary properties of LR haloes by training our algorithm with the properties of HR ones, whose mass resolution is eight times better (Sect. 2). Our method was applied and refitted in different mass bins.

We consider the primary property of haloes to be their mass. Our algorithm does not modify this primary property and we used M_200b (see Appendix C for a discussion on different halo mass definitions). There are several studies aiming at improving the halo mass of unresolved haloes (e.g. Forero-Sánchez et al. 2022). The improvement achieved for halo masses using different techniques is marginal for the simulations under study (Appendix C.1). Thus, we proceeded to improving only secondary properties of LR haloes.

Here we refer to the information we want to predict accurately as the target vector. This vector comprises various secondary structural and dynamical properties, c={c₁, c₂, …c_r}. The input information required by HALOSCOPE is a vector containing two type of properties:

Descriptors of the local density field around the halo α={α₁, α₂, …α_q}. The local density field is defined at a larger scale than the halo and hence sufficiently resolved within the LR simulation.
The LR halo masses. Other properties from the LR are actually not important for the recovery of the HR secondary properties (this will be discussed in Sect. 5.2).

We modelled the joint probability of the target vector, c, and environmental properties, α, as a multi-dimensional Gaussian distribution, 𝒩. This is a generalisation of the method described in Ramakrishnan et al. (2021) to a multi-variate case. If we have r structural and dynamical halo properties c={c₁, c₂…c_r}, and a vector of q environment variables α={α₁, α₂, …α_q}; together they span a (r+q) – dimensional Gaussian distribution:

$[\begin{matrix} c \\ α \end{matrix}] \sim N ([\begin{matrix} μ_{c} \\ μ_{α} \end{matrix}], [\begin{matrix} σ_{c} ρ_{c c} σ_{c} & σ_{c} ρ_{c α} σ_{α} \\ σ_{α} ρ_{α c} σ_{c} & σ_{α} ρ_{α α} σ_{α} \end{matrix}]) .$ $\begin {bmatrix}{\boldsymbol {c}} \\ {\boldsymbol {\alpha }} \end {bmatrix} \sim {\cal {{N}}}\left (\begin {bmatrix}{\boldsymbol {\mu }}_{{\boldsymbol {c}}} \\ {\boldsymbol {\mu }}_{{\boldsymbol {\alpha }}}\end {bmatrix}, \begin {bmatrix}{\sigma }_{{\boldsymbol {c}}} {\rho }_{{\boldsymbol {c}}{\boldsymbol {c}}} {\sigma }_{{\boldsymbol {c}}}& {\sigma }_{{\boldsymbol {c}}} {\rho }_{{\boldsymbol {c}}{\boldsymbol {\alpha }}} {\sigma }_{{\boldsymbol {\alpha }}} \\ {\sigma }_{{\boldsymbol {\alpha }}}{\rho }_{{\boldsymbol {\alpha }}{\boldsymbol {c}}} {\sigma }_{{\boldsymbol {c}}} & {\sigma }_{{\boldsymbol {\alpha }}} {\rho }_{{\boldsymbol {\alpha }}{\boldsymbol {\alpha }}} {\sigma }_{{\boldsymbol {\alpha }}} \end {bmatrix}\right ).$ (4)

The multi-dimensional Gaussian above is characterised by a vector of means, μ_i, and standard deviations for each property in the diagonal elements of two block matrices, $σ_{c} = diag (σ_{c_{1}}, σ_{c_{2}}, \dots σ_{c_{r}})$ ${\sigma }_{{\boldsymbol {c}}} = {\rm diag}(\sigma _{c_{1}},\sigma _{c_{2}}, \ldots \sigma _{c_{r}})$ and $σ_{α} = diag (σ_{α_{1}}, σ_{α_{2}}, \dots σ_{α_{q}})$ ${\sigma }_{{\boldsymbol {\alpha }}} = {\rm diag}(\sigma _{\alpha _{1}},\sigma _{\alpha _{2}}, \ldots \sigma _{\alpha _{q}})$ . The block matrix of correlation coefficients between the different internal halo properties is $ρ_{c c} = [ρ_{c_{i} c_{j}}$ ${\rho }_{{\boldsymbol {c}}{\boldsymbol {c}}} = [{\rho _{c_{i}c_{j}}}$ ]; the block matrix of correlation coefficients between the different environment variables is $ρ_{α α} = [ρ_{α_{i} α_{j}}]$ ${\rho }_{{\boldsymbol {\alpha }}{\boldsymbol {\alpha }}} = [{\rho _{\alpha _{i}\alpha _{j}}}]$ ; and the block matrices of cross-correlations between the halo properties and the environment are $ρ_{c α} = [ρ_{c_{i} α_{j}}]$ ${\rho }_{{\boldsymbol {c}}{\boldsymbol {\alpha }}} = [{\rho _{c_{i}\alpha _{j}}}]$ and $ρ_{α c} = [ρ_{α_{j} c_{i}}]$ ${\rho }_{{\boldsymbol {\alpha }}{\boldsymbol {c}}} = [{\rho _{\alpha _{j}c_{i}}}]$ .

We visualise the block correlation matrices from Eq. (4) in Fig. 1, where we use the N-body simulations under study. In this case, we took four halo properties into consideration, c={c_vir, λ, c/a, b/a}, and three halo environmental properties, α={b₁, α_4R, δ_10R}. We then computed the block diagonal elements shown in Eq. (4) (i.e. ρ_αα, ρ_αc, and ρ_cc) for the well-resolved HR simulations (demarcated by a black border in Fig. 1). Each element of the block matrices in Fig. 1 is colour coded according to the Spearman rank correlation value, going from −1 to 1. Such correlations between several halo properties have been studied previously (e.g. Shin & Diemer 2023) and we obtain qualitatively consistent values. We also show a similar matrix for the LR simulation (top-left and demarcated by a grey border in Fig. 1). The block matrix of correlation coefficients between different environment variables, ρ_αα, is well resolved in both the LR and HR simulations and is the same in the two because the simulations have similar initial conditions. Hence, ρ_αα is shown just once in Fig. 1, as a common overlapping region for both the LR (grey border) and HR (black border) simulations.

Fig. 1.

Correlation matrices for the properties of haloes with the lowest masses considered here, 5·10¹¹<M_h (M_⊙ h⁻¹)<7·10¹¹. Each matrix element is colour coded with the value of the Spearman rank correlation value, from −1 to 1, as indicated in the colour bar. We distinguish between intrinsic halo properties, c={c_vir, λ, c/a, b/a}, and environmental ones, α={b₁, α_4R, δ_10R}. We can understand these correlation matrices as being composed of four blocks (Eq. (4)): correlations between environmental properties, ρ_αα; correlations between intrinsic and environmental properties, ρ_αc and ρ_cα; and correlations between intrinsic halo properties, ρ_cc. The LR simulation matrix is at the top, outlined in grey, and the HR one is at the bottom, outlined in black. Since the environmental properties are well resolved in both simulations, we indicate a common correlation block matrix, ρ_αα. Correlations between different halo properties are affected by resolution effects, as can be seen by comparing the top LR matrix with the bottom HR one. The key idea of our method is to incorporate the missing correlations, i.e. the shades of blue, which indicate negative correlations present in the HR simulation (bottom matrix) but not in the LR simulation (top matrix).

Several correlations that exist for HR haloes are missing for their LR counterparts (Fig. 1). Notably,

Negative correlations (blue shades within the black border in Fig. 1) between halo spin and other properties seen in the HR ρ_cc matrix, are absent in the LR one.
The smaller Spearman rank correlation values in the LR ρ_cα (more white and less intense red within the grey border in Fig. 1) indicate a loss of correlations with the environment (AB), when the resolution is lower.

HALOSCOPE aims to preserve the correlations present in the training set, the HR simulation in this case. Our model prediction for the distribution of c given α, c|α, is given by a p-dimensional Gaussian distribution (Bishop & Nasrabadi 2007, Chapter 2.3):

$c | α \sim p (c | \tilde{α}) = N (\bar{μ}, \bar{Σ}),$ ${\boldsymbol {c}}|{\boldsymbol {\alpha }} \sim p(c|{\tilde {\alpha }}) = {\cal {{N}}}({{\boldsymbol {\bar {\mu }}}},{\bar {\Sigma }}),$ (5)

where the average mean is defined as

$\bar{μ} = μ_{c} + σ_{c} ρ_{c α} ρ_{α α}^{- 1} σ_{α}^{- 1} (α - μ_{α}),$ ${{\boldsymbol {\bar {\mu }}}} = {\boldsymbol {\mu }}_{{\boldsymbol {c}}} + {\sigma }_{{\boldsymbol {c}}}{\rho }_{{\boldsymbol {c}}{\boldsymbol {\alpha }}}{\rho }^{-1}_{{{\boldsymbol {\alpha }}}{{\boldsymbol {\alpha }}}} {\sigma }^{-1}_{{\boldsymbol {\alpha }}}({\boldsymbol {\alpha }}-{\boldsymbol {\mu _{{\boldsymbol {\alpha }}}}}),$ (6)

and the average scatter is

$\bar{Σ} = σ_{c} (ρ_{c c} - ρ_{c α} ρ_{α α}^{- 1} ρ_{α c}) σ_{c} .$ ${\boldsymbol {\bar {\Sigma }}} = {\sigma _{{\boldsymbol {c}}}}({\rho _{{\boldsymbol {c}}{\boldsymbol {c}}}}-{\rho _{{\boldsymbol {c}}{\boldsymbol {\alpha }}}}{\rho }^{-1}_{{\boldsymbol {\alpha }}{\boldsymbol { \alpha }}}{\rho _{{\boldsymbol {\alpha }} {\boldsymbol {c}}}}){\sigma }_{{\boldsymbol {c}}}.$ (7)

In the case where c and α are scalars, the terms in Eqs. (6) and (7) simplify to ρ_αα = 1 and ρ_cc = 1. In this case, we can redefine ρ_cα=ρ_αc=ρ_c, and use the standardised α (i.e. $\tilde{α} = (α - μ_{α}) σ_{α}^{- 1}$ ${\tilde {\alpha }}=(\alpha -\mu _{\alpha })\sigma ^{-1}_{\alpha }$ ), reducing the above expressions to

$\bar{μ} = μ_{c} + σ_{c} ρ_{c} \tilde{α}$ $\bar {{\mu }} = \mu _c + \sigma _c\rho _c {\tilde {\alpha }}$ (8)

$\bar{Σ} = σ_{c}^{2} (1 - ρ_{c}^{2})$ ${\bar {\Sigma }} = {\sigma ^2_{c}}(1-{\rho ^2_{c}})$ (9)

$p (c | \tilde{α}) = \frac{e^{- (c - μ_{c} - σ_{c} ρ_{c} \tilde{α})^{2} / 2 σ_{c}^{2} (1 - ρ_{c}^{2})}}{\sqrt{2 π σ_{c}^{2} (1 - ρ_{c}^{2})}} \cdot$ $p(c|{\tilde {\alpha }}) =\dfrac {e^{-(c- \mu _c - \sigma _c\rho _c {\tilde {\alpha }})^2/2\sigma ^2_{c}(1-\rho ^2_{c})}}{\sqrt {2 \pi \sigma ^2_{c}(1-\rho ^2_{c})}}\cdot$ (10)

The above equation is similar to Eq. (4) in Ramakrishnan & Velmani (2022). The framework is also similar to that presented in Mendoza et al. (2023, MultiCAM-with scatter) and Farahi et al. (2022, with KLLR). Mendoza et al. (2023) introduced a generalisation of the conditional abundance matching technique (e.g. Hearin & Watson 2013) and used the same formalism for predicting the present-day halo properties given the accretion history of the halo traced back in time. Here we target large-volume simulations and aim to improve the LR halo properties and their AB given the information about the present-day local environment. This highlights the diversity of possible applications of HALOSCOPE to other problems in astronomy and cosmology.

3.1. Transforming variables into Gaussians

The above formalism relies on the assumption that both the feature and target variables are Gaussian when in reality they have skewed distributions. Hence, all the variables need to be transformed to have a Gaussian distribution before applying HALOSCOPE, and then inverse-transformed to their original distributions.

Different transformations can be tailored to the distribution of each specific halo or environment property (Ramakrishnan & Velmani 2022). For example, a logarithmic transformation works fairly well to transform halo concentration and halo spin to a Gaussian variable. However, this approach requires an analytical description of the distribution of halo properties. To overcome this issue, here we used instead the quantile transformer from SCI-KIT-LEARN. This method is based on a multi-variate probability integral transformation (Rosenblatt 1952). This transformation has two advantages: (i) it can transform any arbitrary distribution to a Gaussian distribution, and (ii) it preserves rank correlations between the variables.

3.2. Linear constraints

Halo properties have the following physical constraints:

$c_{vir} > 0,$ $c_{\rm vir} >0,$ (11)

$\begin{matrix} λ & > 0, \\ c / a & > 0, \\ b / a - c / a & > 0 \\ 1 - b / a & > 0 . \end{matrix}$ $\begin{aligned}\lambda &>0,\\ c/a &>0,\\ b/a - c/a &>0 \\ 1-b/a&>0. \end{aligned}$

These inequalities are not taken into account by the default sampling of a multi-variate Gaussian distribution described before. We solved this issue by using a rejection sampling algorithm that discards samples that fail to adhere to these constraints. We note that alternative methodologies for sampling a truncated Gaussian are available in the existing literature, owing to the ubiquity of such challenges in statistical applications.

For the case under study, rejection sampling has an acceptance rate greater than 96% for all the mass ranges considered. The shaded region in the bottom right panel in Fig. 2 shows an example of the application of the rejection sampling algorithm.

Fig. 2.

Confidence intervals (20%, 40%, 68%, and 95%) for the bi-variate distribution of pairs of halo properties in the mass range 2×10¹¹<M_200b (M_⊙ h⁻¹)<10¹², corresponding to haloes having from 20 to 100 particles in the LR simulation. Purple contours show the distribution of LR haloes and orange the HR haloes. The dashed black contours represent the new distribution of the LR haloes after applying HALOSCOPE. In the bottom right panel the grey area simply indicates a forbidden region due to the constraint a>b>c. Since our algorithm is designed to reproduce any multi-dimensional skewed distributions, it can also recover the bivariate distribution of LR halo properties as shown here.

4. Recovering correlations between halo properties

We used HALOSCOPE to enhance the properties (concentration, spin and two shape parameters) of unresolved dark matter haloes in a LR simulation, given an eight times higher resolution (HR) one. By construction, the method is designed to recover the mean halo property relation with halo mass and its scatter (see Sect. 3 and Appendix B.1). Besides this we are also interested in recovering the multi-dimensional distribution of halo properties at different mass ranges. This allows us to capture the self-correlations between halo properties that are present for HR but absent in the LR correlation matrix (see Fig. 1). Missing out on such correlations can impact the galaxy-halo connection models. For example, Posti et al. (2020) shows how residuals in galaxy scaling laws are sensitive to the anti-correlation between spin and concentration; and Zhang et al. (2024) models the correlations between halo properties to address systematics in galaxy cluster cosmology with weak lensing scaling relations.

We used our algorithm, HALOSCOPE, to create a catalogue of enhanced halo properties for the LR simulation. Thus, the LR haloes have two sets of halo properties, one computed with the halo finder (LR) and the other using our algorithm (LR+HALOSCOPE); we compared them with HR halo properties computed with a halo finder (HR). Our method is applied independently in halo mass bins. In particular, we used the mass bins indicated in Table D.1, namely nine logarithmically spaced bins ranging from 2×10¹¹ h⁻¹ M_⊙ to 3.2×10¹⁴ h⁻¹ M_⊙. These correspond to 20 particles to 30 000 particles is the LR simulation and 160–240 000 particles in the HR simulation.

The bivariate distribution of a pair of halo properties, for haloes with 2·10¹¹<M_200b (M_⊙ h⁻¹)<10¹², is shown using the 20%, 40%, 68% and 95% confidence contours⁴ in Fig. 2. The HR distributions (orange contours Fig. 2) are clearly different from the LR ones (purple contours). The centre of the LR contours are below that for the HR halo properties, except for the spin. The same can be seen in the median relations at the lowest halo mass bin considered (see Fig. B.2). The shape of the LR contours are also different, more spherical and less tilted, than those for the HR haloes. As the halo mass increases, the LR distributions get closer to the HR ones (Fig. D.1). This is expected, as the properties of massive enough haloes will be resolved also in the LR simulation.

The haloes from the LR simulations improved with HALOSCOPE (grey dashed contours in Fig. 2) agree remarkably well with the HR halo properties. We quantified the goodness of fit for HALOSCOPE using the Kolmogorov–Smirnov test for individual properties (Appendix D). We find a good agreement between the distributions of HR halo properties and the LR+HALOSCOPE ones (Table D.1).

The non-Gaussian shapes of the HR distributions can be recovered by HALOSCOPE. This is possible thanks to the use of the quantile transformer from SCI-KIT-LEARN in the final step from our method (Sect. 3.1).

The median and scatter for each studied HR internal halo property is recovered by haloes improved by HALOSCOPE (Fig. B.2) as a consequence of providing a good fit to the bi-variate correlations.

5. Recovering the multi-dimensional assembly bias

It is of paramount importance for any model catalogue to preserve the correlations between clustering statistics and the local density environment. The clustering of different tracers is one of the most widely used tools for inferring cosmological parameters from spectroscopic surveys. In the case of dark matter halo properties, such as dependence on the environment at fixed halo mass is called halo AB and has been widely studied in the literature (e.g. Wechsler et al. 2006; Croton et al. 2007; Dalal et al. 2008; Desjacques 2008; Faltenbacher & White 2010 and Oyarzún et al. 2024). The strength and shape of the halo AB vary depending on the halo properties and mass range under consideration. Studies on halo AB have mostly tried understanding one halo property at a time (e.g. Faltenbacher & White 2010). In the literature, there are few studies of the halo AB as a function of a two halo properties simultaneously (e.g. Lazeyras et al. 2017).

Here we aim to model the multi-dimensional AB, that is to say, the linear bias of haloes with a given mass but classified into different sub-populations demarcated in the multi-dimensional distribution of halo properties. We can only achieve this with an adequate choice of training parameters. We discuss in Sect. 5.2 that environmental properties, in particular α_4R and b₁ (defined in Sect. 2.2), are needed in the training of HALOSCOPE to recover the AB measured for HR haloes.

We quantified the halo AB by measuring the linear bias (Sect. 2.2.3) for haloes split by their internal properties within equal mass bins (Fig. 3). To study the multi-dimensional nature of the halo AB, we rank-ordered haloes using two (c_vir, λ; left panel in Fig. 3), three (λ, c/a, b/a; middle panel), and four (c_vir, λ, c/a, b/a; right panel) properties and considering the upper and lower 25% percentiles in each case. For example, the lower 25% of (c_vir, λ) was chosen by selecting a population of haloes below the p^th percentile in both c_vir and λ. The upper 25% of (c_vir, λ) haloes correspond to a population above the 100−p^th percentile⁵. Other combinations of halo properties have also been tested; however, we focused on this set to show a range of AB trend with halo mass using a different number of halo properties.

Fig. 3.

Top panels: Linear halo bias as a function of halo mass for split populations of haloes, to show the halo AB. In blue (red) is shown the linear bias of the upper (lower) 25% of haloes according to: concentration and spin in the left panel; spin and two shape parameters in the middle panel; and concentration, spin, and two shape parameters in the right panel (see Sect. 5 for a detailed description of how the different halo populations are defined). Triangles show the HR haloes, dotted lines the LR ones, and the thick continuous lines are the result of applying HALOSCOPE to the LR haloes, LR+HALOSCOPE. Shaded regions correspond to the standard error of the mean. Bottom panels: LR and LR+HALOSCOPE results divided by the HR ones. This shows that our method when applied to the LR haloes can decrease the differences between HR and LR from ∼12−15% to less than 5%. Our algorithm is capable of recovering the multi-dimensional halo AB measured in HR simulations.

When we segregate haloes by their internal properties, there is a gap between the linear bias in the two populations. This is clearly seen when comparing the upper and lower 25% HR haloes (blue and red triangles) for the three panels in Fig. 3. In the case of using (c_vir, λ) to segregate haloes, the gap between the two populations steadily decreases with halo mass. We note that this gap does not invert as in the case of using halo concentration alone to segregate haloes of the same mass (e.g. Ramakrishnan & Velmani 2022). The trend for the gap, when using three parameters (λ, c/a, b/a) to segregate haloes of similar mass, is opposite from before. In this case, the gap between the linear bias in the two populations increases with increasing halo mass. Lastly, when considering four parameters (c_vir, λ, c/a, b/a) to segregate haloes, the gap in the linear bias between the two populations at every halo mass is fairly constant.

The trend of the AB signal with halo mass can decrease, increase or remain fairly constant, depending on which properties are chosen to segregate haloes at a given mass (Fig. 3). Thus, we can conclude that the clustering signal can present different behaviours as a function of halo mass, depending on what secondary halo properties, beyond the halo mass, are considered.

5.1. Improving the LR assembly bias

We segregated haloes in the LR simulation in the same way to that described above for the HR simulation. The linear bias for LR haloes shows an AB signal with similar trends with halo mass as those described for the HR haloes (dotted lines in Fig. 3). However, the HR and LR AB signals are different (up to 12%) for low mass haloes.

When we improve LR haloes with HALOSCOPE (solid lines in Fig. 3) we are able to recover an AB signal within 5% of that measured for HR haloes. The improvement is the largest for low-mass haloes, with fewer than 500 particles, which constitute over 90% of the total haloes (Fig. A.1). It is remarkable that haloes improved with HALOSCOPE recover those HR trends in the halo AB, for any combination of halo properties.

The ability of HALOSCOPE to recover the multi-dimensional AB is particularly useful for applications using fast approximate simulations or catalogues (Feng et al. 2016 and Balaguera-Antolínez et al. 2019) that cannot provide halo AB but do contain sufficient information to obtain it with our method. In particular, these type of approximate methods have halo masses and tidal environments, which are the only inputs our model needs (see Sect. 5.2 for a study of feature importance of the inputs).

5.2. Training set and assembly bias

We studied what properties are needed to train HALOSCOPE to recover the HR multi-dimensional AB. By construction, HALOSCOPE will provide the bivariate distributions from the internal halo properties, c, in the training set. However, this is not the case for the AB signal. Halo catalogues modelled simply with a mean or median concentration-mass relation with scatter, do not carry halo the AB information seen for the HR haloes (triangles in Figs. 3 and 4).

Fig. 4.

Same as Fig. 3, but HALOSCOPE has been trained with different input properties. In the top left panel, random uncorrelated inputs are used for training HALOSCOPE and when we apply our algorithm to the LR haloes, LR+HALOSCOPE, no halo AB is measured. In the top middle panel, we train HALOSCOPE with properties from LR haloes. For the other panels, HALOSCOPE is trained with the input properties indicated in the legend. In all the panels, the blue and red triangles correspond to the upper and lower 25% of (c_vir, λ, c/a, b/a) HR haloes. This is our reference. To recover the multi-dimensional halo AB, HALOSCOPE needs to be trained with haloes’ environmental properties; in particular b₁, α_4R, and a combination of the two give the best results.

If no environmental input, α, is given in the training, there is no AB. This can be seen in the top left panel of Fig. 4, where we compare the HR AB signals with that of LR+HALOSCOPE, when the algorithm is trained with either no environmental properties or uncorrelated inputs. In this case, the two segregated halo populations, using four parameters (c_vir, λ, c/a, b/a), show the same signal.

Improved haloes do present a reduced AB signal when LR halo properties are used to train HALOSCOPE (top middle panel in Fig. 4). This signal differs by more than 20% from the target HR one at low masses, and is worse than the original LR signal shown in the right panel of Fig. 3. This results improves when the environmental property δ_10R (defined in Sect. 2.2.2) is used to train HALOSCOPE. However, only using this parameter is not enough to recover an AB signal within 5% of the HR one.

Environmental properties are needed during the training of HALOSCOPE to recover the HR AB. In particular, to achieve an agreement with the HR signal, for the training below 10% we do need to use the tidal anisotropy, α_4R (defined in Sect. 2.2.1), or the linear bias, b₁ (defined in Sect. 2.2.3), or better, a combination of the two. The tidal anisotropy smoothed at scales 4R_200b, α_4R, has be shown to be the primary indicator for halo AB (see Fig A5 in Ramakrishnan et al. 2019). Nevertheless, we also analysed the density at a 10R_200b scale, δ_10R, to study its performance in this particular context. The density computed on these scales is expected to correlate with large-scale halo bias, b₁. However, we find that that using either b₁ or δ_10R leads to a different AB signal (Fig. 4). This difference might be introduced by using a fixed scale, 60 h⁻¹ Mpc, when computing the halo-by-halo-bias, b₁, while the 10R_200b smoothing scale used for δ_10R varies among different haloes.

In this study, our default HALOSCOPE algorithm is trained using the three environmental properties discussed here: b₁, α_4R and δ_10R.

6. Effect of halo properties on galaxy clustering

In this section we show the effect that improving halo properties with HALOSCOPE has on the clustering of central galaxies. We chose to focus solely on central galaxies to reduce the number of free parameters in the HOD model used to populate with galaxies the UNIT simulation (described in Sect. 6.1). This choice allows us to get a cleaner view of the effect that HALOSCOPE has on the galaxy clustering at large scales (described in Sect. 6.2). We defer to future works a detail study of satellite galaxies.

6.1. Model galaxy catalogues

We used an HOD model to populate with central galaxies the dark matter haloes of both the HR and LR simulations. We then studied the central galaxies clustering for the same set of HOD parameters. The continuous expansion in capabilities of cosmological surveys, such as DESI (DESI Collaboration 2024) and Euclid (Euclid Collaboration: Castander et al. 2025), have created a demand for simulations that are larger in volume and more accurate at smaller scales. Faced with the computational costs of such simulations, to incorporate cosmological tracers in them, we often need to resort to connecting galaxies to haloes using computationally effective tools such as the HOD models.

For the average occupancy of central galaxies in haloes of a given mass, 〈N_cen〉, we used the following standard form (Zheng et al. 2005 and Reyes-Peraza et al. 2024):

$〈 N_{cen} 〉 = 1 / 2 (1 + erf (\frac{log M_{h} - log M_{\min}}{σ_{log M}})) .$ $\langle N_{\rm cen} \rangle = 1/2 \left (1+ {\textrm {erf}}\left (\dfrac {\log M_{h} - \log M_{\rm min}}{\sigma _{\log M}}\right )\right ).$ (12)

We chose log M_min = 11.95 and σ_{log M} = 0.65 for our baseline model. This implies that 〈N_cen〉 transitions from 0 to 1 in a mass range centred in log M_min = 11.95, as it can be seen in Fig. 5. For our LR simulation, this value corresponds to a few tens to hundreds of particles per halo. This makes our choice of parameters ideal for this specific study.

Fig. 5.

Average number of central galaxies as a function of halo mass for model galaxies generated using an HOD model with two different implementations of galaxy AB (Eqs. (12) and (13)). We have produced two catalogues of central model galaxies hosted by haloes with similar mass distributions but different concentrations, spins, and shapes.

In addition, we also modelled the AB signal by altering log M_min (Xu et al. 2021; Hadzhiyska et al. 2023 and Paviot et al. 2024) in the following way,

$log M_{\min}^{'} = log M_{\min} + A_{cen} f_{a} + B_{cen} f_{b} + C_{cen} f_{c} + D_{cen} f_{d},$ $\log M_{\rm min}^{\prime } = \log M_{\rm min} + A_{\rm cen} f_{a} + B_{\rm cen} f_{b} + C_{\rm cen} f_{c} + D_{\rm cen} f_{d},$ (13)

where f_a, f_b, f_c, f_d are, for a given mass range, the ranks a halo belongs to given the four properties under consideration (a, b, c and d)⁶. The ranks are computed by first binning halo properties in narrow bins of halo masses, rank ordering them by increasing value of the halo property. The ranks are then rescaled to range between −0.5 and 0.5. A_cen, B_cen, C_cen, and D_cen are the AB parameters.

To explore the two extreme scenarios, we chose the following sets of AB parameters: the negative set A_cen=B_cen=C_cen=D_cen=−1; and the positive one (A_cen=B_cen=C_cen=D_cen = 1. This choice also gives us a conservative estimate of the amount of systematic errors that can be introduced by resolution effects in a simulation.

We have created three pairs of galaxy catalogues, based on the HR, LR, or LR+HALOSCOPE haloes. For each pair of catalogues, we have one with the negative set of AB parameters (–ve AB) and the other with the positive set of AB parameters (+ve AB). The average HOD is similar for the two model galaxy catalogues (Fig. 5), independently of the halo catalogue used. Catalogues produced with either a –ve AB or a +ve AB model, have similar distributions of central galaxies as a function of halo mass.

The difference between the two sets of catalogues, with –ve AB or +ve AB, lie on the distribution of galaxies as a function of secondary halo properties. Central galaxies from the –ve AB catalogue are preferentially hosted by haloes that are less concentrated, have smaller spins and more spherical shapes, than those from the +ve AB catalogue. We expect these differences to propagate into the clustering of model galaxies.

6.2. Clustering of model galaxies

We measured the power spectrum of the six galaxy catalogues we generated in pairs of +ve AB and –ve AB. The two sets of catalogues have similar average HODs (see Fig. 5), independently from the underlying catalogue of dark matter haloes. We used the three halo catalogues available: directly the HR and LR simulations, and the LR haloes enhanced with other algorithm, LR+HALOSCOPE.

There are differences of between 20% and 50% in the power spectrum measured from the pairs of +ve AB (solid red) and –ve AB catalogues (solid blue), as can be seen from the top panel in Fig. 6. This result is independent of the input host haloes. The differences are found for all the k modes considered. This shows that the halo AB directly propagates into the galaxy AB. Despite having similar average HODs (Fig. 5), model central galaxies show ∼40% differences in their power spectrum for the pairs of +ve AB and –ve AB catalogues. Similar variations have been previously measured in studies of the galaxy AB signal as a function of multiple halo properties (Montero-Dorta & Rodriguez 2024).

The power spectrum from the LR galaxy catalogues differs by 15% from the HR version of the same HOD model (solid lines in the bottom panel of Fig. 6). Such differences could possibly be absorbed into a different set of HOD model parameters. However, our analysis indicates that one has to be cautious when using the same HOD calibrations for simulations with a different mass resolution, in particular, when including the effect of AB. It should be noted that the differences between the LR and HR model catalogues have a reduced cosmic variance because of the fixed-pair technique used to generate initial conditions for the UNIT simulations (Angulo & Pontzen 2016). Moreover, the LR and HR sets of the UNIT simulations have identical initial phases, that is, they trace the same large-scale density field, so taking ratios is designed to cancel the statistical fluctuation errors (McDonald & Seljak 2009).

Fig. 6.

Top panel: Power spectrum for model central galaxies. In red are shown the power spectra measured in catalogues produced with a positive AB (+ve AB), with A_cen=B_cen=C_cen=D_cen = 1 in Eq. (13); in blue, those from catalogues with a negative AB (–ve AB), with A_cen=B_cen=C_cen=D_cen=−1. Continuous brighter lines show the results for galaxies generated on the HR simulation, while continuous lines with darker colours correspond to the LR simulation, as indicated in the legend. The dashed lines show the clustering from those catalogues generated on LR haloes corrected with HALOSCOPE, LR+HALOSCOPE. There are differences between the power spectrum measured from galaxies within the +ve AB and –ve AB catalogues, due to galaxy AB. These differences range from 20 to 50% and are present in the whole range of k modes. Bottom panel: Ratios with respect to the galaxy catalogues from the HR simulation of both the LR (solid lines) and LH+HALOSCOPE (dashed lines) dark matter haloes. The 0, 5, and 15 percent levels are indicated by horizontal grey lines. The ratios of power spectra from two tracers in the same density field cancel the statistical noise in the bottom panel, the fixed-pair method used in the UNIT simulations (more details in Sect. 6.2) also mitigates noise from cosmic variance. There are 15% differences between the power spectrum of HR and LR galaxy catalogues. Our algorithm, can reduce this difference to 5%.

The power spectrum of the galaxy catalogues generated from LR haloes corrected by HALOSCOPE, LR+HALOSCOPE, is within 5% of the HR one. Our method manages to reduce the previous discrepancy with LR by a factor of three (Fig. 6). This agreement starts to deteriorate at small scales, k>0.6 h Mpc⁻¹, and the power spectra from LR+HALOSCOPE catalogues agree with the LR ones by k = 2 h Mpc⁻¹. This seems to be a limitation of our method.

Many HOD models address resolution issues by simply sampling from a fitting function such as a concentration-mass relation for poorly resolved haloes, and then rank ordering the samples to match the abundance of the existing poorly resolved sample (e.g. Paranjape et al. 2021 and Euclid Collaboration: Castander et al. 2025). Such an approach corrects for the value of the halo properties and incorporates AB to the levels expected in LR simulation. While such a method corrects for the value of the halo properties and incorporates AB to the levels expected in a LR simulation, we would like to emphasise that the analysis here is more accurate since it incorporates AB to match the levels seen in well-resolved simulations.

The improvement in the clustering after applying HALOSCOPE demonstrates the utility of our method to enhance summary statistics for model galaxies based on LR simulations.

7. Summary and conclusions

We have developed a ML method, HALOSCOPE (Sect. 3), to improve unresolved properties of dark matter haloes in simulations while preserving the multi-dimensional AB. The code associated with this method is publicly available⁷.

We have demonstrated the capabilities of HALOSCOPE by accessing low-mass, poorly resolved haloes in a large-volume simulation and correcting for their halo properties; to produce results similar to those haloes that have achieved numerical convergence. In particular, we tested HALOSCOPE using the original UNIT simulation (Chuang et al. 2019), referred to here as the HR simulation, and a simulation with eight times worse resolution, the LR simulation (Sect. 2). For the application presented here, we aimed to recover the properties of the LR haloes. Our method was trained using halo properties in bins of halo mass.

After training HALOSCOPE with HR halo properties and then applying it to LR haloes, LR+HALOSCOPE, we can conclude the following:

HALOSCOPE recovers the multi-variate distributions and correlations of halo properties in halo mass bins (Figs. 2 and D.1). It also recovers the median and scatter expected for the distribution of halo properties as a function of halo mass (Fig. B.2).
HALOSCOPE can recover, for the first time, the multi-dimensional halo AB (Fig. 3), that is to say, the simultaneous dependence of halo bias on arbitrary combinations of the halo properties at a fixed mass. This is only possible when environmental properties, such as the linear bias and the tidal anisotropy (Fig. 4), are used to train HALOSCOPE. LR+HALOSCOPE reduces the differences in multi-dimensional AB between HR and LR from 12% to 5%.

This new method is particularly useful in the lowest mass range, where there are significant differences between LR and HR simulations. This is due to the large amount of unresolved properties for low-mass haloes in LR simulations. We have verified that our method and the conclusions above are robust against different definitions of halo mass (Appendix C). Our algorithm can also be applied after improving the halo mass function using different ML algorithms, such as random forest. However, this improvement provides very small changes to the halo mass that do not improve the unresolved properties of haloes, at least for the case under study (Appendix C.1).

The clustering of model galaxies can be affected by that of their host haloes. We studied this by generating catalogues of galaxies with an HOD model that includes AB. We generated catalogues of central galaxies based on the HR, LR, and LR+HALOSCOPE haloes. We focused on central galaxies to be able to limit the free parameters of the HOD model. We used two extreme HOD models, one with a positive AB, +ve AB, and another with a negative one, –ve AB (see Eq. (13)). After studying the power spectrum of the six galaxy catalogues, we conclude the following:

The galaxy AB is directly affected by the halo AB. The power spectrum differs by 20–50% (depending on the k mode) between the +ve AB and –ve AB galaxy catalogues (Fig. 6). This difference in clustering occurs despite the similar average occupancies with halo mass (Fig. 5).
HALOSCOPE improves the central galaxy power spectrum with respect to the HR catalogue (for k<0.6 h⁻¹ Mpc): from a 15% difference with the LR catalogue to a 5% difference for the LR+HALOSCOPE (Fig. 6).

We expect our results to hold for satellites hosted by well-resolved haloes. The average HOD determines the number of satellites per halo. The number of satellite galaxies is typically expected to grow with the mass of their host halo. For tracers such as luminous red galaxies, satellite galaxies are expected to be hosted by haloes with masses above 10¹³ h⁻¹ M_⊙ (e.g. Yuan et al. 2024). These masses are well resolved by most current cosmological simulations and thus would not be affected by resolution effects at lower masses. In contrast, other tracers, such as emission line galaxies, are hosted by less massive dark matter haloes and have typically one or two satellite galaxies per halo (Gonzalez-Perez et al. 2020; Reyes-Peraza et al. 2024 and Yu et al. 2024). From a modelling perspective, satellites residing in unresolved parent haloes can be influenced by resolution effects. In particular, unresolved internal properties of host haloes will impact the modelling of the radial distribution, velocities, and AB for satellites galaxies. Future studies should incorporate satellite galaxies to further test these assumptions.

HALOSCOPE can serve as a unified framework for simulations with varying initial conditions and cosmologies if used in bins of peak height, instead of halo mass. This is because the correlation between internal properties of dark matter haloes and their environment is universal for variations around the standard cosmological model (Ramakrishnan & Velmani 2022). One important aspect that remains to be tested is whether the correlations among the internal halo properties themselves are also universal across cosmologies and redshift. A confirmation of this universality would allow us to establish HALOSCOPE as a fully cosmology-independent framework for modelling halo properties in simulations of varying resolution. We plan to address this in future work.

In this work we have focused on enhancing the LR halo properties. However, other applications are possible. Fast approaches to numerical simulations (Scoccimarro & Sheth 2002; Monaco et al. 2002; Avila et al. 2015 and Balaguera-Antolínez et al. 2019) can now be improved with HALOSCOPE to preserve the multi-dimensional AB found in full numerical simulation. This is possible because HALOSCOPE only requires the large-scale structures and tidal environment as input. This is a very promising application, as internal halo properties are unavailable for the entire dynamic range of the approximate gravity solvers, as has been demonstrated by Balaguera-Antolínez & Montero-Dorta (2024). HALOSCOPE also has the potential to be used to determine the effect that baryons have on dark matter haloes and, possibly, to directly connect galaxies with haloes. The framework is also adaptable for forward modelling approaches to galaxy clustering. This work highlights the potential of ML methods to recover missing small-scale physics in simulations that are limited by resolution.

Acknowledgments

We thank Aseem Paranjape, Ravi Sheth, Martín de los Ríos, Ángeles Moliné, Sergio Contreras, Francisco-Shu Kitaura, Adrián Gutiérrez, Bernhard Vos Ginés, Santiago Avila and Shadab Alam for the fruitful discussions we have had with them. SR and VGP have been supported by the Atracción de Talento Contract no. 2019-T1/TIC-12702 granted by the Comunidad de Madrid in Spain. VGP is also supported by the Atracción de Talento Contract no. 2023-5A/TIC-28943 granted by the Comunidad de Madrid in Spain. This work has also been supported by Ministerio de Ciencia e Innovación (MICINN) under the following research grant: PID2021-122603NB-C21 (VGP, GY). The UNIT simulations have been run in the MareNostrum Supercomputer, hosted by the Barcelona Supercomputing Center, Spain, under the PRACE project number 2016163937. The analysis in this work has been carried out in the computing cluster at UAM (TAURUS).

¹

https://github.com/computationalAstroUAM/haloscope

²

http://www.unitsims.org/

³

The above smoothing scale is for a top hat smoothing filter. This is equivalent to $4 \times R_{200 b} / \sqrt{5}$ $4\times R_{\rm 200b}/\sqrt {5}$ for a Gaussian filter, which is what we used in practice. See Appendix A2 of Paranjape et al. (2018) for a discussion.

⁴

These confidence intervals roughly correspond to 0.25σ, 0.5σ, 1σ, and 2σ contours, respectively, for a 1D Gaussian distribution.

⁵

Note that the value p needed to encompass the 25% of the total population varies with the properties used to rank order the haloes. This value becomes larger as we take combinations of more halo properties.

⁶

In practice it is preferable to model AB directly using halo environmental properties (e.g. Alam et al. 2024), instead of internal ones. This reduces resolution effects. However, rank ordering by internal halo properties has been widely used in the literature (e.g. Rocher et al. 2023).

⁷

https://github.com/computationalAstroUAM/haloscope

References

Alam, S., Aubert, M., Avila, S., et al. 2021, Phys. Rev., D, 103 [Google Scholar]
Alam, S., Paranjape, A., & Peacock, J. A. 2024, MNRAS, 527, 3771 [Google Scholar]
Angulo, R. E., & Pontzen, A. 2016, MNRAS, 462, L1 [NASA ADS] [CrossRef] [Google Scholar]
Angulo, R. E., Baugh, C. M., Frenk, C. S., & Lacey, C. G. 2014, MNRAS, 442, 3256 [NASA ADS] [CrossRef] [Google Scholar]
Armijo, J., Baugh, C. M., Padilla, N. D., Norberg, P., & Arnold, C. 2022, MNRAS, 510, 29 [Google Scholar]
Avila, S., Murray, S. G., Knebe, A., et al. 2015, MNRAS, 450, 1856 [NASA ADS] [CrossRef] [Google Scholar]
Balaguera-Antolínez, A., & Montero-Dorta, A. D. 2024, A&A, 692, A32 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Balaguera-Antolínez, A., Kitaura, F. -S., Pellejero-Ibáñez, M., Zhao, C., & Abel, T. 2019, MNRAS, 483, L58 [CrossRef] [Google Scholar]
Balaguera-Antolínez, A., Montero-Dorta, A. D., & Favole, G. 2024, A&A, 685, A61 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Behroozi, P. S., Wechsler, R. H., & Wu, H. -Y. 2013a, ApJ, 762, 109 [NASA ADS] [CrossRef] [Google Scholar]
Behroozi, P. S., Wechsler, R. H., Wu, H. -Y., et al. 2013b, ApJ, 763, 18 [NASA ADS] [CrossRef] [Google Scholar]
Bett, P., Eke, V., Frenk, C. S., et al. 2007, MNRAS, 376, 215 [NASA ADS] [CrossRef] [Google Scholar]
Bishop, C. M., & Nasrabadi, N. M. 2007, J. Electron. Imaging, 16, 049901 [NASA ADS] [CrossRef] [Google Scholar]
Borzyszkowski, M., Porciani, C., Romano-Díaz, E., & Garaldi, E. 2017, MNRAS, 469, 594 [NASA ADS] [CrossRef] [Google Scholar]
Catelan, P., & Theuns, T. 1996, MNRAS, 282, 436 [CrossRef] [Google Scholar]
Chuang, C. -H., Yepes, G., Kitaura, F. -S., et al. 2019, MNRAS, 487, 48 [NASA ADS] [CrossRef] [Google Scholar]
Coloma-Nadal, J. M., Kitaura, F. S., García-Farieta, J. E., et al. 2024, JCAP, 2024, 083 [CrossRef] [Google Scholar]
Contreras, S., Angulo, R. E., & Zennaro, M. 2021, MNRAS, 504, 5205 [CrossRef] [Google Scholar]
Croton, D. J., Gao, L., & White, S. D. M. 2007, MNRAS, 374, 1303 [Google Scholar]
Dalal, N., White, M., Bond, J. R., & Shirokov, A. 2008, ApJ, 687, 12 [NASA ADS] [CrossRef] [Google Scholar]
DESI Collaboration (Adame, A. G., et al.) 2024, AJ, 167, 62 [NASA ADS] [CrossRef] [Google Scholar]
Desjacques, V. 2008, MNRAS, 388, 638 [NASA ADS] [CrossRef] [Google Scholar]
Dhawalikar, S., & Paranjape, A. 2024, JCAP, 2024, 041 [Google Scholar]
Diemer, B., & Joyce, M. 2019, ApJ, 871, 168 [NASA ADS] [CrossRef] [Google Scholar]
Diemer, B., & Kravtsov, A. V. 2015, ApJ, 799, 108 [Google Scholar]
Euclid Collaboration (Mellier, Y., et al.) 2025, A&A, 697, A1 [Google Scholar]
Euclid Collaboration (Castander, F. J., et al.) 2025, A&A, 697, A5 [Google Scholar]
Fall, S. M., & Efstathiou, G. 1980, MNRAS, 193, 189 [NASA ADS] [CrossRef] [Google Scholar]
Faltenbacher, A., & White, S. D. M. 2010, ApJ, 708, 469 [NASA ADS] [CrossRef] [Google Scholar]
Farahi, A., Anbajagane, D., & Evrard, A. E. 2022, ApJ, 931, 166 [Google Scholar]
Feng, Y., Chu, M. -Y., Seljak, U., & McDonald, P. 2016, MNRAS, 463, 2273 [NASA ADS] [CrossRef] [Google Scholar]
Forero-Sánchez, D., Chuang, C. -H., Rodríguez-Torres, S., et al. 2022, MNRAS, 513, 4318 [CrossRef] [Google Scholar]
Gao, L., Springel, V., & White, S. D. M. 2005, MNRAS, 363, L66 [NASA ADS] [CrossRef] [Google Scholar]
Gonzalez-Perez, V., Cui, W., Contreras, S., et al. 2020, MNRAS, 498, 1852 [NASA ADS] [CrossRef] [Google Scholar]
Hadzhiyska, B., Eisenstein, D., Hernquist, L., et al. 2023, MNRAS, 524, 2507 [NASA ADS] [CrossRef] [Google Scholar]
Hahn, O., Porciani, C., Carollo, C. M., & Dekel, A. 2007, MNRAS, 375, 489 [NASA ADS] [CrossRef] [Google Scholar]
Hearin, A. P., & Watson, D. F. 2013, MNRAS, 435, 1313 [Google Scholar]
Hearin, A. P., Zentner, A. R., van den Bosch, F. C., Campbell, D., & Tollerud, E. 2016, MNRAS, 460, 2552 [NASA ADS] [CrossRef] [Google Scholar]
Heavens, A., & Peacock, J. 1988, MNRAS, 232, 339 [NASA ADS] [CrossRef] [Google Scholar]
Heitmann, K., Finkel, H., Pope, A., et al. 2019, ApJS, 245, 16 [NASA ADS] [CrossRef] [Google Scholar]
Ishiyama, T., Prada, F., Klypin, A. A., et al. 2021, MNRAS, 506, 4210 [NASA ADS] [CrossRef] [Google Scholar]
Kataria, S. K., & Shen, J. 2022, ApJ, 940, 175 [NASA ADS] [CrossRef] [Google Scholar]
Knebe, A., & Power, C. 2008, ApJ, 678, 621 [NASA ADS] [CrossRef] [Google Scholar]
Lau, E. T., Hearin, A. P., Nagai, D., & Cappelluti, N. 2021, MNRAS, 500, 1029 [Google Scholar]
Lazeyras, T., Musso, M., & Schmidt, F. 2017, JCAP, 2017, 059 [Google Scholar]
Lin, M., Lucas, H. C., & Shmueli, G. 2013, Inf. Syst. Res., 24, 906 [Google Scholar]
LSST Science Collaboration (Abell, P. A., et al.) 2009, arXiv e-prints [arXiv:0912.0201] [Google Scholar]
Maksimova, N. A., Garrison, L. H., Eisenstein, D. J., et al. 2021, MNRAS, 508, 4017 [NASA ADS] [CrossRef] [Google Scholar]
Mansfield, P., & Avestruz, C. 2021, MNRAS, 500, 3309 [Google Scholar]
McDonald, P., & Seljak, U. 2009, JCAP, 2009, 007 [Google Scholar]
Mendoza, I., Mansfield, P., Wang, K., & Avestruz, C. 2023, MNRAS, 523, 6386 [Google Scholar]
Mo, H. J., Mao, S., & White, S. D. M. 1998, MNRAS, 295, 319 [Google Scholar]
Monaco, P., Theuns, T., & Taffoni, G. 2002, MNRAS, 331, 587 [Google Scholar]
Montero-Dorta, A. D., & Rodriguez, F. 2024, MNRAS, 531, 290 [NASA ADS] [CrossRef] [Google Scholar]
Musso, M., Cadiou, C., Pichon, C., et al. 2018, MNRAS, 476, 4877 [Google Scholar]
Oyarzún, G. A., Tinker, J. L., Bundy, K., Xhakaj, E., & Wyithe, J. S. B. 2024, ApJ, 974, 29 [Google Scholar]
Paranjape, A. 2021, MNRAS, 502, 5210 [Google Scholar]
Paranjape, A., Hahn, O., & Sheth, R. K. 2018, MNRAS, 476, 3631 [NASA ADS] [CrossRef] [Google Scholar]
Paranjape, A., Choudhury, T. R., & Sheth, R. K. 2021, MNRAS, 503, 4147 [CrossRef] [Google Scholar]
Paviot, R., Rocher, A., Codis, S., et al. 2024, A&A, 690, A221 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Planck Collaboration VI. 2020, A&A, 641, A6 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Posti, L., Famaey, B., Pezzulli, G., et al. 2020, A&A, 644, A76 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Potter, D., Stadel, J., & Teyssier, R. 2017, Comput. Astrophys. Cosmol., 4, 2 [NASA ADS] [CrossRef] [Google Scholar]
Ramakrishnan, S., & Velmani, P. 2022, MNRAS, 516, 5849 [Google Scholar]
Ramakrishnan, S., Paranjape, A., Hahn, O., & Sheth, R. K. 2019, MNRAS, 489, 2977 [NASA ADS] [CrossRef] [Google Scholar]
Ramakrishnan, S., Paranjape, A., & Sheth, R. K. 2021, MNRAS, 503, 2053 [NASA ADS] [CrossRef] [Google Scholar]
Reyes-Peraza, G., Avila, S., Gonzalez-Perez, V., et al. 2024, MNRAS, 529, 3877 [NASA ADS] [CrossRef] [Google Scholar]
Rocher, A., Ruhlmann-Kleider, V., Burtin, E., et al. 2023, JCAP, 2023, 016 [CrossRef] [Google Scholar]
Rosenblatt, M. 1952, Ann. Math. Stat., 23, 470 [Google Scholar]
Salcedo, A. N., Maller, A. H., Berlind, A. A., et al. 2018, MNRAS, 475, 4411 [NASA ADS] [CrossRef] [Google Scholar]
Scoccimarro, R., & Sheth, R. K. 2002, MNRAS, 329, 629 [NASA ADS] [CrossRef] [Google Scholar]
Sheth, R. K., & Tormen, G. 2004, MNRAS, 350, 1385 [NASA ADS] [CrossRef] [Google Scholar]
Shi, J., Wang, H., & Mo, H. J. 2015, ApJ, 807, 37 [NASA ADS] [CrossRef] [Google Scholar]
Shin, T. -H., & Diemer, B. 2023, MNRAS, 521, 5570 [NASA ADS] [CrossRef] [Google Scholar]
Tinker, J., Kravtsov, A. V., Klypin, A., et al. 2008, ApJ, 688, 709 [Google Scholar]
Tinker, J. L., Hahn, C., Mao, Y. -Y., & Wetzel, A. R. 2018, MNRAS, 478, 4487 [CrossRef] [Google Scholar]
Wang, K., Mao, Y. -Y., Zentner, A. R., et al. 2020, MNRAS, 498, 4450 [NASA ADS] [CrossRef] [Google Scholar]
Watson, D. F., Hearin, A. P., Berlind, A. A., et al. 2015, MNRAS, 446, 651 [NASA ADS] [CrossRef] [Google Scholar]
Wechsler, R. H., Zentner, A. R., Bullock, J. S., Kravtsov, A. V., & Allgood, B. 2006, ApJ, 652, 71 [NASA ADS] [CrossRef] [Google Scholar]
Xu, X., Zehavi, I., & Contreras, S. 2021, MNRAS, 502, 3242 [NASA ADS] [CrossRef] [Google Scholar]
Yu, J., Zhao, C., Gonzalez-Perez, V., et al. 2024, MNRAS, 527, 6950 [Google Scholar]
Yuan, S., Zhang, H., Ross, A. J., et al. 2024, MNRAS, 530, 947 [CrossRef] [Google Scholar]
Zehavi, I., Zheng, Z., Weinberg, D. H., et al. 2011, ApJ, 736, 59 [NASA ADS] [CrossRef] [Google Scholar]
Zhang, Z., Farahi, A., Nagai, D., et al. 2024, MNRAS, 530, 3127 [Google Scholar]
Zheng, Z., Berlind, A. A., Weinberg, D. H., et al. 2005, ApJ, 633, 791 [NASA ADS] [CrossRef] [Google Scholar]

Appendix A: Percentage of unresolved haloes

Over 90% of dark matter haloes in simulations are unresolved, that is, they do not have converged halo properties (see Fig. A.1). The numerical resolution of an N-body simulation sets a limit to the smallest mass of the dark matter halo that can be resolved. Typically, a halo requires a few tens to hundreds of particles for its mass to be well resolved. However, most of the secondary beyond-mass halo properties require at least a few hundred particles (and potentially thousands) to be fully resolved (Mansfield & Avestruz 2021). As the halo mass function decreases rapidly with mass, most of the haloes in a simulation are low mass and thus unresolved. Figure A.1 shows the fraction of low mass haloes comprising of 20-500 particles as a function of the particle resolution of a simulation. To provide a more realistic view of the problem, we also show in the same figure, state-of-the-art simulations from which galaxy catalogues have been produced.

Fig. A.1.

Fraction of the total haloes in a simulation formed by 20 to 500 particles, as a function of the particle resolution, M_part, of a simulation. This fraction has been calculated using the Tinker et al. (2008) halo mass function with Planck-18 cosmological parameters (Planck Collaboration VI 2020) and varies for different redshifts (see labels in the figure). The vertical dashed lines of different colours indicate the location of state-of-the-art simulations, as indicated in the legend. The vertical solid green corresponds to UNIT, the simulation used in this work (Sect. 2). Over 90% of the haloes in these simulations have unresolved halo properties and this number increases for higher redshifts.

Appendix B: Smoothing scale for the environment

In this section we provide a self-contained reasoning for the choice of 4×R_200b to describe the tidal anisotropy around a halo. correlations with smaller smoothing scale peaking for smaller masses and the correlations with larger smoothing scale peaking at higher masses. The black line corresponds to the rank correlation ρ_α−δ at a variable smoothing scale, corresponding to four-times-larger halo radius. Comparing the black line, which forms an envelope around the other coloured lines, it becomes apparent the peak in rank correlation at any fixed scale is identical to 4 times the radius for that halo mass bin. Since we want to maximally recover the correlations with the large-scale clustering environment, α_4R200b becomes the obvious choice as input parameters in our algorithm as opposed to fixed smoothing scales.

B.1. Mean and scatter of halo properties

The numerical convergence problems for LR simulations, can create an offset for the mean and the scatter of properties of low mass haloes. Most methods in the literature aiming to address convergence problems have aimed to reproduce these two statistics, the mean and the scatter. As we can see in Fig. B.2, when we apply HALOSCOPE to the LR haloes we can recover both the mean and the 1σ scatter measured for the HR haloes.

Fig. B.1.

Spearman rank correlation between the tidal anisotropy and the overdensity as a function of halo mass. Different fixed Gaussian smoothing scales, S, have been used, as indicated by the legend. The solid black line indicates the Spearman rank correlation at a variable scale corresponding to the radius of the halo, i.e. $4 \cdot R_{200 b} / \sqrt{5}$ $4\cdot R_{200b}/\sqrt {5}$ . Our chosen smoothing scale for the tidal anisotropy, 4·R_200b, corresponds to the maximum correlation for any mass range considered, and the correlation at this scale envelopes the correlations computed at other smoothing scales.

Fig. B.2.

Median halo property (from top to bottom: concentration, spin, and shape) as a function of halo mass for HR (orange), LR (purple), and LR+HALOSCOPE (grey) haloes. The shaded regions show the 68 percentile. Our method recovers the HR median and 1σ scatter.

Appendix C: Halo masses

In this work we used M_200b as our default definition for the halo mass. M_200b is the mass enclosed inside 200 × the background density. Another widely used mass definition is M_200c, the mass enclosed inside 200 × the critical density. The peak circular velocity over the accretion history of a halo, V_peak, has also been widely used in the literature as a proxy for halo masses.

These three quantities, M_200b, M_200c, and V_peak, are correlated (Fig. C.1). The correlation is tighter between M_200b and M_200c. In both cases, the scatter increases for smaller values of M_200b.

Fig. C.1.

Correlation between M_200b and M_200c (top panel) and V_peak (bottom panel). Black lines show the median and standard deviation. The number of haloes are shown as coloured regions, following the colour bar. The dashed grey line shows the one-to-one relation. All definitions of the halo mass are highly correlated, although there is substantial scatter in the case of V_peak.

C.1. Improving the low-mass halo mass function

The primary property of a dark matter halo is its mass. In simulations, there are several definitions and proxies for the mass of the halo due to the lack of consensus on a well-defined halo boundary. The convergence in halo mass is generally much better than for other secondary properties, while the secondary properties require a few hundred particles for convergence, mass convergence can be achieved at a few tens of particles. Yet further improvements in the halo mass function are desirable since model catalogues for large-scale surveys seek to maximise the dynamic range offered by a simulation (Armijo et al. 2022 and Angulo et al. 2014). Here we improve the mass function by applying the mass correction method in Forero-Sánchez et al. (2022), based on the random forest (RF) technique. We followed the hyperparameters for the training algorithm as prescribed in their work, except the number of estimators was set to range between 1 and 10. The other difference is that they use haloes from SUBFIND while we used ROCKSTAR haloes.

The LR mass function converges to the HR one, with sub-percent accuracy at high masses (Fig. C.2). At lower masses, the LR halo number density drops compared to the HR. The number of LR haloes with 40 particles is 10% below the HR number.

Fig. C.2.

Halo mass function for the LR (solid purple lines) and the HR simulation (solid orange lines). The mass function of the LR simulation is within 10% of HR simulation for LR haloes greater than 40 particles as can be seen from the solid purple line in the bottom panel. The black dashed line corresponds to the LR haloes after correcting their masses with the HR counterparts with a simple 1-1 matching. After the 1-1 matching, and training with RF, the LR halo number density still drops compared to the HR, for haloes with fewer than 40 particles.

We corrected the masses of LR haloes using one-to-one matching and an RF algorithm. This correction increases the completeness of the halo catalogue at higher masses (Fig. C.2). However, haloes with fewer than 40 particles are still lost. We find similar results when we perform the 1-1 matching with the alternative mass definitions introduced in Appendix C. We also experimented with alternate methods such as XGboost and NGboost and conclude that they only bring a marginal improvement over the RF algorithm.

Appendix D: Mass bin ranges

HALOSCOPE is applied in different mass bins (two first columns in Table D.1). This implies that $\bar{Σ}$ ${\boldsymbol {\bar {\Sigma }}}$ and $\bar{μ}$ ${\boldsymbol {\bar {\mu }}}$ in Eqs. 6 and 7 are recomputed for different mass ranges.

Table D.1.

KS statistic comparing 1D distributions of the LR+HALOSCOPE halo properties with respect to the HR ones in different mass bins, as indicated in the first two columns.

The differences between LR bivariate distributions and those for the HR haloes, decrease with increasing halo mass. This is expected, as halo properties start to be resolved in the LR simulation above a certain mass limit, for which enough haloes contain enough particles. This is shown in Fig. D.1 for the case of halo concentration versus halo spin. However, similar results are obtained for the other halo properties under study.

Fig. D.1.

Same as the top panel of Fig. 2 but for different mass bins. The right panel corresponds to massive haloes, with at least 6000 particles. The orange contours are for the HR haloes and the purple ones for the LR haloes. Dashed grey contours show the result of applying HALOSCOPE to the LR haloes. The differences between HR and LR decrease with halo mass, as halo properties can be resolved when haloes contain a large enough number of particles.

To quantify the performance of HALOSCOPE recovering the HR halo property distributions, we used the Kolmogorov–Smirnov (KS) test from the SCIPY module for the distribution of individual halo properties. In this case, the null hypothesis corresponds to the two samples belonging to the same distribution.

We computed the KS statistic in different mass bins (Table D.1). We limited our sample size to 10000 haloes in each mass range to prevent heightened sensitivity in our statistical tests for large sample sizes and to prevent frequent null hypothesis rejection (Lin et al. 2013). The 10000 haloes have been sampled randomly in each mass bin under study. Since p-value indicates the strength of evidence against the null hypothesis, we chose a p-value ≤ 0.001 to reject the null hypothesis, which corresponds to a KS statistic ≥ 0.0274.

All the studied cases are consistent with the null hypothesis (Table D.1), and thus, we can consider both the HR and LR+HALOSCOPE halo properties to have the same distributions.

All Tables

Table D.1.

KS statistic comparing 1D distributions of the LR+HALOSCOPE halo properties with respect to the HR ones in different mass bins, as indicated in the first two columns.

In the text

All Figures

Fig. 1.

Correlation matrices for the properties of haloes with the lowest masses considered here, 5·10¹¹<M_h (M_⊙ h⁻¹)<7·10¹¹. Each matrix element is colour coded with the value of the Spearman rank correlation value, from −1 to 1, as indicated in the colour bar. We distinguish between intrinsic halo properties, c={c_vir, λ, c/a, b/a}, and environmental ones, α={b₁, α_4R, δ_10R}. We can understand these correlation matrices as being composed of four blocks (Eq. (4)): correlations between environmental properties, ρ_αα; correlations between intrinsic and environmental properties, ρ_αc and ρ_cα; and correlations between intrinsic halo properties, ρ_cc. The LR simulation matrix is at the top, outlined in grey, and the HR one is at the bottom, outlined in black. Since the environmental properties are well resolved in both simulations, we indicate a common correlation block matrix, ρ_αα. Correlations between different halo properties are affected by resolution effects, as can be seen by comparing the top LR matrix with the bottom HR one. The key idea of our method is to incorporate the missing correlations, i.e. the shades of blue, which indicate negative correlations present in the HR simulation (bottom matrix) but not in the LR simulation (top matrix).

In the text

Fig. 2.

Confidence intervals (20%, 40%, 68%, and 95%) for the bi-variate distribution of pairs of halo properties in the mass range 2×10¹¹<M_200b (M_⊙ h⁻¹)<10¹², corresponding to haloes having from 20 to 100 particles in the LR simulation. Purple contours show the distribution of LR haloes and orange the HR haloes. The dashed black contours represent the new distribution of the LR haloes after applying HALOSCOPE. In the bottom right panel the grey area simply indicates a forbidden region due to the constraint a>b>c. Since our algorithm is designed to reproduce any multi-dimensional skewed distributions, it can also recover the bivariate distribution of LR halo properties as shown here.

In the text

Fig. 3.

Top panels: Linear halo bias as a function of halo mass for split populations of haloes, to show the halo AB. In blue (red) is shown the linear bias of the upper (lower) 25% of haloes according to: concentration and spin in the left panel; spin and two shape parameters in the middle panel; and concentration, spin, and two shape parameters in the right panel (see Sect. 5 for a detailed description of how the different halo populations are defined). Triangles show the HR haloes, dotted lines the LR ones, and the thick continuous lines are the result of applying HALOSCOPE to the LR haloes, LR+HALOSCOPE. Shaded regions correspond to the standard error of the mean. Bottom panels: LR and LR+HALOSCOPE results divided by the HR ones. This shows that our method when applied to the LR haloes can decrease the differences between HR and LR from ∼12−15% to less than 5%. Our algorithm is capable of recovering the multi-dimensional halo AB measured in HR simulations.

In the text

Fig. 4.

Same as Fig. 3, but HALOSCOPE has been trained with different input properties. In the top left panel, random uncorrelated inputs are used for training HALOSCOPE and when we apply our algorithm to the LR haloes, LR+HALOSCOPE, no halo AB is measured. In the top middle panel, we train HALOSCOPE with properties from LR haloes. For the other panels, HALOSCOPE is trained with the input properties indicated in the legend. In all the panels, the blue and red triangles correspond to the upper and lower 25% of (c_vir, λ, c/a, b/a) HR haloes. This is our reference. To recover the multi-dimensional halo AB, HALOSCOPE needs to be trained with haloes’ environmental properties; in particular b₁, α_4R, and a combination of the two give the best results.

In the text

	Fig. 5. Average number of central galaxies as a function of halo mass for model galaxies generated using an HOD model with two different implementations of galaxy AB (Eqs. (12) and (13)). We have produced two catalogues of central model galaxies hosted by haloes with similar mass distributions but different concentrations, spins, and shapes.
In the text

Fig. 6.

Top panel: Power spectrum for model central galaxies. In red are shown the power spectra measured in catalogues produced with a positive AB (+ve AB), with A_cen=B_cen=C_cen=D_cen = 1 in Eq. (13); in blue, those from catalogues with a negative AB (–ve AB), with A_cen=B_cen=C_cen=D_cen=−1. Continuous brighter lines show the results for galaxies generated on the HR simulation, while continuous lines with darker colours correspond to the LR simulation, as indicated in the legend. The dashed lines show the clustering from those catalogues generated on LR haloes corrected with HALOSCOPE, LR+HALOSCOPE. There are differences between the power spectrum measured from galaxies within the +ve AB and –ve AB catalogues, due to galaxy AB. These differences range from 20 to 50% and are present in the whole range of k modes. Bottom panel: Ratios with respect to the galaxy catalogues from the HR simulation of both the LR (solid lines) and LH+HALOSCOPE (dashed lines) dark matter haloes. The 0, 5, and 15 percent levels are indicated by horizontal grey lines. The ratios of power spectra from two tracers in the same density field cancel the statistical noise in the bottom panel, the fixed-pair method used in the UNIT simulations (more details in Sect. 6.2) also mitigates noise from cosmic variance. There are 15% differences between the power spectrum of HR and LR galaxy catalogues. Our algorithm, can reduce this difference to 5%.

In the text

Fig. A.1.

Fraction of the total haloes in a simulation formed by 20 to 500 particles, as a function of the particle resolution, M_part, of a simulation. This fraction has been calculated using the Tinker et al. (2008) halo mass function with Planck-18 cosmological parameters (Planck Collaboration VI 2020) and varies for different redshifts (see labels in the figure). The vertical dashed lines of different colours indicate the location of state-of-the-art simulations, as indicated in the legend. The vertical solid green corresponds to UNIT, the simulation used in this work (Sect. 2). Over 90% of the haloes in these simulations have unresolved halo properties and this number increases for higher redshifts.

In the text

Fig. B.1.

Spearman rank correlation between the tidal anisotropy and the overdensity as a function of halo mass. Different fixed Gaussian smoothing scales, S, have been used, as indicated by the legend. The solid black line indicates the Spearman rank correlation at a variable scale corresponding to the radius of the halo, i.e. $4 \cdot R_{200 b} / \sqrt{5}$ $4\cdot R_{200b}/\sqrt {5}$ . Our chosen smoothing scale for the tidal anisotropy, 4·R_200b, corresponds to the maximum correlation for any mass range considered, and the correlation at this scale envelopes the correlations computed at other smoothing scales.

In the text

	Fig. B.2. Median halo property (from top to bottom: concentration, spin, and shape) as a function of halo mass for HR (orange), LR (purple), and LR+HALOSCOPE (grey) haloes. The shaded regions show the 68 percentile. Our method recovers the HR median and 1σ scatter.
In the text

	Fig. C.1. Correlation between M_200b and M_200c (top panel) and V_peak (bottom panel). Black lines show the median and standard deviation. The number of haloes are shown as coloured regions, following the colour bar. The dashed grey line shows the one-to-one relation. All definitions of the halo mass are highly correlated, although there is substantial scatter in the case of V_peak.
In the text

Fig. C.2.

Halo mass function for the LR (solid purple lines) and the HR simulation (solid orange lines). The mass function of the LR simulation is within 10% of HR simulation for LR haloes greater than 40 particles as can be seen from the solid purple line in the bottom panel. The black dashed line corresponds to the LR haloes after correcting their masses with the HR counterparts with a simple 1-1 matching. After the 1-1 matching, and training with RF, the LR halo number density still drops compared to the HR, for haloes with fewer than 40 particles.

In the text

Fig. D.1.

Same as the top panel of Fig. 2 but for different mass bins. The right panel corresponds to massive haloes, with at least 6000 particles. The orange contours are for the HR haloes and the purple ones for the LR haloes. Dashed grey contours show the result of applying HALOSCOPE to the LR haloes. The differences between HR and LR decrease with halo mass, as halo properties can be resolved when haloes contain a large enough number of particles.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Alam, S., Aubert, M., Avila, S., et al. 2021, Phys. Rev., D, 103 [Google Scholar]

[2] Alam, S., Paranjape, A., & Peacock, J. A. 2024, MNRAS, 527, 3771 [Google Scholar]

[3] Angulo, R. E., & Pontzen, A. 2016, MNRAS, 462, L1 [NASA ADS] [CrossRef] [Google Scholar]

[4] Angulo, R. E., Baugh, C. M., Frenk, C. S., & Lacey, C. G. 2014, MNRAS, 442, 3256 [NASA ADS] [CrossRef] [Google Scholar]

[5] Armijo, J., Baugh, C. M., Padilla, N. D., Norberg, P., & Arnold, C. 2022, MNRAS, 510, 29 [Google Scholar]

[6] Avila, S., Murray, S. G., Knebe, A., et al. 2015, MNRAS, 450, 1856 [NASA ADS] [CrossRef] [Google Scholar]

[7] Balaguera-Antolínez, A., & Montero-Dorta, A. D. 2024, A&A, 692, A32 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[8] Balaguera-Antolínez, A., Kitaura, F. -S., Pellejero-Ibáñez, M., Zhao, C., & Abel, T. 2019, MNRAS, 483, L58 [CrossRef] [Google Scholar]

[9] Balaguera-Antolínez, A., Montero-Dorta, A. D., & Favole, G. 2024, A&A, 685, A61 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[10] Behroozi, P. S., Wechsler, R. H., & Wu, H. -Y. 2013a, ApJ, 762, 109 [NASA ADS] [CrossRef] [Google Scholar]

[11] Behroozi, P. S., Wechsler, R. H., Wu, H. -Y., et al. 2013b, ApJ, 763, 18 [NASA ADS] [CrossRef] [Google Scholar]

[12] Bett, P., Eke, V., Frenk, C. S., et al. 2007, MNRAS, 376, 215 [NASA ADS] [CrossRef] [Google Scholar]

[13] Bishop, C. M., & Nasrabadi, N. M. 2007, J. Electron. Imaging, 16, 049901 [NASA ADS] [CrossRef] [Google Scholar]

[14] Borzyszkowski, M., Porciani, C., Romano-Díaz, E., & Garaldi, E. 2017, MNRAS, 469, 594 [NASA ADS] [CrossRef] [Google Scholar]

[15] Catelan, P., & Theuns, T. 1996, MNRAS, 282, 436 [CrossRef] [Google Scholar]

[16] Chuang, C. -H., Yepes, G., Kitaura, F. -S., et al. 2019, MNRAS, 487, 48 [NASA ADS] [CrossRef] [Google Scholar]

[17] Coloma-Nadal, J. M., Kitaura, F. S., García-Farieta, J. E., et al. 2024, JCAP, 2024, 083 [CrossRef] [Google Scholar]

[18] Contreras, S., Angulo, R. E., & Zennaro, M. 2021, MNRAS, 504, 5205 [CrossRef] [Google Scholar]

[19] Croton, D. J., Gao, L., & White, S. D. M. 2007, MNRAS, 374, 1303 [Google Scholar]

[20] Dalal, N., White, M., Bond, J. R., & Shirokov, A. 2008, ApJ, 687, 12 [NASA ADS] [CrossRef] [Google Scholar]

[21] DESI Collaboration (Adame, A. G., et al.) 2024, AJ, 167, 62 [NASA ADS] [CrossRef] [Google Scholar]

[22] Desjacques, V. 2008, MNRAS, 388, 638 [NASA ADS] [CrossRef] [Google Scholar]

[23] Dhawalikar, S., & Paranjape, A. 2024, JCAP, 2024, 041 [Google Scholar]

[24] Diemer, B., & Joyce, M. 2019, ApJ, 871, 168 [NASA ADS] [CrossRef] [Google Scholar]

[25] Diemer, B., & Kravtsov, A. V. 2015, ApJ, 799, 108 [Google Scholar]

[26] Euclid Collaboration (Mellier, Y., et al.) 2025, A&A, 697, A1 [Google Scholar]

[27] Euclid Collaboration (Castander, F. J., et al.) 2025, A&A, 697, A5 [Google Scholar]

[28] Fall, S. M., & Efstathiou, G. 1980, MNRAS, 193, 189 [NASA ADS] [CrossRef] [Google Scholar]

[29] Faltenbacher, A., & White, S. D. M. 2010, ApJ, 708, 469 [NASA ADS] [CrossRef] [Google Scholar]

[30] Farahi, A., Anbajagane, D., & Evrard, A. E. 2022, ApJ, 931, 166 [Google Scholar]

[31] Feng, Y., Chu, M. -Y., Seljak, U., & McDonald, P. 2016, MNRAS, 463, 2273 [NASA ADS] [CrossRef] [Google Scholar]

[32] Forero-Sánchez, D., Chuang, C. -H., Rodríguez-Torres, S., et al. 2022, MNRAS, 513, 4318 [CrossRef] [Google Scholar]

[33] Gao, L., Springel, V., & White, S. D. M. 2005, MNRAS, 363, L66 [NASA ADS] [CrossRef] [Google Scholar]

[34] Gonzalez-Perez, V., Cui, W., Contreras, S., et al. 2020, MNRAS, 498, 1852 [NASA ADS] [CrossRef] [Google Scholar]

[35] Hadzhiyska, B., Eisenstein, D., Hernquist, L., et al. 2023, MNRAS, 524, 2507 [NASA ADS] [CrossRef] [Google Scholar]

[36] Hahn, O., Porciani, C., Carollo, C. M., & Dekel, A. 2007, MNRAS, 375, 489 [NASA ADS] [CrossRef] [Google Scholar]

[37] Hearin, A. P., & Watson, D. F. 2013, MNRAS, 435, 1313 [Google Scholar]

[38] Hearin, A. P., Zentner, A. R., van den Bosch, F. C., Campbell, D., & Tollerud, E. 2016, MNRAS, 460, 2552 [NASA ADS] [CrossRef] [Google Scholar]

[39] Heavens, A., & Peacock, J. 1988, MNRAS, 232, 339 [NASA ADS] [CrossRef] [Google Scholar]

[40] Heitmann, K., Finkel, H., Pope, A., et al. 2019, ApJS, 245, 16 [NASA ADS] [CrossRef] [Google Scholar]

[41] Ishiyama, T., Prada, F., Klypin, A. A., et al. 2021, MNRAS, 506, 4210 [NASA ADS] [CrossRef] [Google Scholar]

[42] Kataria, S. K., & Shen, J. 2022, ApJ, 940, 175 [NASA ADS] [CrossRef] [Google Scholar]

[43] Knebe, A., & Power, C. 2008, ApJ, 678, 621 [NASA ADS] [CrossRef] [Google Scholar]

[44] Lau, E. T., Hearin, A. P., Nagai, D., & Cappelluti, N. 2021, MNRAS, 500, 1029 [Google Scholar]

[45] Lazeyras, T., Musso, M., & Schmidt, F. 2017, JCAP, 2017, 059 [Google Scholar]

[46] Lin, M., Lucas, H. C., & Shmueli, G. 2013, Inf. Syst. Res., 24, 906 [Google Scholar]

[47] LSST Science Collaboration (Abell, P. A., et al.) 2009, arXiv e-prints [arXiv:0912.0201] [Google Scholar]

[48] Maksimova, N. A., Garrison, L. H., Eisenstein, D. J., et al. 2021, MNRAS, 508, 4017 [NASA ADS] [CrossRef] [Google Scholar]

[49] Mansfield, P., & Avestruz, C. 2021, MNRAS, 500, 3309 [Google Scholar]

[50] McDonald, P., & Seljak, U. 2009, JCAP, 2009, 007 [Google Scholar]

[51] Mendoza, I., Mansfield, P., Wang, K., & Avestruz, C. 2023, MNRAS, 523, 6386 [Google Scholar]

[52] Mo, H. J., Mao, S., & White, S. D. M. 1998, MNRAS, 295, 319 [Google Scholar]

[53] Monaco, P., Theuns, T., & Taffoni, G. 2002, MNRAS, 331, 587 [Google Scholar]

[54] Montero-Dorta, A. D., & Rodriguez, F. 2024, MNRAS, 531, 290 [NASA ADS] [CrossRef] [Google Scholar]

[55] Musso, M., Cadiou, C., Pichon, C., et al. 2018, MNRAS, 476, 4877 [Google Scholar]

[56] Oyarzún, G. A., Tinker, J. L., Bundy, K., Xhakaj, E., & Wyithe, J. S. B. 2024, ApJ, 974, 29 [Google Scholar]

[57] Paranjape, A. 2021, MNRAS, 502, 5210 [Google Scholar]

[58] Paranjape, A., Hahn, O., & Sheth, R. K. 2018, MNRAS, 476, 3631 [NASA ADS] [CrossRef] [Google Scholar]

[59] Paranjape, A., Choudhury, T. R., & Sheth, R. K. 2021, MNRAS, 503, 4147 [CrossRef] [Google Scholar]

[60] Paviot, R., Rocher, A., Codis, S., et al. 2024, A&A, 690, A221 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[61] Planck Collaboration VI. 2020, A&A, 641, A6 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[62] Posti, L., Famaey, B., Pezzulli, G., et al. 2020, A&A, 644, A76 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[63] Potter, D., Stadel, J., & Teyssier, R. 2017, Comput. Astrophys. Cosmol., 4, 2 [NASA ADS] [CrossRef] [Google Scholar]

[64] Ramakrishnan, S., & Velmani, P. 2022, MNRAS, 516, 5849 [Google Scholar]

[65] Ramakrishnan, S., Paranjape, A., Hahn, O., & Sheth, R. K. 2019, MNRAS, 489, 2977 [NASA ADS] [CrossRef] [Google Scholar]

[66] Ramakrishnan, S., Paranjape, A., & Sheth, R. K. 2021, MNRAS, 503, 2053 [NASA ADS] [CrossRef] [Google Scholar]

[67] Reyes-Peraza, G., Avila, S., Gonzalez-Perez, V., et al. 2024, MNRAS, 529, 3877 [NASA ADS] [CrossRef] [Google Scholar]

[68] Rocher, A., Ruhlmann-Kleider, V., Burtin, E., et al. 2023, JCAP, 2023, 016 [CrossRef] [Google Scholar]

[69] Rosenblatt, M. 1952, Ann. Math. Stat., 23, 470 [Google Scholar]

[70] Salcedo, A. N., Maller, A. H., Berlind, A. A., et al. 2018, MNRAS, 475, 4411 [NASA ADS] [CrossRef] [Google Scholar]

[71] Scoccimarro, R., & Sheth, R. K. 2002, MNRAS, 329, 629 [NASA ADS] [CrossRef] [Google Scholar]

[72] Sheth, R. K., & Tormen, G. 2004, MNRAS, 350, 1385 [NASA ADS] [CrossRef] [Google Scholar]

[73] Shi, J., Wang, H., & Mo, H. J. 2015, ApJ, 807, 37 [NASA ADS] [CrossRef] [Google Scholar]

[74] Shin, T. -H., & Diemer, B. 2023, MNRAS, 521, 5570 [NASA ADS] [CrossRef] [Google Scholar]

[75] Tinker, J., Kravtsov, A. V., Klypin, A., et al. 2008, ApJ, 688, 709 [Google Scholar]

[76] Tinker, J. L., Hahn, C., Mao, Y. -Y., & Wetzel, A. R. 2018, MNRAS, 478, 4487 [CrossRef] [Google Scholar]

[77] Wang, K., Mao, Y. -Y., Zentner, A. R., et al. 2020, MNRAS, 498, 4450 [NASA ADS] [CrossRef] [Google Scholar]

[78] Watson, D. F., Hearin, A. P., Berlind, A. A., et al. 2015, MNRAS, 446, 651 [NASA ADS] [CrossRef] [Google Scholar]

[79] Wechsler, R. H., Zentner, A. R., Bullock, J. S., Kravtsov, A. V., & Allgood, B. 2006, ApJ, 652, 71 [NASA ADS] [CrossRef] [Google Scholar]

[80] Xu, X., Zehavi, I., & Contreras, S. 2021, MNRAS, 502, 3242 [NASA ADS] [CrossRef] [Google Scholar]

[81] Yu, J., Zhao, C., Gonzalez-Perez, V., et al. 2024, MNRAS, 527, 6950 [Google Scholar]

[82] Yuan, S., Zhang, H., Ross, A. J., et al. 2024, MNRAS, 530, 947 [CrossRef] [Google Scholar]

[83] Zehavi, I., Zheng, Z., Weinberg, D. H., et al. 2011, ApJ, 736, 59 [NASA ADS] [CrossRef] [Google Scholar]

[84] Zhang, Z., Farahi, A., Nagai, D., et al. 2024, MNRAS, 530, 3127 [Google Scholar]

[85] Zheng, Z., Berlind, A. A., Weinberg, D. H., et al. 2005, ApJ, 633, 791 [NASA ADS] [CrossRef] [Google Scholar]