Open Access
Issue
A&A
Volume 690, October 2024
Article Number A186
Number of page(s) 13
Section Numerical methods and codes
DOI https://doi.org/10.1051/0004-6361/202450409
Published online 07 October 2024

© The Authors 2024

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1 Introduction

The formation of massive objects via gravitational collapse is an important subject of investigation to explain the origin of supermassive black holes and in the context of star formation in general. In the early Universe, more than 200 supermassive black holes are currently known at ɀ ≳ 6 (Bañados et al. 2016; Inayoshi et al. 2020). Several different origins have been proposed for the formation of their seeds, typically considering different variations of the pathways proposed by Rees (1984). All of these pathways involve gravitational collapse and gravitational instability, and understanding the outcome of gravitational collapse is thus very likely at the heart of resolving this problem. One of the proposed pathways is the so-called direct collapse, leading to the formation of a single massive object from a massive cloud (e.g., Bromm & Loeb 2003; Koushiappas et al. 2004; Begelman et al. 2006; Wise et al. 2008; Schleicher et al. 2010; Latif et al. 2013; Inayoshi et al. 2014; Regan et al. 2017; Woods et al. 2019). Recent observations with the James Webb Space Telescope (JWST) indeed reveal an object with a ~106 M that may have formed either due to direct collapse or from smaller seeds experiencing significant super-Eddington accretion (Larson et al. 2023).

We note here, however, that even under strongly idealized conditions, at least some fragmentation was still found, suggesting that the formation of the central massive object at least must be accompanied by the formation of additional clumps (e.g., Latif et al. 2013, 2021; Suazo et al. 2019; Prole et al. 2024).

Fragmentation in the context of the direct collapse scenario has meanwhile been studied under a wider range of conditions and was found to be difficult to avoid. The presence of H2 cooling, for example, was shown to decrease the mass of the most massive object and favor fragmentation (Latif et al. 2014). The presence of small amounts of dust grains can trigger fragmentation at densities above 1010 cm−3 and lead to the formation of rather low-mass clumps (Omukai et al. 2008; Latif et al. 2016). On the other hand, simulations also have shown that fragmentation does not necessarily impede the formation of a very massive object, as the clumps may also merge again (e.g., Suazo et al. 2019; Grete et al. 2019; Prole et al. 2024). This behavior can be understood as a result of clump migration in self-gravitating accretion disks (Latif & Schleicher 2015).

As a result of these investigations, it has become more common to consider models that allow both fragmentation and mergers to occur, considering growth via accretion as well as through mergers of fragments. A toy model for the dynamical evolution of a primordial protostellar cluster embedded in gas has been presented by Boekholt et al. (2018), showing that it could lead to the formation of massive objects of 105 M. Tagawa et al. (2020) presented similar results employing semi-analytic calculations. Chon & Omukai (2020) presented numerical simulations starting from cosmological initial conditions, showing that a central massive object can still accrete efficiently even in the presence of fragmentation. An analytic assessment of the formation of a massive object via collisions and accretion, including the effect of feedback, has been provided by Schleicher et al. (2022). Reinoso et al. (2023) presented numerical simulations following the dynamics of gas and protostars for more than 100 000 years, showing that objects of ~104 M can form.

Similar simulations including the formation of fragments and subsequent mergers have been recently pursued for example by Prole et al. (2022a, 2023) in the context of Population III (Pop. III) star formation (for a review, see Klessen & Glover 2023). Riaz et al. (2020), on the other hand, employed a parameterized equation of state to explore how variations in the equation of state, in particular the initial temperature, affect the initial mass function, including the formation of massive stars via fragmentation and accretion. Mass segregation can further play an additional role in changing the distribution of stars within the cluster, with the heavier ones dynamically sinking towards the center of the distribution (e.g., Bonnell & Davies 1998; Allison et al. 2009; Olczak et al. 2011).

The relevance of many of these phenomena is also known from the context of massive star formation (e.g., Bonnell et al. 2004; Peters et al. 2010b; Seifried et al. 2011; Myers et al. 2013). Investigating the relation between envelope masses and final stellar mass, Bonnell et al. (2004) found no correlation between the clump masses and the final masses. However, the masses of the less-than-solar-mass stars were closely related to the envelope masses at the time of protostar formation. In the case of stars with more than one solar mass, on the other hand, a significant fraction of their final mass was obtained from accretion outside their original envelope.

Other formation channels of massive objects, particularly black holes, are based on collisions only. These could occur in dense star clusters at low metallicities (Omukai et al. 2008; Devecchi & Volonteri 2009; Katz et al. 2015; Sakurai et al. 2017; Reinoso et al. 2018, 2020; Vergara et al. 2021) or Nuclear Star Clusters (NSCs, e.g., Escala 2021; Vergara et al. 2023). A relevant source could be even the mergers in black hole clusters, as initially investigated by Quinlan & Shapiro (1987, 1990). In those simulations, it was found that the black hole clusters would need to be extremely dense so that mergers via gravitational wave emission dominate over the ejection through three-body interactions. However, Davies et al. (2011) have shown that sufficient steepening of the black hole distribution may occur in the case that a gas inflow into the center acts as an external potential, thereby decreasing the timescale for gravitational wave emission. Lupi et al. (2014) have found the black hole population resulting from this channel to be relevant, and similarly, Kroupa et al. (2020) have shown that the latter provides an efficient mechanism to explain the observed supermassive black holes.

In this paper, our interest concerns particularly the formation of massive central objects from gas, including the processes of fragmentation and subsequent mergers. We focus here particularly on situations with small virial parameters α, thath is, low ratios of initial kinetic turbulent to gravitational energy and only low to moderate (M~1) Mach numbers. The focus of this investigation further concerns the phase before feedback is very efficient. The relations derived here can thus be expected to hold as long as these assumptions are being fulfilled; thereby allowing to obtain at least an upper limit for the mass of the most massive object that is obtained before strong feedback sets in. In particular, we analyze how much the formation of massive objects differs between different simulations, including different types of environments. For this purpose, we consider the mass of the most massive object normalized both via the total stellar mass in the cluster as well as the total mass in the cluster, considering both gas and stars, as a function of the star formation efficiency that we defined as the stellar mass divided by the initial gas mass, thus describing how much of the initial gas mass has already been converted into stars. The suite of simulations employed in this work is mostly employing piecewise-polytropic equations of state and not explicitly modeling the cooling. We may therefore miss some effects that could occur in the limit when the cooling time is comparable to or larger than the free-fall time of the simulations.

The data we consider here is summarized in Section 2. A visual comparison and analysis of the results is given in Section 3. In Section 4, we present a machine learning analysis to determine the correlation within the quantities that we analyzed here. A final summary and discussion is given in Section 5.

2 Simulation data

In the following, we describe the different simulation data that we employ here in this comparison. A summary of these is also provided in Table 1. The simulations of Reinoso et al. (2023) have been performed with the AMUSE framework1 (Pelupessy et al. 2013), modeling embedded primordial protostellar clusters employing a simplified equation of state, where hydrodynamics is treated using the smoothed particle hydrodynamics (SPH) code FI (Hernquist & Katz 1989), the stellar dynamics with the N-body code PH4 (McMillan & Hut 1996), which are coupled using the BRIDGE method (Fujii et al. 2007). Protostars are treated as sink particles and modeled following Hubber et al. (2013). Both gas and the protostellar distribution follow a Plum-mer sphere (Plummer 1911) with a Plummer radius of 0.077 pc. They explore total gas masses of 104 M and 3 × 104 M, employing 218 SPH particles. The initial number of protostars is 256, with each of them having an initial mass of 0.1 M. The simulations include an initial spectrum of non-compressive Kolmogorov-type turbulence corresponding to a Mach number of ℳ = 1, and the virial parameter α ranges between 0.33 and 1 depending on the mass of the cluster. For the mass-radius relation of the protostars, a fit to the relations calculated by Hosokawa & Omukai (2009); Hosokawa et al. (2012, 2013) is being employed. Supermassive protostars are known to contract when their average accretion rate falls below a value of ~0.04 M yr−1 (Hosokawa et al. 2013; Schleicher et al. 2013; Haemmerlé et al. 2018). In the simulations of Reinoso et al. (2023), this is assumed to happen when the accretion rate falls below the critical value of X Kelvin-Helmholtz timescales, where they explore X = 10 and X = 100. The simulations have been evolved for 200 000 years. For each parameter combination, they perform six simulations varying the initial random seed for the turbulence. For clouds of 104 M, they show the formation of massive objects in the range of ~2000–5000 M. For clouds with 3 × 104 M, the masses of the most massive object are in the range of ~21 000–27 000 M. The accretion radius of the sink particles is calculated iteratively via the interaction zone as defined by Hubber et al. (2013), with a lower limit of 10 AU and an upper limit of 500 AU.

The simulations by Solar et al. (2022) are the first explorations employing the same methodology as Reinoso et al. (2023) but focus on clouds with 3 × 103 M. Now in Solar et al. (in prep.), they pursue a systematic exploration of the effect of the initial gas temperature on the evolution of the cloud with a mass of 3 × 104 M, exploring initial temperatures of 500, 1000, 3000, 5000, and 8000 K. While the equation of state is initially assumed to be isothermal, it becomes adiabatic for densities larger than 1015 cm−3. They find the mass of the most massive object to increase with decreasing temperature, due to the increasing instability of the cloud, with masses in the range of ~19000–28 000 M after 30000 years. The initial turbulent Mach number in the simulations is equal to 1, and the virial parameter ranges between 0.03 and 0.33.

Chon & Omukai (2020) pursued a suite of hydrodynamical simulations employing the SPH code Gadget-22 (Springel 2005). They extract a massive compact gas cloud from the cosmological simulation of Chon et al. (2016, 2018), containing gas particles inside a radius of 7 × 106 AU from the center of the cloud with a total gas mass of 2 × 104 M. The equation of state is precalcu-lated following the one-zone models of Omukai et al. (2008) for the chemical evolution, exploring metallicities of Z/Z = 10−3, 10−4, 10−5, 5 × 10−6 and 10−6. Sink particles are formed when the density reaches 2 × 1016 cm−3. The simulations have been evolved for ~ 10 000 years. For the mass of the most massive object, they obtained ~300 M in case of Z/Z = 10−3 where the gas is affected by metal line cooling, while it reaches ~8000 M in simulations with lower metallicities. The adopted accretion radius of the sink particles is equal to the protostellar radius. The cloud shows transsonic turbulence with a virial parameter α ≤ 1.

Prole et al. (2022b) have performed two-stage zoom-in magneto-hydrodynamics (MHD) simulations of primordial star formation with the moving mesh code Arepo3 (Springel 2010). The simulations start from Bonnor-Ebert spheres (Ebert 1955; Bonnor 1956) with a central density of 2 × 10−20 g cm−3 and a radius RBE = 1.87 pc. The initial condition included a turbulent velocity field with a turbulent power spectrum ∝ k−2, which is normalized to obtain a ratio of turbulent to gravitational energy of 0.05. The turbulence is subsonic and the turbulent Mach number of order 0.3. The chemistry and thermodynamics have been solved employing an updated version of the chemical model from Clark et al. (2011). The authors present both purely hydrodynamical simulations, as well as simulations including tangled magnetic fields, following a Kazantsev spectrum of k3/2 (Kazantsev 1968), assuming a saturated magnetic field in equipartition with the turbulent velocity. The MHD equations are solved following Pakmor et al. (2011). The authors have explored critical densities for sink particle formation of 10−8 g cm−3, 10−9 g cm−3 and 10−10 g cm−3, After reaching the critical density for sink formation, the simulations are evolved for about ~1500 years. Typically, 10–20 fragments form in each simulation, with the total protostellar masses varying between 40−80 M. The most massive protostars in the simulations reach about ~10 M. The sink accretion radius was adopted as the Jeans length at the sink creation density.

Prole et al. (2023) presented cosmological simulations with the Arepo code (Springel 2010) including hydrodynamics and primordial chemistry, modeling gravitational collapse in dark matter halos with masses of 3 – 9 × 106 M within a cosmo-logical box of 1 Mpc/h, were h = 0.6774 is the reduced Hubble constant. The chemical network was based on an updated version of the appendix from Clark et al. (2011). They further considered the photo-dissociation of molecular hydrogen through Lyman-Werner (LW) photons with 11.2–13.6 eV can dissociate H2 via the two-step Solomon process (Field et al. 1966; Stecher & Williams 1967). They explore Lyman-Werner backgrounds of J21 = 0, 0.1 and 0.01, where J21 = 1 implies a Lyman-Werner band intensity of 10−21 erg s−1 cm−2 Hz−1 sr−1. From the cosmological initial conditions, turbulence formed dynamically during the gravitational collapse, producing typical turbulent Mach numbers of order 0.3 and typical virial parameters of order 0.05. They adopt a critical density for sink particle formation of 1018 cm−3. The simulations have been evolved for 300 years after the formation of the first sink particle. The number of sink particles formed in the different halos varies between 5 and 30. The total mass in sinks reaches 10–30 M, and the mass of the most massive sink is comparable, suggesting that few sink particles contain a significant fraction of the total stellar mass. The sink accretion radius was taken to be the protostellar radius estimated through recipes of Stahler et al. (1986).

Peters et al. (2010a,b) presented 3D radiation-hydrodynamical simulations of massive star formation including heating by ionizing and non-ionizing radiation using the adaptive-mesh code FLASH4 (Fryxell et al. 2000). The simulation starts from a 1000 M molecular cloud with a core density of 1.27 × 10−20 g cm−3 within a radius of 0.5 pc, falling of as r−3/2 until 1.6 pc. The temperature of the cloud is T = 30 K. It is assumed to be in solid body rotation with an angular velocity ω = 1.5 × 10−14 s−1, corresponding to a ratio of rotational to gravitational energy of 0.05. Turbulence was not explicitly included in the initial condition. Sink particles are created at a density of 7 × 10−16 g cm−3. We focus here on their run B, which includes multiple sinks and radiative feedback. In total, it leads to the formation of 25 sink particles, a total stellar mass of 125.56 M, where the most massive star reaches 23.39 M after 0.74 Myr.

Finally, we consider the simulations by Riaz et al. (2020) exploring how fragmentation and accretion affect the stellar initial mass function for varying temperatures of the gas, employing the SPH code GRADSPH5 (Vanaverbeke et al. 2009). They consider the gravitational collapse of a gas cloud with 30 M, a radius of 0.168 pc and a cloud density of 10−19 g cm−3. They assume rigid rotation with an angular velocity of 2.912 × 10−14 s−1, corresponding to a ratio of rotational to gravitational energy of 1%. The initial temperature is varied between 10 and 50 K, corresponding to ratios of the thermal to gravitational energy between 0.115 and 0.578. The simulations included initial turbulence with a Mach number of 1 , implying virial parameters in the range of 0.115–0.578. They employed a barotropic equation of state that was initially isothermal but became adiabatic at the density at which the gas is expected to become optically thick to dust emission following Omukai et al. (2005). Sink particle formation is modeled following Hubber et al. (2013), employing a constant sink radius of 1 AU that is always larger than the Jeans length. The simulations are evolved until 15% of the gas mass is converted into stellar mass. The number of sink particles at the end of the simulation varies between 7 and 30, and the mean mass per sink particle is between 0.15 and 0.64 M. The mass of the most massive sink varies between 0.7 and 1.3 M.

Table 1

Summary of the simulations employed here.

3 Visual comparison

As it is clear from the previous section and the summary in Table 1, the data are obtained from a range of different situations, with varying cloud masses and metallicities, differences in the physics and the numerical treatment such as the employed integration time after sink particle formation, among others. As we aim to identify similarities between the rather different simulations, a comparison of dimensional numbers is then clearly not meaningful, so it is important to work with normalized dimen-sionless quantities to determine how far they govern the behavior that was found in the simulations. While the simulations provide a time evolution, a simple comparison in terms of the time coordinate is also not meaningful, as the characteristic timescales may be different in the different simulations. To characterize the evolutionary stage, we employ the star formation efficiency ϵ(t)=M(t)Mgas,0,${_ * }(t) = {{{M_ * }(t)} \over {{M_{{\rm{gas}},0}}}},$(1)

where M*(t) is the total stellar mass at time t and Mgas,0 is the initial gas mass available to form stars. We further consider the fraction of the total stellar mass ƒ* (t) that corresponds to the mass of the most massive object MMMO(t) at time t, defined via f(t)=MMMO(t)M(t),${f_ * }(t) = {{{M_{{\rm{MMO}}}}(t)} \over {{M_ * }(t)}},$(2)

as well as the fraction of the total mass ƒtot that corresponds to the most massive object, ftot(t)=MMMO(t)Mtot,${f_{{\rm{tot}}}}(t) = {{{M_{{\rm{MMO}}}}(t)} \over {{M_{{\rm{tot}}}}}},$(3)

where Mtot is the sum of the initial mass of the gas and the stellar mass available in the cloud, Mtot =Mgas,0+M(t=0).${M_{{\rm{tot}}}} = {M_{{\rm{gas}},0}} + {M_ * }(t = 0).$(4)

In Fig. 1, we plot ƒ* as a function of є* for the set of different simulations in Table 1. From this plot, we note that the simulations by Reinoso et al. (2023), Solar et al. (in prep.) and Chon & Omukai (2020) reach star formation efficiencies of 40–90%. The simulations by Peters et al. (2010b) and Riaz et al. (2020) reach star formation efficiencies of the order 10%, while Prole et al. (2022a, 2023) reach efficiencies of the order of a few percent. We note here that these differences are often determined by the available runtime of the simulation, the numerical methodology, and the general objective of the simulation. However, also physical processes can play a relevant role; for example, towards the end of the simulations by Peters et al. (2010b), the HII region starts to expand, shutting off accretion and thus limiting the further growth of the star formation efficiency. Other simulations have not yet reached the stage, where feedback is relevant and their efficiencies are rather limited by the runtime of the simulations. For the functional dependence of ƒ* on є*, we can identify two main types of behaviors in Fig. 1: in the simulations by Reinoso et al. (2023), Solar et al. (in prep.) and Chon & Omukai (2020), ƒ* is initially low and subsequently increases, implying that the time evolution corresponds to a shift towards higher star formation efficiencies and higher fractions ƒ*. In the simulations by Reinoso et al. (2023) and Solar et al. (in prep.), this is a direct result of the initial conditions which include 256 protostars of equal mass, implying a low initial value of ƒ* that subsequently increases as a result of accretion and mergers. Similarly, in the simulations of Chon & Omukai (2020), a significant number of sink particles forms very early, leading to a low initial value of ƒ*.

On the other hand, in the simulations by Prole et al. (2022b, 2023), Peters et al. (2010b) and Riaz et al. (2020), sink particle formation occurs more gradually, starting initially with one sink particle and thus implying an initial value f* = 1, which subsequently decreases as additional sink particles form and acquire mass. We further see that the simulations terminate in rather different regimes; while in the simulations of Reinoso et al. (2023), Solar et al. (in prep.) and Chon & Omukai (2020), a very large fraction of the stellar mass is in the most massive particle (35–99.7%), the latter is more moderate in the simulations by Prole et al. (2022b, 2023), Peters et al. (2010b) and Riaz et al. (2020) with more typical values in the range of 20–30%.

The difference in this behavior in principle can be explained as a result of the initial conditions as well as the time evolution covered in the simulations. For example, as the simulations of Reinoso et al. (2023) and Solar et al. (in prep.) already start with initial protostars, the initial star formation efficiency in these simulations is already ~0.25%, while most of the evolution in the models by Prole et al. (2022b, 2023) covers lower efficiencies and different phases of the star formation process. Indeed, the simulation by Chon & Omukai (2020) suggests that the situation may be more complex, including phases where f* increases with ϵ* and others where f* decreases with increasing ϵ*. Nonetheless, some of the differences observed here could be a result of the rather different environments, an aspect we will consider in more detail below.

Now, we consider the evolution of ftot as a function of ϵ* in Fig. 2. While there are evident differences in the range of the star formation efficiencies covered by the different simulations as discussed in the context of Fig. 1, the evolution overall appears here more similar, as ftot is generally found to increase as a function of ϵ* with at most minor and temporary deviations.

Differences are however visible in the slope of the increase, with the simulations by Reinoso et al. (2023), Solar et al. (in prep.) and Chon & Omukai (2020) increasing more steeply, and a more gradual increase in the simulations by Prole et al. (2022b), Prole et al. (2023), Peters et al. (2010b) and Riaz et al. (2020). In the simulations by Reinoso et al. (2023), Solar et al. (in prep.) and Chon & Omukai (2020), fractions ftot of 30–100% are being reached, while this value remains in the percent or sub-percent range in the simulation of Prole et al. (2022b, 2023), Peters et al. (2010b) and Riaz et al. (2020), also reflecting the difference in the evolutionary stage as defined by the star formation efficiency ϵ*.

While most of the simulations we compare in this study (except Peters et al. 2010b) allow for mergers to happen, we note that these are particularly frequent in the simulations of Reinoso et al. (2023), Solar et al. (in prep.) and Chon & Omukai (2020). It is thus conceivable that some of the major differences we find in the simulations, particularly concerning the obtainable values of f*, are driven by the possibility to have collisions within the systems. In the following, we derive a simple criterion to estimate whether collisions can be expected to be relevant in a given simulation. Given an initial cloud mass Mgas,0 and an average mass per protostar m*, the number of protostars can be estimated to be N=ϵMgas ,0m.$N = {{{_ * }{M_{{\rm{gas}},0}}} \over {{m_ * }}}.$(5)

The number density of the protostars follows as n = N/V, where the volume V is calculated from the radius R0 assuming spherical symmetry, that is V=4π3R03$V = {{4\pi } \over 3}R_0^3$. Estimating the collision length λ via the mean free path, we have λ=1nσ,$\lambda = {1 \over {n\sigma }},$(6)

where σ is the effective cross section for collisions. Comparing the collision length to the radius of the cloud and inserting the above expressions, we find fcoll 1=λR0R02ϵσ(Mgas,0/m)R02σN.$f_{{\rm{coll}}}^{ - 1} = {\lambda \over {{R_0}}} \sim {{R_0^2} \over {{_ * }\sigma \left( {{M_{{\rm{gas}},0}}/{m_ * }} \right)}} \sim {{R_0^2} \over {\sigma N}}.$(7)

In our estimate, we aim not to anticipate any particular properties of the protostars that form but rather to assess very conservatively if collisions might be occurring within the system. Therefore, we conservatively estimate σ~R2π$\sigma \~R_ \odot ^2\pi $, with R the solar radius, noting that of course this likely represents an underestimate for simulations in which very massive objects are forming. While the information incorporated in this parameter is not complete, a decreasing value of fcoll1$f_{{\rm{coll}}}^{ - 1}$ may then suggest more frequent collisions.

We now present the fraction f* of the mass of the most massive object normalized by the total stellar mass M* as a function of the collision parameter fcoll1$f_{{\rm{coll}}}^{ - 1}$ in Fig. 3. We consider both parameters as a function of time, particularly accounting for the time evolution of the number of protostars N that have formed at time t. We find that the simulations of Reinoso et al. (2023), Solar et al. (in prep.) and Chon & Omukai (2020) have systematically lower values of fcoll1$f_{{\rm{coll}}}^{ - 1}$ compared to the other runs. Particularly, Reinoso et al. (2023) and Solar et al. (in prep.) report initial values fcoll1~108$f_{{\rm{coll}}}^{ - 1}\~{10^8}$, shifting towards fcoll1~109$f_{{\rm{coll}}}^{ - 1}\~{10^9}$ when a significant fraction of the mass concentrates in the most massive object, indicating efficient collisions to occur. For the simulations of Chon & Omukai (2020), we have a range of fcoll1~107109.5$f_{{\rm{coll }}}^{ - 1}\~{10^7} - {10^{9.5}}$, where the collision parameter increases over time as new protostars continue to form very efficiently. In the simulations by Prole et al. (2022b), the collision parameter starts out at fcoll1~1010$f_{{\rm{coll}}}^{ - 1}\~{10^10}$ and shifts towards higher values of fcoll 1~3×1011$f_{{\rm{coll }}}^{ - 1}\~3 \times {10^{11}}$ as a result of some collisions occurring, but more gradually over time. The simulations by Prole et al. (2023) represent even less evolution in this parameter and for most of it remain close to fcoll1~1012$f_{{\rm{coll}}}^{ - 1}\~{10^{12}}$. The Peters et al. (2010b) simulations start out similarly as Prole et al. (2022b) from fcoll1~1010$f_{{\rm{coll}}}^{ - 1}\~{10^10}$ subsequently and gradually evolving towards higher values. The simulations by Riaz et al. (2020) start out at fcoll1~108$f_{{\rm{coll}}}^{ - 1}\~{10^8}$ but move towards higher values already when f* is still low, indicating some mergers rather in the earlier phase of the evolution, and more distributed in time compared to the situation modeled by Reinoso et al. (2023) or Solar et al. (in prep.). It is conceivable that the impact after the first mergers in the simulations of Riaz et al. (2020) on the collision parameter is more significant, considering the overall lower number of protostars in the respective runs.

thumbnail Fig. 1

Fraction f* of the mass of most massive object MMMO normalized by the total stellar mass M* shown as a function of the star formation efficiency for the simulations summarized in Table 1. The gray arrow indicates the direction of the time evolution. For those authors with more than 8 simulations, we display some representative examples.

thumbnail Fig. 2

Fraction ftot of the mass of the most massive object MMMO normalized by the total mass Mtot as a function of the star formation efficiency ϵ* for the simulations summarized in Table 1. The black dotted line corresponds to the identity function, indicating the scenario where all the stellar mass is concentrated within the central object, representing the maximum possible value for ftot. The gray arrow indicates the direction of the time evolution. For those authors with more than 8 simulations, we display some representative examples.

thumbnail Fig. 3

Fraction f* of the mass of the most massive object MMMO normalized by the total stellar mass M* as a function of the collision parameter fcoll1$f_{{\rm{coll}}}^{ - 1}$ for the simulations summarized in Table 1. The gray arrow indicates the direction of the time evolution. For those authors with more than 8 simulations, we display some representative examples.

4 Machine learning analysis

After visually comparing the simulations employed in this study, we can conclude that different environments in the simulations lead to different relationships between ϵ* and f* (t). In this section, we aim to explore whether we can extract relevant information from these simulations using machine learning (ML) models to arrive at accurate parameter estimations based on the input from these simulations. In principle, these models may account for highly nonlinear behaviors (e.g., Bishop 2006). One of the objectives of these simulations is to study the formation and evolution of the most massive object. It is our interest to investigate whether an ML model (fθ) of parameters θ is able to estimate f*(ti) given n attributes or features xi = (xi,1, xi,2,xi,n) related to the specific time ti of snapshot i since the formation of the first sink particle in the simulation. Considering our dataset 𝒟 = {(x1, (x2, y2),(xq, yq)}, comprised of q pairs of attributes xi ∈ ℝn and labels yi ∈ ℝ obtained from all the simulations, the regression problem can be defined as: fθ(xi)=y^iy^iyi,${f_\theta }\left( {{x_i}} \right) = {\hat y_i} \to {\hat y_i} \approx {y_i},$(8)

where y^i${\hat y_i}$ corresponds to the prediction of the ML model, and yi = f*(ti). We seek to adjust θ so that fθ, given a cost function (y^i,yi)${\cal L}\left( {{{\hat y}_i},{y_i}} \right)$, can make accurate predictions about the mass of the most massive object, independent of the type of regime that dominates the gravitational collapse of the cloud.

We proceed to explain the preprocessing performed on our dataset, define the attributes that are used as input for the model, discuss the model selection process, and finally, we present the results and predictions obtained.

4.1 Data preprocessing

Although we successfully assembled a sufficiently large dataset covering various environments, the simulations span different ranges of ϵ*(t), and the time step for calculating each snapshot varies. These characteristics of the data may lead to overfitting during training or introduce bias towards runs with a higher number of points. To mitigate these potential issues, we initially limit the range of ϵ*(t) to values between 0 and 0.4. Additionally, we employ linear interpolation for each run, ranging from its minimum to its maximum value, resulting in new curves that are 500 points in length.

Now, concerning the variables supplied to the model for estimating f* (ti ), we regard ϵ* (ti) as a key parameter, offering insights into the amount of gas mass utilized in protostar formation. Similarly, as previously discussed, mergers play a crucial role in comprehensively characterizing the simulation environment. Therefore, we incorporate fcoll1(ti)$f_{{\rm{coll}}}^{ - 1}\left( {{t_i}} \right)$ as another significant factor.

To analyze if the dimensionless parameters we have chosen can effectively describe f* (t), we calculated their correlation coefficients. In Fig. 4, we use the Spearman correlation coefficient to analyze the relationship between the dimensionless parameters ϵ*, f*, ftot, and fcoll1$f_{{\rm{coll}}}^{ - 1}$ for the data in our sample. We find particularly strong correlations between ftot and ϵ* (0.929), ftot and f* (0.748), and a moderate correlation between ϵ* and f* (0.515). Of course, this is not fully unexpected, as we have ftot ≈ ϵ* f* ; however, the analysis interestingly shows that the correlation is particularly strong between ftot and ϵ*, while still strong but to a lesser degree between f* and ϵ*. Additionally, a moderate anti-correlation is present between ftot and fcoll1(0.411)$f_{{\rm{coll}}}^{ - 1}\left( { - 0.411} \right)$ and between ϵ* and fcoll1(0.388)$f_{{\rm{coll}}}^{ - 1}\left( { - 0.388} \right)$. From this calculation, we can conclude that the combination of fcoll1$f_{{\rm{coll}}}^{ - 1}$ and ϵ* forms pairs with different types of correlations with f* and with each other, making them suitable candidates as input values for the ML model. We don’t take into consideration the ftot parameter as an attribute for the training of the models since, as can be seen in Eq. (3), it is calculated directly with the value of the mass of the most massive object.

Finally, to mitigate overfitting and enhance confidence in predictions, we randomly divided our dataset into three subsets: training (16640 values), validation (4160 values), and testing (5200 values). The training of the best candidates obtained were repeated in a three-fold manner, with each iteration employing a distinct random data partitioning, ensuring the robustness of the predictions.

thumbnail Fig. 4

Correlogram for the dimensionless parameters calculated in this work. The values shown correspond to Spearman’s correlation coefficient ρs, which assesses the monotonic relationship between the ranks of the paired data points.

4.2 Model selection

The selection of the most suitable ML model is a computationally expensive task, often influenced by the nature and dimensionality of the data. To ensure that we identify the best model within a range of possibilities, we execute an AutoML workflow from the Modulos-AI6 platform, on the dataset for training and validation. The AutoML platform (version: 1.1.2) is designed for automated model selection and supervised training in a given task. This process involves detecting the uploaded data type, defining the task’s objective, and optimizing the combination of feature extractors, machine learning algorithms, and hyperparameters using a Bayesian optimizer (Srinivas et al. 2012). The platform identifies a candidate, trains it, validates it, and delivers the solution to the user. This cycle repeats until reaching a user-defined endpoint.

AutoML can also evaluate the efficiency of different preprocessing techniques or feature extractors before training the models, or it can leave the data as it is (referred to as the identity transformation on the platform). The preprocessing options that we study include standard scaling, which transforms the features of a dataset to have a mean of 0 and a standard deviation of 1, and t-test feature selection. The latter involves dividing the data into groups, applying the t-test to each feature, and selecting features with statistically significant differences in their means (Zhou & Wang 2007).

Regarding the possible learning algorithms in the platform, we select:

  • k-Nearest Neighbors (kNN): operating on the principle of proximity in feature space, kNN identifies the k-nearest data points to a given instance based on a predefined metric. In regression, the algorithm predicts the target value by averaging or aggregating the values of these nearest neighbors. The parameter k delineates the number of neighbors considered, acting as a decision boundary for data distribution (Fix & Hodges 1989; Altman 1992).

  • Support Vector Machine (SVM): the main idea consists of finding a hyperplane that maximizes the margin between data points and a regression line. For the non-linear case, a kernel function is used to map the data. SVMs minimize prediction error within a loss function, allowing for a tolerance region around the regression line (Cortes & Vapnik 1995).

  • Random Forest (RF): a robust algorithm that combines multiple decision trees for enhanced predictive power. Utilizing bootstrap aggregating, random subspace, and decision trees (Ho 1998), it draws bootstrap samples, constructs decision trees, and employs majority voting for final predictions (Breiman 2001).

  • Extreme Gradient Boosting (XGBoost): an efficient and high-performance supervised learning method, rooted in the gradient boosting algorithm (Friedman 2001). It operates by sequentially incorporating decision trees into its model, selecting trees that minimize the gradients between the target variable and the model’s predictions at each step. This process continues until a predetermined number of iterations or a specified error threshold is reached (Chen & Guestrin 2016).

  • LightGBM: also with a gradient boosting framework, Light-GBM employs a tree-based learning strategy, similar to XGBoost, and distinguishes itself through its focus on leaf-wise tree growth, where each added node corresponds to the leaf with the maximum delta loss, resulting in a more complex but narrower tree structure (Ke et al. 2017).

To evaluate the confidence of the models, we take the root mean square error (RMSE) as the cost function: RMSE=1ni=1n(ytrue ,iypred ,i)2,${\rm{RMSE}} = \sqrt {{1 \over n}\mathop {{{\mathop \sum \nolimits^ }^}}\limits_{i = 1}^n {{\left( {{y_{{\rm{true}},i}} - {y_{{\rm{pred}},i}}} \right)}^2}}, $(9)

which is widely utilized in machine learning algorithms, particularly regression models, to quantify the accuracy of predictions. It gauges the square root of the average squared differences between predicted and actual values, providing a comprehensive measure of the model’s predictive performance.

4.3 Results

We execute the workflow described above with the intention that if there are no improvements in the RMSE score using the validation set after 350 solution candidates, it is paused. We select the best candidate for each algorithm, retrain it in a 3-fold manner from scratch, and utilize the test set to evaluate its performance (for more information, see Appendix A). These results can be seen in Table 2, where we summarize different metrics besides RMSE for completeness. Simple algorithms such as kNN and SVM fail to predict with confidence, yielding lower results on each different metric. This can be attributed to a variety of reasons, but looking at the interactions between the dimensionless parameters in Figs. 13, they show a high scatter for different authors, even after preprocessing, which strongly affects kNN and SVM, as they are sensitive to high variances in the data.

In contrast, ensemble models, such as RF, LightGBM, and XGBoost, exhibit robustness against outliers, thereby enabling them to generate predictions with heightened reliability. Notably, RF distinguishes itself by demonstrating superior performance across all evaluation metrics. This efficacy is principally attributable to the robustness that RF has against noisy data, which lets it handle irrelevant features without significantly impacting its performance. Consequently, RF emerges as the preferred model for our subsequent analysis.

A comparison between original and predicted values of ƒ* (t) is given in Fig. 5 for the different simulations in our test sample. We generally find that most of the predictions are close to the true values so the majority of the data points are close to the diagonal lines of the diagram. Particularly good predictions are obtained for the simulations of Peters et al. (2010b), Riaz et al. (2020), and Prole et al. (2022b). Some of the predicted values for the simulations of Chon & Omukai (2020), Solar et al. (in prep.), and Reinoso et al. (2023) lie further away from the simulated data points. The latter can be attributed to the high dispersion of ƒ* in simulations sharing the same initial conditions, as it can be misinterpreted by the models as noisy data. An example of this can be seen in Fig. 6, where we calculate the average and standard deviation of ƒ* for two different authors. For illustration, we provide typical examples of the reconstruction of the relation between ƒ* and є* in Fig. 7, showing that the predictions of ƒ* (t) obtained via Random Forest are relatively close to the original simulations.

Table 2

Summary of test stage for the different models.

thumbnail Fig. 5

Correlation plot for the average of the ƒ* (t) predictions of the test set by RF. The color palette represents the different authors of the simulations, and we include the RMSE for each case.

thumbnail Fig. 6

Examples of the dispersion of ƒ* in simulations of Solar et al. (in prep.) (2 runs, top panel) and Reinoso et al. (2023) (6 runs, bottom panel) with the same initial conditions but different seeds. The dashed line corresponds to the average between the simulations and the shading to the standard deviation at each point of є*.

thumbnail Fig. 7

Predicted values for ƒ* (t) by RF are depicted. The dashed black line represents the values of the simulation, the red dots indicate the average of the predictions, and the shaded area represents the standard deviation of predictions. We present random examples for all authors and for different ranges of є* (t).

5 Summary

The formation of massive objects via gravitational collapse is important to explain the origin of massive black holes and stars. Here we have analyzed and compared numerical simulations of gravitational collapse in different environments, focusing on the regime of low initial ratios of turbulent to gravitational energy, low to moderate Mach numbers and the absence of strong feedback. We consider the formation of massive black holes in the early Universe (Chon & Omukai 2020; Solar et al. 2022; Reinoso et al. 2023), the formation of primordial massive stars (Prole et al. 2022b, 2023), star formation at low metallicity (Riaz et al. 2020) and formation of massive stars in molecular cores (Peters et al. 2010b). As most of these simulations employ a piecewise-polytropic equation of state, our dataset does not allow to analyse the effect of the cooling time on the dynamics or the evolution of the most massive object. For a meaningful comparison of the simulations of different environments, we have introduced dimensionless parameters, particularly the star formation efficiency є* describing the fraction of mass that was turned into stars, the ratio ƒ* corresponding to the mass of the most massive star normalized by the total stellar mass, the ratio ƒtot corresponding to the mass of the most massive star normalized by the total mass, as well as a collision parameter ƒcoll that describes the probability to have collisions within the system. In a visual comparison of the simulation results, we find that the relation and dynamical evolution of these quantities at least is partly related to the initial configuration. Particularly, in simulations that from the start have a significant number of protostars or sink particles already, ƒ* tends to increase when the efficiency є* increases, as one of the protostars becomes more massive, leading to a higher value of є* (e.g., Solar et al. 2022; Reinoso et al. 2023). In other simulations that start with the formation of one sink particle, on the other hand, and other sinks then form progressively, ƒ* has an initial value of 1 but is found to decrease over time while є* increases as collapse continues and more gas may be converted into stars (Prole et al. 2022b, 2023; Peters et al. 2010b; Riaz et al. 2020). Some simulations such as Chon & Omukai (2020) further show an intermediate behavior, where in part of the simulation ƒ* increases with є* and in other parts it decreases, as different dynamics may take over at different stages of the evolution.

Considering the relation between ƒtot and є*, on the other hand, the overall evolution in the simulations appears more similar, and ƒtot generally is found to increase with є*. The most notable difference here is that this relation is steeper in simulations corresponding to very strong gravitational instability where very massive objects form, as in the cases of Chon & Omukai (2020), Solar et al. (in prep.) and Reinoso et al. (2023). We further analyze how ƒ* relates to the collision parameter ƒcoll, finding that indeed this parameter is adequate to characterize and distinguish between different regimes, where the simulations of Chon & Omukai (2020), Riaz et al. (2020), Solar et al. (in prep.) and Reinoso et al. (2023) correspond to particularly high collision parameters, while Peters et al. (2010b) and Prole et al. (2022b, 2023) to more moderate or low collision parameters, allowing to estimate how important collisions are during the evolution of the simulation.

To quantitatively analyze the relation between these dimen-sionless parameters, we used the Spearman correlation coefficient for assessing the monotonic relationship between the ranks of paired data points. From this analysis, we found a very strong correlation between ƒtot and є*, a moderately strong correlation between ƒ* and є*. At the same time, ƒcoll has a moderate correlation with ƒtot, a very weak correlation with ƒ* and again a moderate one with є*.

We finally employed different machine learning models that have been trained with the data provided from these simulations, where the best results have been obtained with LightGBM, XGBoost and Random Forest. After training with a subset of the provided simulations, the machine learning models are able to make a good prediction of the parameter ƒ* when є* and ƒcoll have been provided, suggesting that they provide a good enough indicator for estimating the mass of the most massive object relative to the total stellar mass. We provide various examples where ƒ* is reconstructed as a function of є* via the trained models, showing a good agreement with the original simulation data.

The results obtained here suggest that the outcome of gravitational collapse is not universal, but rather could be divided into different regimes. An approximate characterization of these regimes can be achieved via the collision parameter introduced here, which is roughly defined as the length scale of the system divided by the typical collision length. While in principle in all simulations ƒtot increases with є*, the steepness of the relation depends specifically on the level of instability characterized through this parameter. We do not claim here that it is the only parameter that affects the formation of massive objects during gravitational collapse, but at least it provides a good description for the data that we analyzed here.

Beyond establishing the relation between ƒ* and є*, the potential to form very massive objects further depends on the maximum star formation efficiency that can be reached in a given environment (Girichidis et al. 2020). The latter is likely limited by feedback, for example in terms of ionizing radiation (e.g., Haid et al. 2018; Latif et al. 2021), feedback from stellar winds (Das et al. 2021) or supernova explosions (Wise & Abel 2008; Latif & Schleicher 2020). The strength of the feedback may even depend on the evolution of the supermassive star itself, particularly if it remains in a bloated state with moderate atmospheric temperatures where radiation feedback remains inefficient for a longer period of time (e.g., Hosokawa et al. 2012, 2013; Schleicher et al. 2013; Woods et al. 2019), as well as the transition point when the gravitational instability occurs and the star is turned into a massive black hole (Umeda et al. 2016; Haemmerlé et al. 2018), thereby changing the feedback dynamics. As in most of the simulations analyzed here, feedback has not been explicitly included, we expect that our results will only hold until feedback significantly affects the dynamics of the collapse. While the results obtained here do not allow us to determine the maximum star formation efficiency that can be achieved, they at least tell us the expected value of ƒ* for a given value of є*. Even once all gas is expelled as a result of feedback, the latter does not necessarily imply the end of the evolution, but the central massive object may potentially still grow as a result of collisions, as shown in a variety of different studies (e.g., Sakurai et al. 2017; Reinoso et al. 2018, 2020; Vergara et al. 2021, 2023).

Another limitation concerns the assumption of low turbulent to gravitational energies as well as low to moderate Mach numbers in the initial conditions of the analyzed simulations. This assumption may not always be appropriate, and particularly in cores or clouds with very high turbulent Mach numbers (e.g. McKee & Tan 2003; Klessen 2003; Tan et al. 2014), the resulting dynamics could be different. Comparing the dynamics in such environments will certainly be important and interesting, but is also clearly beyond the scope of the current work.

Data availability

The data underlying this article will be shared on reasonable request to the corresponding author.

Acknowledgements

We gratefully acknowledge support by the ANID BASAL project FB21003, as well as via Fondecyt Regular (project code 1201280) and ANID QUIMAL220002. D.R.G.S. thanks for funding via the Alexander von Humboldt - Foundation, Bonn, Germany. R.S.K. acknowledges financial support from the European Research Council via the ERC Synergy Grant “ECOGAL” (project ID 855130), from the German Excellence Strategy via the Heidelberg Cluster of Excellence (EXC 2181 - 390900948) “STRUCTURES”, and from the German Ministry for Economic Affairs and Climate Action in project “MAINN” (funding ID 50OO2206). R.S.K. also thanks for computing resources provided by the Ministry of Science, Research and the Arts (MWK) of the State of Baden-Württemberg through bwHPC and the German Science Foundation (DFG) through grants INST 35/1134-1 FUGG and 35/1597-1 FUGG, and also for data storage at SDS@hd funded through grants INST 35/1314-1 FUGG and INST 35/1503-1 FUGG. K.O. thanks for funding by the Grants-in-Aid for Basic Research by the Ministry of Education, Science and Culture of Japan (KO:22H00149). B.R. acknowledges funding through ANID (CONICYT-PFCHA/Doctorado acuerdo bilateral DAAD/62180013), DAAD (funding program number 57451854), the International Max Planck Research School for Astronomy and Cosmic Physics at the University of Heidelberg (IMPRS-HD), and the European Research Council via ERC Consolidator Grant KETJU (no. 818930). L.P. acknowledges support from the Irish Research Council Laureate programme under grant number IRCLA/2022/1165, as well as the DiRAC@Durham facility managed by the Institute for Computational Cosmology on behalf of the STFC DiRAC HPC Facility (www.dirac.ac.uk). The equipment was funded by BEIS capital funding via STFC capital grants ST/P002293/1, ST/R002371/1 and ST/S002502/1, Durham University and STFC operations grant ST/R000832/1. DiRAC is part of the National e-Infrastructure. L.P. also acknowledges the support of the Supercomputing Wales project, which is part-funded by the European Regional Development Fund (ERDF) via Welsh Government. R.R. thanks for funding through Agencia Nacional de Investigación y Desarrollo (ANID) (project code SA77210037). P.S. acknowledges support through ANID/Doctorado en el Extranjero convocatoria 2022 (funding number 72220198), the Federal Ministry of Education and Research and the state governments for supporting this project as part of the joint funding of National High Performance Computing (NHR) and the Kultrun cluster hosted at the Departamento de Astronomía, Universidad de Concepción.

Appendix A ML models and hyperparameters

In this section we detail each solution obtained at the end of the workflow presented in Sec. 4, including hyperparameters and preprocessing. All solutions were trained from scratch with 3 different splits of data for K-fold cross-validation:

  • k-Nearest Neighbours:

    • Feature extraction: Identity transformation.

    • Model specifications: k = 30 and distance defined as the Euclidean distance (l2 norm).

    • Training Score: RMSE= 0.159 ± 0.0004.

    • Validation Score: RMSE= 0.160 ± 0.0033.

  • SVM:

    • Feature extraction: Identity transformation.

    • Model specifications: Radial basis function (RBF) kernel with γ = 10−5, and a regularization parameter C = 1000.

    • Training Score: RMSE= 0.196 ± 0.0022.

    • Validation Score: RMSE= 0.212 ± 0.0014.

  • Random Forest:

    • Feature extraction: Standard Scaling of numerical values.

    • Model specifications: The number of estimator is set to n = 999 and the minimum number of samples required to be a leaf node is defined as 2.

    • Training Score: RMSE= 0.054 ± 0.0005.

    • Validation Score: RMSE= 0.067 ± 0.0008.

  • XGBoost:

    • Feature extraction: t-test feature selection, the fraction of total number of features selected from the training dataset is 0.2.

    • Model specifications: Learning rate α = 0.246, the minimum loss reduction required to make a further partition γ = 0.113, and a maximum depth for each decision tree of 16.

    • Training Score: RMSE= 0.064 ± 0.0002.

    • Validation Score: RMSE= 0.076 ± 0.0023.

  • LightGBM:

    • Feature extraction: t-test feature selection, the fraction of total number of features selected from the training dataset is 0.2.

    • Model specifications: The maximum number of leaves for each decision tree is set to 185, with a maximum depth of 180.

    • Training Score: RMSE= 0. 067 ± 0. 0004.

    • Validation Score: RMSE= 0.076 ± 0.0056.

Finally, in Fig. A.1, we provide the correlation plots for the predictions of ƒ*(t) for all models. The summary of the metrics relative to these predictions with the test set can be found in Table 2.

All the solutions are programmed in Python 3 (version: 3.12.2) under the framework of AutoML, which is executed automatically. However, the implementations of the models come from open-source Python packages, including scikit-learn7, Faiss8, LightGBM9, and XGBoost10. The trained models and respective statistical weights shown in this article will be shared upon reasonable request to the corresponding author.

thumbnail Fig. A.1

Correlation plots for the test set predictions. Results are shown for all candidates, we also perform a Gaussian kernel density estimation to highlight the regions with more data points.

References

  1. Allison, R. J., Goodwin, S. P., Parker, R. J., et al. 2009, ApJ, 700, L99 [Google Scholar]
  2. Altman, N. S. 1992, Am. Stat., 46, 175 [Google Scholar]
  3. Bañados, E., Venemans, B. P., Decarli, R., et al. 2016, ApJS, 227, 11 [Google Scholar]
  4. Begelman, M. C., Volonteri, M., & Rees, M. J. 2006, MNRAS, 370, 289 [NASA ADS] [CrossRef] [Google Scholar]
  5. Bishop, C. M. 2006, Pattern Recognition and Machine Learning (Information Science and Statistics) (Berlin, Heidelberg: Springer-Verlag) [Google Scholar]
  6. Boekholt, T. C. N., Schleicher, D. R. G., Fellhauer, M., et al. 2018, MNRAS, 476, 366 [Google Scholar]
  7. Bonnell, I. A., & Davies, M. B. 1998, MNRAS, 295, 691 [CrossRef] [Google Scholar]
  8. Bonnell, I. A., Vine, S. G., & Bate, M. R. 2004, MNRAS, 349, 735 [Google Scholar]
  9. Bonnor, W. B. 1956, MNRAS, 116, 351 [NASA ADS] [CrossRef] [Google Scholar]
  10. Breiman, L. 2001, Mach. Learn., 45, 5 [Google Scholar]
  11. Bromm, V., & Loeb, A. 2003, Nature, 425, 812 [NASA ADS] [CrossRef] [Google Scholar]
  12. Chen, T., & Guestrin, C. 2016, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ‘16 (New York, NY, USA: Association for Computing Machinery), 785 [CrossRef] [Google Scholar]
  13. Chon, S., & Omukai, K. 2020, MNRAS, 494, 2851 [Google Scholar]
  14. Chon, S., Hirano, S., Hosokawa, T., & Yoshida, N. 2016, ApJ, 832, 134 [NASA ADS] [CrossRef] [Google Scholar]
  15. Chon, S., Hosokawa, T., & Yoshida, N. 2018, MNRAS, 475, 4104 [CrossRef] [Google Scholar]
  16. Clark, P. C., Glover, S. C. O., Smith, R. J., et al. 2011, Science, 331, 1040 [Google Scholar]
  17. Cortes, C., & Vapnik, V. 1995, Mach. Learn., 20, 273 [Google Scholar]
  18. Das, A., Schleicher, D. R. G., Basu, S., & Boekholt, T. C. N. 2021, MNRAS, 505, 2186 [NASA ADS] [CrossRef] [Google Scholar]
  19. Davies, M. B., Miller, M. C., & Bellovary, J. M. 2011, ApJ, 740, L42 [NASA ADS] [CrossRef] [Google Scholar]
  20. Devecchi, B., & Volonteri, M. 2009, ApJ, 694, 302 [NASA ADS] [CrossRef] [Google Scholar]
  21. Ebert, R. 1955, ZAp, 37, 217 [Google Scholar]
  22. Escala, A. 2021, ApJ, 908, 57 [NASA ADS] [CrossRef] [Google Scholar]
  23. Field, G. B., Somerville, W. B., & Dressler, K. 1966, ARA&A, 4, 207 [NASA ADS] [CrossRef] [Google Scholar]
  24. Fix, E., & Hodges, J. L. 1989, Int. Stat. Rev., 57, 238 [CrossRef] [Google Scholar]
  25. Friedman, J. H. 2001, Annal. Stat., 29, 1189 [CrossRef] [Google Scholar]
  26. Fryxell, B., Olson, K., Ricker, P., et al. 2000, ApJS, 131, 273 [Google Scholar]
  27. Fujii, M., Iwasawa, M., Funato, Y., & Makino, J. 2007, PASJ, 59, 1095 [NASA ADS] [Google Scholar]
  28. Girichidis, P., Offner, S. S. R., Kritsuk, A. G., et al. 2020, Space Sci. Rev., 216, 68 [NASA ADS] [CrossRef] [Google Scholar]
  29. Grete, P., Latif, M. A., Schleicher, D. R. G., & Schmidt, W. 2019, MNRAS, 487, 4525 [NASA ADS] [CrossRef] [Google Scholar]
  30. Haemmerlé, L., Woods, T. E., Klessen, R. S., Heger, A., & Whalen, D. J. 2018, MNRAS, 474, 2757 [Google Scholar]
  31. Haid, S., Walch, S., Seifried, D., et al. 2018, MNRAS, 478, 4799 [Google Scholar]
  32. Hernquist, L., & Katz, N. 1989, ApJS, 70, 419 [NASA ADS] [CrossRef] [Google Scholar]
  33. Ho, T. K. 1998, IEEE Trans. Pattern Anal. Mach. Intell., 20, 832 [Google Scholar]
  34. Hosokawa, T., & Omukai, K. 2009, ApJ, 703, 1810 [NASA ADS] [CrossRef] [Google Scholar]
  35. Hosokawa, T., Omukai, K., & Yorke, H. W. 2012, ApJ, 756, 93 [Google Scholar]
  36. Hosokawa, T., Yorke, H. W., Inayoshi, K., Omukai, K., & Yoshida, N. 2013, ApJ, 778, 178 [Google Scholar]
  37. Hubber, D. A., Walch, S., & Whitworth, A. P. 2013, MNRAS, 430, 3261 [NASA ADS] [CrossRef] [Google Scholar]
  38. Inayoshi, K., Omukai, K., & Tasker, E. 2014, MNRAS, 445, L109 [NASA ADS] [CrossRef] [Google Scholar]
  39. Inayoshi, K., Visbal, E., & Haiman, Z. 2020, ARA&A, 58, 27 [NASA ADS] [CrossRef] [Google Scholar]
  40. Katz, H., Sijacki, D., & Haehnelt, M. G. 2015, MNRAS, 451, 2352 [Google Scholar]
  41. Kazantsev, A. P. 1968, Sov. Phys. JETP, 26, 1031 [Google Scholar]
  42. Ke, G., Meng, Q., Finley, T., et al. 2017, in Advances in Neural Information Processing Systems, eds. I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (USA: Curran Associates, Inc.), 30 [Google Scholar]
  43. Klessen, R. S. 2003, in Reviews in Modern Astronomy, ed. R. E. Schielicke (Berlin: Springer), 16, 23 [NASA ADS] [Google Scholar]
  44. Klessen, R. S., & Glover, S. C. O. 2023, ARA&A, 61, 65 [NASA ADS] [CrossRef] [Google Scholar]
  45. Koushiappas, S. M., Bullock, J. S., & Dekel, A. 2004, MNRAS, 354, 292 [CrossRef] [Google Scholar]
  46. Kroupa, P., Subr, L., Jerabkova, T., & Wang, L. 2020, MNRAS, 498, 5652 [Google Scholar]
  47. Larson, R. L., Finkelstein, S. L., Kocevski, D. D., et al. 2023, ApJ, 953, L29 [NASA ADS] [CrossRef] [Google Scholar]
  48. Latif, M. A., & Schleicher, D. R. G. 2015, A&A, 578, A118 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  49. Latif, M. A., & Schleicher, D. 2020, ApJ, 902, L31 [NASA ADS] [CrossRef] [Google Scholar]
  50. Latif, M. A., Schleicher, D. R. G., Schmidt, W., & Niemeyer, J. 2013, MNRAS, 433, 1607 [NASA ADS] [CrossRef] [Google Scholar]
  51. Latif, M. A., Schleicher, D. R. G., Bovino, S., Grassi, T., & Spaans, M. 2014, ApJ, 792, 78 [NASA ADS] [CrossRef] [Google Scholar]
  52. Latif, M. A., Omukai, K., Habouzit, M., Schleicher, D. R. G., & Volonteri, M. 2016, ApJ, 823, 40 [NASA ADS] [CrossRef] [Google Scholar]
  53. Latif, M. A., Khochfar, S., Schleicher, D., & Whalen, D. J. 2021, MNRAS, 508, 1756 [NASA ADS] [CrossRef] [Google Scholar]
  54. Lupi, A., Colpi, M., Devecchi, B., Galanti, G., & Volonteri, M. 2014, MNRAS, 442, 3616 [NASA ADS] [CrossRef] [Google Scholar]
  55. McKee, C. F., & Tan, J. C. 2003, ApJ, 585, 850 [Google Scholar]
  56. McMillan, S. L. W., & Hut, P. 1996, ApJ, 467, 348 [CrossRef] [Google Scholar]
  57. Myers, A. T., McKee, C. F., Cunningham, A. J., Klein, R. I., & Krumholz, M. R. 2013, ApJ, 766, 97 [Google Scholar]
  58. Olczak, C., Spurzem, R., & Henning, T. 2011, A&A, 532, A119 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  59. Omukai, K., Tsuribe, T., Schneider, R., & Ferrara, A. 2005, ApJ, 626, 627 [NASA ADS] [CrossRef] [Google Scholar]
  60. Omukai, K., Schneider, R., & Haiman, Z. 2008, ApJ, 686, 801 [Google Scholar]
  61. Pakmor, R., Bauer, A., & Springel, V. 2011, MNRAS, 418, 1392 [NASA ADS] [CrossRef] [Google Scholar]
  62. Pelupessy, F. I., van Elteren, A., de Vries, N., et al. 2013, A&A, 557, A84 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  63. Peters, T., Banerjee, R., Klessen, R. S., et al. 2010a, ApJ, 711, 1017 [NASA ADS] [CrossRef] [Google Scholar]
  64. Peters, T., Klessen, R. S., Mac Low, M.-M., & Banerjee, R. 2010b, ApJ, 725, 134 [Google Scholar]
  65. Plummer, H. C. 1911, MNRAS, 71, 460 [Google Scholar]
  66. Prole, L. R., Clark, P. C., Klessen, R. S., & Glover, S. C. O. 2022a, MNRAS, 510, 4019 [NASA ADS] [CrossRef] [Google Scholar]
  67. Prole, L. R., Clark, P. C., Klessen, R. S., Glover, S. C. O., & Pakmor, R. 2022b, MNRAS, 516, 2223 [NASA ADS] [CrossRef] [Google Scholar]
  68. Prole, L. R., Schauer, A. T. P., Clark, P. C., et al. 2023, MNRAS, 520, 2081 [NASA ADS] [CrossRef] [Google Scholar]
  69. Prole, L. R., Regan, J. A., Glover, S. C. O., et al. 2024, A&A, 685, A31 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  70. Quinlan, G. D., & Shapiro, S. L. 1987, ApJ, 321, 199 [NASA ADS] [CrossRef] [Google Scholar]
  71. Quinlan, G. D., & Shapiro, S. L. 1990, ApJ, 356, 483 [NASA ADS] [CrossRef] [Google Scholar]
  72. Rees, M. J. 1984, ARA&A, 22, 471 [Google Scholar]
  73. Regan, J. A., Visbal, E., Wise, J. H., et al. 2017, Nat. Astron., 1, 0075 [Google Scholar]
  74. Reinoso, B., Schleicher, D. R. G., Fellhauer, M., Klessen, R. S., & Boekholt, T. C. N. 2018, A&A, 614, A14 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  75. Reinoso, B., Schleicher, D. R. G., Fellhauer, M., Leigh, N. W. C., & Klessen, R. S. 2020, A&A, 639, A92 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  76. Reinoso, B., Klessen, R. S., Schleicher, D., Glover, S. C. O., & Solar, P. 2023, MNRAS, 521, 3553 [CrossRef] [Google Scholar]
  77. Riaz, R., Schleicher, D. R. G., Vanaverbeke, S., & Klessen, R. S. 2020, MNRAS, 494, 1647 [NASA ADS] [CrossRef] [Google Scholar]
  78. Sakurai, Y., Yoshida, N., Fujii, M. S., & Hirano, S. 2017, MNRAS, 472, 1677 [NASA ADS] [CrossRef] [Google Scholar]
  79. Schleicher, D. R. G., Spaans, M., & Glover, S. C. O. 2010, arXiv e-prints [arXiv: 1002.2850] [Google Scholar]
  80. Schleicher, D. R. G., Palla, F., Ferrara, A., Galli, D., & Latif, M. 2013, A&A, 558, A59 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  81. Schleicher, D. R. G., Reinoso, B., Latif, M., et al. 2022, MNRAS, 512, 6192 [NASA ADS] [CrossRef] [Google Scholar]
  82. Seifried, D., Banerjee, R., Klessen, R. S., Duffin, D., & Pudritz, R. E. 2011, MNRAS, 417, 1054 [NASA ADS] [CrossRef] [Google Scholar]
  83. Solar, P. A., Schleicher, D. R. G., Reinoso, B., & Klessen, R. S. 2022, Boletin de la Asociacion Argentina de Astronomia La Plata Argentina, 63, 277 [NASA ADS] [Google Scholar]
  84. Springel, V. 2005, MNRAS, 364, 1105 [Google Scholar]
  85. Springel, V. 2010, MNRAS, 401, 791 [Google Scholar]
  86. Srinivas, N., Krause, A., Kakade, S. M., & Seeger, M. W. 2012, IEEE Trans. Inform. Theor., 58, 3250 [CrossRef] [Google Scholar]
  87. Stahler, S. W., Palla, F., & Salpeter, E. E. 1986, ApJ, 302, 590 [NASA ADS] [CrossRef] [Google Scholar]
  88. Stecher, T. P., & Williams, D. A. 1967, ApJ, 149, L29+ [Google Scholar]
  89. Suazo, M., Prieto, J., Escala, A., & Schleicher, D. R. G. 2019, ApJ, 885, 127 [NASA ADS] [CrossRef] [Google Scholar]
  90. Tagawa, H., Haiman, Z., & Kocsis, B. 2020, ApJ, 892, 36 [Google Scholar]
  91. Tan, J. C., Beltrán, M. T., Caselli, P., et al. 2014, in Protostars and Planets VI, eds. H. Beuther, R. S. Klessen, C. P. Dullemond, & T. Henning (Tucson: University of Arizona Press), 149 [Google Scholar]
  92. Umeda, H., Hosokawa, T., Omukai, K., & Yoshida, N. 2016, ApJ, 830, L34 [Google Scholar]
  93. Vanaverbeke, S., Keppens, R., Poedts, S., & Boffin, H. 2009, Comp. Phys. Commun., 180, 1164 [NASA ADS] [CrossRef] [Google Scholar]
  94. Vergara, M. Z. C., Schleicher, D. R. G., Boekholt, T. C. N., et al. 2021, A&A, 649, A160 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  95. Vergara, M. C., Escala, A., Schleicher, D. R. G., & Reinoso, B. 2023, MNRAS, 522, 4224 [NASA ADS] [CrossRef] [Google Scholar]
  96. Wise, J. H., & Abel, T. 2008, ApJ, 685, 40 [CrossRef] [Google Scholar]
  97. Wise, J. H., Turk, M. J., & Abel, T. 2008, ApJ, 682, 745 [NASA ADS] [CrossRef] [Google Scholar]
  98. Woods, T. E., Agarwal, B., Bromm, V., et al. 2019, PASA, 36, e027 [Google Scholar]
  99. Zhou, N., & Wang, L. 2007, Genom. Proteom. Bioinform., 5, 242 [CrossRef] [Google Scholar]

All Tables

Table 1

Summary of the simulations employed here.

Table 2

Summary of test stage for the different models.

All Figures

thumbnail Fig. 1

Fraction f* of the mass of most massive object MMMO normalized by the total stellar mass M* shown as a function of the star formation efficiency for the simulations summarized in Table 1. The gray arrow indicates the direction of the time evolution. For those authors with more than 8 simulations, we display some representative examples.

In the text
thumbnail Fig. 2

Fraction ftot of the mass of the most massive object MMMO normalized by the total mass Mtot as a function of the star formation efficiency ϵ* for the simulations summarized in Table 1. The black dotted line corresponds to the identity function, indicating the scenario where all the stellar mass is concentrated within the central object, representing the maximum possible value for ftot. The gray arrow indicates the direction of the time evolution. For those authors with more than 8 simulations, we display some representative examples.

In the text
thumbnail Fig. 3

Fraction f* of the mass of the most massive object MMMO normalized by the total stellar mass M* as a function of the collision parameter fcoll1$f_{{\rm{coll}}}^{ - 1}$ for the simulations summarized in Table 1. The gray arrow indicates the direction of the time evolution. For those authors with more than 8 simulations, we display some representative examples.

In the text
thumbnail Fig. 4

Correlogram for the dimensionless parameters calculated in this work. The values shown correspond to Spearman’s correlation coefficient ρs, which assesses the monotonic relationship between the ranks of the paired data points.

In the text
thumbnail Fig. 5

Correlation plot for the average of the ƒ* (t) predictions of the test set by RF. The color palette represents the different authors of the simulations, and we include the RMSE for each case.

In the text
thumbnail Fig. 6

Examples of the dispersion of ƒ* in simulations of Solar et al. (in prep.) (2 runs, top panel) and Reinoso et al. (2023) (6 runs, bottom panel) with the same initial conditions but different seeds. The dashed line corresponds to the average between the simulations and the shading to the standard deviation at each point of є*.

In the text
thumbnail Fig. 7

Predicted values for ƒ* (t) by RF are depicted. The dashed black line represents the values of the simulation, the red dots indicate the average of the predictions, and the shaded area represents the standard deviation of predictions. We present random examples for all authors and for different ranges of є* (t).

In the text
thumbnail Fig. A.1

Correlation plots for the test set predictions. Results are shown for all candidates, we also perform a Gaussian kernel density estimation to highlight the regions with more data points.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.