The most massive stars in very young star clusters with a limited mass: Evidence favours significant self-regulation in the star formation processes

The stellar initial mass function (IMF) is commonly interpreted to be a scale-invariant probability density distribution function (PDF) such that many small clusters yield the same IMF as one massive cluster of the same combined number of stars. Observations of the galaxy-wide IMF challenge this as dwarf galaxies do not form as many massive stars as expected. This indicates a highly self-regulated star formation process in which stellar masses are not stochastically sampled from the IMF and are instead related to the environment of star formation. Here, the nature of star formation is studied using the relation between the most massive star born in a star cluster and its parental stellar cluster mass (the $m_{\rm max}$--$M_{\rm ecl}$ relation). This relation has been argued to be a statistical effect if stars are sampled randomly from the IMF. By comparing the tightness of the observed $m_{\rm max}$--$M_{\rm ecl}$ distribution with synthetic star clusters with stochastically sampled stellar masses, we find that the expected dispersion of the mock observations is much larger than the observed dispersion. Assuming that $m_{\rm max}$ and $M_{\rm ecl}$ uncertainties from the literature are correct, our test rejects the hypothesis that the IMF is a PDF at a more than $4.5\sigma$ confidence level. Alternatively, we provide a deterministic stellar mass sampling tool which reproduces the observed $m_{\rm max}$--$M_{\rm ecl}$ distribution and compares well with the luminosities of star-forming molecular clumps. In addition, we find that there is a significant flattening of the $m_{\rm max}$--$M_{\rm ecl}$ relation near $m_{\rm max}=13~M_\odot$. This may suggest strong feedback of stars more massive than about $13~M_\odot$ and/or the ejections of the most massive stars from young clusters in the mass range 63 to $400~M_\odot$ to be likely important physical processes in forming clusters.


Introduction
The nature of the discreteness of the stellar initial mass distribution function (IMF) is important to consider (Applebaum et al. 2020;Smith 2021).Different methods to sample stellar masses from an IMF have been discussed (Sormani et al. 2017;Hu et al. 2017;Hirai et al. 2021) and applied in an increasing number of recent studies of stellar systems with a low mass or a low star formation rate (SFR) and a high resolution of time or mass, starting to account for the evolution of individual massive stars (Gatto et al. 2017;Emerick et al. 2019;Lahén et al. 2020;Gutcke et al. 2021;Hislop et al. 2022) and even low-mass stars down to 0.08 M (where M is the solar mass; Hu 2019; Smith et al. 2021;Lahén et al. 2022).
It has been long known that simple stellar population (SSP) modelling codes, such as GALEV (Anders & Fritze-v. Alvensleben 2003) and Starburst99 (Vázquez & Leitherer 2005), neglect the discrete nature of the stellar mass distribution.This introduces a systematic offset in the colour prediction for star clusters that are young (<80 Myr) or have a low mass (<10 6 M ; Cerviño & Luridiana 2004).Ignoring the discreteness of the IMF significantly affects the inferred properties of low-mass systems, such as the evolution of star clusters (Wang & Jerabkova 2021;Dinnbier et al. 2022) or determinations of their age and mass, introducing errors in cluster ages as high as an order of magnitude (Piskunov et al. 2009).
In dynamical simulations of galaxies, an ensemble of stars -referred to as a 'star particle' -with a mass resolution of for example 10 5 M is often tracked instead of individual stars.In the simulation of single dwarf galaxies (and in recent years, the cosmological galaxy simulations, which are pushing to higher and higher resolutions to study ultra-faint galaxies), the number of stars in a galaxy or star particle becomes low enough that the integrated number of stars between two mass limits of the IMF (given by the considered time step and the stellar mass-lifetime relation, cf.Yan et al. 2019 their Fig. 3) can be less than one.If the IMF is still given as a continuous function, a fraction of a supernova explosion would result in such time steps in a simulation (Revaz et al. 2016;Applebaum et al. 2020).It has been demonstrated that a more realistic description of a stellar population, with a list of sampled stellar masses, alters the property and the evolution of the simulated galaxy compared to a model with a continuous energy injection due to the smooth IMF assumption, affecting the galactic SFR (Su et al. 2018), the strength of photoionisation (Smith 2021), galactic outflow (Smith et al. 2021), and chemical composition (Carigi & Hernandez 2008).
However, an important question that has not been resolved is how to sample stellar masses from a given IMF.There are two extreme sampling methods: stochastic sampling (also known as random sampling) and optimal sampling.Stochastic sampling is based on the hypothesis that the formation of every single star is purely an independent event and the mass of the star is randomly sampled from a probability density distribution function (PDF); that is, the IMF is considered to be a PDF.On the other end, optimal sampling (Kroupa et al. 2013;Schulz et al. 2015) assumes that the formation of all the stars (or at least all the massive stars that can be more accurately counted) in a star cluster is strongly correlated because of self-regulation of the star formation process, and therefore that a deterministic relationship exists between the mass of a star cluster and the mass of every single (massive) star within that cluster.
Theoretically, the sampling method relates to the fundamental astrophysics knowledge of the formation process of massive stars, which is not well understood.If the star formation process is sensitive to the initial conditions of the molecular cloud, such as angular momentum, shock wave behaviour, and so on, then stochastic sampling will be a reasonable approximation.For example, in the competitive accretion model (Bonnell et al. 1997(Bonnell et al. , 1998(Bonnell et al. , 2001)), the stellar mass is gathered during the starformation process as the parent cloud is randomly influenced by dynamical interactions.On the other hand, it is also possible that massive star formation is strongly regulated by feedback processes such as winds, outflows, radiation, accretion shocks, and the magnetic field of the parent cloud (Shu et al. 1987;Silk 1995;Adams & Fatuzzo 1996;Li & Nakamura 2006;Crutcher & Troland 2007) such that the random initial conditions are not as important.The stars formed in a single star formation event may then follow the IMF tightly with much more certainty than the random scenario.Another possibility is that a significant self-regulated formation of star clusters does not even require feedback regulation, as nature can automatically form the power-law distribution through scale-invariance, self-similarity, and iterative processes.For example, in preferential attachment phenomena and diffusion-limited aggregation, a large number of random processes (instead of a single stochastic event) determine the outcome, which becomes highly deterministic.The state-of-art star cluster formation simulation, STAR-FORGE, appears to support this view (Guszejnov et al. 2022).Despite the chaotic and complex physical processes involved, the simulated maximum stellar masses in embedded star clusters1 presented by these latter authors seem to have too little scatter to be explained by stochastic sampling schemes and to show strong evidence of a correlation between the embedded star cluster mass and the mass of the most massive star (m max −M ecl relation), which is consistent with optimal sampling.
A variety of observational clues suggesting a highly selfregulated star formation process instead of a statistical process were discussed by Kroupa (2015, Table 1 therein).A non-trivial correlation between m max and M ecl is evident through empirical (Larson 1982(Larson , 2003)), analytical (Elmegreen 2000), and numerical (Bonnell et al. 2003(Bonnell et al. , 2004) ) studies.The tight rela-tion is particularly emphasised by Weidner & Kroupa (2006), Weidner et al. (2010Weidner et al. ( , 2013) ) and reinforced by Stephens et al. (2017) and Yan et al. (2017), suggesting a significant selfregulation in the star formation process.André et al. (2016) find that stars form in molecular-cloud overdensities from inflows along filaments with remarkably similar cross sections of about 0.1 pc in radius, suggesting that star formation may be governed by relatively simple universal laws.A direct comparison of the fibre networks in B213 in Taurus and the NGC 1333 region in Perseus suggests that the differences between the classically distinguished isolated and clustered star-formation scenarios might originate from the same type of fibre with different densities.Furthermore, interactions between fibres may be responsible for the formation of massive cores and stars (Hacar et al. 2017).Recent observations also reveal regions with numerous low-mass correlated star formation events (CSFEs, also known as embedded clusters) that do not possess any massive star.For example, the whole southern part of the Orion southern cloud, L1641, despite being the origin of the formation of thousands of stars, has not formed a single massive star (see e.g.Hsu et al. 2012Hsu et al. , 2013;;Megeath et al. 2016).This is consistent with the m max −M ecl relation because the southern cloud has only been forming low-mass embedded clusters.These CSFEs have radii consistent with the filament cross sections (Marks & Kroupa 2012).In addition, González-Lópezlira et al. (2012) and Pflamm-Altenburg et al. ( 2013) study the cluster mass distribution at different galactic radii of the flocculent galaxy M 33 and conclude that their mass is not randomly distributed.
Motivated by the above observations, Kroupa et al. (2013, their Sect. 2.2) suggest that the optimal sampling hypothesis better describes nature.In particular, optimal sampling is crucial to understand the top-light IMF (lack of massive stars compared to canonical IMF) of dwarf galaxies, which is evident from their Hα-to-FUV luminosity ratio (Pflamm-Altenburg et al. 2007, 2009;Lee et al. 2009), the mass-metallicity relation of galaxies (Köppen et al. 2007), and their α-element enhancement in comparison to the Milky Way (MW, Recchi et al. 2009;Yan et al. 2020;Minelli et al. 2021;Mucciarelli et al. 2021).With optimal sampling, the dwarf galaxies with a low SFR do not form massive stars (Yan et al. 2017, their Fig. 7).This is not the case in stochastic sampling, where the formation of massive stars happens randomly throughout the formation history of a galaxy such that the ratio between all the low-mass and massive stars ever formed (and therefore the chemical evolution) does not depend on the mass of the galaxy.
Until now, the most common way of generating stellar masses for a star system in astrophysical simulations has been stochastic generation (Elmegreen 2006;da Silva et al. 2012).After all, the observed star clusters with similar masses do not appear to repeat themselves but appear to have random differences in their stellar mass distributions.However, it is unclear as to whether these differences are intrinsically due to the random nature of star formation or if they come from hidden parameters such as the metallicity (Marks et al. 2012) and/or the angular momentum of the parental molecular cloud through the ejection of massive stars from their birth clusters (Oh et al. 2015;Oh & Kroupa 2018).The observation that most massive stars have an unresolved companion star with an apparently random mass ratio of the two stars (Moe & Di Stefano 2017) also introduces the complication of binary stellar evolution.Furthermore, the total stellar mass of an embedded star cluster is uncertain, because of extinction and other measurement uncertainties.Therefore, apparent randomness does not necessarily suggest a A151, page 2 of 16 random star formation process, and a more detailed investigation is required.
The present work provides a new and IMF-independent statistical test as to the degree of self-regulation in the star formation process.The observational dataset that we use to generate the m max −M ecl relation is given in Sect. 2. Our statistical test method and IMF assumptions are described in Sect.3. Statistical tests for random and optimal sampling methods are performed in Sects.4 and 5, respectively.In addition, we find a large M ecl variance around m max = 13 M for the observed star clusters, and this is discussed in Sect.6.We further discuss the uncertainties of this study in Sect.7, discuss previous studies and the implementation of different sampling methods in Sect.8, and give conclusions in Sect.9.

Data selection
We consider the observed values of 'embedded' star cluster masses, M ecl , and the mass of the most massive star in that star cluster, m max , for star clusters with an age of less than 5 Myr.
Compared to the dataset in WKP13 and Yan et al. (2017), we apply an additional data selection on the age of a star cluster and the number of stars in a cluster.The age limit reduces the possibility of having an old cluster in the sample that is highly evolved dynamically or for which the most massive star is already dead in a supernova explosion.The most massive stars will disappear within the first few million years of cluster evolution through dynamical ejections (Oh & Kroupa 2016) and stellar evolution, while massive stars also merge (Oh & Kroupa 2018).The shortest lifetime of a star is no less than about two to three million years (Yusof et al. 2013;Yan et al. 2019, their Fig. 3, Higgins et al. 2022, their Fig. 6).Although WKP13 culled data only for clusters with an age estimation smaller than 5 Myr, some of the age estimations have a large uncertainty and may have age as old as 9 Myr within the standard deviation of the age estimation.To reduce uncertainties introduced by older star clusters, we exclude the tail of the age distribution from the above data reservoir, excluding all the star clusters with an upper limit on the age estimation of above 5 Myr.The young ages of the clusters mean that their expansion through gas expulsion is minimised (to 5 pc for a velocity dispersion of 1 km s −1 ) such that the loss of stars is small (Kroupa et al. 2001;Kroupa & Bouvier 2003).However, ejections and mergers of massive stars are likely (Oh & Kroupa 2012, 2018;Kroupa et al. 2018).In addition, clusters with less than 20 detected members (or if no information is provided) are excluded.Clusters with fewer members are strongly affected by dynamical evaporation (cf.Lada & Lada 2003, their Sect. 2.1) and can bias our sample (see Sect. 7.2).In addition, the m max −M ecl relation for low-mass clusters is not sensitive enough to test the sampling method given the large mass measurement uncertainties.For massive clusters, a low number of single star detections indicates different mass-estimation methods (e.g.use of the dynamical cluster mass instead of summing up the masses of individual stars; see Sect.7.2) and therefore different mass-estimation uncertainties compared to most star clusters.Therefore, we uniformly exclude such clusters from our dataset.
After the selection, a total of 100 clusters remain; a list of them is provided in Table A.1 and plotted in Fig. 1.The average observational uncertainties in the logarithmic scale for M ecl and m max are +0.289,−0.313 dex and +0.143, −0.180 dex, respectively.The accuracy of these uncertainties is discussed in Sect.7.4.Scaling relations are applied to calculate the precluster molecular cloud mass and core density as well as the bolometric luminosity of the most massive star, which can be more easily compared with observations (e.g.Urquhart et al. 2022).The results are shown in Sect.8.3.
The errors and biases of the compiled dataset affect our results as discussed in Sect.7. In addition, we acknowledge that the stellar mass upper limit is highly uncertain.There are studies suggesting that stars with mass >300 M (Crowther et al. 2010;Schneider et al. 2018;Higgins et al. 2022) might be the result of mergers of massive stars (Banerjee et al. 2012a;Oh & Kroupa 2018).However, the mass estimation of massive stars is not the topic of the present study and our conclusion is not affected if we adopt a higher stellar mass estimation.For the present tests, we apply the compiled data along with the measurement errors given by the literature sources.A homogeneous set of observations with higher completeness (the current dataset includes less than about 2% of all the star clusters in the MW; cf.Weidner et al. 2010) and a more reliable mass uncertainty analysis will, in the future, greatly improve the robustness of our study.

Statistical test
As pointed out in WKP13, the scatter of the observed m max −M ecl relation appears to be consistent with no intrinsic scatter, which suggests a high degree of self-regulation.To test if nature is indeed significantly more regulated than the stochastic sampling hypothesis, we compare the scatter of the observed relation with that of the synthesised m max −M ecl relation generated by stochastic sampling.
With forward modelling, a large number of star clusters and their mock observational errors are stochastically generated to form the expected m max −M ecl density distribution.The mock clusters are set to have a cluster mass distribution similar to the observational dataset, which has a relatively uniform distribution of mass on the logarithmic scale.Therefore, we generate 10 6 star clusters containing N stars, where N is randomly selected from 20 to 10 6 with a uniform distribution on the logarithmic scale (we note that the star number and mass distribution are similar; see Sect.2.2 of WKP13).The observed and synthesised cluster mass distribution are different from the embedded cluster mass distribution in nature with a power index of −2 (Lada & Lada 2003) due to an observational bias towards more massive star clusters.
Given the number of stars in each star cluster, the masses of stars are sampled stochastically (e.g.Kroupa et al. 2013) from a PDF, that is, the assumed IMF.The model results depend on both the sampling method and the IMF.To test the sampling method in such a way that the conclusion does not depend on the assumed shape of the IMF, in the present work we apply two free parameters in the IMF: the high-mass IMF slope, α 3 , and the stellar mass upper limit, m up .We search for the two parameters that best reproduce the observed m max −M ecl relation (Sect.3.2) and compare the scatter of the mock star clusters and the data on the m max −M ecl plot for the first time.A different IMF prescription, such as a lognormal IMF for low-mass stars (Chabrier 2003), would introduce a systematic change in the m max −M ecl relation similar to the effect of varying the IMF parameters (α 3 and m up ) and would not affect our conclusions.

A151, page 3 of 16
To compare the synthetic star clusters with observations, mock measurement errors of m max and M ecl are added to the sampled clusters for the first time.The two mass errors are correlated (see discussion in Sect.7.4) but, as a first step, we assume that they are independent and that they follow one-sided normal distributions with standard deviations being the average observational uncertainties from our complied dataset given in Sect. 2. Doing so results in rare cases with m max > M ecl in mock observations.Follow-up studies with a uniformly analysed observational dataset allowing accurate forward modelling of the errors are required to resolve this issue.
As the observation and mock observation of synthetic star clusters are now directly comparable, a two-sample test can be performed on the two-dimensional m max −M ecl relation.To do this, we first measure the variance of m max values for each M ecl bin for the simulated star clusters.As there is no limit on the number of mock star clusters, the bin size can be arbitrarily small.
An observed star cluster with a known M ecl can then be compared with the mean and the variance of all the mock star clusters in the mass bin that includes the value M ecl .After doing this for all the observed star clusters, we can calculate the likelihood of observing a dataset as extreme as the one we compile from the literature.Finally, since the synthetic m max −M ecl relation depends on the sampling method, the confidence level with which we can reject the applied sampling hypothesis can be given.

Initial mass function models
Different IMF assumptions are considered in order to avoid the influence of the IMF model when testing the validity of the sampling methods.The multi-part power-law IMF can be written in general as: (1) The canonical Kroupa (2001) IMF is a two-part power law with, respectively, m low , m turn , m turn 2 = 0.08, 0.5, 3, and m up = 150 M .Possible variations of α 1 and α 2 are discussed in Sect.7.2.The normalisation parameters, k 1 , k 2 , and k 3 are adjusted such that the IMF is a continuous function.
The slope of the high-mass IMF, α 3 , and the physical stellar mass upper limit, m up (=m max * in Weidner & Kroupa 2006;Weidner et al. 2010), affect the m max −M ecl relation significantly and both have large uncertainties.Different choices of these parameters are listed in Table 1.We note that α 3 = 2.68 is similar to the solar-neighbourhood IMF measured in Scalo (1986) and Kroupa et al. (1993).This is steeper than the IMF in star clusters (cf.Bastian et al. 2010), which is expected for a galaxy-wide IMF made by many star clusters (Kroupa & Weidner 2003).The high-mass IMF slope for the field stars is not accurate due to the large uncertainty on the MW star formation history.A recent study shows that the solar-neighbourhood IMF for massive stars can be consistent with the IMF in star clusters (Mor et al. 2019).For initial stellar populations in star clusters, a flatter slope is favoured and α 3 = 2.3 ± 0.7 as is shown by Kroupa (2001Kroupa ( , 2002) ) and Bastian et al. (2010).On the other hand, the estimation of m up depends on the stellar modelling of massive stars and is still Notes.The fiducial a23m150 model adopts the canonical IMF parameters.a23m100 is the best fit IMF model working with stochastic sampling with m up being a free parameter.a268m130 is the best fit IMF model working with stochastic sampling with both α 3 and m up being free parameters.We note that the best-fit m up value listed here is only a reflection of adopted literature mass estimations while higher mass estimations of extremely massive stars have been suggested (Crowther et al. 2010;Schneider et al. 2018;Higgins et al. 2022) but may be mergers (Banerjee et al. 2012a;Oh & Kroupa 2018).
under debate (see Sect. 7.4.4).We vary m up to find the best fit with the data without a prior probability distribution.

Tests of stochastic sampling
4.1.The mock m max -M ecl relation from stochastic sampling To discuss the degree of self-regulation in the star formation processes independent of the IMF shape, we not only test with a fiducial IMF model (a23m150 model in Table 1) but also allow freely variable IMF parameters and test parameter values that can best reproduce the observed m max −M ecl relation (a23m100 and a268m130 models in Table 1 shown in Sects.4.2.2 and 4.2.3).
Figure 1 shows the distribution of 10 6 randomly-sampled embedded clusters on the m max −M ecl relation after applying our fiducial IMF model parameters.The darkness of each region is proportional to the number of clusters in a m max −M ecl bin divided by the total number of clusters in the same M ecl bin; that is, the darkness denotes the distribution of m max for a given M ecl .The mean and the standard deviation of m max for each M ecl bin are also given with the red solid and the blue dashed curves, respectively.Here, the single-sided standard deviations are calculated and shown, assuming that data above and below the mean m max have different variances, but the upper and lower variances are nearly identical.
The observed star clusters more massive than about 100 M are systematically below the expected distribution, as is pointed out in Weidner et al. (2010, their Sect. 3.2).A top-light IMF (model a268m130) can result in better agreement, which we discuss in Sect.4.2.The mass uncertainties of the observations shown in Fig. 1 are large while the distribution of the measurement values is tight, indicating a small intrinsic uncertainty on the star formation process (disfavouring stochastic sampling), an overestimation of the mass uncertainty of observed clusters, and/or a selection bias of the observational dataset.
Figure 2 shows the result after a random observational error is added to each synthetic star cluster.The density distribution shows a larger scatter than the data.The amplitude of this scatter is determined by the sampling method and observational uncertainty.Changing the IMF parameters does not have a significant influence on the scatter but only influences the position and the orientation of the density distribution which is shown in the following section.We note that there are mock clusters with M ecl < m max (points above the solid line in Fig. 2) simply because we assume for simplicity that the observational uncertainty of M ecl and m max are independent, which is not accurate,  1), resulting in the density map.The darkness of the region is proportional to the number of the synthetic clusters in that region divided by the total number of synthetic clusters in the same M ecl bin.The average m max and standard deviation of m max for each M ecl bin are plotted as the red solid and the blue dashed curves, respectively.The solid line indicates the M ecl = m max limit and the horizontal dashed line indicates the m up limit of 150 M (see Sect. 7.4.4).
especially for low-mass clusters.However, given the statistical test method described in the following section, the error for the low-mass mock clusters does not significantly affect our conclusions as there are only three clusters in the dataset with M ecl < 10 M .

The p-value
Assuming the null hypothesis that nature behaves in exactly the same way as described by stochastic sampling and that the IMF applied here is correct, here we calculate the likelihood of observing a dataset as extreme as the one we compiled from the literature (i.e. the 100 star clusters listed in Table A.1).We choose a test statistic, n, to be the number of data points in the region above the upper blue dashed curve (hereafter the upper region) or the number of points below the lower blue dashed curve (hereafter the lower region).The expected value of n is 100 × 0.159 = 15.9 points, where 0.159 is the single-sided probability above the standard deviation.
If there are n ≤ 15 points in the upper or lower region, we calculate the likelihood of having ≤n points in that region by applying: where N = 100 is the total number of data points.The summed terms are the probabilities for each data point and their combination formula while the summation from 0 to n includes all cases as extreme as or more extreme than the observed case (a summation from 0 to N results in 1).On the other hand, if there are n ≥ 16 points in the upper or lower region, we calculate the likelihood of having ≥n points in that region Then, the joint likelihood or the p-value for a result as equally extreme as or more extreme than the compiled dataset is given by In Fig. 2 there are zero data points in the upper region and 34 in the lower region.The resulting p-value is P = 2 × 10 −13 , that is, it is extremely unlikely that the real distribution is given by the synthetic clusters.The corresponding rejection confidence of the stochastic sampling hypothesis is 7σ (calculated in Sect.4.3).

Model a23m100: the canonical IMF slope with
As mentioned in Sect.3, the uncertainty on the assumed IMF shape needs to be taken into account.We first search for the stellar upper mass limit, m up , in Eq. ( 1), which improves the fit for M ecl > 10 4 M as shown in Fig. 3.The best-fit value (with mass changing in steps of 10 M ) is m up = 100 M .There are 3 and 25 data points in the upper and lower regions, respectively, and the p-value increases to about 5 × 10 −7 .This corresponds to a 4.7σ confidence level to reject the stochastic sampling hypothesis (see Sect. 4.3).

Model a268m130: non-canonical IMF
We also tested steeper high-mass IMF slopes.Here we allow m up and α 3 to vary using a grid with a 10 M step for different m up and a 0. Fig. 3. Same as Fig. 2 but for the a23m100 IMF model (Table 1).The additional horizontal dotted line denotes m up = 100 M .
The p-value given by Eq. ( 4) for this model is still less than 5.7 × 10 −6 .As the model fits the data nicely and already achieves the maximum equal number (six) of data points in the upper and lower regions, we can safely conclude that there is no room to further increase the p-value by adjusting IMF parameters for the applied dataset and test statistics.The remaining factor impeding a higher p-value is that the intrinsic variation of the stochastically sampled distribution is too large.

Confidence level of rejecting stochastic sampling
Here we estimate the confidence level of rejecting the stochastic sampling hypothesis.According to the Bayesian theory 2 , two other estimations are needed: the prior probability of the null hypothesis, Pr, and the probability of observing a dataset less extreme than the real observation if the null hypothesis is not true, R. Together with the p-value estimated above, that is, the likelihood of observing a dataset as extreme as the real observation if the null hypothesis is true, P, we can obtain the probability of the null hypothesis being true given the observations, or in other words, the probability of wrongly rejecting the null hypothesis, which is also known as the type I error, A: with all parameters ranging from 0 to 1.It is hard to estimate whether nature is more influenced by random processes or feedback self-regulation and the issue is still highly debated.We therefore assign both cases an equal  prior probability (i.e.Pr = 0.5).We also assume that half of the observation would be as extreme as the compiled dataset if some other sampling method instead of the stochastic sampling is true (i.e.R = 0.5), because we do not have any information on whether or not the compiled dataset is more extreme than usual.
With these values and P 1, we have with the p-values given by the a268m130 model in Sect.4.2 that best reproduces the observed m max −M ecl relation.The confidence level of rejecting the hypothesis is 1 − A. Given the cumulative normal distribution, this corresponds to a 4.5σ confidence level to reject the stochastic sampling hypothesis.If, on the other hand, the IMF is forced to have the canonical shape with m up ≥ 150 M , then the rejection is at more than 7σ confidence (Sect.4.2.1, cf.Sect.7.4.4 for the mass estimation of very massive stars).

Optimal sampling
Optimal sampling is a deterministic sampling method that generates stellar masses that populate the IMF with no Poisson noise.The method was introduced by Kroupa et al. (2013) to account for the small scatter of the observations.We refer the readers to Schulz et al. (2015) and Yan et al. (2017) for a further description of optimal sampling and the publicly available Gal-IMF code3 that optimally populates clusters and galaxies with stars.
Optimal sampling assumes that the formation of stellar populations is perfectly self-regulated and, therefore that a deterministic relationship exists between the mass of a star cluster and the mass of every single star within that cluster.To sample the stellar masses, the IMF is divided into N + 1 sections at mass limits of x 1 to x N (where x i > x i+1 ).These sections fulfil the mass conservation of the embedded star cluster: where x low is the lowest possible stellar mass, 0.08 M .Each section gives exactly one star (therefore, the total number of stars is N): In other words, if the integrated number of stars between two mass limits equals one, then the cluster must form exactly one star between these two mass limits with the optimally sampled stellar mass given by: In addition, the value of where the theoretical mass limit of the most massive star, x up , is set to 150 M .This set of equations can be solved analytically assuming a two-part power-law IMF (Kroupa 2001) as is done in Yan et al. (2017).
Applying the publicly available GalIMF code, here we sample stars from the canonical IMF 4 , that is, Eq. ( 1) with α 1 = 1.3 and α 2 = α 3 = 2.3 (see Sect. 7.2 for IMF variation).The optimally sampled result is shown in Fig. 5, where the m max −M ecl relation is deterministic.
We perform the same test as we applied for stochastic sampling in Sect. 4.There are 12 and 24 data points in the upper and lower regions, respectively.The resulting p-value is 0.004, which is three orders of magnitude higher than the result of the stochastic sampling scenario.The total number of observations outside the standard deviation region (12 + 24 = 36) is only 13% larger than the expected number of data in these regions for 100 data points (100 × 31.8%= 32).Adjusting the assumed IMF parameters (as is done for stochastic sampling in Sect.4.2) would allow us to shift the synthetic m max −M ecl relation and reach an equal expected number of points in the upper and lower regions and improve the fit.From this, we conclude that optimal sampling is consistent with the observed m max −M ecl relation and the star formation process appears to have a low level of randomness.

The trend of the data
We visualise the trend of the observational data distribution (Fig. 1) by measuring the mean values of nearby points in both 4 See 'example_star_cluster_IMF.py' in the GalIMF package.The code is also capable of optimally sampling stars from all the star clusters in a galaxy formed within a correlated 10 Myr star formation epoch (Yan et al. 2017) by implementing an empirical embedded cluster mass distribution, as is demonstrated by 'example_galaxy_wide_IMF.py' in the GalIMF package. .Same as Fig. 2 but with the optimally sampled clusters (the smooth black solid curve) with no intrinsic m max −M ecl scatter.By assuming a stellar population in an embedded cluster of stellar mass M ecl in an optimal draw from the canonical IMF, the observed distribution is obtained naturally once the observational uncertainties are taken into account.This is not the case for a randomly drawn canonical IMF (Fig. 2). Figure 6 shows a feature above the grey line, that may be related to the stellar and star cluster formation physics.The flatter part of the trend around m max = 13 M indicates that the formation of more massive stars may be inhibited at this mass, requiring a more massive star cluster to make the formation possible.

The scatter of the data
The standard deviation, σ 10 , of the nearest ten data points for m max and M ecl (see Sect. 6.1) is plotted for a given m max in Fig. 7.In addition, the average variances of 10 4 realisations of mock observations of 100 star clusters generated with the optimal sampling hypothesis (as described in Sect.5) are shown as solid curves.
The M ecl values of the grouped data points show increased variance at m max ≈ 13 M and 100 M .The increase at about 100 M indicates a possible physical upper limit on stellar mass, m up .The residual of the observation and the optimal sampling model is shown in the lower panel of Fig. 7.The signal-to-noise ratio (S/N 5 ) of the M ecl residual peak at m max ≈ 13 M is 13.
The large m max scatter above 10 M may be due to the photodissociation radiation produced by these massive stars, such that accretion and growth to larger masses become difficult (Yorke & Sonnhalter 2002;Zinnecker & Yorke 2007).For example, Bonnell et al. (1998) suggest a collisional build-up mechanism for forming massive stars (>10 M ) that requires dense initial conditions that only exist in sufficiently massive star clusters.Thus, these authors predict that the different mechanisms of forming stars may display some structure at ≈10 M .It is also possible that stellar-dynamical encounters eject the most massive stars for embedded clusters with masses in the range between 63 M and 400 M (cf.Wang et al. 2019).Indeed, Kroupa et al. (2018) find that the star clusters that are still building up and reach this mass range start forming massive stars but also eject them out of the cluster on a timescale of less than 1 Myr, resulting in a zigzag evolution track towards the upper right in the m max −M ecl relation, possibly similar to Fig. 6.Both scenarios suggest significant self-regulation in the star formation process. 5 Here defined as and ABS[x] calculate the maximum, the average, and the absolute value of x, respectively.

Uncertainty analyses
Although the test in Sect. 4 rejects the stochastic sampling hypothesis with a 4.5σ confidence level (allowing the IMF to deviate from its canonical shape) and at more than 7σ for the canonical IMF shape, any conclusion we can make from these findings depends on the observational uncertainty estimations and data selection bias.This section discusses the effects that may compromise the above rejection.
It is worth noting that any uncertainty or hidden variable that is not accounted for (thus resulting in an underestimated mass measurement uncertainty) increases the scatter of the data points, mimicking stochasticity, which in truth is not present.Only an overestimated uncertainty may compromise the rejection of the stochastic sampling hypothesis.

Selection bias
It is possible that star clusters with extreme properties, if they do exist, are pre-excluded from the sample; for example, single isolated stars or clusters consisting of only low-mass stars.This potential observational bias would make the resulting m max −M ecl relation tighter than it really is, which mimics a more regulated star formation process.Even if such clusters are not intentionally excluded, the freak star clusters can be more challenging to observe.A star cluster consisting of only low-mass stars would have a lower luminosity than a normal cluster with the same mass.On the other hand, if a star cluster has an m max much higher than the value given by the average m max −M ecl relation, by chance, it is unlikely that the second most massive star in this cluster has a similar mass as its most massive star.This means that, if the most massive star is missed or incorrectly measured in the observation due to, for example, dynamical ejection, stellar evolution, or extinction, the extreme m max value will also be missed.For normal star clusters on the other hand, even if the most massive star is incorrectly estimated or missed, both sampling methods suggests a few stars with lower but similar masses, reducing the measurement error of m max .
However, it is likely that freak star clusters do not exist.Stephens et al. (2017) select, in particular, the seven most likely isolated massive stars in the entire Large Magellanic Cloud, each being distant from any known astronomical sources, and followed them up with deeper Hubble Space Telescope observations.In all these seven cases, a large young stellar population is detected in the immediate environment that is distributed in a clustered fashion such that the m max −M ecl relation is obeyed.In addition, dynamical ejection (Gvaramadze et al. 2011a(Gvaramadze et al. ,b, 2012;;Banerjee et al. 2012a,b;Oh & Kroupa 2016), including the twostep ejection process (Pflamm-Altenburg & Kroupa 2010), is known to create apparently isolated massive stars.Therefore, it is likely that all allegedly isolated massive stars are associated with normal star clusters with insufficient observations and should be excluded from our analyses.The stellar associations observed so far with a small number of stars are all in agreement with the non-trivial m max −M ecl relation (Kirk & Myers 2012).

Bias from the IMF assumption
For many clusters in our sample, M ecl is estimated by extrapolating the mass of the observed stars assuming a certain IMF.Considering the extreme case where only the mass of the most massive star is measured for star clusters, then the cluster mass would be calculated based only on m max which results in a deterministic m max −M ecl relation without any scatter.We try to A151, page 8 of 16 minimise this effect by excluding clusters with a small number of star detections, but the effect cannot be fully mitigated.For example, the mass of NGC 6611 is estimated based on stars between 6 and 12 M , assuming they account for a certain fraction of the cluster mass derived with a given IMF (Wolff et al. 2007, their Sect. 5).The bias introduced by the IMF assumption depends on the completeness and the mass range of the observed stars, the applied IMF uncertainty, and whether additional M ecl estimations with independent methods (dynamical estimation or from total luminosity) are taken into account.A more homogeneous study would help to better constrain and model this bias.
On the other hand, the shape of the subsolar IMF depends on the metallicity of the stars (Kroupa 2002;Geha et al. 2013;Gennaro et al. 2018;Martín-Navarro et al. 2019;Smith 2020;Yan et al. 2020;Poci et al. 2022;Li et al. 2023).As the observed star clusters do not have the same metallicity, this introduces additional scatter to the expected m max −M ecl relation, favouring the optimal sampling hypothesis.

Contamination
Field star contamination is inevitable.Studies employing dynamical and chemical tagging will further reduce the contamination in upcoming surveys.

Uncertainties from Kirk & Myers
Star clusters with more than 20 detected members (Kirk & Myers 2011, their Table 1) are adopted.The uncertainty on M ecl from Kirk & Myers (2011) is set to be the same as the uncertainty on m max (which is roughly estimated to be ±50% by Kirk and Myers).Kirk & Myers (2012) find that massive stars can be better described if they assume an older cluster age (2 Myr instead of 1 Myr).This suggests that stars with different masses in the same star cluster may have different average ages and a single age assumption may introduce bias to the mass estimation of these pre-main sequence stars; see also STARFORGE simulations where more massive stars in a star cluster form at a later time (Guszejnov et al. 2022).In addition, the statistical uncertainty on individual stars due to different extinction estimation errors and multiplicity are neglected.An extinction map based on stellar reddening is provided in Kirk & Myers (2011) but the related uncertainty is unknown.A potential multiplicity of the stars could lead to a 20% lower m max and a 60% higher M ecl than estimated if every star in a cluster is an unresolved equal-mass binary.Furthermore, it is possible that identified star cluster members are either contaminated by field stars or incomplete because of dynamical evolution and that some stars have not formed by the time of observation.Stephens et al. (2017, their Table 6) are included.We adopt the mean cluster mass calculated with cluster ages of 1 and 2.5 Myr.The uncertainties on M ecl consider only the rough age uncertainty on the star cluster when fitting the cluster isochrone.The statistical and/or systematic uncertainties on extinction, distance, the grid of stellar models, and so on are not taken into account.Therefore, this uncertainty can be considered as a lower limit.The adopted m max uncertainty of about 40% is estimated due to a large number of uncertain effects from the spectral type model, multiplicity, extinction, and stellar age, none of which have been well estimated.

Uncertainties from WKP13
A large number of star clusters are compiled in WKP13 (their Table A.1).We select clusters with younger ages and a large number of detected stars, as described in Sect. 2. According to WKP13, the uncertainty on m max is estimated assuming a half spectral subclass uncertainty on the most massive star, neglecting potential multiplicity.A physical upper stellar mass limit of 150 M is implemented in WKP13.More massive stars might be unresolved multiple stars or stars formed through dynamical mergers (Banerjee et al. 2012a;Oh & Kroupa 2018).This upper mass limit would not affect our conclusion, as discussed in the following section.
The uncertainty on M ecl is mostly constituted by the assumption made in WKP13 that all stars could be unresolved binaries and that 50% of the stars may be misidentified as cluster members, suggesting a mass uncertainty of ≈±0.3 dex.This leads to a much higher average uncertainty on M ecl than m max .In reality, binaries would only lead to a 64% to 68% increase in the estimated system mass for stars between about 0.4 and 55 M due to the non-linear relation between stellar mass and luminosity, with no increase for extremely massive stars due to the linear relation between stellar mass and luminosity (according to Eq. ( 12) below); that is, a less than 68% increase in M ecl instead of 100% if all observed stars (mostly massive stars are observed) are unresolved equal-mass binaries.The effect of field star contamination on M ecl may also be lower than previously estimated.Massive stars contribute a large fraction of the cluster mass and are less likely to be missed or contaminated than low-mass stars.
On the other hand, other sources of uncertainty, such as IMF variation, variable extinction, stellar variability, gas expulsion, dynamical evolution, and stellar models, are not fully taken into account and the uncertainties in WKP13 are given as a lower limit.Therefore, it is possible that the adopted uncertainty is realistic.

The masses of very massive stars
The possible mass of the most massive star is still under debate, ranging from 100 to about 300 M .Weidner & Kroupa (2004), Figer (2005), Oey & Clarke (2005), and Koen (2006) suggest an upper limit to the initial masses of stars of about 150 M (the canonical value), while Crowther et al. (2010Crowther et al. ( , 2016) ) argue that the limit may be higher given the large systematic uncertainties in the stellar models of the massive stars.Other than the error from extinction and distance measurements, massive stars show strong mass loss and a thick atmosphere, which are difficult to model.Stellar simulations give very different evolution tracks when different amounts of overshooting, rotation, mass loss rate, metallicity, and binary evolution are adopted.The uncertainties are larger for more massive stars (Martins 2015).For example, the evolution tracks of Ekström et al. (2012) failed to reproduce the luminosity and temperature of the stars in the Arches cluster.Yusof et al. (2013) note that, during the advanced phases of their evolution, the very massive stars may have similar luminosities to stars with initial masses of below 120 M , making these stars difficult to distinguish.This leads to the possibility of underestimating the m max of the Arches cluster.Crowther et al. (2010Crowther et al. ( , 2016) ) revisited younger star clusters considering the line-blanketed model, and give a higher mass estimation for A151, page 9 of 16 the stars than WKP13, doubling the m max value in the case of R136 in 30 Doradus.Applying different stellar evolution models, the mass of R136a1 (i.e. the most massive star in R136) estimated by Brands et al. (2022) greatly exceeds the estimation by Massey & Hunter (1998).On the other hand, the crowded cluster R136 suffers from limited photometric resolution and the very massive stars may turn out to be unresolved multi-stellar systems, as is the case in Pismis 24-1 (Maíz Apellániz et al. 2007).Severe crowding within the core of R136 leads to larger measurement uncertainties and reduces the mass estimations (Rubio-Díez et al. 2017;Kalari et al. 2022).Indeed, stars less massive than about 20 M are not detected in the central 0.15 pc radius of R136 due to crowding.It is likely that more Wolf-Rayet stars -for example, with 50 M or 100 M -exist than those that have been identified and that these stars contribute to the luminosity of unresolved stellar systems.With a linear correlation between mass and luminosity for the most massive stars, unresolved systems can significantly increase the stellar mass measurements.Even if the stars are indeed single stars, the strong stellar wind and highly uncertain mass-loss rate of very massive stars (Higgins et al. 2022) make it impossible to distinguish initial stellar masses above 200 M on the Hertzsprung-Russell diagram.The wind also reduces the mass of very massive stars to about 150 M within 1.6 Myr.Therefore, it is likely that a few stars with current masses exceeding the limit of 150 M but with ages similar to or older than 1.6 Myr (e.g.R136a1 as discussed in Crowther et al. 2010) form through mergers of massive binaries (Banerjee et al. 2012a,b;Oh & Kroupa 2018).Given these considerations, the initial masses of very massive stars are still uncertain.
Nevertheless, a different mass estimation of very massive stars would not significantly affect the present study because the scatter of the m max −M ecl relation is mainly determined by star clusters with masses below 8000 M (including 90% of the clusters).The discussion of whether or not the birth masses of the most massive stars can be larger than the canonical 150 M value is relevant for massive star-burst clusters, such as R136, within which stellar mergers are common, but such clusters are very rare.Therefore, excluding these few data points would affect the best-fit m up value (e.g.Sect.4.2.3) but not our conclusions.

Correlations between mass estimation errors
The two mass errors are not independent and there are three types of error correlations in general.
-Systematic stellar mass error: The cluster mass is often calculated based on the observed stellar mass, which is extrapolated to the full stellar mass range with an assumed IMF.In this case, stellar model, cluster age, distance, and average dust extinction estimation errors would affect the estimation of both stellar and star-cluster masses with a positive correlation.
-Statistic stellar mass error: random measurement error for m max contributes to the M ecl error.For example, very massive stars have large uncertainties on their evolution and observational parameters due to strong mass loss and probable binary evolution (Zhang et al. 2005).However, the error correlation is insignificant when the m max /M ecl ratio is low, which is the case for the majority of the stochastically sampled star clusters.Only about 10% of the star clusters (stochastically sampled from the canonical IMF) have m max /M ecl > 0.3 and about 4% have m max /M ecl > 0.5.
-Unresolved multi-stellar system: If a multi-stellar system is unresolved, the mass estimation of the primary star is higher than the correct value while the mass estimation of the star cluster is lower than the correct value, and therefore the errors are anticorrelated.
The values of different types of errors relate differently and non-linearly to the mass estimations of stars and clusters.Ideally, one simulates realistic statistics and systematic uncertainties originating from the stellar model, age estimation, distance, extinction, multiplicity, and so on with detailed forward modelling.However, there are practical difficulties in doing so.Firstly, the uncertainty analyses in the literature are not comprehensive, that is, we do not have detailed information about different types of errors.Usually, only a roughly estimated total error is given, if at all.Secondly, the measurements are adopted from different studies applying different methods of mass estimation with different uncertainties, such that there is no uniform way to simulate mock uncertainties that can represent the dataset.Thirdly, the observational uncertainty on freak star clusters (Sect.7.1) cannot be estimated because these latter are not observed and only exist in the mock dataset.

Non-Gaussian mass error distribution
We test the additional scenario whereby mock mass measurement errors follow a truncated normal distribution clipped to the range 0 to 2σ.Again, the upper and lower variances are different (see Sect. 2), and therefore the upper and lower clip values are calculated separately.The change reduces the scatter of the synthetic m max −M ecl relation and increases the p-value of our two-sample test.An equal number of data points in the upper and lower regions can increase to seven at most, with slightly adjusted IMF parameters.The resulting confidence level for rejecting the stochastic sampling hypothesis is reduced to 4σ for the case allowing m up and α 3 to deviate from the canonical IMF, which is still significant.Considering that the M ecl uncertainties are not well-estimated in much of the adopted literature, we explore what would be the correct mean M ecl uncertainty if the stochastic sampling hypothesis were correct.We find that for the stochastically generated star clusters to reproduce the observed tight m max −M ecl relation, there needs to be a high probability (>50%) of omitting the most massive stars in our mock observation and that the applied cluster mass uncertainties must be much smaller than the average value of the observational dataset.Specifically, if the average uncertainty on M ecl is the same as the average uncertainty on m max on the logarithmic scale (i.e. about 0.15 dex instead of 0.3 dex), the resulting mock observations would have a similar scatter to the observed points.However, Sects.7.4.1-7.4.3 demonstrate that the adopted mass uncertainties are often given as a lower limit.Therefore, these requirements are not likely to be fulfilled.

Previous studies
Reproducing the mean trend of the m max −M ecl relation (Cerviño et al. 2013;Hirai et al. 2021) is possible with both sampling hypotheses but the small scatter on the m max −M ecl relation favours optimal sampling.A possibly larger measurement uncertainty of the adopted dataset, as is suggested by Krumholz (2014) and Sects.7.4.1-7.4.3 above, would suggest an even smaller A151, page 10 of 16 intrinsic scatter of the m max −M ecl relation and strengthen the argument favouring optimal sampling.
We note that a test between deterministic sampling and stochastic sampling (e.g. the present work) is different from a test between stochastic sampling and stochastic sampling with a cluster-mass-dependent truncation for the mass of the most massive star (e.g.Hermanowicz et al. 2013).Earlier works studying the m max −M ecl relation assume the truncated stochastic sampling scenario (e.g.Weidner & Kroupa 2006, their sorted random sampling), which is claimed to be ruled out by many studies (e.g.Corbelli et al. 2009;Fumagalli et al. 2011;Andrews et al. 2013Andrews et al. , 2014;;Dib et al. 2017).However, these latter are not falsifications of optimal sampling, which is a pure deterministic sampling (Eqs.( 8) and ( 9), see also Kroupa et al. 2013, their Eq.(4.9), Schulz et al. 2015, their Eqs. ( 6) and ( 7)) and is distinct from stochastic sampling under a deterministic truncation.For example, Weidner et al. (2010) discuss the observational m max −M ecl relation, and therefore their m max corresponds to 'the mass' of the most-massive star and the best-fit relation is consistent with the observation by definition.However, if the observational m max is taken as 'the mass limit' of the most-massive star for stochastic sampling, the stochastically sampled most massive star would be systematically less massive than m max , and therefore inconsistent with the observations.Similarly, according to the optimal sampling hypothesis, star clusters of 100, 400, and 1000 M , and with (one-tenth of) the solar metallicity must have a most-massive star of approximately 8 (9), 20 ( 22), and 36 (40) M , respectively (calculated by the open-source GalIMF code, Yan et al. 2017), which are in good agreement with the observations presented by Koda et al. (2012).
In summary, truncated statistical sampling is not directly related to the discussion of whether the star formation process is highly self-regulated.A significantly tight m max −M ecl relation can and can only exist because of optimal sampling instead of truncated random sampling, which is in agreement with previous studies.

Practical considerations
The deterministic optimal sampling method is useful when one aims to study the evolution of an 'average' stellar population closely resembling the IMF.The optimally sampled stellar population is not only a fair sample of the IMF but also has a mass error smaller than the mass of the lowest-mass star, 0.08 M .In comparison, the stochastic sampling hypothesis contradicts mass conservation.Given the total mass of a molecular cloud and a star formation efficiency (which may be universal), what is being constrained is the mass instead of the number of the stars that can form in that cloud.Macroscopic physics, as we know it, is not stochastic.There are a variety of treatments regarding when to stop the random drawing of stars in order to approximate mass conservation to some level, but mass conservation and reproduction of the IMF cannot be simultaneously fulfilled with stochastic sampling (Smith 2021, their Sect. 2).Therefore, various mass-transfer schemes (transfer of mass up to the mass of the assumed stellar-mass upper limit) between nearby gas or star particles are applied to enforce the mass conservation in simulations applying random sampling (e.g.references listed in Sect.1).
Optimal sampling is also much more efficient in situations when only the masses of the most massive stars are of interest (Applebaum et al. 2020;Hislop et al. 2022) because the numbers are deterministic and can be given analytically (Eqs.( 7)-( 10)).However, in the case of stochastic sampling, sampling  11).The thick curve is given by the optimal sampling hypothesis (Sect.5).
only the massive stars, such as in Gatto et al. (2017), does not follow the probabilistic interpretation of the IMF where the formation of stars is an independent process.Therefore, one should always finish sampling all the stars in a star cluster before the most massive stars can be identified without bias (e.g. the method described in Geen et al. 2018, their Sect. 2.4).This leads to a significant cost difference between the two sampling methods.
On the other hand, the optimal sampling hypothesis faces a fundamental problem: the total mass of a real single formation event of a star cluster cannot be well defined because stars do not form in one instance and there may be multiple populations of stars formed (or captured) in the same star cluster (Pflamm-Altenburg & Kroupa 2007, 2008, 2009;Kroupa et al. 2018;Wang et al. 2020;Wirth et al. 2022).In addition, optimal sampling cannot be directly applied to star particles that represent multiple star clusters or a fraction of a star cluster.This problem can be bypassed by stochastic sampling from a list of optimally sampled stellar masses of a galaxy (i.e.OSGIMF, Yan et al. 2017) and may eventually be resolved by galaxy simulation models where the stellar particle represents the true cluster mass.

Properties of the star forming clouds
Assuming a star formation efficiency of 33%, the total mass of the pre-cluster molecular cloud, M cl , can be estimated.The scaling relation between M ecl and the pre-cluster molecular cloud core density, ρ cl , from Yan et al. (2017, their Eq.( 7)) is applied to estimate ρ cl , that is where m is the stellar initial mass.Salaris & Cassisi (2006) argue that, for more massive stars (>20 M ), L ∝ M because radiation pressure is dominating the total pressure instead of gas pressure.However, the exact mass boundary within which we can apply a linear L−M relation is not provided.Observations for stars up to 50 M suggest a L−M relation with a power-law index of about 2.8 instead of 1 (Vitrichenko et al. 2007 and up to 64 M according to Eker et al. 2018, their Table 7).For simplicity, here we assume the widely applied power-law index of 3.5 for stars below 55.41 M and a linear L−M relation for more massive stars following Yan et al. (2019).The derived L bol values and the ratios between M cl and L bol are shown in Fig. 9. Finally, the total combined bolometric luminosity of all the optimally sampled stars in a young star cluster is calculated, assuming that the pre-main sequence stars have a bolometric .Total stellar bolometric luminosity-pre-cluster molecular cloud mass relation.The solid curve shows the prediction from the optimal sampling method, where the cluster mass relates to the mass of each star formed in that cluster deterministically, according to Eqs. ( 7)-( 10).The total stellar luminosity is calculated with Eq. ( 12).The red circles are the observational data adopted from Urquhart et al. (2022); specifically their clumps related to HII regions.
luminosity equal to the zero age main sequence (ZAMS) stars given by Eq. ( 12).The relation predicted by the optimal sampling hypothesis is compared with the observed mass, M fwhm , and bolometric luminosity within the FWHM source sizes of the star-forming dense molecular clumps (Urquhart et al. 2022, their HII region related clumps) in Fig. 10, assuming M cl = M fwhm .
The optimal sampling model is in qualitative agreement with the data (Fig. 10), which include complex radiation processes and a contribution from accretion luminosity.The model, in contrast, counts only the ZAMS luminosities of all stars in the embedded cluster, assuming them to appear instantly at the same time, while star-formation might proceed over 1 Myr (Sect.7.4.1).In addition, massive stars can already be ejected from the cloud within 1 Myr (Kroupa et al. 2018).These effects result in a lower observed luminosity than our model for massive clusters.On the other hand, the observational data systematically become more luminous than the optimal sampling model towards smaller cloud mass M cl .This is likely a consequence of the following physics: at the massive end, most of the bolometric luminosity comes from the massive stars, which are already on the main sequence when they become detectable in HII regions.The model therefore well represents these data.Lowmass embedded clusters, on the other had, are predominantly populated by low-mass stars.These are, when they become visible, still contracting onto the main sequence with significantly larger luminosities when very young.It will be an interesting problem to include pre-main sequence evolutionary tracks into the optimal sampling model to test, in the future, whether the observational data shown in Fig. 10 better agree with such a model.At the low-mass cluster end, the optimal sampling model does not expect the formation of massive stars in these clusters, while some observed low-mass clumps appear to have HII region features.The ionising radiation responsible for creating the HII region may be due to low-mass pre-main sequence stars producing a higher ionising flux than their ZAMS counterparts due to their flaring activity (e.g.Raetz et al. 2020).

Conclusions
In this study, we demonstrate that the stochastic sampling hypothesis fails to reproduce the small scatter of the observed m max −M ecl relation such that the hypothesis is rejected at a 4.5σ significance level if: the measurement uncertainties of m max and M ecl are not overestimated; the compiled dataset is unbiased; and if the IMF can be adjusted.If, on the other hand, the IMF is fixed to the canonical form and m up ≥ 150 M , then the data are inconsistent with random sampling with > 7σ confidence.If the adopted mass uncertainties are underestimated, which is the case when certain uncertainties are omitted, the conclusion becomes stronger.In comparison, optimal sampling from an invariant canonical IMF and without an intrinsic dispersion of m max values for a given M ecl describes the observed m max −M ecl relation naturally.
The optimal sampling model predicts a strong correlation between the bolometric luminosity of a young stellar population and the mass of the embedded cluster.We calculate this relation assuming a freshly hatched optimally sampled stellar population on the ZAMS.The correlation agrees qualitatively well (Fig. 10) with the data from the ATLASGAL survey (Urquhart et al. 2022).Future modelling will likely improve the agreement once pre-main sequence stellar evolution tracks are incorporated into the evolution of the bolometric luminosity of low-mass stars.
In addition, we discover a large number of clusters with m max ≈ 13 M .This may indicate that the formation of more massive stars is inhibited beyond 13 M and/or that there is a strong dynamical ejection for the most massive stars for embedded clusters with stellar masses in the range of about 63 to 400 M , supporting significant self-regulation in the star formation process.
A homogeneous and consistent observational survey of embedded clusters and their stellar content would help to improve the reliability of our statistical test by providing a more reliable uncertainty analysis.Studies examining the properties of dwarf galaxies (as mentioned in Sect. 1) will continue to provide us with constraints on the self-regulation in the star formation process.For example, if stars were stochastically sampled, then the integrated galaxy-wide IMF of many dwarf galaxies would be the same as the canonical IMF (cf. Lee et al. 2009), meaning that they would have the same supernova formation efficiency as massive galaxies (cf.Zhang et al. 2021) and that their chemical evolution would be understandable with an invariant canonical IMF (cf.Yan et al. 2020;Mucciarelli et al. 2021).See also Kroupa & Jerabkova (2021, their Sect. 1.6.3)for a further discussion of these tests.
In conclusion, we still do not have a comprehensive theory of how stars form.Nature is deterministic, and so star formation cannot be stochastic.However, as we do not know all the variables (e.g.rotation, temperature) of the embedded-clusterforming molecular cloud clump, this lack of knowledge would transpire as an apparent stochasticity on top of the optimal sampling process, such that the practical sampling method could be in between the statistical and optimal sampling extremes.The Python GalIMF code, which performs optimal sampling for star clusters and entire galaxies, is publicly available (see Sect. 5).

Fig. 2 .
Fig. 2. Same as Fig. 1, with the same set of synthetic star clusters (model a23m150), but with mock observational uncertainties added.
) 4.2.1.Model a23m150: the canonical IMF with m up = 150 M 01 step for different α 3 .The best-fit combination for the observation is with m up = 130 M and α 3 = 2.68 (model a268m130), in which case the data have six points in both upper and lower regions as shown in Fig. 4.

Fig. 4 .
Fig. 4. Same as Fig. 2 but for the a268m130 IMF model (Table 1) which best fits the observed m max −M ecl relation.The additional horizontal dotted line denotes m up = 130 M .
Fig.5.Same as Fig.2but with the optimally sampled clusters (the smooth black solid curve) with no intrinsic m max −M ecl scatter.By assuming a stellar population in an embedded cluster of stellar mass M ecl in an optimal draw from the canonical IMF, the observed distribution is obtained naturally once the observational uncertainties are taken into account.This is not the case for a randomly drawn canonical IMF (Fig.2).

Fig. 6 .
Fig. 6.Average position of the observational data for the groups of ten nearest points in the m max −M ecl relation.The linear grey line highlights that the clusters with a mass of between 10 1.8 M = 63 M and 10 2.6 M = 400 M depart from the linear relation.

Fig. 7 .
Fig. 7. Standard deviation of M ecl (blue triangles) and m max (red squares) for the groups of ten nearest points in the m max −M ecl relation.The orange and the blue curves in the upper panel are the synthetic relations assuming optimal sampling of stars.The lower panel shows the difference between the model and the observations.

7. 5 .
Conditions for stochastic sampling to reproduce the observed m max -M ecl relation

Fig. 8 .
Fig.8.Same as Fig.5but for the pre-cluster molecular cloud mass, M cl , and cloud core density, ρ cl , assuming a star formation efficiency of 33% and Eq.(11).The thick curve is given by the optimal sampling hypothesis (Sect.5).

Fig. 9 .
Fig. 9. Same as Figs. 5 and 8 but for the bolometric luminosity (upper panel) and the mass-to-light ratio (lower panel).The horizontal coordinates of the upper and lower panel are identical to Figs. 5 and 8 (M cl = 3 M ecl ).
Fig.10.Total stellar bolometric luminosity-pre-cluster molecular cloud mass relation.The solid curve shows the prediction from the optimal sampling method, where the cluster mass relates to the mass of each star formed in that cluster deterministically, according to Eqs. (7)-(10).The total stellar luminosity is calculated with Eq. (12).The red circles are the observational data adopted fromUrquhart et al. (2022); specifically their clumps related to HII regions.
Most massive stellar mass to embedded cluster mass (m max −M ecl ) relation.Red stars denote the observational data listed in Table A.1.Here, 10 6 synthetic clusters are stochastically sampled from the canonical IMF (the a23m150 IMF model in Table Fig. 1. Table 1) which best fits the observed m max −M ecl relation.The additional horizontal dotted line denotes m up = 130 M .