Stellar population of the Rosette Nebula and NGC 2244 (cid:63) Application of the probabilistic random forest

Context. Measurements of internal dynamics of young clusters and star-forming regions are crucial to fully understand the process of their formation. A basic prerequisite for this is a well-established and robust list of probable members. Aims. In this work, we study the 2 . 8 ◦ × 2 . 6 ◦ region in the emblematic Rosette Nebula, centred in the young cluster NGC2244, with the aim of constructing the most reliable candidate member list to date. Using the obtained catalogue, we can determine various structural and kinematic parameters, which can help to draw conclusions about the past and the future of the region. Methods. We constructed a catalogue containing optical to mid-infrared photometry, as well as accurate positions and proper motions from Gaia EDR3 for the sources in the ﬁeld of the Rosette Nebula. We applied the probabilistic random forest algorithm to derive the membership probability for each source within our ﬁeld of view. Based on the list of almost 3000 probable members, of which about a third are concentrated within the radius of 20 (cid:48) from the centre of NGC2244, we identiﬁed various clustered sources and stellar concentrations in the region, and estimated the average distance to the entire region at 1489 ± 37pc, 1440 ± 32pc to NGC2244, and 1525 ± 36pc to NGC2237. The masses, extinction, and ages were derived by ﬁtting the spectral energy distribution to the atmosphere and evolutionary models, and the internal dynamic was assessed via proper motions relative to the mean proper motion of NGC2244. Results. NGC2244 is showing a clear expansion pattern, with an expansion velocity that increases with radius. Its initial mass function (IMF) is well represented by two power laws ( dN / dM ∝ M − α ), with slopes α = 1 . 05 ± 0 . 02 for the mass range 0.2 - 1.5M (cid:12) and α = 2 . 3 ± 0 . 3 for the mass range 1.5 - 20M (cid:12) , and it is in agreement with slopes detected in other star-forming regions. The mean age of the region, derived from the HR diagram, is ∼ 2Myr. We ﬁnd evidence for the di ﬀ erence in ages between NGC2244 and the region associated with the molecular cloud, which appears slightly younger. The velocity dispersion of NGC2244 is well above the virial velocity dispersion derived from the total mass (1000 ± 70M (cid:12) ) and half-mass radius (3 . 4 ± 0 . 2pc). From the comparison to other clusters and to numerical simulations, we conclude that NGC2244 may be unbound and that it possibly may have even formed in a super-virial state.


Introduction
For the last three decades, it has generally been accepted that most (70-90%) of star formation in the Milky Way occurs in embedded clusters, with loose OB associations emerging as remnants of their dissolution (Lada & Lada 1991, 2003).In this view, often referred to as the 'monolithic scenario', embedded clusters emerge as initially bound structures, whose expansion leads to the unbound associations we observe today.This expansion is often thought to be driven by the expulsion of residual gas because of stellar feedback (Goodwin & Bastian 2006;Baumgardt & Kroupa 2007;Banerjee & Kroupa 2015); although, other processes may contribute as well (Gieles et al. 2012).
Full Table A.1 is only available at the CDS via anonymous ftp to cdsarc.cds.unistra.fr(130.79.128.5) or via https:// cdsarc.cds.unistra.fr/viz-bin/cat/J/A+A/668/A19However, a contemporary alternative view of star formation has regarded it as a hierarchical process operating over a large range of scales (Efremov 1979;Elmegreen & Elmegreen 1983;Larson 1994;Elmegreen & Efremov 1996).In this 'hierarchical picture', star formation proceeds on multiple scales, from kiloparsec-sized super-complexes, to OB associations and star clusters on smaller scales, down to multiple stellar groupings and single stars.With stars forming across a continuous distribution of gas densities, unbound associations may form in situ, with similar low densities as observed today.In the last years, access to wide-field observations, along with recent theoretical results have contributed to consolidate this view of the in situ formation of unbound associations in the Milky Way and beyond (Bressert et al. 2010;Gouliermis 2018;Grudić et al. 2018;Ward et al. 2020).
Historically, the two views of star formation proceeded more or less separately on different scales, which was motivated by observational circumstances: the works discussing the hierarchical picture (e.g.Elmegreen & Elmegreen 1983;Elmegreen & Efremov 1996) looked typically at large, galacticscale fields, whereas the monolithic scenario emerged from studies over small field-of-view observations of star-forming regions possible with infrared arrays available in the 1990s.Although the two views of star formation are sometimes presented as being opposed to each other, they should better be viewed as being complementary and operating on different scales.It may as well be that there exist both types of OB associations, those that formed in situ as well as the expanded ones.After all, having 70−90% of stars forming in embedded clusters, as claimed by Lada & Lada (2003), is still compatible with a fair number forming 'on the loose'; although, the exact percentages may be revisited in the future.
Detailed studies of young stellar object (YSO) populations in both embedded and non-embedded regions have revealed that most star-forming regions appear to be clumpy, with their populations distributed in several subclusters (Kuhn et al. 2014(Kuhn et al. , 2021;;Sills et al. 2018).Measurements of internal kinematics of these subclusters are needed to understand how they relate to each other, that is to say whether the tendency is towards merging into larger clusters, or a dispersal of individual subclusters once the molecular gas is gone from the system.However, detailed studies of internal kinematics and structural properties of young star clusters have been challenging due to instrumental issues (the need for high-precision astrometry or, alternatively, radial velocities for a large number of stars), as well as due to those related to the determination of membership.Recently, both issues have seen major improvements thanks to Gaia (Gaia Collaboration 2016), which led to precise measurements of velocity dispersion and the detection of expansion patterns in a substantial number of clusters younger than 5 Myr (Kuhn et al. 2019).As of the membership selections, those based on optical and near-infrared (NIR) photometry typically suffer from significant contamination by field objects, while X-ray and mid-infrared-excess selections, though quite robust, come with a bias towards objects with a particular set of properties, and they are unable to uncover entire populations.Spectroscopy, on the other hand, becomes unfeasible for regions spanning large areas, and consequently containing large numbers of objects.Precise proper motions, especially when combined with parallaxes for systems relatively close to the Sun, provide a crucial piece of information for membership determination, and can be combined with photometric criteria to uncover reliable sets of pre-main sequence (PMS) candidate members.
Recently, various machine learning (ML) techniques, both supervised and unsupervised, have been used to aid in this process, and have provided membership lists in a large number of regions, from the youngest ones with ages of ∼1 Myr to the oldest known open clusters (e.g.Gao 2018;Marton et al. 2019;Miret-Roig et al. 2019;Melton 2020;Cantat-Gaudin et al. 2018;Cantat-Gaudin & Anders 2020;Galli et al. 2021;Peña Ramírez et al. 2021).While the unsupervised techniques can efficiently deal with a large number of clusters and thus help to improve the statistics on the overall properties of young Galactic populations, they may still come with some non-negligible amount of contamination.On the other hand, supervised ML techniques allow us to use all the prior knowledge on the properties of cluster population, potentially resulting in cleaner candidate member lists.This, of course, brings the challenge of properly selecting a training set, since all biases and the contamination of this training set will be propagated towards the final classification and thus to the sample that will be used for further astrophysical studies.
Most traditional ML methods are often not designed to deal with measurement uncertainties, which can influence their overall performance on a typical low-or moderate-signal-to-noise ratio astronomical dataset (Baron 2019).Several approaches to this problem have been explored and presented in astronomical literature (e.g.Krone-Martins & Moitinho 2014;Castro et al. 2018;Naul et al. 2018;Shy et al. 2022), but there are in general very few existing tools that take dataset uncertainties into account during the model construction.One of these tools is the probabilistic random forest (PRF) classifier (Reis et al. 2019), a modified version of the original random forest algorithm, in which both the features and the labels are treated as probability distribution functions (PDFs), rather than deterministic values.Each random variable is represented by a PDF, with a mean corresponding to a measurement and a variance to the measurement uncertainty.In this work, we apply the PRF algorithm to uncover the stellar population in a well-known young region of star formation, the Rosette Nebula.
The Rosette Nebula is the most prominent and active starforming region in the Mon OB2 association.In its centre, it hosts a young star cluster NGC 2244, whose massive stars are thought to be responsible for the evacuation of much of the interstellar material from the centre of the HII bubble (Román-Zúñiga & Lada 2008).Numerous studies of the NGC 2244 population have revealed a presence of several tens of OB stars, along with a large population of low-mass stars and substellar candidates (Park & Sung 2002;Balog et al. 2007;Román-Zúñiga & Lada 2008;Wang et al. 2008;Bonatto & Bica 2009;Mužić et al. 2019;Lim et al. 2021).Studies of the wider region of the Rosette nebula witness that the star-forming activity is by no means limited to its most prominent cluster, revealing several smaller groups associated with the molecular cloud, as well as more extended regions containing young stars (Phelps & Lada 1997;Román-Zúñiga et al. 2008;Poulton et al. 2008;Cambrésy et al. 2013;Ybarra et al. 2013;Kuhn et al. 2014).
Previous estimates of the distance to NGC 2244 vary between 1400 and 1700 pc (Ogura & Ishida 1981;Perez et al. 1987;Hensberge et al. 2000;Park & Sung 2002;Lombardi et al. 2011;Martins et al. 2012;Bell et al. 2013;Kharchenko et al. 2013;Kuhn et al. 2019;Mužić et al. 2019;Lim et al. 2021), with the most recent estimates from Gaia data yielding distances around 1500 pc.Most authors agree on the age of ∼2 Myr (Perez et al. 1987;Lim et al. 2021).In this paper, we reassess both the age and the distance to the region, using the Gaia EDR3 parallaxes for the latter (Gaia Collaboration 2021).
Previous member selections in the Rosette Nebula have been made based on X-ray (Wang et al. 2008), NIR (Román-Zúñiga et al. 2008), and mid-infrared (MIR; Balog et al. 2007) properties, as well as proper motions from Gaia (alone or in combination with photometric selection; Mužić et al. 2019;Kuhn et al. 2019;Cantat-Gaudin & Anders 2020;Lim et al. 2021).Some of these studies deal with the wide region spanned by the nebula, while others only concentrate on the central cluster NGC 2244.In this work, we apply the PRF algorithm to the sources detected in the 2.8 • × 2.6 • region in the direction of the Rosette Nebula, accompanied with a carefully selected training set, with a goal to obtain the most reliable and unbiased view of the stellar population in the region to date, and study its structure, mass, age, and kinematics.
This paper is structured as follows.In Sect.2, we present the details of the dataset used in this work, including photometric A19, page 2 of 29 and astrometric catalogues, as well as the spectroscopic data.The membership selection method using the PRF algorithm is given in Sect.3, followed by the analysis of spatial and kinematic properties of the selected members (Sect.4), as well as their further physical characterisation (Sect.5).Section 6 is centred around the properties of NGC 2244, including its mass function, mass segregation, and a discussion on its dynamical state.Finally, Sect.7 summarises the work, and lists the main conclusions.

Photometric and astrometric catalogue
We used the methodology developed as part of the Dynamical Analysis of Nearby Clusters (DANCe) project (Bouy et al. 2013) to gather optical and near-infrared photometry as well as accurate positions and proper motions for the sources in the field of the Rosette Nebula, centred at the cluster NGC 2244.
The field of view in this study satisfies the following conditions: 96.6 • ≤ α ≤ 99.4 • and 3.6 • ≤ δ ≤ 6.2 • .This field has been chosen such to include the region containing the main bulk of CO emission of the Rosette cloud (Heyer et al. 2006;see Fig. 8 in Román-Zúñiga et al. 2008).It also fully encompasses several structures identified and studied in previous works on the stellar populations of NGC 2244 and the rest of the nebula (e.g.Wang et al. 2008;Kuhn et al. 2014;Meng et al. 2017).We searched for wide field images covering this field in the following archives: the European Southern Observatory (ESO), the National Optical-Infrared Astronomy Research Laboratory Archive (NOIRLab), the Canada France Hawaii Telescope (CFHT) archive hosted at the Canadian Astronomy Data Centre (CADC), the Isaac Newton Group (ING), and the WFCAM Science (WSA).The data found in these public archives were complemented with our own observations using the ESO Very Large Telescope and its near-infrared camera HAWK-I (Pirard et al. 2004), under the programme ID 0104.C-0369.Table 1 gives an overview of the various instruments used for this study.
In all cases except for CFHT/MegaCam, DECam, WIRCam and UKIRT images, the raw data and associated calibration frames were downloaded and processed following standard procedures using an updated version of Alambic (Vandame 2002), a software suite developed and optimised for the processing of large multi-CCD imagers.In the case of MegaCam and WIRCam at CFHT, the images processed and calibrated with the Elixir (Magnier & Cuillandre 2004) and the 'I'iwi 1 pipelines, respectively, were retrieved from the CADC archive.In the case of DECam, the images were retrieved from the NOIRLab public archive and processed with the community pipeline (Valdes et al. 2014).UKIRT images processed by the Cambridge Astronomical Survey Unit were retrieved from the WFCAM Science Archive.After a rejection of problematic images (mostly due to loss of guiding, tracking or read-out problems) and of problematic pixels using MaxiTrack and MaxiMask (Paillassa et al. 2020), the dataset included 2187 individual images originating from ten cameras.

Astrometric calibration
The astrometric calibration was performed as described in Olivares et al. (2019) using Gaia EDR3 (Gaia Collaboration 1 https://www.cfht.hawaii.edu/Instruments/Imaging/WIRCam/IiwiVersion2Doc.html 2021) as astrometric reference.The final average internal and external 1-σ residuals were estimated to be ∼15 mas for high signal-to-noise (photon noise limited) sources.The most important steps are briefly outlined here.
-PSF modelling: a non-parametric spatially variable model of the PSF was computed using PSFex (Bertin 2013).
-Source detection: sources with more than three pixels above 1.5 standard deviations of the local background were detected and their instrumental fluxes and positions measured using the empirical PSF with SExtractor (Bertin & Arnouts 1996).We note that the PSF modelling has been used for astrometry only, while for the photometry measurements using an automatic adaptive aperture were preferred.
-The global astrometric solution is computed iteratively by Scamp (Bertin et al. 2006) by minimising the quadratic sum of differences in position between overlapping detections from pairs of catalogues.Positions are tied to the Gaia EDR3 release.
As explained in Bouy et al. (2013), the computed proper motions are relative to one another and display an offset with respect to the geocentric celestial reference system.We estimate the offsets by computing the median of the difference between our proper motions and those of the Gaia EDR3 after rejecting outliers using the modified Z-score.The typical proper motion uncertainties of our catalogue are shown in Fig. 1.

Photometric calibration
The photometric calibration was performed only for the Hα, u, g, r, i, z, y and J, H, K S images, and was not attempted for the other various Strömgren or narrow band images (except Hα).The photometric zero-point of all individual images obtained in the u, g, r, i, z, y, Hα and J, H, K S filters was computed by direct comparison of the instrumental SExtractor (Bertin & Arnouts 1996) MAG_AUTO magnitudes with an external catalogue: u images were tied to the SDSS catalogue (Blanton et al. 2017); -g, r, i, z, y images were tied to the Pan-STARRS PS1 catalogue (Chambers et al. 2016); -Hα images were tied to the IPHAS DR2 photometry (Barentsen et al. 2014); -B, V images were tied to APASS (Henden et al. 2015); -R, I (VIMOS/VLT; Le Fèvre et al. 2003) images were calibrated on the Johnson-Kron-Cousins system using standard fields obtained on the same night; -Y images were tied to the UKIDSS photometric system (Lawrence et al. 2007); -J, H, K S images were tied to the 2MASS catalogue (Skrutskie et al. 2006).
The individual zero-points were computed as follows.First, clean point-like sources were selected using SExtractor |SPREAD_MODEL| ≤ 0.02.The SPREAD_MODEL is a morphometric linear discriminant parameter obtained when fitting a Sérsic model, which is useful to classify point-sources from extended sources (see e.g.Bouy et al. 2013).Second, a difference between the Kron and PSF instrumental magnitudes smaller than 0.02 mag (|MAG_AUTO-MAG_PSF| ≤ 0.02 mag), used as an additional test to reject extended or problematic sources.Third, SExtractor FLAG = 0, to avoid any problem related to blending, saturation, or truncation.The closest match within 1 to these clean point-like sources was then found in the reference catalogue.The zero-point was computed as the median of the difference between the reference and instrumental (MAG_AUTO) magnitudes.The median absolute deviation are typically of the order of 0.01 ∼ 0.09 mag depending on the filter.Figure 2 shows A19, page 3 of 29  a comparison between the final i-band magnitudes of our catalogue and that reported in the Pan-STARRS catalogue.
MAG_AUTO magnitudes correspond to the total flux obtained by integrating pixel values within an adaptively scaled aperture computed using Kron's first moment algorithm (Kron 1980).They are preferred over MAG_PSF magnitudes obtained using PSF fitting because the latter give very poor measurements for extended sources (galaxies), leading to biased zeropoints when comparing to the above-mentioned reference catalogues.

Deep stacks
We median-combined all the photometrically and astrometrically registered images obtained with the same camera and in the same filter to obtain a deep stack and extract the corresponding photometry.These stacks made of many epochs were not used for the proper motion measurements but allowed us to significantly improve the detection sensitivity in all filters, and recover or improve the photometry of faint sources obtained in the individual images.These measurements have a higher signal-to-noise and are therefore preferred over the measurements obtained from the individual images, whenever available.

External astrometry and photometry
As in Bouy et al. (2013), the photometry and astrometry extracted from the images are complemented by that reported in external catalogues.The recent Gaia EDR3 catalogue provides accurate parallaxes and proper motions for sources down to G ∼ 20 mag.We retrieved all the sources reported in the Gaia EDR3 catalogue over the area covered by our images and cross-matched with our catalogue using the closest match within a 1 radius.Because Gaia's proper motion measurements are much more accurate and reliable than our groundbased measurements, we always prefer them over our own measurements, and keep our proper motion measurements only for sources without any counterpart in the Gaia EDR3 catalogue.As our catalogue was constructed using Gaia EDR3 as the primary astrometric reference, it is already tied to the Gaia Celestial Reference Frame.
Given the heterogeneous spatial coverage of the various datasets, the DANCe catalogue described above (and including Hα, u, g, r, i, z, y, J, H, and K S ) is complemented with Gaia EDR3 astrometry and photometry (G, BP, and RP), as well as Pan-STARRS (g, r, i, z, and y; Chambers et al. 2016), 2MASS (J, H, and K S ; Skrutskie et al. 2006), and AllWISE (W1−W4, Cutri et al. 2021) photometry.These additional photometric values were directly included in our catalogue when no measurement was available for the corresponding source in our data; otherwise, the weighted average of all -of our and external catalogues' -measurements is adopted.In some cases, significant differences exist among the filters of the same photometric band.These systematic differences contribute to the noise of the photometric dispersion in the colour-magnitude and A19, page 4 of 29 colour-colour diagrams used for our analysis.Moreover, recent variability studies of young clusters found typical amplitudes of 0.03 mag (e.g.Rebull et al. 2016;Stauffer et al. 2016).Since our aim is to identify cluster members, and to take into account both the systematic differences among filters of the same band and the intrinsic photometric variability, we conservatively add, in quadrature, 0.05 mag to all the photometric uncertainties in our catalogue.

Completeness of our catalogue
The goal of this work is to obtain a detailed study of the stellar population of the Rosette Nebula and the cluster NGC 2244.
To have a uniform data quality over the entire field, we limit ourselves to the photometric bands extending over the entire field of view (listed in Table 2), together with the astrometry from Gaia EDR3.This selection includes ∼200 000 sources, or ∼40% of the original catalogue.The remaining parts of our compiled catalogue described in previous sections are left for a future publication.
Given the heterogeneous origin of our dataset, its photometric completeness may be a complex distribution.To calculate the completeness for each photometric band that will be used in the rest of the work, we first restrict the catalogue to the objects that have proper motion measurement from Gaia EDR3.We define the completeness limit of each photometric band as the magnitude at which the density of sources reaches a maximum value (Fig. 3).The completeness intervals are reported in Table 2.The faint end for most of the filters is equivalent to masses of 0.1−0.2M for the age of 2 Myr, distance of 1500 pc and the average extinction of NGC 2244 (A V = 1.5 mag; Mužić et al. 2019), according to BT-Settl models (Allard et al. 2011).We therefore adopt 0.2 M as the low-mass completeness limit later in the paper.For the bright end completeness limits, there is no simple way to establish a mass equivalence, since the iso-mass contours in Hertzsprung-Russell (HR) diagrams run nearly parallel to the temperature axis (see Sect. 5.3).

Observations and data reduction
Optical spectra have been obtained using VIsible MultiObject Spectrograph (VIMOS; Le Fèvre et al. 2003) at the ESO's Very Large Telescope, in the MOS mode.They have been obtained as a part of the programme 080.C-0697 (PI King) and retrieved from the ESO archive.VIMOS contains four CCDs arranged in a 2 × 2 array, each covering a field-of-view of 7 × 8 with a pixel resolution of 0 .205.The four quadrants are separated by gaps of approximately 2 .The spectra were obtained using the mediumresolution grism (MR), covering the wavelength range of 5000-10 000 Å, with a dispersion of 2.3Å pixel −1 .This, in combination with the slit of the 1 width, results in a spectral resolution of R ∼ 580.The observations cover 16 different pointings, with a total on-source time per pointing between 40 s and 9200 s.Since the data have been taken prior to the detector upgrade in 2009, they suffer from strong fringing at wavelengths redder than ∼8000 Å.We cut the spectra at 8500 Å to reduce fringing effects while trying to maintain a wavelength range as large as possible, since this is important for spectral type determination.
Data reduction was performed using the VIMOS pipeline provided by ESO, and includes bias subtraction, flat-field and bad-pixel correction, wavelength and flux calibration, as well as the final extraction of the spectra.The extraction is done by applying the optimal extraction from Horne (1986).In total we extracted spectra of 501 objects, distributed within ∼15 around the centre of NGC 2244.

Spectral analysis
The VIMOS data have been retrieved from the ESO archive, thus we are not aware of the exact selection procedure for individual objects, however, the vast majority of the selected sources are red in optical-to-infrared colour-magnitude diagrams.Because of the fringing, the gravity-sensitive alkali lines typically used for youth (i.e.membership) assessment (Na II doublet at 8183/8195 Å and Ca II triplet at 8498/8542/8662 Å; Kirkpatrick et al. 1991;Martín et al. 1999;Schlieder et al. 2012) are not available.We therefore rely only on the enhanced Hα emission as a signature of youth.
The vast majority of sources with Hα in emission show spectral features characteristic for cool dwarfs (late K and M spectral types), which consist of an overall spectral shape which is flat or with a positive slope towards red, as well as various molecular (TiO, VO, CaH) and metallic transitions (Kirkpatrick et al. 1991).By visual inspection, we identify 327 late-type objects, of which 251 show a discernible Hα in emission.Among the remaining 174 spectra of earlier type stars, only 3 show Hα in emission.Pronounced H α emission in stars with SpT K0 can be taken as a clear sign of accretion, since the chromospheric activity in these stars tends to first fill-in the photospheric absorption lines, unlike in the lower mass stars where the chromospheric activity manifests through Hα in emission (Stauffer et al. 1997 and Fang et al. (2009), respectively.The sources with the SpT K7 and M0 have been randomly displaced within (SpT-0.5,SpT+0.5), for clarity.The error bars are shown only when they are larger than the plotting symbols.Hou et al. 2016).Therefore, the 3 early-type objects are directly included in our list of accretors and potential members of the cluster.We note that these objects are also listed as YSO candidates in previous works (Bell et al. 2013;Broos et al. 2013;Meng et al. 2017).For the late-type Hα emitters, additional analysis related to the strength of the Hα emission is necessary to identify potential cluster members.
To simultaneously derive the spectral type and extinction, we perform a χ-squared minimisation identical to that used in our previous works (e.g.Kubiak et al. 2021), using a grid of optical spectra of young objects (1-3 Myr) in Cha I, Taurus, η Cha (Luhman et al. 2003;Luhman 2004a,b,c), and Collinder 69 (Bayo et al. 2011), with spectral types M1 to M9, with a step of 0.25-0.5 spectral subtypes.To this, we add K5, K7, and M0 field templates, created by averaging a number of available spectra at these sub-types2 .The extinction has been varied between 0 and 5 mag, with the step of 0.2 mag, and we applied the extinction law by Cardelli et al. (1989), with R V = 3.1 (Fernandes et al. 2012).The derived spectral types and extinction for the analysed objects are given in Table A.1.
The equivalent widths (EWs) of the Hα line in sources experiencing accretion are higher when compared to non-accreting pre-main sequence and main-sequence stars, where Hα can still be present due to chromospheric activity (White & Basri 2003;Barrado y Navascués & Martín 2003;Fang et al. 2009;Alonso-Floriano et al. 2015).The measurements of the pseudo-EW of Hα line in emission is shown as a function of the spectral type in Fig. 4. The error bars are calculated as standard deviations from repeated measurements slightly changing the wavelength range for the pseudo-continuum subtraction, and the range over which the EW is measured, taking into account the spectrum measurement errors.We also overplot the oftenapplied criteria to distinguish between accretion and chromospheric activity from Barrado y Navascués & Martín (2003) and Fang et al. (2009), which will be used to select the objects for the training set in the next section.

Membership using the probabilistic random
Forest algorithm

Construction of the training set
The Rosette Nebula and NGC 2244 have been extensively studied in the past, and probable members have been selected based on spectroscopy, infrared excess, as well as the X-ray emission.
Additionally, with the availability of the astrometric measurements from Gaia EDR3, these member lists can be further constrained.The cluster NGC 2244 is by far the best studied region in the area of interest.Previous works yield a cluster radius in the range 20 −30 (Li 2005;Wang et al. 2008), and several other detailed studies that will be used in constructing the training set were limited to r < 20 , where the concentration of the candidate members is highest.For this reason, we limit the area for the training set construction to within the radius of 20 of the brightest cluster star HD 46150.

Member list
To construct the a list of members to be used for the training, we first compile a catalogue using the following sources in the literature: 1. X-ray and infrared excess candidates.Both Bell et al. (2013) and Meng et al. (2017) provide candidate member lists based on X-ray data (Wang et al. 2008) and infrared excess from Spitzer (Balog et al. 2007).The two lists have a large overlap, and contain 688 unique sources with counterparts in our catalogue.We use this list as the basis for the construction of the member training set, applying further cuts based on the proper motion information from Gaia EDR3.In Fig. 5 we show the proper motions of all the sources in our catalogue (within the plotting limits), along with the 688 candidate members.The 3-sigma-clipped mean proper motions are µ α cosδ = −1.75 ± 0.41 mas yr −1 , and µ δ = 0.25 ± 0.44 mas yr −1 .For our member training set, we chose to keep the sources within the 1.5σ proper motion ellipse.We note that this selection is somewhat arbitrary, with an intention to be conservative, rather than complete.Given the size of the uncertainties (indicated by the mean uncertainty in Fig. 5), we expect that several of the black dots outside of the orange ellipse are genuine members, which may later be recovered by the algorithm.449 out of 688 sources pass this proper motion criterion.
2. OB stars.Since the brightest members of the cluster will largely be missed by the selections of Bell et al. (2013) and Meng et al. (2017), we compile a list of candidate OB stars from Román-Zúñiga & Lada (2008), complemented by a list obtained by querying the Simbad database (Wenger et al. 2000).A total of 25 candidate OB stars are found in our catalogue, of which 20 pass the proper motion criterion defined in point 1.
3. Hα emitters from spectroscopy.We select the sources with pseudo-EWs of Hα consistent with accretion (see Fig. 4), where we use the criterion by Fang et al. (2009).For the earliest spectral types present in our dataset, we extrapolate the Fang et al. (2009) criterion horizontally.This yields 62 sources, of which 45 pass the proper motion criterion.
4. Hα emitters from photometry.We search for additional young sources with accretion signatures, with the help of Hα photometry.In Fig. 6, we show the r − i, r − Hα diagram of all the sources in the field (grey dots).The blue line and associated error bars show the mean value and standard deviation of r−Hα colour for a set of r − i values, respectively.Coloured points mark the objects with VIMOS spectra, with the colour scale indicating the measured pseudo-EWs of Hα.Majority of A19, page 6 of 29 pale red points (small absolute pseudo-EWs) follow the blue mean line, while the darker points tend to be located above the mean sequence, as expected for the objects brighter in Hα.This behaviour, however, presents some scatter, with some of the sources with large pseudo-EWs from the VIMOS spectra falling close to the main bulk of sources.This is probably caused by variability in Hα emission in young stars, due to variable accretion rates, or clumpy disk geometry, as previously reported in multiple works (e.g.Jayawardhana et al. 2006;Sousa et al. 2016;Manara et al. 2021).We select the stars with enhanced Hα values by drawing a line (black dashed line in Fig. 6) running roughly parallel to the mean sequence, above which we find a significant fraction of spectroscopic Hα emitters.This line is also roughly parallel to the extinction vector, making the selection extinction-independent.According to proper motions (lower right panel in Fig. 5), this sample is significantly contaminated at this stage, but it does show a concentration of sources in the area where cluster members tend to reside.The photometric Hα selection yields 1281 sources, of which 180 pass the proper motion criterion and are retained for the next step in the training set construction.
Joining the four lists of proper motion candidates presented above, and removing the overlapping objects, we obtain a list containing 700 stars, which can further be constrained by their position in a colour-magnitude diagram (CMD; Fig. 7).As expected, this sample still contains some contamination by the A19, page 7 of 29 field sources, which is particularly evident in the photometric Hα sample.Young stars are expected to be located to the right of the main bulk of the field sources due to their pre-main-sequence (PMS) evolutionary status, and the extinction.The solid lines in Fig. 7 show the PARSEC (Bressan et al. 2012;Pastorelli et al. 2020) isocrone for the age of 2 Myr and a distance of 1500 pc.
We use the isochrones only as a guide for defining the selection lines in the CMD.The OB stars are largely saturated in the optical wavelengths, and therefore only the PMS sources can be selected in the top left and middle panels of Fig. 7, in contrast to the near-infrared CMD (top right panel), where we can appreciate the transition from the PMS to the main sequence, extending to masses in excess of 20 M .The selection in the (i, i − J) CMD is therefore a single line excluding the candidates that are too blue to be really young, while the selection criterion in the (J, J − K s) diagram is defined so as to keep most of the bright sources (roughly 1 M ) and red PMS stars.Below J ≈ 15.5 mag, the sequence of PMS stars merges with that of the field stars (see Mužić et al. 2019), which motivates the vertical selection line in this magnitude range.The selection criterion in (z, z − H) has been defined with two lines of slightly different slopes, in order to accommodate the included OB stars, as well as to follow the shape of the PMS sequence.The final selection of the member training set consists of the sources that pass the selection in all the three CMDs, in addition to those that pass the selection in any of the diagrams, but are not present in the remaining ones.The latter corresponds exclusively to the bright stars saturated in the i− and z-bands.After this selection step, we are left with 506 candidate members.The final filtering is then done by looking at the parallaxes (Fig. 8), and excluding all the sources with the score ζ = | − |/ σ 2 + σ 2 > 3, where and σ represent the parallax measurement and uncertainty for each star, while and σ are the weighted mean and standard deviation of the entire sample, correspondingly.The six excluded stars are marked with red crosses in Fig. 8.The final member list for the training set consists of 500 stars, marked with black points in the lower panels of Fig. 7. Of the 500 members in our training set, 420 overlap with the member candidates reported in Bell et al. (2013) and Meng et al. (2017), while the remaining 80 have been added in this work.
A match of our training set to the Spitzer catalogue of the region (Balog et al. 2007) returns 287 common sources, which, according to the YSO classification rules from Fang et al. (2009) belong either to Class II and transition disk class, or show colours consistent with a naked photosphere.Our training set therefore does not include the more embedded YSOs (Class0/I), which is to be expected anyways given the main condition of the existence of the Gaia proper motion measurement.

Non-member list
The non-members are selected from the annulus with the inner and the outer radii of 15 and 25 , respectively.This way we avoid the densest part of NGC 2244 and minimise inclusion of potential cluster members, while sampling from a region that has similar extinction properties as the member sample.To construct the list that would represent the non-member class for the PRF classifier, we use two criteria: 1. Sources with inconsistent proper motions.This includes all the sources whose individual proper motion within 1σ do not intersect the ellipse centred at the mean proper motion of the member list, and with the semi-major axes equal to 3 standard deviations in both directions (Fig. 5).
2. Blue sources.This includes the sources located to the left of the selection lines in the CMDs shown in Fig. 7. Similar criteria have been constructed in other CMDs: (i, i − K s), and (z, z − H).
If a source is found in any of the above-mentioned lists, it is marked as a non-member.The non-member lists contain in total 9679 unique sources, that will be used as a pool for random selection of the sources for the training set.

PRF classifiers
There are several aspects that need to be considered when building a PRF model.These are the following: -Features.As we have shown in Sect.3.1, the main features that help distinguish between members and non-members are the position of the stars in various CMDs, as well as their proper motions.Therefore, the main features are the magnitudes (r, i, z, J, H, K s ), colours (constructed from the differences of pairs of the mentioned magnitudes), and proper motions from Gaia.Due to the relatively large distance to NGC 2244, the Gaia parallax tends to have large uncertainties, and is expected to be of limited use for the selection, especially for the fainter objects.We run the PRF algorithm both with and without the parallax as one of the features.In the next section, we discuss the feature importance, i.e. the relevance that each feature has for successful splitting of the two class categories.
-Hyperparameters.There are several parameters to adjust in the PRF algorithm, although we find that none have a major impact on the results.To test this, we change one of the parameters, while keeping others at a fixed (default) value, and compare the resulting accuracies.In case of the parameters n_estimators (number of trees), and the parameter max_features (maximum number of features to be considered at each node split), we find that the accuracies vary within 1% from the mean value over the tested range (n_estimators = 20-300, max_features = 3-19).We therefore use the default values for these two hyperparameters (n_estimators = 100, max_features = √ N, where N is the number of features; N = 20 when considering parallax as one of the features, N = 19 otherwise).
A19, page 8 of 29   -Missing data.Although we do require all the objects our catalogue to have a proper motion measurement, about 5% lack one or more photometric measurements.Typically, the objects with missing values are either disregarded, or the missing values may be replaced via some of available imputation methods.One of the advantages of the PRF is that no special treatment of the missing data is necessary, a consequence of representing the feature measurements as PDFs.Both at the training and the prediction stages, an object with a missing value at a given feature will have a probability of 0.5 to propagate to either of the two tree nodes (member or non-member).This way the model is constructed only from the measured features, but at the same time allowing us to predict the class of objects with missing features without making strong assumptions about the dataset (Reis et al. 2019).
-Imbalance compensation.To construct the training set, the member list (500 sources) is paired to a sub-sample of the non-member catalogue (9679).In order to have a more balanced training set, we randomly resample the two classes, which can be done either by over-sampling the member list, undersampling the non-member list, or a combination of the two.For this task, we use the Python package imbalanced − learn (Lemaître et al. 2017).The keyword sampling_strategy may be used to control the amount of random over-or under-sampling.For example, sampling_strategy = 0.5 will either under-sample the majority class, or over-sample the minority class so that the majority class contains twice as many objects as the minority class.In Table B.1 we show different combinations of the sam-pling_strategy parameters that we used to train the PRF algorithm.We first apply the oversampling to the minority class, A19, page 9 of 29 followed by the undersampling of the majority class.The choice of the parameters was such to keep the two classes roughly in the same order of magnitude range.
The selected combinations of imbalance compensation parameters, and feature options (parallax included or not) result in a total of 16 different PRF classifiers, whose details are listed in Table B.1.

PRF evaluation and comparison
Cross-validation is used to validate the different PRF classifiers, and also to compare their performance.Here, the main training sample (marked A1 to H2) for each classifier is shuffled, and then sub-divided into the test and the train sample, with a proportion 25%:75%, respectively.We employ stratified splitting, which ensures the relative class frequencies to remain approximately preserved.For each training sample (A1 to H2), we generate 50 random test and train (25 and 75 division) split samples, and re-run the PRF algorithm for each of them.
To compare the performance of different PRF classifiers, four different metrics are used.These are the F1 score, the area under the receiver operating characteristic curve (ROC_AUC), the area under the precision recall curve (PR_AUC), and the Matthews correlation coefficient (MCC; Yule 1912;Matthews 1975).The uncertainties for each score are calculated as the standard deviation of the 50 values obtained through splitting.The results are given in Table B.1 and shown in Fig. B.1.Evidently, several classifiers perform similarly well; for the remainder of the paper we show the results from the run F1, stressing that this choice does not affect the scientific results of the paper when comparing with those obtain by using, for example, the run F2 or C1/C2.
In Fig. C.1 we show the relative importance of each feature as returned by the classifier in run F1.We note that the feature importance looks similar for all the runs.The uncertainties shown in the plot correspond to the standard deviation of the 50 split values.As may have been expected, the features that are most informative for the classifier are proper motions, in particular the one in right ascension, whereas parallax provides very little information.NIR photometry and colours appear to be more significant than the optical ones.This may partially be due to the fact that the more massive stars do not appear in the optical catalogues (see Fig. 7).

Spatial and kinematic properties of candidate members
To study different properties of the candidate members of NGC 2244 and other structures in its vicinity, we focus on the high-probability members from one of our top-ranked runs, F1.
The classifier F1 has been trained on the entire main training sample and then used to evaluate membership probabilities for unlabelled part of the catalogue.Figure 9 shows membership probabilities of all the objects in the area within which the training set was selected (circle of 20 radius, centred at HD 46150).The histogram shown in the inset contains the probabilities derived for the objects labelled as members in the training set, with 99% of the objects having probabilities ≥0.83 .We therefore decide to apply the same cut to select the probable This selection contains 2974 probable members over the entire field.Inside the r = 20 circle (black dashed line), we find 1121 candidate members, of which 626 are new, i.e. not found in the training set member list.These numbers will naturally differ for various runs, but we do see a significant overlap, especially for the runs with the best scores (B, C, E, F), where the overlap is on a level of >95%.

Identification of different structures on the sky
Along with the candidates selected here (orange dots), in Fig. 10, we show the locations of the previously identified young clusters and groups, marked by the green crosses (Phelps & Lada 1997;Roman-Zuniga 2006;Poulton et al. 2008;Román-Zúñiga & Lada 2008;Cambrésy et al. 2013).The black dots show the YSOs selected from mid-infared and X-ray data (Broos et al. 2013;Cambrésy et al. 2013).The central part is clearly dominated by the cluster NGC 2244, with about 30% of candidate members being concentrated in a circular region with r = 15 around its centre.Slightly to the west of NGC 2244, we find the young cluster NGC 2237, whose central position as reported by Román-Zúñiga & Lada (2008) appears slightly offset from the highest concentration of candidates seen in our selection.To the south-east of NGC 2244, we find a more extended region of star formation containing various groups identified in the aforementioned works.These clusterings appeared as distinct structures in the early selections (Phelps & Lada 1997;Roman-Zuniga 2006;Poulton et al. 2008;Román-Zúñiga & Lada 2008), but here their borders are blurred, and it seems that they constitute a rather extended region of star formation, associated with the molecular cloud filaments.This is probably due to the limited depth of the earlier surveys, corroborated by the extended MIR and X-ray population from Broos et al. (2013) which also appears fairly continuous.We note that their selection does not extend over the entire region studied here (e.g.PL01, PouD/PL03, PL07, REFL09 were not included in their study). To

Distance to the region
To estimate the distance to the Rosette Nebula complex, we employ the maximum-likelihood procedure as in Mužić et al. (2019) and Cantat-Gaudin et al. (2018), using the list of likely members (probability ≥0.8) from the run F1.We calculate the distances to the entire region, to clusters NGC 2244 and NGC 2237 separately, as well as to the south-east concentration of members.The NGC 2237 members were selected from a circle centred at the position given in Román-Zúñiga & Lada (2008) with a radius of 10 and having the Φ angle values between 60 • and 120 • (see Sect. 4.5).The members of NGC 2244 were selected from a circle centred at (α, δ) = (06:31:57.52,+04:54:37.5)and with a radius of 18 (see Sect. 4.3), excluding the likely NGC 2237 members.The likely members in the south-east region were selected within a rectangular region with 06:33:00 < α < 06:35:00 and 04:15:00 < δ < 04:40:00.
Based on the observations of quasars, it has been shown that Gaia EDR3 parallaxes may contain systematic offsets from the expected distribution around zero, which depend in a nontrivial way on the magnitude, colour, and ecliptic latitude of the source (Lindegren et al. 2021).To estimate the parallax bias, we use the publicly available Python package 4 based on the prescription given in Lindegren et al. (2021).The values of the bias 4 https://gitlab.com/icc-ub/public/gaiadr3_zeropointA19, page 11 of 29 Notes. (a) From Gaia EDR3 parallaxes. (b) From Gaia EDR3 parallaxes, including the correction of the parallax bias.
show a distribution ranging between −0.05 and 0.02 mas, with a peak at −0.03 mas.By sampling from the obtained parallax bias distribution, we re-run the maximum-likelihood procedure 100 times, each time subtracting a different value of the bias.The distance and the corresponding uncertainty are then calculated as a weighted mean and standard deviations of the 100 iterations.
The results are given in Table 3.The first column is showing the distance obtained using the maximum likelihood method directly on the Gaia EDR3 parallaxes, while the second column shows distances after applying the parallax bias correction.It is interesting to note that, while NGC 2244 and the south-east region appear to be at a similar distance, the cluster NGC 2237 seems to be located some 90 pc behind.This is significantly larger than the extent of NGC 2244 in the plane of the sky (diameter ∼16 pc; see Sect.4.3).
All the distances obtained here are in agreement with those reported in Kuhn et al. (2019, 1550 +100  −90 pc) and Lim et al. (2021, 1400 ± 100 pc).For the remainder of this paper, we assume the distance of 1500 pc.

Radial distribution and the extent of NGC 2244
To determine the radial profile of NGC 2244, we first calculate a stellar surface density distribution using a two-dimensional Gaussian kernel density estimator (KDE).The maximum of this distribution is taken as the centre of the cluster, and is located at the position (α, δ) = (06:31:57.52,+04:54:37.5).The radial profile, shown in Fig. 11, is calculated from the number of stars within the concentric annuli around the central position, divided by the area of the respective annulus.At radii r 12 , stars belonging to NGC 2237 and the south-east overdensity start to affect the radial distribution, and should be excluded from the calculation.Filled black circles in Fig. 11 show the radial profile with these two regions masked, while the blue diamonds consider all the stars.Outwards from r = 18 the density distribution flattens out, which we take as an indication of reaching the cluster outer radius.This radius corresponds to ∼8 pc at a distance of 1500 pc, and is marked with the vertical dotted line.To fit the radial distribution shown in Fig. 11, we take a generalised profile known as Elson-Fall-Freeman (EFF) profile (Elson et al. 1987) in the following form: where r stands for the projected distance from the centre of the cluster, Σ 0 is the surface density at its centre, and a is a scale parameter.The core radius, r c of the King profile (King 1966) is then given by The best fit profile, obtained from the black points excluding the rightmost three, is shown as an orange dashed line in  (Elson et al. 1987, see text for details).The last three points, where the profile starts to flatten out have not been included in the fit.The vertical dashed line represents the core radius derived from the profile fitting, and the dotted one marks the approximate outer radius of the cluster (r = 18 ).

Stellar kinematics
In this section we study the two-dimensional internal motions of the stars, relative to the mean motion of the cluster NGC 2244.The error-weighted mean proper motion was calculated using the sources within the 12 radius from the centre of NGC 2244, a conservative radius that ensures most of the selected sources would indeed be members of the cluster.The mean proper motion was subtracted from all the stars in the region.In this way we can study the internal kinematics of NGC 2244, as well as relative motions of other young stellar groups with respect to NGC 2244.We then correct for the so-called perspective expansion (or contraction), an effect that is caused by the projection of the cluster's radial motion at the different positions of cluster members (i.e. the further away a star is from the centre A19, page 12 of 29 For the centre of the stellar distribution, we take the position (α, δ) = (06:31:57.52,+04:54:37.5),obtained from the maximum of the two-dimensional stellar density distribution, and calculated using a two-dimensional KDE.We find that the effect is generally small, ∼0.03 mas yr −1 at the distance of 1 • from the centre, and ∼0.05 mas yr −1 at the largest relative distances present in our dataset.In Fig. 12, we show the relative proper motions of all the selected sources, with respect to the mean motion of NGC 2244.The positions of stars are marked by black dots, with a bar extending from each dot showing the direction of the relative motion.The bars are also colour-coded according to the direction on the sky.In the region occupied by NGC 2244 (see Fig.   10), which do seem to have a common preferred motion towards the south-east.

Expansion of NGC 2244
To study in more details the internal kinematics of NGC 2244 and its immediate surroundings, we calculate the angle Φ, defined as an angle between the star's relative proper motion vector and the line that connects it with the centre of the cluster (see Lim et al. 2021).Φ = 0 • corresponds to proper motion vectors pointing away from the cluster centre, and Φ = 180 • towards it.Fig. 13 shows the relative proper motion vectors on top of an optical image of the region.The vectors are colour-coded according to the value of the angle Φ, with purple and green hues marking the stars moving radially away from the centre, or towards it, respectively.In this representation, the outward radial motion of a large number of members of NGC 2244 can easily be appreciated.In Fig. 14, we show the density distribution of the angle Φ, for several angular distance bins from the cluster centre, as indicated by black circles in Fig. 13.The density has been calculated with the KDE, using a Gaussian kernel with bandwidth = 10.The shaded areas show the 1σ confidence interval, obtained through a Monte Carlo (MC) simulation with 1000 iterations, in which the angle Φ was varied within the uncertainties for each object.At all radii, except for the innermost and the outermost part which are fairly flat, we see a pronounced peak at low Φ values, corresponding to the purple-hued sources in Fig. 13.
Here we must note that in the colour representation of Fig. 13, the white colour represents the motion perpendicular to the line that is connecting a star to the centre of NGC 2244, however, not in a vectorial form, meaning that diametrically opposite motions may be represented by similar colours.This is not the case for NGC 2237, where we see close-to-white shading only for the sources moving southwards.The other apparent group within the 20 -30 annulus, contains sources moving in both directions (see also  In Fig. 15 we show the radial component of the relative proper motion vector (µ r ) as a function of the distance from the cluster centre.µ r has been calculated as a projection of the relative proper motion vector onto the vector connecting the centre of the cluster with the star in question.
Positive value of µ r means that the radial component of the proper motion is directed away from the cluster centre, while the negative value signals the opposite behaviour.Stars were binned by radial distance with the bin size of 2 , and the weighted mean and the corresponding standard deviations of µ r for each bin are shown in red.We can clearly see that µ r steadily increases as we move further out from the cluster centre up to the projected distance of ∼14 , providing evidence of radially dependent expansion velocity.The trend, however does not hold at the edges of the cluster.We note that this is not caused by the members of NGC 2237, which have been masked (see Sect. 4.3) for this part of the analysis.We perform two weighted linear regressions to the red points in Fig. 15, one including only the stars within r = 14 (6.1 pc at 1500 pc), and the other over the entire extent of the cluster (r < 18 ; 7.8 pc).For the two fits, we obtain the following slopes: r < 14 : 0.024 ± 0.003 mas yr −1 arcmin −1 = 0.39 ± 0.05 km s −1 pc −1 , r < 18 : 0.008 ± 0.009 mas yr −1 arcmin −1 = 0.13 ± 0.15 km s −1 pc −1 .Previously, Kuhn et al. (2019) have also detected a similar trend of increase of the expansion velocity with radius, out to the radius of 11 , deriving the slope of 0.6 ± 0.1 km s −1 pc −1 , A19, page 14 of 29 1, but the relative proper motions coloured according to the angle between star's relative proper motion vector and the line that connects it with the centre of the cluster.The purple hues mark the stars with a predominant motion away from the cluster centre, and the green towards it.We note that while near-white sources move close to perpendicularly to the line that connects them with the centre of the cluster, they may be moving in two opposite directions with respect to each other.The concentric circles correspond to the radii of (2.5, 5, 10, 20, 30) from the cluster centre (white star), used to plot the distribution of angle Φ in Fig. 14.
which is somewhat larger than the one derived here, and only marginally consistent.We note that, when repeating the fit using the same radius as in Kuhn et al. (2019), the value remains essentially unchanged with respect to the one obtained for the radius of 14 .
The median of the distribution of µ r out to r = 18 (Fig. 16) peaks at 0.138±0.015mas yr −1 , which is equivalent to v r = 1.0± 0.1 km s −1 at a distance of 1500 pc.This value is in agreement with the result 1.23 ± 0.17 km s −1 reported by Kuhn et al. (2019) for a distance of 1550 pc (or 1.19 ± 0.16 km s −1 when scaled to the distance considered here).

SED fitting
The spectral energy distributions (SEDs) were constructed using the optical to mid-infrared photometry, and the SED fitting was performed with the help of the Virtual Observatory SED Analyzer (VOSA; Bayo et al. 2008).VOSA compares the observed fluxes with the synthetic photometry at some assumed distance for each object (fixed to 1500 pc), looking for the best fit effective temperature (T eff ), extinction (A V ), and surface grav- Solar value.We use the BT-Settl models (Allard et al. 2011), and probe T eff over the range offered in VOSA (400-70 000 K), with a step of 100 K, and A V between 0 and 5 mag, with a step of 0.25 mag (automatically determined by VOSA).The log(g) was varied between 3.5 and 5.0, which is suitable for young cool very low-mass stars, and the field stars.We note, however, that the SED fitting procedure is largely insensitive to log(g), resulting in flat probability density distributions over the inspected range (see e.g.Bayo et al. 2017).In Fig. E.1, we show a subset of SED fits, for a wide range of fitted temperatures, with (bottom panels) or without (top panels) infrared excess.

Luminosity function
Figure 17 exhibits the luminosity function in form of a histogram of absolute magnitudes in Gaia G-band, and a one-dimensional Gaussian KDE with Silverman bandwidth (Silverman 1986).The lower histogram shows high probability members over the entire region, while the upper histogram contains only those belonging to NGC 2244.Here we include only the sources within our estimated cluster radius of r = 18 (Sect.4.3), while at the same time excluding possible members of NGC 2237 (as described in Sect.4.2).To obtain the absolute magnitudes, we correct the apparent one by the distance modulus corresponding to the distance of 1500 pc, and the extinction derived in Sect.5.1, assuming G = 0.789A V (Wang & Chen 2019).The luminosity function of NGC 2244 shows a pronounced dip at M G ≈ 5 mag, probably corresponding to the Wielen dip, a feature often observed in luminosity functions of open clusters, and possibly caused by the changes in the atmospheric physics with mass (Wielen 1974;Kroupa et al. 1990;Jeffries et al. 2001;Naylor et al. 2002).As discussed in Guo et al. (2021), the Wielen dip appears at M G = 5 mag for an age of 1 Myr, but by 10 Myr it shifts to M G = 7 mag, where it stabilises.Observations of the Wielen dip at M G ≈ 5 mag speaks in favour of the young age of the Rosette population, which will be further discussed in the following sections.
On the other hand, the age dating using the so-called Hpeak (Guo et al. 2021) yields a somewhat older age of ∼5 Myr, assuming that the smaller peak seen at M G ≈ 1.3 mag corresponds to this feature.The H-peak was first reported in Piskunov & Belikov (1996), and is thought to be caused by the change of the mass-luminosity relation at the transition from the pre-main sequence to the main sequence, which creates a local A19, page 16 of 29 peak in the stellar number density at a given luminosity.As the mass of the stars arriving at the main sequence gets lower with time, so the H-peak also evolves towards fainter magnitudes.

Mass and age derivation from the HR diagram
Using the T eff and A V obtained in Sect.5.1, and a distance of 1500 pc, we can construct a HR diagram of the region (Fig. 18), for the objects with membership probabilities ≥80% (run F1).We overplot the PARSEC evolutionary tracks (Bressan et al. 2012;Pastorelli et al. 2020) for the ages between 0.1 and 30 Myrs.The majority of the selected objects is consistent with ages below 10 Myrs, speaking in favour of the robustness of our selection method.The hotter stars appear, on average, older than the cooler stars, an issue that has been repeatedly reported in the past (e.g.Pecaut et al. 2012;Prisinzano et al. 2019).Thus, stars with T eff > 6000 K are excluded from the calculation of the mean age of the region.
To derive the age and the mass of the objects shown in the HR diagram, we use interpolation on the PARSEC isochrones with ages between 0.1 Myr and 100 Myr, with a step of 0.05 dex.To assess the uncertainties of the derived parameters, we perform a MC simulation by varying the T eff and M J within their respective uncertainties, assumed to be 100 K for the T eff (SED fitting step).The uncertainties for M J include the photometric errors, as well as the uncertainty of 0.25 mag in A V , added in quadrature.The distance uncertainty is not included in this calculation.The derived mass and age distributions for each object typically do not follow a normal distribution, and may be highly asymmetric.In case of the mass, we save the derived distributions (see an example in the Appendix Fig. E.2), which are later used to draw random samples for the derivation of the Initial Mass Function (IMF).In case of the age, we save the medians of 100 MC values for each object.

Ages of various structures in the region
As is commonly seen in HR diagrams of star-forming regions, the objects in the Rosette Nebula show a significant span in ages, from below 0.5 Myr to about 10 Myr.To search for potential structure in the age distribution of the studied region, in Fig. 19 we plot the ages of all the probable members with T eff < 6000 K in the form of a two-dimensional, binned and Gaussian-kernelsmoothed on-sky map, colour-coded according to the mean age in each bin, and with the bin size of 6 × 6 .The mean age of the entire region is 1.6 ± 0.5 Myr, derived as the average, and the standard deviations of the ages of all the probable members with T eff < 6000 K. Here, it is important to keep in mind that the quoted uncertainty is statistical, and does not include potential systematics related to the use of a particular set of models, nor the uncertainties inherent to models at these young ages.The mean age of NGC 2244 (within the radius of 18 ) is 1.3 ± 0.4 Myr, which is close to the typically assumed age of the region of ∼2 Myr (Hensberge et al. 2000;Park & Sung 2002;Dias et al. 2002;Bell et al. 2013;Wareing et al. 2018), although slightly A19, page 17 of 29 younger, and at odds with estimates by Kharchenko et al. (2013, ∼5 Myr), andCantat-Gaudin et al. (2020, ∼12 Myr).As discussed in Appendix F, a comparison of our selection with that used in the latter work shows that their member list may suffer a significant degree of contamination, which in turn can make the cluster appear older than it really is.
From Fig. 19, it appears that the region to the south-east of NGC 2244 contains on average the youngest stars in the entire Rosette Nebula.For example, the region centred at REFL08 with a radius of 10 has the mean age of 1.0±0.4Myr.This is in agreement to findings of Poulton et al. (2008) and Cambrésy et al. (2013), who report a lower concentration of Class I sources associated with the cluster NGC 2244 than in the regions associated with the molecular cloud (south-east of NGC 2244), which should therefore be younger.
The mean ages of our high probability candidates potentially associated with the clusters Collinder 106 and 107, calculated for the sources within the dotted circles in Fig. 19 are 3.3 ± 0.7 Myr and 4.0±0.9Myr, respectively.However, the derived ages should not be taken as representative for these two clusters, because our algorithm picks up only a fraction of their reddest members, which can be clearly seen when compared to the selection of Cantat-Gaudin et al. (2020), who report ages of 18 Myr and 24 Myr for Collinder 107 and 106, respectively.According to Kharchenko et al. (2013), Collinder 106 has an age of ∼3-5 Myr, while the Collinder 107 appears slightly older (∼15 Myr).On the other hand, Dias et al. (2002) report an age of 10 Myr for Collinder 107, and an old age inconsistent with the open cluster nature for Collinder 106.

Properties of the cluster NGC 2244
In Sects.4.3 and 4.5, we determined the radial extent of NGC 2244, and studied its internal kinematics, indicating expansion.In Sect.5.3, we derived masses and ages for all the probable members across the Rosette Nebula complex.This section aims at further characterisation of NGC 2244, by the derivation of its mass function, assessment of mass segregation, and finally discussion on its dynamical state, including comparison to numerical simulations.

Initial mass function
In this section we derive the IMF of NGC 2244 (within a circle of 18 from the cluster centre), for masses in the range from 0.2 M to 20 M .NGC 2237 members have been removed as described in Sect.4.2.The mass of each star is represented by a distribution derived in Sect.5.3.Each of these distributions is smoothed by a Gaussian KDE (using the Silverman rule to determine the kernel), and used to extract a random sample of N 1 values per object, constructing thus N 1 realisations of the mass distributions for the entire sample.For each of these realisations, we perform N 2 bootstraps (random samplings with replacement), resulting in N 1 × N 2 mass distributions.Each distribution is then binned onto the same grid, and the final value and its uncertainty for each bin is calculated as mean and standard deviation of N 1 × N 2 values.The bin sizes have been selected as to contain a high number of elements (typically 70-120), except for the last three high mass bins which are more sparsely populated (10-20 stars).We set Figure 20 shows the derived IMF in two commonly used forms, dN/dM (upper panel), and dN/d(logM) (lower panel), along with the standard IMFs from Kroupa (2001) and Chabrier (2005).We fitted the IMF shown in the upper panel with two The solid orange lines in the top panel show a power law fit with a break ∼1.5 M , with slopes α = 1.05 ± 0.02 at the low-mass side, and α = 2.26 ± 0.25 for the massive part.Two standard mass functions are shown as dotted (Kroupa 2001), and dashed (Chabrier 2005) lines, normalised to match the total number of objects that entered the IMF derivation.The vertical blue dashed line marks the completeness limit.power-laws in the form dN/dM ∝ M −α , with a break at ∼1.5 M (solid orange lines), and obtain the slopes α = 1.05 ± 0.02, and α = 2.26 ± 0.25 for the low-mass and the massive part, respectively.The high-mass slope is very close to the standard Salpeter slope (α = 2.35; Salpeter 1955), and in agreement with the slope α ∼ 2.1 derived for X-ray-selected members with masses >0.5 M (Wang et al. 2008), and α = 2.4 ± 0.1 derived by Lim et al. (2021) for the masses above 1 M .At the low-mass side, we can compare the slope α with the results in nearby star-forming regions and other young clusters that have been derived below 1 M : NGC 1333 (0.9-1) and IC 348 (0.7-0.8; Scholz et al. 2013), Chamaeleon-I and Lupus 3 (0.8 ± 0.1; Mužić et al. 2015), RCW 38 (0.8 ± 0.1; Mužić et al. 2017).The slope derived here for NGC 2244 is in agreement with these different regions, especially considering that all these results include systematics that are difficult to take into account, such as the age uncertainties, choice of models and the extinction law, or mass range to calculate the fit.Finally, both slopes of the IMF are in agreement with our earlier results which focussed only on a narrow region close to the centre of NGC 2244 (Mužić et al. 2019).
In the dN/d(logM) representation, the IMF of NGC 2244 is fairly flat over the mass range 0.2-1.5 M , which has already be noted in Mužić et al. (2019), and also reported in A19, page 18 of 29 Damian et al. (2021), over the mass range 0.1-1 M .A similar behaviour has also been observed in the cluster IC 4665 over the range 0.1-1 M (Miret-Roig et al. 2019), and NGC 2362 for the masses ∼0.06−1 M (Damian et al. 2021).

Mass segregation
To inspect potential mass segregation in NGC 2244, we perform an MC analysis similar to that in Sect.6.1, sampling the masses from individual mass distributions.At each step, a cumulative distribution of mass as a function of the projected distance from the cluster centre is saved, and finally a mean value and standard deviations are calculated for each value of projected distance.Fig. 21 (left panel) contains the results of this procedure for two mass bins: stars above 1 M are represented by the orange line, and those below by the black one.We see clearly that the more massive stars are preferentially found closer to the cluster centre, indicating mass segregation.The Anderson-Darling twosample test (Anderson & Darling 1952) returns p-values lower than 0.001 for the means of the two distributions, suggesting that they cannot be drawn from the same parent population.
Another way to probe the mass segregation in a cluster is by using the Λ MSR parameter (Allison et al. 2009;Maschberger & Clarke 2011).It is determined with the help of the minimum spanning tree (MST), a graph constructed by drawing lines between pairs of points in a distribution in a way to connect all the points with the shortest possible total path length and no closed loops.Λ MSR is obtained by dividing the median edge length of the MST of randomly chosen N MST stars, with the median edge length of N MST most massive stars.A region without any mass segregation would have Λ MSR ∼ 1, as opposed to Λ MSR > 1 which indicates mass segregation, or Λ MSR < 1, signalling an inverse effect with respect to mass.We run an MC simulation similar to that from Sect.6.1, including resampling and bootstrapping to obtain 10 4 mass distributions.For each mass distribution, we vary N MST between 20 and 250 (the latter roughly corresponds to 1 M ), which allows us to calculate the mean Λ MSR and the corresponding standard deviation.The MSTs were constructed using the Python package MISTree (Naidoo 2019).In the right panel of Fig. 21 we see that Λ MSR has values consistent with, or just slightly higher than unity.Since it has been shown that even some simulated random distributions of stars can yield |Λ MSR | > 1 (Parker & Goodwin 2015), values of |Λ MSR | > 1.5 − 2 are typically required in the literature to unambiguously detect mass segregation (e.g.Dib et al. 2018;Parker 2018).At face value, this would indicate that no significant mass segregation is present in NGC 2244, contrary to what is seen in the left panel of Fig. 21.The reason for this may lie in the fact that, even if the more massive stars are centrally concentrated, so are the other stars in our cluster, and therefore a random choice of stars will favour short branch distributions.It has in fact been demonstrated by Parker & Goodwin (2015) that when the location of the most massive stars coincides with the regions of the highest surface density, Λ MSR tends to only marginally signal mass segregation.Furthermore, although the massive stars may be more centrally concentrated, some of them are still present in the outer parts of the cluster, leading to longer branches, increasing the average value for the massive stars, and thus pushing the ratio closer to unity.
Mass segregation in NGC 2244 has previously been reported by Chen et al. (2007), who observe a similar effect separating the stars according to their apparent magnitude.They divide the sample at m B = 13 mag, which roughly corresponds to 2-3 M at the distance of 1500 pc and the age of 2 Myr (and additionally depends on the extinction).At the same time, they do not observe any velocity -mass correlation, speaking in favour of primordial mass segregation.On the other hand, Wang et al. (2008), report no evidence of mass segregation, by comparing the cumulative distributions of stars with estimated masses <2 M with those with masses >8 M .Clearly, the three works use different mass intervals to determine whether stellar masses may be segregated.Changing the limit mass to 2 or 3 M as in Chen et al. (2007), we still observe the same behaviour in the cumulative mass distributions.However, the comparison with Wang et al. (2008) is more difficult as the low number of high mass stars, along with the uncertainties in their mass determination result in very wide confidence intervals.
To summarise, the two indicators of mass segregation applied in this work return somewhat contradictory results.The same is found by comparing different sources in the literature.We conclude that, while there may be some degree of mass segregation in NGC 2244, it does not seem to be very pronounced.

Velocity dispersion
One-dimensional velocity dispersion σ 1D can be obtained from where σ µα and σ µδ mark the velocity dispersion in right ascension and declination, respectively.The latter can be calculated as weighted standard deviations of the measured proper motions.
In order to obtain a robust measurement of the velocity dispersion, the proper motions are resampled 10 4 times, by taking into account the proper motion uncertainties and their correlations, as well as bootstrapping.For NGC 2244 (r < 18 ) we obtain σ 1D = 0.201 ± 0.016 mas yr −1 , equivalent to 1.43 ± 0.11 km s −1 .This estimate is in between two previous determinations of onedimensional velocity dispersion from the Gaia DR2 data, σ 1D = 1.94±0.10km s −1 (Kuhn et al. 2019), and σ 1D = 1.1±0.9km s −1 (obtained from proper motions) or σ 1D = 1.4 ± 1.2 km s −1 (with radial velocity included) from (Lim et al. 2021).In both cases, the cited results have been re-scaled to account for the difference in assumed distances.To facilitate the comparison with the numerical simulations, which will be presented in Sect.6.3.4,we also calculate the inter-quartile range of the velocity dispersion (σ IQR ; Eq. ( 8) in Parker et al. 2014).The obtained value is almost identical to the value of σ 1D (see Table 4).The total stellar mass of NGC 2244, estimated from an MC simulation equivalent to the one used to derive the IMF, is 1000± 70 M5 , and the half-mass radius r h = 7.9 ± 0.5 , equivalent to 3.4 ± 0.2 pc at a distance of 1500 pc.The velocity dispersion σ vir of a virialised cluster with the mass M and half-mass radius r h can be obtained from where η is a constant that depends on the radial profile of the cluster.For η = 10, corresponding to a Plummer model, this yields σ vir = 0.36 ± 0.02 kms −1 .Taking η = 9, which may be more appropriate for a cluster with γ = 3 (Portegies Zwart et al. 2010), we obtain σ vir = 0.37 ± 0.02 kms −1 .Even for very small values (η = 5), the virial velocity dispersion remains significantly smaller than σ 1D .Placing it in the context of other young clusters in Fig. 17 of Kuhn et al. (2019), which shows the dependence of the virial velocity dispersion vs. the observed one for several Milky Way's young clusters, NGC 2244 appears well above the line indicating zero total energy, suggesting that it may be unbound.The pattern of expansion seen within the half-mass radius (Fig. 15), in principle supports this conclusion.However, numerical simulations following dynamical evolution of young star-forming regions by Parker & Wright (2016), show that the measured velocity dispersion may not be a good diagnostic of the current virial state of these systems, as most clusters in their simulations appear supervirial after a short time, independently of the initial conditions.On the other hand, regions that are significantly supervirial from the start display a velocity dispersion that is well in excess of the virial one (σ 1D /σ vir ∼ 4 in their simulations) at very early times (<2 Myr).This may be the case of NGC 2244, and will be further discussed below.The number of sources in the simulation (1500) by Parker & Wright (2016) is comparable to NGC 2244 (∼1000).The current density of NGC 2244 is significantly lower than that of the simulated clusters, which have a radius of only 1 pc.However, given that the cluster crossing time seems to be comparable to its age, it may as well have expanded to its current size.

Relaxation timescale
A relaxation timescale of a cluster due to two-body interactions can be estimated as where N is the total number of stars in the cluster, and t cross ≈ r h /σ 1D is the cluster crossing time (Binney & Tremaine 2008).
Using the half-mass radius and one-dimensional velocity dispersion calculated above, along with N = 1055 (high probability members within the 18 radius), we obtain t cross ≈ 2.3 ± 0.2 Myr, and t relax ≈ 35 ± 3 Myr.The relaxation time is an order of magnitude longer than the estimated age of the cluster, signalling that the potential mass segregation (see Sect. 6.2) cannot be a consequence of dynamical relaxation.

Tidal stability
A static tidal field enforces an upper limit on the size of a stellar system, by stripping the stars outside of its Jacobi (or Hill) radius (Binney & Tremaine 2008).If a cluster is to survive the Milky Way's tidal field, it should be smaller than its Jacobi radius: where M is the total mass of the system, and A and B are the Oort's constants at its Galactic position.Using the relations given in Piskunov et al. (2007), we obtain A ≈ 12 km s −1 kpc −1 and B ≈ −10.5 km s −1 kpc −1 .For the total mass calculated above, we obtain the Jacobi radius r J ≈ 15.9 pc, which is approximately twice the outer cluster radius determined in Sect.4.3, indicating that the cluster is not significantly influenced by the Galactic potential.

Structural analysis
In Sect.6.3.1, we compared the velocity dispersion in NGC 2244 to the simulations by Parker & Wright (2016), and in Sect.6.2 we derived the mass segregation ratio Λ MSR .In this section, we derive several additional structural parameters that can be used for a comparison to the set of simulations presented in various papers of the same group (Parker et al. 2014;Parker & Wright 2016;Wright & Parker 2019), and thus try to set some constraints on the formation and early evolution of NGC 2244.For this analysis, we limit the cluster extent to 2 halfmass radii (∼15 ), in order to avoid including the members of NGC 2237.
-Q-parameter.First introduced by Cartwright & Whitworth (2004), the scale-free Q-parameter distinguishes between a smooth large-scale radial density gradient and multiscale fractal subclustering.Q-parameter is determined as ratio of the normalised mean length of the MST (m) and the normalised mean edge length of the complete graph (s).For NGC 2244, we obtain m = 0.57, s = 0.66, and Q = 0.86.This rather A19, page 20 of 29 moderate value of Q indicates neither a substructured nor a very centrally concentrated distribution.It may correspond to a dynamically young structure (Q roughly constant from the beginning), or may have started off with a smaller value of Q, followed by an episode of violent relaxation (Parker et al. 2014;Parker & Schoettler 2022).However, as will be shown below, the latter can probably be excluded as it presumes low to moderate values of the virial ratio.
-The local density ratio Σ LDR is the ratio between the median surface density of the N most massive stars, and the median surface density of all the stars in the region (Parker et al. 2014;Küpper et al. 2011).The surface density of each star is defined as Σ = (N − 1)/(πr 2 N ), where r N is the projected separation from the Nth closest neighbour.The value of Σ is fairly insensitive to the choice of N; we set N = 10.We obtain Σ LDR = 1.06 ± 0.40.This result indicates that no mass segregation is present, which is consistent with the result using Λ MSR (Sect.6.2).
-Velocity dispersion ratio σ VDR is defined as the ratio between the velocity dispersion of the N most massive stars and the velocity dispersion of all stars in the sample (Wright & Parker 2019).One-dimensional velocity dispersion have been determined as described in Sect.6.3.1.We measure σ VDR for values of N between 10 and 50, in steps of 5. We observe that σ VDR generally increases with N, but the values are always maintained below 1.For N = 10, the value of the ratio is σ VDR = 0.50 ± 0.18, whereas for N = 50 we obtain σ VDR = 0.85 ± 0.12.
We can now try to compare the above obtained parameters with the N-body simulations of star clusters from Parker et al. (2014), Parker & Wright (2016), Wright & Parker (2019).In the series of papers, the authors use the same simulations of clusters with 1500 members distributed within a sphere with an initial radius of 1 pc, starting with several values of the initial virial ratio (subvirial, virial, supervirial), and the degree of substructure (low, moderate, high).We can compare the calculated parameters for NGC 2244, ∼2 Myr old cluster, with predictions of numerical simulations.
-Evolution of Q with time.Subvirial initial conditions are clearly excluded as they result in much larger values of Q at 2 Myr.Both virial and supervirial initial conditions may be compatible with the derived value of Q, with the real value of the virial ratio lying somewhere between the extremes used in the simulations.For the virial initial condition, there is a preference towards higher fractal dimensions, i.e. no presence of significant substructure.
-Evolution of Λ MSR with time.Similar to the previous item, the subvirial initial conditions are clearly excluded, while the supervirial ones are preferred, independently of the degree of substructure.
-Evolution of Σ LDR with time.Both a high degree of substructure, and subvirial initial conditions can be excluded.The calculated values are best represented by the case of high virial ratios and high fractal dimensions (no substructure).
-Q−Λ MSR and Q−Σ LDR .As expected from the behaviour of the three mentioned parameters with time, the two relations show preference towards high virial ratios, together with moderate-tohigh fractal dimensions.
-Q − σ IQR and Σ LDR − σ IQR .Comparison in the planes containing velocity dispersion is slightly complicated by the fact that the initial velocity dispersion in simulations increases with the total number of stars in the simulation (Parker & Wright 2016).However, the value of σ IQR obtained for NGC 2244 is generally higher than that of the simulations (but still consistent within the errors), although the number of stars in the cluster is ∼30% lower than that in the simulations.Thus, we can basically make similar observations as when looking at the evolution of Q and Σ LDR with time, and conclude that the best match seems to be provided by the supervirial case, with large fractal dimension.
σ VDR .Independently of the value of N, we obtain σ VDR < 1, i.e. the most massive stars move on average slower than the low-mass stars.This excludes a region that was initially highly substructured, independently of the virial state.
In conclusion, the numerical simulations by Parker et al. indicate that NGC 2244 may have undergone a hot collapse, i.e. formation under supervirial conditions, with a low level of substructure which has not been significantly altered since the formation phase.

Summary and conclusions
In this work, we study the 2.8 • × 2.6 • region in the Rosette Nebula, centred in the young cluster NGC 2244.
Starting from a catalogue containing optical to mid-infrared photometry from multiple sources, as well as accurate positions and proper motions from Gaia EDR3, we applied the PRF algorithm to derive the membership probability for each source within our field of view.Based on the list of almost 3000 probable members, of which about a third are concentrated within the radius of 20 from the centre of NGC 2244, we identified various clusters and concentrations of young stars.The masses, extinction, and ages were derived by fitting the SED to the atmosphere and evolutionary models, and the internal dynamics was assessed via proper motions relative to the mean proper motion of NGC 2244.Here we summarise the main results of the paper: -Our distance estimate to the overall Rosette Nebula stellar population is 1489 ± 37 pc, and to NGC 2244 1440 ± 32 pc, derived from Gaia EDR3 parallaxes.While the region of active star formation associated with the Rosette molecular cloud, located to the south-east of NGC 2244, appears to be located at a similar distance as the cluster, another, smaller cluster in the region, NGC 2237, seems to be located ∼90 pc behind it.-The region is clearly young, with the mean age derived from the HR diagram (1.6 ± 0.5 Myr) being consistent with previous estimates of ∼2 Myr.The young age is further supported by the position of the Wielen dip in the luminosity function at M G ≈ 5 mag.A region to the south-east of NGC 2244, associated with the molecular cloud, contains -on averagethe youngest stars in the Rosette Nebula.-The radial profile of NGC 2244 was fitted with the EFF profile (Elson et al. 1987), which returns the peak stellar surface density Σ 0 = 6.4 ± 0.7 stars arcmin −2 , and the (King profile) core radius r c = 2.0 ± 0.4 pc.We estimate that the cluster extends out to a radius of ∼18 arcmin, which is equivalent to ∼8 pc at a distance of 1500 pc.-NGC 2244 is showing a clear expansion pattern, with an expansion velocity that increases with radius, in agreement with previous studies (Kuhn et al. 2019).The median of the distribution of the radial component of the relative proper motion vector (µ r ) out to r = 18 is at 0.138 ± 0.015 mas yr −1 , which is equivalent to v r = 1.0 ± 0.1 km s −1 at a distance of 1500 pc.-The IMF is well represented by two power laws (dN/dM ∝ M −α ), with slopes α = 1.05 ± 0.02 for the mass range 0.2-1.5 M , and α = 2.3 ± 0.3 for the mass range 1.5-20 M , and it is in agreement with slopes detected in other star-forming regions.
A19, page 21 of 29 A&A 668, A19 (2022) -The velocity dispersion of NGC 2244 is well above the virial velocity dispersion derived from the total mass (1000 ± 70 M ) and half-mass radius (3.4 ± 0.2 pc).-The relaxation timescale is an order of magnitude larger than the estimated age of NGC 2244, suggesting that the cluster is still far from being dynamically relaxed.The numerical simulations by Parker et al. (2014), Parker &Wright (2016), andWright &Parker (2019) make predictions about the evolution of various structural and kinematic parameters with time, from which one can try to deduce the initial conditions of cluster formation.This comparison suggests that NGC 2244 may be unbound and that it may have possibly even formed in a super-virial state.Recently, Bonilla-Barroso et al. ( 2022) have compared numerical simulations of clusters forming in turbulence-dominated environments over multiple free-fall timescales, with those forming by a global hierarchical collapse over a much shorter timescale.They predict that stars in clusters that formed via the former mechanism should exhibit an inverse mass segregated velocity dispersion (massive stars have a larger velocity dispersion), as opposed to a flat distribution of velocity dispersion with mass in the latter scenario.Bonilla-Barroso et al. (2022) use Gaia EDR3 to show that the stars in the Orion Nebula Cluster exhibit a constant velocity dispersion as a function of mass, similar to what we found for NGC 2244.This suggests that the two clusters may have been formed by hierarchical collapse within one free-fall time of its parental cloud, rather than in a turbulence-dominated environment.

Fig. 1 .
Fig. 1.Proper motion uncertainty as a function of i-band magnitude and the maximum time difference between epoch used in proper motion calculation for each source.

Fig. 2 .
Fig. 2. Comparison between our final i-band magnitude and the i-band magnitude reported in Pan-STARRS.

Fig. 5 .
Fig. 5. Gaia EDR3 proper motions of bulk of the objects in the direction of NGC 2244 (grey dots).Black dots mark the proper motions of 688 candidate members from Bell et al. (2013, B13) and Meng et al. (2017, M17) (upper left), OB stars (upper right), strong Hα emitters from VIMOS spectroscopy (lower left), and objects bright in Hα from photometry (lower right).The orange ellipse marks the proper motion selection criterion for the construction of the member training set.The mean proper motion uncertainty is shown in the lower right corner of each panel.

Fig. 6 .
Fig. 6. r −i, r − Hα CMD showing all the sources in the field (grey dots), along with their mean colours and corresponding standard deviations (blue).The red-scale points show the colours of stars with measured pEWs from spectroscopy, coloured by its value.The sources above the black dashed line are selected as potential member candidates due to an enhanced Hα emission.This selection is further refined using the proper motions and position in CMDs.

Fig. 7 .
Fig. 7. Colour-magnitude diagrams used for the selection of the member training set.The grey dots in all panels show the full catalogue.Top panels: the proper-motion-selected candidates belonging to various categories (X-ray and IR excess, OB stars, Hα spectroscopy and Hα photometry; see text for details) are overplotted.The solid line shows the 2 Myr PARSEC isochrone, and the dashed lines the selection criteria.Bottom panels: the black points show the 500 finally selected members for the training set, based on the proper motion, colour, and parallax arguments.

Fig. 8 .
Fig.8.Parallax measurements for the member training set candidates, selected based on proper motions and colours (black dots).The dotted orange line represents the weighted average of the parallaxes, while the shaded area marks ±1σ range.The red crosses mark the sources that have been excluded from the final member training set (see text for the details of the applied criterion).

Fig. 9 .
Fig. 9. Membership probabilities from the run F1, limited to the objects within 20 radius from the stars HD 46150 for clarity.The inset histogram is showing the probabilities derived for the objects labelled as members in the training set.In both histograms, the values of the edge bins marked with an arrow extend outside of the plotting range; the corresponding bin values are shown on the plot.

Fig. 10 .
Fig.10.Planck 857 GHz image of the studied region, along with the candidates with membership probabilities ≥80% from the run F1 (orange dots), and the mid-infrared and X-ray selected YSOs fromBroos et al. (2013) andCambrésy et al. (2013, black dots).The black circle marks the 20 (∼8.7 pc) radius from the brightest star in the field where the training sample has been selected.Green crosses mark the positions of the young clusters and groups identified inPhelps & Lada (1997),Roman-Zuniga (2006),Poulton et al. (2008),Cambrésy et al. (2013).The dotted white circles are centred at the central positions of the clusters Collinder 106 and 107, with the radii containing half the members, as given in Cantat-Gaudin & Anders (2020).

Fig. 11 .
Fig. 11.Radial profile of the cluster NGC 2244 in logarithmic scale.The profile (black filled circles) has been calculated by averaging the number of objects contained inside annuli defined around the NGC 2244 centre, where both the region occupied by members of NGC 2237, and the overdensity of young stars to the south-east of NGC 2244 have been masked.The blue diamonds at larger radii show the values without any masking.The dashed orange line shows the best fit EFF profile(Elson et al. 1987, see text for details).The last three points, where the profile starts to flatten out have not been included in the fit.The vertical dashed line represents the core radius derived from the profile fitting, and the dotted one marks the approximate outer radius of the cluster (r = 18 ).
D.1 for a zoomed-in version), one can clearly notice a circular colour segregated pattern, caused by a large number of source moving away from the cluster centre, indicating expansion.A similar behaviour has previously been reported byKuhn et al. (2019) andLim et al. (2021).To the west of NGC 2244, a large number of sources shows a bulk motion in the southward direction.These stars are likely the members of NGC 2237, and separate from NGC 2244.The zoomed-in version of the plot concentrated on NGC 2237 is shown in the right panel of Fig. D.2.Towards the east edges of Fig. 12, we note a region in the lower left corner where relative motions appear to have random orientations, as well as the region occupied by members of Collinder 106 and 107, whose members seem to have a largely coherent motion pointing north-east.Taking a closer look at the kinematics of the south-east concentration connecting the mid-infrared clusters (Fig. D.2 left), we see that the relative proper motions appear randomly distributed, except for the stars around the position 06:34:35, 04:40:21 (corresponding roughly to the group PL05 shown in

Fig.
Fig. 10), which do seem to have a common preferred motion towards the south-east.
Fig. D.1), and therefore may be just a chance projection.

Fig. 12 .
Fig. 12. Proper motion vectors relative to mean proper motion of NGC 2244, on top of a DSS red image shown in grey scale.Stellar positions marked by black points are the point of origin for the bars showing proper motions.The orientations of the bars and their colours indicate their direction of motion.The colour segregation in the central part of the image indicated that many stars in the cluster NGC 2244 move away from its centre.The white star indicates the position with the highest surface density of stars, which we adopt as the centre of NGC 2244.For a zoom into some particular regions of this plot, see Figs.D.1 and D.2.

Fig. 13 .
Fig. 13.Same as Fig. D.1, but the relative proper motions coloured according to the angle between star's relative proper motion vector and the line that connects it with the centre of the cluster.The purple hues mark the stars with a predominant motion away from the cluster centre, and the green towards it.We note that while near-white sources move close to perpendicularly to the line that connects them with the centre of the cluster, they may be moving in two opposite directions with respect to each other.The concentric circles correspond to the radii of (2.5, 5, 10, 20, 30) from the cluster centre (white star), used to plot the distribution of angle Φ in Fig.14.

Fig. 17 .
Fig. 17.Luminosity function in Gaia G-band for the probable members in the entire studied region (run F1; bottom histogram), and only for NGC 2244 (within the cluster radius of r = 18 , excluding the probable members of NGC 2237; top histogram).The orange lines represent KDE distributions calculated with Silverman rule bandwidth.The upper distribution is shifted vertically by 0.1 for clarity.The completeness limit of the Gaia G-band with the distance modulus subtracted is shown as a vertical dashed line.

Fig
Fig. 18.Hertzsprung-Russell diagram showing the candidates with membership probabilities ≥80% from the run F1 (black dots).The right panel is the zoomed-in version of the one to the left.The isochrones (grey solid lines) and the lines of constant mass (dashed orange lines) are from the PARSEC series.A typical errorbar is shown in the lower left corner.

Fig. 19 .
Fig. 19.Map of mean ages as a function of the position on the sky, with he bin size of 6 × 6 .To facilitate a comparison with other figures in the paper, we mark various known structures, identical to those in Fig. 10.

Fig. 20 .
Fig. 20.Initial mass function of the cluster NGC 2244, in two different representations: dN/dM (top panel) and dN/d(logM) (bottom panel).The solid orange lines in the top panel show a power law fit with a break ∼1.5 M , with slopes α = 1.05 ± 0.02 at the low-mass side, and α = 2.26 ± 0.25 for the massive part.Two standard mass functions are shown as dotted(Kroupa 2001), and dashed(Chabrier 2005) lines, normalised to match the total number of objects that entered the IMF derivation.The vertical blue dashed line marks the completeness limit.

Fig. 21 .
Fig. 21.Assessment of mass segregation in NGC 2244.Left: cumulative distributions of mass as a function of the projected distance from the NGC 2244 centre.The masses are divided at 1 M .The confidence intervals correspond to 1σ.Right: mass segregation ratio, Λ MSR as a function of the N MST stars.The horizontal dashed line indicates Λ MSR = 1, corresponding to no mass segregation.

Table 1 .
Instruments used in this study.

Table 3 .
Distances to various structures in the Rosette Nebula.

Table 4 .
Summary of various kinematic and structural parameters of NGC 2244.
Lim et al. (2021) -in this case the centre of the clusterthe larger is the fraction of the radial velocity of the cluster that is projected into the observed proper motion components; van Leeuwen 2009; Gaia Collaboration 2018;Kuhn et al. 2019).The adopted radial velocity is v r = 12.8 km s −1 , derived byLim et al. (2021)from high-resolution spectroscopy of a large number of YSOs in NGC 2244.

Table B .
1. Parameters for the construction of the training set and the corresponding PRF scores.Fig. B.1.Performance metrics for each of the classifier runs (see Table B.1 for IDs, run details and the exact score values).Points are offset on the x-axis for clarity.