Issue 
A&A
Volume 617, September 2018



Article Number  A15  
Number of page(s)  29  
Section  Numerical methods and codes  
DOI  https://doi.org/10.1051/00046361/201730972  
Published online  18 September 2018 
The seven sisters DANCe
IV. Bayesian hierarchical model^{⋆}
^{1}
Departamento de Inteligencia Artificial, UNED, Juan del Rosal 16, 28040 Madrid, Spain
email: lsb@dia.uned.es
^{2}
Université GrenobleAlpes, CNRS, IPAG, 38000 Grenoble, France
^{3}
Laboratoire d’astrophysique de Bordeaux, Univ. Bordeaux, CNRS, B18N, Allée Geoffroy SaintHilaire, 33615 Pessac, France
^{4}
Dpt. Statistics and Operations Research, University of Cádiz, Campus Universitario Río San Pedro s/n., 11510 Puerto Real, Cádiz, Spain
^{5}
Departamento Astrofísica, Centro de Astrobiología (INTACSIC), ESAC campus, PO Box 78, 28691 Villanueva de la Cañada, Spain
^{6}
Institut d’Astrophysique de Paris, CNRS UMR 7095 and UPMC, 98bis Bd Arago, 75014 Paris, France
^{7}
Instituto de Astronomia, Geofísica e Ciências Atmosféricas, Universidade de São Paulo, Rua do Matão, 1226, Cidade Universitária, 05508900, São Paulo, Brazil
Received:
11
April
2017
Accepted:
23
April
2018
Context. The photometric and astrometric measurements of the Pleiades DANCe DR2 survey provide an excellent test case for the benchmarking of statistical tools aiming at the disentanglement and characterisation of nearby young open cluster (NYOC) stellar populations.
Aims. We aim to develop, test, and characterise of a new statistical tool (intelligent system) for the sifting and analysis of NYOC populations.
Methods. Using a Bayesian formalism, with this statistical tool we were able to obtain the posterior distributions of parameters governing the cluster model. It also used hierarchical bayesian models to establish weakly informative priors, and incorporates the treatment of missing values and nonhomogeneous (heteroscedastic) observational uncertainties.
Results. From simulations, we estimated that this statistical tool renders kinematic (proper motion) and photometric (luminosity) distributions of the cluster population with a contamination rate of 5.8 ± 0.2%. The luminosity distributions and present day mass function agree with the ones found in a recent study, on the completeness interval of the survey. At the probability threshold of maximum accuracy, the classifier recovers ≈90% of the recently published candidate members and finds 10% of new ones.
Conclusions. A new statistical tool for the analysis of NYOC is introduced, tested, and characterised. Its comprehensive modelling of the data properties allows it to get rid of the biases present in previous works. In particular, those resulting from the use of only completely observed (nonmissing) data and the assumption of homoskedastic uncertainties. Also, its Bayesian framework allows it to properly propagate observational uncertainties into membership probabilities and cluster velocity and luminosity distributions. Our results are in a general agreement with those from the literature, although we provide the most uptodate and extended list of candidate members of the Pleiades cluster.
Key words: methods: data analysis / methods: statistical / proper motions / stars: luminosity function, mass function / open clusters and associations: individual: M 45
Full Table 1 is only available at the CDS via anonymous ftp to cdsarc.ustrasbg.fr (130.79.128.5) or via http://cdsarc.ustrasbg.fr/vizbin/qcat?J/A+A/617/A15
© ESO 2018
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction
The Pleiades is one of the most studied clusters in history^{1}. Its popularity comes from its unique combination of properties. It is young (125 ± 8 Myr, Stauffer et al. 1998), close to the sun ( pc, Galli et al. 2017), massive (870 ± 35 M_{⊙}, Converse & Stahler 2008), has low extinction (Av = 0.12, Guthrie 1987), and an almost solar metallicity ([Fe/H] ≈ 0, Takeda et al. 2017). From Trumpler (1921) to date, it continues to yield new and fascinating results. Recently, Bouy et al. (2015b) found 812 new candidate members for a total of 2109 down to ≈0.025 M_{⊙}. The discovery of these new candidate members is rooted in their excellent multiarchive data (Bouy et al. 2013) and their leadingedge multidimensional statistical tool (Sarro et al. 2014).
In the past, candidate members were selected using proper motions (e.g. Moraux et al. 2001), or a combination of proper motions and cuts in the photometric space (e.g. Lodieu et al. 2012). Only very recently with the works of Malo et al. (2013; and later Gagné et al. 2014), KroneMartins & Moitinho (2014) and Sarro et al. (2014) the astrometric and photometric data started to be treated simultaneously and consistently to infer cluster (or moving groups) membership probabilities. Since the early work of Lodieu et al. (2012), photometric bands have proven to be crucial in the identification of new open cluster members. Therefore we leave aside the discussion of the recent works of Sampedro & Alfaro (2016) and Riedel et al. (2017) in which photometric information is not included in their methodologies.
Briefly, Malo et al. (2013) establish membership probabilities to nearby young moving groups using a naive^{2} Bayesian classifier (BANYAN). Their classification uses kinematic and photometric models of moving groups and field. They create these models with the previously known bona fide members and field objects. Their photometric model in the low mass range is extended using evolutionary models. Later, Gagné et al. (2014) developed BANYAN II, and improved Malo et al. (2013) methodology to identify lowmass stars and brown dwarfs by using redder photometric colours. They also included observational uncertainties, and correlations in the XYZ, and in UVW spaces, separately (via freelyrotated ellipsoid Gaussian models which are mathematically equivalent to two separate 3D multivariate Gaussian models), which reduces the naivety of the classifier, amongst other improvements. KroneMartins & Moitinho (2014) establish cluster membership probabilities in a frequentist approach using an unsupervised and data driven iterative algorithm (UPMASK). Their methodology relies on clustering algorithms and the principal component analysis. Although untested, the authors mention that their methodology is able to incorporate proper motions and deal with missing data and different uncertainty distributions. Sarro et al. (2014) infer posterior membership probabilities using a Bayesian classifier (refered to here as SBB). Their cluster model is data driven and, since they model some parametric correlations, it is also less naive than that of Malo et al. (2013) and Gagné et al. (2014). Sarro et al. (2014) construct the field and cluster proper motions models with clustering algorithms and the photometric one with principal curve analysis. Their treatment of uncertainties and missing values is consistent across observed features. Finally, they infer the best parameter values of their cluster model using a maximumlikelihoodestimator (MLE) algorithm.
The previous methodologies perform well on their designated tasks and successfully led to the identification of many new high probability members of nearby clusters and associations. However, key aspects still need to be tackled or improved.
The probability of an object being a member of a class (e.g. the cluster or field populations) depends on how the class is defined. In parametric classifiers, the classes are defined by the relations amongst their parameters, and the value of these later. Therefore, the classification is sensitive to the parameters defining the class, with different parameter values resulting in different membership probabilities. The majority of the parametric classifiers from the literature fail to report the sensitivity of the recovered membership probabilities to the chosen value of their parametric models.
Missing values can strongly affect any result based on data that contains them. If the pattern of missing values is completely random, then results based only on the complete (nonmissing) data are unbiased. However, the less random the missing pattern is, the most biased the results are. See chapter 8 of Gelman et al. (2013) for a discussion on missing at random and ignorability of the missing pattern. In astronomy, measurements are strongly affected by the brightness of the source in question. The physical limits of detectors lead to nonrandom distributions of missing measurements. At the bright end, saturation indeed makes measurements useless, while at the faint end, sources beyond the limit of sensitivity are not detected. In a multiwavelength data set, the intrinsic stellar colours add another level of correlation between sensitivity limits in different filters. Thus, missing values are not completely random, although they could be in some specific and controlled situations. Commonly, they occur on measurements of bright and faint sources and depend on the source colours. For these reasons, their treatment is of paramount importance. Gagné et al. (2014) and Sarro et al. (2014) construct their cluster (or kinematic moving group) and field model using only complete data, and then estimate membership probabilities for objects with missing values (only in parallaxes and radial velocities in the case of Gagné et al. 2014). UPMASK does not include any treatment of missing values, although KroneMartins & Moitinho (2014) mention that, in the future, their methodology will be able to incorporate them.
In general, models consist of relations among their variables (or parameters if they are parametric models) and the data. We call these the underlying relations between modelled and observed data. Traditionally, it is assumed that the underlying relations correspond to those seen in the data, which we call observed relations. For example, Sarro et al. (2014) and KroneMartins & Moitinho (2014) establish as underlying relations in their models, the observed ones that they found after applying the principal curve analysis and the PCA to their data. Another possibility is to assume that the underlying relations come from other models. However, using other models may inherit their possible biases, while the observed relations are strongly affected by observational uncertainties, especially when these are not homogeneous (heteroscedastic). For example, the principal curve and principal components analyses (present in UPMASK and SBB) are biased by individual observational uncertainties (Hong et al. 2016). Some remedies include the popular σclipping procedure and the variance scaling.
Inspired by the previous works and moved to address these three major limitations, we now present a new Bayesian methodology aiming at statistically modelling the distributions of the open cluster population. It obtains the cluster membership probabilities as a byproduct. Our methodology treats parametric and observational uncertainties consistently in a bayesian framework. In this framework, observational uncertainties propagate into the posterior distributions of the parameters. Objects with missing values are consistently included in all elements of our methodology. Particularly, it allows us to construct our field and cluster models with all objects in the data set, in spite of their missing values. As mentioned in the previous paragraph, the observed relation between modelled and observed data is subject of bias. Instead, we aim at the “true”^{3} underlying parametric relations which render the observed data after being convolved with the observational uncertainties. In other words, we aim at deconvolving the true cluster distributions (see Bovy et al. 2011 for another example of deconvolution).
In a Bayesian framework, priors must be established. To avoid the subjectivity of choosing priors as much as possible, we used the Bayesian hierarchical model formalism (see the works of Jefferys et al. 2007; Shkedy et al. 2007; Hogg et al. 2010; Sale 2012; Feeney et al. 2013, for applications of HM in astrophysics). On it, the parameters of the prior distributions are given by other distributions in a hierarchical fashion. In the words of Gelman (2006), Bayesian Hierarchical Models “allow a more ‘objective’ approach to inference by estimating the parameters of prior distributions from data rather than requiring them to be specified using subjective information”. However, it comes at a price: these models are computationally more expensive because they require far more parameters than standard approaches.
This paper proceeds as follows: in Sect. 2 we briefly describe the Pleiades data set and explain our methodology. Section 3 contains the results of applying our methodology to both synthetic and the real Pleiades data. In Sect. 4 we discuss our results and compare them with the literature. Finally, Sect. 5 contains our conclusion and discussion of future perspectives.
2. Methodology
The data set used in this work is the second release (DR2) of the Pleiades catalogue (see Appendix A of Bouy et al. 2015b) from the DANCe survey (Bouy et al. 2013). Data from this survey have been successfully used to characterise the Pleiades (Sarro et al. 2014; Bouy et al. 2015b; Barrado et al. 2016) and M35 (Bouy et al. 2015a) clusters. Although this catalogue contains astrometric (stellar positions and proper motions) and photometric (ugrizYJHK_{s}) measurements for 1 972 245 objects, we used only proper motions and the iYJHK_{s} bands. Our selection aims at comparing our results to those of Bouy et al. (2015b) who used the reduced RF2 representation space (µ_{α}, µ_{δ}, J, H, K_{s}, i−K_{s}, Y−J) of Sarro et al. (2014). Thus, our representation space comprises the proper motions in right ascension and declination, µ_{α}, µ_{δ}, and the photometric colours and magnitudes, i−K_{s},Y, J, H, K_{s}. We modelled the photometry by means of a set of parametric relations in which the colour index i−K_{s} (CI hereafter) is the independent parameter. We selected the CI from among the possible colour indices because of its discriminant properties. Goldman et al. (2013) remarked the importance of using colour indices with the largest difference in wavelength in order to discriminate Hyades members from the field population. They used the colour indices g−K_{s}, r−K_{s} and i−K_{s} to perform their photometric selection of members. This result has been confirmed by Sarro et al. (2014). Using a random forest classifier, these authors determine that the colour indices r−K_{s}, i−K_{s} and Y−J where amongst the most discriminant features with mean decrease of node impurity of 156.0, 102.0, and 77.9, respectively (see their Table 2).
As will be explained later in this section, our parametric model yields the photometric bands as functions (injective by definition) of a true colour index. Thus, we proceeded to select one colour index from the set of the most discriminant ones. On the one hand, the r band is missing in 1 222 853 sources of the DANCe DR2 catalogue, which is more than ∼50% missing entries than in the i band. On the other hand, we attempted to model the magnitudes of the cluster members as a function of the Y−J colour index, but this resulted in large discrepancies with the observed photometry. This is due to the high and almost vertical slope in cluster CMDs resulting from the Y−J colour index, which prevents our injective functions to correctly reproduce the data. On the contrary, the cluster CMDs slope is less pronounced when using the i−K_{s} colour index. Therefore, in the following we work with the i−K_{s} colour index.
Since both photometry and proper motions carry crucial information for the disentanglement of the cluster population, we restrict the data set to objects with proper motions and at least two observed values in any of our four CMDs: Y, J, H, K_{s} vs. CI. This restriction excludes 22 candidate members of Bouy et al. (2015b), which have only one observed value in the photometry. Furthermore, we restrict the lower limit (CI = 0.8) of the colour index to the value of the brightest cluster member. We do not expect to find new bluer members in the bright part of the CMDs. We set the upper limit (CI = 8) of the colour index at one magnitude above the colour index of the reddest known cluster member, thus allowing for new discoveries. Due to the sensitivity limits of the DR2 survey in i and K_{s} bands, objects with CI > 8 have K_{s} magnitudes ≥16 mag. These objects are incompatible with the cluster sequence and therefore we discard them a priori as cluster members.
Our current computational constraints and the costly computations associated to our methodology (described throughout this section) prevent its application to the entire data set. However, since the precision of our methodology, as that of any statistical analysis, increases with the number of independent observations, we find that 10^{5} sources is a reasonable compromise for our data. Although a smaller data set produces faster results, it also renders a less precise model of the field (in the area around the cluster) and therefore, a more contaminated model of the cluster. For these reasons, we restrict our data set to the 10^{5} objects with highest membership probabilities according to Bouy et al. (2015b). Of this resulting data set, the majority (≈98%) are field objects with cluster membership probabilities around zero. Thus, the probability of leaving out a cluster member is negligible. For the remaining of the objects in the Pleiades DANCe DR2, we assigned membership probabilities a posteriori, once the cluster model is constructed (see Sect. 2.4).
To disentangle the cluster and field population we create parametric and independent models for both populations. These models aim at reproducing the observed astrometric and photometric properties of both populations. We infer the set of model parameters, θ, based on the data, (with N the number of sources and d_{n} = {µ_{α,n}, µ_{δ,n}, CI_{n}, Y_{n}, J_{n}, H_{n}, K_{sn}}), and the probabilistic framework established by the Bayes theorem:
In this equation, p(θD) represents the posterior probability of the parameters given the data, this is what we aim to infer. In the right side, p(Dθ) stands for the probability of the data given the parameters, also called the likelihood^{4} of the data, p(θ) represents the prior beliefs about the relative probabilities of different parameter values, and p(D), also known as evidence or marginal likelihood (since the parameters have been marginalised over), works as a normalisation constant. Since this last one can be computed by integrating the numerator of Eq. (1), we only focus on the likelihood and the priors. We describe these terms in more detail in the remainder of this section.
Assuming that data are independent^{5}, given the parameters, its likelihood can be expressed as
We call this the generative model to the likelihood of one datum, p(d_{n}θ), because synthetic data can be drawn from it. Formally, this term must be p(d_{n}θ, M) with M standing for all other information on which the probability distribution depends. This information includes the standard uncertainty of each datum,
ϵ_{n}(e.g. ϵ_{µα}, ϵ_{µδ}, ϵ_{CI}, associated with the common ±σ Gaussian uncertainties), and the assumptions we make in the construction of the model. We assumed that each datum uncertainty is fixed, which means that the differences between datum d_{n} and new measurements of the same source will be normally distributed with mean zero and standard deviation ϵ_{n}. The following section explains the rest of the information M, that we used to construct the generative model.
2.1. Generative model
Since we aim to separate the cluster and the field, we modelled these two overlapping populations separately. Their explicit disentanglement would demand a set of N binary integers to account for the two possible states of each object in our data set: either it belongs to the cluster or to the field population. Since this would be prohibitive for the inference process (in computational terms), we marginalised^{6} them using a binomial prior. This marginalisation renders only one parameter, π, which represents the fraction^{7} of field objects in the data set.
Thus, the generative model can be expressed as
where p_{f}(d_{n}θ_{f}, ϵ_{n}) and p_{c}(d_{n}θ_{c}, ϵ_{n}) are the field and cluster likelihoods of the datum d_{n}, given its standard uncertainty ϵ_{n}, and the values of the field and cluster parameters θ_{f} and θ_{c}, respectively. The next two sections explain briefly the generative models of the field and cluster. We refer the reader to the Appendix A.1 for a more detailed explanation of both models and the relations among their parameters.
In the following, we assume that the observed quantities are drawn from a probability distribution centred in the true values. These are then convolved with the probability distribution of the observational uncertainties, which we assume to be a multivariate Gaussian.
In the DANCe DR2 data set, proper motions and photometric bands are computed independently from each other. Thus, it does not report correlations amongst the observable uncertainties. For this reason, in the multivariate Gaussian describing the observational uncertainties we set the offdiagonal elements to zero, except those corresponding to the i−K_{s} colour index, which by construction^{8} contain the correlation of the i and K_{s} bands.
Our methodology aims to deconvolve the observational uncertainties to obtain the intrinsic dispersion of the true values. This intrinsic dispersion is the convolution of several processes (e.g. unresolved binaries, extinction, variability), which we attempt to separate in future versions of our model.
Due to its heterogeneous origin, the DANCe Pleiades DR2 has a high fraction of photometric missing values (see Table 1 of Sarro et al. 2014). In our data set, less than 1% of the objects have values in all photometric bands (and it should be remembered that only these complete sources are used in Bouy et al. (2015b) to construct the cluster model which is eventually applied to infer the membership probabilities of the incomplete sources). Therefore, the treatment of objects including missing values is of paramount importance to our methodology. In the following, we deal with the missing values in these objects by setting them as parameters, which we marginalise over all their possible values with the aid of priors. In general we use uniform priors, otherwise, we give specific details.
Cluster and equalmass binaries membership probabilities.
2.1.1. Field population model
We have assumed that the field distributions of proper motion and relative photometry are independent, and thus can be factorised. This assumption is not entirely correct since the relative photometry is affected by distance, and the later is correlated with proper motions. Nevertheless, we assumed independence amongst proper motion and photometric bands based on the following points: (i) the entire DANCe DR2 renders small (<0.1) correlations amongst these observables, and (ii) assuming independence reduces the number of free parameters of the field model from 728 to 366. Thus, although this assumption renders a less accurate model, it reduces the complexity of the later by ∼50%.
We also assume that the distribution of proper motions and relative photometry are described by Gaussian mixture models (GMM). The flexibility of GMM to fit a variety of probability distributions geometries make them a suitable model to describe the density of the heterogeneous data from the DANCe DR2. We fitted these GMM to field objects in our data set. We selected as field objects those having cluster membership probability lower than 0.75 according to Bouy et al. (2015b), approximately 98 000 objects. We verified that the number of hypothetically misclassified objects is negligible compared to the size of our data set (100 000 objects). With the contamination and true positive rates reported by Sarro et al. (2014): ≈8% and ≈96% respectively (at probability threshold p = 0.75), and the number of candidate cluster members reported by Bouy et al. (2015b), 2109, the number of misclassified objects would be ≈258, which represents a negligible fraction (≲0.26%) of our data set. Furthermore, assuming that these misclassified objects would be in the fieldcluster “boundary”, we can safely assume that they would be spread all over the cluster photometric sequence and over a halo around the proper motion of the cluster. This assumption, together with their negligible fraction, leads us to assume that their contribution to the field parameters is negligible. Thus, we keep the field parameters fixed throughout the inference process. Due to our current computing power, this decision is essential since it diminishes considerably (by 336) the number of free parameters, and therefore the computing time. However, it also leads to posterior distributions of cluster parameters that do not reflect the uncertainties associated to the field model.
We determined the number of Gaussians in each GMM using the Bayesian Information criterion (BIC, Schwarz 1978). This is a figureofmerit that combines the likelihood and the number of parameters in the model such that it penalises complex models. Due to the presence of missing values in the photometry, we estimated the parameters of this photometric GMM using the algorithm of McMichael (1996). This is a generalisation of the expectation maximisation (EM) algorithm for GMM in which data with missing values also contribute to estimate the parameters. The number of Gaussians suggested by the BIC for this mixture is 14 (amounting to 293 free parameters). The right panel of Fig. 1 depicts a projection of this multidimensional (five dimensions) GMM in the subspace of K_{s} vs. CI. We notice that, due to the high amount of missing values in the photometry, most of the plotted Gaussians in the right panel are empty in this particular projection space.
Furthermore, using our entire data set to construct the parameters of the field, allows us to remove biases associated with the use of only the completely observed objects. To illustrate these biases we proceed as follows. First, we took the GMM fitted to the ≈98 000 field objects, as described in the previous paragraphs. Since this model takes into account the missing values we call it the incomplete data model. Then, we selected only the complete sources in the ≈98 000 field data set (which amount up to 1%) and fit a GMM with the same number of Gaussians, 14, as the incomplete data model. We call this the complete data model. Afterwards, for each model, we draw 10^{5} synthetic data points, we call them complete and incomplete, depending on the parent model. In Fig. 2, we show the associated density of these two synthetic data sets, complete (solid line) and incomplete (dashed line), in the projected K_{s} vs. i−K_{s} (left) and K_{s} vs. J−K_{s} (right) CMDs. As this figure shows, the complete data model underestimates the density in the faintest regions (where the missing values happen the most), over estimate it in the middle ones (11 < K_{s} < 15), and shift it at the brightest ones (K_{s} ≈ 10 in the K_{s} vs. J − K_{s} CMD). As this figure illustrates, when missing values do not happen at random, the density landscapes of completely observed objects and that of all objects (missing values comprised) differ.
In the case of the proper motions, the BIC favours a model with a large number of Gaussians with small weights and large variances distributed all over the observed data space. In order to circumvent this overcomplex model, we decided to add a uniform distribution to the GMM. When we apply the BIC to this new mixture of distributions, the modification improves the likelihood and reduces the number of Gaussians. The number of Gaussians suggested by the BIC for this mixture is seven, plus the uniform distribution (amounting to 42 free parameters). The left panel of Fig. 1 shows the Gaussians of this mixture. As can be seen in this figure, one of the Gaussians in the mixture is centred near the proper motions of the cluster ({µ_{α}, µ_{δ}} ∼ {16, −39} mas yr^{−1}). The weight of this Gaussian is small, 0.07, and only marginally larger than the weight 0.03 of the Gaussian at the upper right corner. Since there is no apparent reason for this Gaussian to be coincident with the cluster population, it suggests that within the objects that Bouy et al. (2015b) classified as field population, there are some falsenegatives with proper motions compatible with those of the cluster population. In future works, we will improve this classification to characterise and minimise possible falsenegatives.
Fig. 1. Proper motion (left panel) and K_{s} vs. i−K_{s} CMD (right panel) projections of field models (ellipses and crosses depicting the covariance matrices and means of the GMM) and the density of objects in our dataset (grey pixels in logarithmic scale, shown only to guide the eyes). The colour scale shows the weight of each Gaussian in the GMM. 
Fig. 2. Densities in K_{s} vs. i−K_{s} (left panel) and K_{s} vs. J − K_{s} (right panel) CMDs of 10^{5} synthetic sources. We drawn these from two different models: complete (solid line) and incomplete (dashed line). We construct the incomplete model using all objects in our data set, even those with missing values, while for the complete one we use only those object without missing values. The contour values (at 10^{−3}, 10^{−2}, 10^{−1}, 4 × 10^{−1}) are the same for both complete and incomplete models. 
2.1.2. Cluster population model
To model the cluster population, we assume independence between proper motions and photometry. This assumption is not entirely correct since the cluster has a spread in distance, which may introduce a correlation amongst these variables. However, due to the distance to the cluster ( pc according to Galli et al. 2017) we can assume that this spread has a negligible impact in the photometry and proper motions of the cluster members. This correlation and its possible inclusion in the model will be explored in future works. Thus, similarly to the field model, we factorise these two components. Sarro et al. (2014) show evidence of an equalmass binaries sequence in the Pleiades and model it with a proportion fixed to 20%. We now model this sequence as a parallel cluster sequence displaced 0.75 magnitudes to the brighter side. Furthermore, since binarity could affect the proper motion of the system, we couple this photometric information to the proper motions by constructing a separate proper motions model for these equalmass binaries. Additionally, we set the fraction of equalmass binaries as a free parameter of our model. This allows us to investigate potential kinematical differences between equalmass binaries and the rest of the stars, singles and non equalmass binaries.
Photometric model of equalmass binaries and single stars.
To model the cluster sequence in the CMDs we used one truncated series of cubic splines for each of the YJHK_{s} vs. CI CMDs. We chose splines because of their better fitting properties. We tried several polynomial bases (Laguerre, Hermite, Chebyshev) but regardless of their order, they lack the flexibility shown by the splines, particularly in the high slope region around CI ≈ 3. However, this flexibility comes at a price. Splines require us to set, in addition to the coeﬃcients of the series, a number of points known as knots. Knots are the starting and ending points of each spline section.
The simultaneous inference of spline coeﬃcients and knots, a problem known as freeknot splines, introduces multimodality in the parametric space (Lindstrom 1999). To avoid this multimodality, we kept the knots fixed throughout inference. Nevertheless, we applied the Spiriti et al. (2013) methodology^{9} to the Bouy et al. (2015b) members. Doing so, we obtain the best number and position for the knots. These are CI = {0.8, 3.22, 3.22, 5.17, 8.0}. We tested different number of knots, ranging from two to nine, with five the best configuration given by the BIC.
In Sarro et al. (2014) the cluster sequence was modelled nonparametrically with a principal curve. It had a natural coordinate (λ) which was not directly related to any physical parameter. This coordinate no longer holds for the splines model in which now the true CI is the independent parameter. Furthermore, as explained in Sect. 1, the principal curve analysis returns the observed relation in the data, not the underlying relation that generates the observations. Instead, splines allowed us to model the true underlying relation in a parametric way.
Here, we have assumed that the observed photometric quantities are drawn from a probability distribution resulting from the convolution of the observed uncertainties, with an intrinsic distribution centred at the true photometric quantities. We model this intrinsic distribution as a multivariate Gaussian, whose covariance matrix is the intrinsic dispersion, the same all along the cluster sequence. This intrinsic dispersion could arise from different astrophysical processes such as age, metallicity and distance dispersions, unresolved binaries, transits, and variability, amongst. Without this dispersion, we would have an oversimplistic model in which the cluster sequence will be an infinitely narrow line, and departures from it would only be explained by the observational uncertainties. In practice, this model would underestimate the posterior membership probabilities of hypothetical good candidates that depart from the ideal cluster sequence. We can have access to this intrinsic dispersion only after deconvolving the observational uncertainties.
The true CI of each object is unknown, even if its observed value is not missing. This means that the true CI of each object is a nuisance parameter which must be marginalised. We show this marginalisation in Eq. (4). To marginalise these CI we need a measure. We established this measure as a truncated (0.8 ≤ CI ≤ 8) univariate GMM with five components whose parameters are also inferred from the data.
In the previous equation, d_{ph}, ϵ_{ph,} and θ_{c} correspond to the photometric measurements, standard photometric uncertainties, and the cluster parameters, respectively. The term p(d_{ph}CI, θ_{c}, ϵ_{ph}) corresponds to the multivariate Gaussian associated with the intrinsic dispersion of the cluster. The CI dictates the true photometric quantities by means of the splines. The term p(CIθ_{c}, ϵ_{ph}) correspond to the truncated GMM which we use as a measure for the true CI. Appendix A.1 contains more details on this marginalisation and the probability distribution involved on it.
We used the observed CI and magnitudes to reduce the computing time of the marginalisation integral by avoiding regions in which the argument is almost zero (i.e. far away from the measured values). The process is the following: first, we compared the observed photometry to the true one (i.e. the cluster sequence given by the splines) and find the closest point, p, using the Mahalanobis metric. This metric uses the sum of the observational uncertainty with the intrinsic dispersion of the cluster sequence as covariance matrix. To define the limits of the marginalisation integral, we used a ball of 3.5 Mahalanobis distances around point p. Contributions outside this ball are negligible (<4 × 10^{−4}).
Since we modelled the true photometric quantities of the equalmass binaries with a parallel sequence displaced 0.75 magnitudes into the bright side (twice the luminosity implies an increase of 0.75 in magnitudes), the only extra parameter needed is the fraction of equalmass binaries to the total of cluster members.
Proper motion model of equalmass binaries and single stars.
We modelled the proper motions of equalmass binaries and single stars with a GMM whose parameters are inferred as part of the hierarchical model. The number of Gaussians, however, remains fixed throughout inference. Following the BIC criterion, we select four and two Gaussians for single and equalmass binaries, respectively. Furthermore, we also assumed that the Gaussians in the proper motions GMM share the same mean, one for single stars and one for equalmass binaries (which need not be equal).
The number of free parameters in our cluster and field models are 84 and 335, respectively. In addition, we used one free parameter, π (Eq. (3)), to model the fraction of the field in the clusterfield mixture. Thus, our generative model has 420 free parameters. As explained in Sect. 2, due to computational constraints, we used maximumlikelihood techniques to obtain the value of the 335 field parameters. For the remaining ones, we used MCMC to infer their full posterior distribution. In the following section we described the priors used for the inference of these 85 parameters.
2.2. Priors
In a Bayesian framework, each parameter in the generative model has a prior, even if it is uniform or improper. The priors we assumed are intended to fall in the category of weakly informative priors. A weakly informative prior, following Gelman (2006), is that in which “the information it does provide is intentionally weaker than whatever actual prior knowledge is available”. Although there is no general method for specifying them, a weakly informative prior can be constructed by diminishing the current available information (see for example Gelman et al. 2008; Chung et al. 2015). In practice, we constructed a weakly informative prior as follows. First, we chose the family distribution and its hyperparameters such that it resembles the actual prior information. Then, we tuned the hyperparameters such that the statistical variance of the distribution increases with respect to the value found in the first step. In this way, the resulting prior provides less restrictive information than the original one. We chose this kind of priors due to their better properties regarding the regularisation and stability of the posterior computation when compared to reference priors (Simpson et al. 2017), and other noninformative priors (Gelman 2006). We grouped priors into three main categories, those for fractions, and those for parameters in the proper motion and in the photometrical models. In the following, we explain the kind of distributions we use for the priors. In Appendix A.2, we give details on the particular parameter values we chose for these distributions.
Fractions are defined for mixtures, which can be GMM or the clusterfield mixture (Eq. (3)), and quantify the contribution of each element to the mixture. Thus, they must add to one and be bounded by the [0, 1] interval. For priors of fractions we use the multivariate generalisation of the beta distribution: the Dirichlet distribution. This distribution is parametrised by the vector α (where , and K is the number of categories) and its support is the set of Kdimensional vectors x defined in the interval (0, 1) and with the property: x = 1 (the sum of their entries equals one). We chose the Dirichlet distribution because it fits perfectly our needs, in addition its variance^{10} can be diminished to tune it as a weakly informative prior.
We set the priors of means and covariance matrices in the proper motions GMM as bivariate normal and Half–t distributions, respectively. According to Huang & Wand (2013), setting arbitrarily large values of the A parameters in the later distribution leads to arbitrarily weakly informative priors on the corresponding standard deviation terms. Thus, we obtained weakly informative priors by allowing large values of the standard deviations σ and A parameters, in the bivariate normal and Half–t distributions, respectively. See Appendix A.2 for more details.
Photometric priors include three categories, those concerning the true CI, the splines coeﬃcients, and the cluster sequence intrinsic dispersion. For the priors of the means and variances of the true CI GMM, we used the normal and Half–Cauchy distributions, respectively. The latter is the recommended choice for a weakly informative prior according to Gelman (2006). In both distributions we use large values for the variance and η parameters (see Appendix A.2). Thus, both are weakly informative priors.
For the coeﬃcients in the spline series we set the priors as univariate normal distributions. Finally, we use the multivariate Half–t distribution (Huang & Wand 2013) as a prior for the covariance matrix modelling the intrinsic dispersion of the cluster sequence. Appendix A.2 shows the details on how we tuned these distributions to obtain weakly informative priors.
2.3. Sampling the posterior distribution
There are three possible approaches to obtain the posterior distributions of the parameters in our model: an analytical solution, a grid in parameter space, and the Markov chain Monte Carlo (MCMC) methods. Given the size of our data set (10^{5} objects) and the dimension of our inferred model (85 parameters), the analytical solution and the grid approach are discarded a priori.
The MCMC methods offer a feasible alternative to this problem. Briefly, they consist of a particle (or particles) which iteratively moves in the parameter space. Among the many MCMC methods that exist, we selected the “stretch” move which is an aﬃne invariant scheme developed by Goodman & Weare (2010). It is implemented to work on parallel in the Python routine emcee (Flegal et al. 2016). We chose emcee due to the following properties: (i) the aﬃne invariance allows a faster convergence over common and skewed distributions (see Goodman & Weare 2010; Flegal et al. 2016, for details), (ii) the parallel computation distributes particles over nodes of a computer cluster and thus reduces considerably the computing time, and (iii) it requires the handtuning of only two constants: the number of particles, and a, the parameter of “stretch” distribution (see Eq. (9) of Goodman & Weare 2010). We used 170 particles (twice the number of parameters) and a value of a = 1.3. These keep the acceptance fraction in the range 0.2−0.5, as suggested by Flegal et al. (2016).
We used CosmoHammer (Akeret et al. 2013), a frontend of emcee, to control the input and output of data and parameters, as well as the hybrid parallel computing. We ran it on a 80 CPUs (cores) computer cluster with 3.5 GHz processors. However, instead of using OpenMP as Akeret et al. (2013) did, we use the multiprocessing package of Python to distribute the computing of the likelihood among cores in each cluster node. Since the evaluation of the likelihood is computationally expensive (it takes approximately 30 days to run in the previously described computer cluster^{11}), we proceeded similarly to Akeret et al. (2013). We provided emcee with an optimised set of values of the posterior distribution. These values can be thought of as a ball around the maximumaposteriori (MAP) solution. We found them with a modified version of the Charged Particle Swarm Optimiser (PSO) of Blackwell & Bentley (2002). It avoids the overcrowding of particles around local best values. The charged version retains the PSO exploratory property by repelling particles that come closer than a certain user specified distance to each other. The repelling force mimics an electrostatic force, thus the name charged PSO.
The modification that we introduce to the charged PSO relates only to the measuring of distance between particles. The algorithm of Blackwell & Bentley (2002) computes these distances in the entire parametric space. We find this approach unsuitable for our problem. In it, parameters have different length scales (for example, fractions and proper motions). Therefore, we measured distance between particles and apply the electrostatic force independently in each parameter. Thus, the electrostatic force comes into action only when the relative distance between particles is smaller than 10^{−10}. We chose this value heuristically.
The PSO does not warrant the finding of the global maximum of the score function (see Blackwell & Bentley 2002; Clerc & Kennedy 2002 and references therein). Therefore, we iteratively ran PSO and 50 iterations of emcee (with the same number of particles as the PSO) until the relative difference between means of consecutive iterations is lower than 10^{−7}. The iterations of emcee guarantee the spreading of the PSO solution without losing the information gained. After convergence of the PSOemcee scheme, we ran emcee with 175 walkers, until convergence. Neither scheme, PSO alone or PSOemcee, guarantees to find the global maximum and their solution could be biased. However, we used them to obtain a fast estimate of the global maximum, or at least, of points in its vicinity. Nevertheless, the final emcee run, during the burning phase, erases any dependance on these initial solutions.
Convergence to the target distribution occurs when each parameter enters into the stationary equilibrium, or normal state. The central limit theorem ensures that this state exists. See Roberts & Rosenthal (2004) for guaranteeing conditions and Goodman & Weare (2010) for irreducibility of the emcee stretch move. The stationary or normal state is reached when, in at least 95% of the iterations, the sample mean is bounded by two standard deviations of the sample, and the variance by the two standard deviation of the variance^{12}; see Fig. 3.
Once all parameters have entered the equilibrium state, we stop emcee by using the criterion of Gong & Flegal (2016)^{13}. We chose this criterion because it was developed for highdimensional problems and tested on hierarchical Bayesian models. In this criterion, the MCMC chain stops once its “effective sample size” (ESS, the size that an independent and identically distributed sample must have to provide the same inference) is larger than a minimum sample size computed using the required accuracy, ϵ, for each parameter confidence interval (1 − δ)100%. Our emcee run stops once the ESS of the ensemble of walkers is greater than the minimum sample size needed for the required accuracy ϵ = 0.05 on the 68% confidence interval (δ = 0.32) of each parameter.
Fig. 3. Normalised mean (left panel) and variance (right panel) of each parameter in our model, given the DANCE DR2 data set as functions of iterations in the MCMC. Each parameter is scaled using the mean and variance of its corresponding ensemble of particles positions at the last iteration. Red lines show one and two sigma levels of these normalisation values. These figures depict the evolution of the Markov Chains from the original values provided by the PSO to the convergence. This later shown by the last ∼200 iterations in which the mean and variances are within the twosigma levels. We notice that some parameters evolve (within the MCMC) in groups, which is related to their correlation. 
2.4. Membership probabilities
The methodology detailed in the previous sections renders the posterior distributions of the parameters in the models of cluster and field populations. Cluster membership probabilities are then computed from these distributions by means of Bayes’ theorem, (Eq. (1)). Applying it to our classification problem, we obtain that the probability of an object with measurement d_{n}, to belong to the cluster population, C, is,
where F denotes the field population and, p(d_{n}C) and p(d_{n}F) are the cluster and field likelihoods, respectively. Probabilities p(C) and p(F) are the prior probabilities of the object to belong to the cluster and field, respectively. For these prior probabilities we used the fraction of field and cluster stars (i.e. the values of π and 1 − π in Eq. (3), respectively), which the model infers at each MCMC iteration. The same reasoning is then applied to the probability of an object to be an equalmass binary. In this case, the two populations are the equalmass binaries and the stars in the main cluster sequence^{14}.
All terms in Eq. (6) depend on the model parameters, even the prior probabilities as mentioned before. Thus, each realisation from the joint posterior distribution of the model parameters (i.e. each iteration of the MCMC) results in a value for both cluster and equalmass binaries membership probabilities. Therefore, upon convergence of the MCMC, sampling the joint posterior distribution of the model parameters results also in the sampling of the cluster and equalmass binaries membership probabilities of each object.
Once the generative model has been learned from the 10^{5} sample (i.e. the MCMC has converged), we obtained the cluster and equalmass binaries membership probabilities of all the objects in the DANCe catalogue. Computing 1700 samples of the membership probabilities for each of the approximately one and a half million stars in the DANCe DR2 takes 4.11 h. In Table 1 (available entirely at the CDS) we summarise the cluster and equalmass membership probabilities of the DANCe DR2 objects marginalised over the posterior distribution of the cluster parameters. We also report the sensitivity of these membership probabilities to the cluster parameters by means of the standard deviation of the 1700 samples obtained for each object in the data set.
3. Results
In this section we analyse the results obtained by applying our methodology on synthetic and real data. The synthetic data enable us to quantify the reliability of the methodology and evaluate the impact that missing values have on it. This synthetic analysis requires at least three runs: one on the real data (to obtain the best values from which we generate the synthetic data), and two on the synthetic one. These last two runs correspond to data sets with and without missing values. As mentioned before, our methodology is computationally expensive. Therefore, for these three runs we use 10^{4} objects samples. The real data sample contains the objects with the highest membership probabilities as given by Bouy et al. (2015b). These objects are closer to the cluster, in the sense of membership probability, than the remaining 9 × 10^{4} objects. Therefore, the field probability density in the region occupied by the cluster is higher and more concentrated (around the cluster) than the field density estimated using the larger and distant to the cluster 10^{5} objects sample. Thus, we assume that results obtained on the smaller sample have higher contamination, and lower recovery rates than those obtained on the larger sample, the more distant 10^{5} objects. The higher contamination and lower recovery rates arise from the concentration and higher values of the field probability density around the cluster, respectively. Therefore, results in the next subsection are upper and lower limits to the contamination and recovery rates, respectively.
3.1. Reliability and impact of missing values
We measure the performance of our methodology as a classifier (member vs. non member) by means of synthetic data on which members and nonmembers are known. To generate the synthetic data we draw 10^{4} random samples of the generative model (see, Sect. 2.1), whose parameters were found using the 10^{4} sample of real data.
As explained in Sect. 2, our data set has a high fraction of missing values. The pattern of missing values is not random and depends on the magnitudes and colours of the objects. Therefore, we reproduced in each synthetic datum the pattern of missing values of one of its closer neighbours in the real data, closer in the euclidean sense. We found the following closer neighbours in each of the CMDs: {K_{s}, J−K_{s}}, {J, J−H}, {K_{s}, H−K_{s}}, {J, Y−J}, {K_{s}, i−K_{s}}. These are, in decreasing order, the bands and colours with the fewer missing values. Assigning the missing pattern of the nearest real neighbour results in a biased sample in which objects with complete (non missing) values are underestimated. This bias roots in the fact that euclidean distances are smaller, or at most equal, when measured in subspaces (missing values) compared to those measured in the entire space (nonmissing values). To avoid this, for each of the previous CMDs we: (i) find the real objects with nonmissing values and calculate their fraction, f_{r}, from the total real data, (ii) take a sample, from the synthetic data, whose fraction, f_{s}, from the total synthetic data, is equal to f_{r}, (iii) assign to the objects in this synthetic sample the pattern of missing values of the nearest neighbour from among the real objects found in (i). In this way, the synthetic data has similar fractions of missing and nonmissing values to those of the real data.
Uncertainties are assigned as follows. We set the proper motions uncertainties to those of the nearest neighbour in the real data. This scheme, however, cannot be applied in the case of photometry. In the photometric space and due to the presence of missing values, the nearest neighbour scheme returns uncertainties that are biased towards the less precise measurements. Again, the euclidean metric results in the preferential choosing of objects with missing values. Since these missing values occur mostly at the faint end, where uncertainties are larger, it results in a bias towards larger uncertainties. To avoid this, we first fitted polynomials (8th degree) to the uncertainties as a function of the magnitudes. Then, we used these polynomials to assign uncertainties to the synthetic photometric data.
To estimate the performance of our classifier to recover cluster members, we applied our methodology to synthetic data sets with and without missing values. In these results, we count the positives (cluster members, TP), negatives (field members, TN), false positives (field members classified as cluster members, FP) and false negatives (cluster members classified as field members, FN). With them we calculated the true positive rate (TPR), contamination rate (CR), accuracy (ACC) and precision (or positive predictive value, PPV), which are defined as follows
We used the mode to summarise membership probability distributions. To quantify the uncertainties of the previous quantities, we draw five realisations of the synthetic data set with missing values. Since we used the results of the nonmissing values data set only for comparisons, we draw it only once.
The left panel of Fig. 4 shows the TPR (solid lines) and CR (dashed lines) in the presence (red lines) and absence (blue lines) of missing values. We measured both quantities as functions of the probability threshold used to define members and nonmembers. In the missing value case, the lines and the shaded grey regions depict the mean and deviations, respectively, of the results from the five synthetic data sets. As it is shown, the missing values have a negative impact in our classification process by diminishing the TPR and increasing the CR. Nevertheless, our methodology delivers low (≲8%) contamination rates above the probability threshold p ≈ 0.75. In this figure and for the sake of comparison, we also show the CR and TPR (as black dots) reported in Table 4 of Sarro et al. (2014). This figure shows that, the TPR of our methodology measured on data without missing values is similar to that of Sarro et al. (2014). This is expected since those authors use only completely observed objects to construct their model. However, we measured the TPR on missing values data, at p_{t} = 0.84, is ≈4% lower than that of Sarro et al. (2014) and the one we measure on nonmissing values data. On the other hand, the CR of our methodology above p = 0.8 outperforms the CR reported by Sarro et al. (2014) in spite of the missing values in our data sets. Nonetheless, we stress the fact that this comparison is not straight forwards because of the following reasons. First, Sarro et al. (2014) infer their cluster model using only nonmissingvalue objects, later they apply it over objects with and without missing values. Second, their synthetic data set and ours are essentially different. They are constructed with different generative models, different number of elements, and different missing value patterns.
The right panel of Fig. 4 shows the ACC and the PPV of our classifier when applied on synthetic data with missing values. The lines and the grey regions depict the mean and the maximum deviations of the results on the five synthetic data set. As this panel shows, the probability threshold with higher accuracy is p_{t} = 0.84. In what follows, and only for classification purposes, we use it as our cluster membership probability threshold. At this threshold the CR is 4.3 ± 0.2%, the TPR is 90.0 ± 0.05%, the ACC is 96.5 ± 0.1%, and the PPV is 95.6 ± 0.2%. The quoted uncertainties correspond to the maximal deviations from the mean of results in the five missingvalues synthetic data sets.
We investigated further the impact of missing values. In Fig. 5 we compare the cluster membership probabilities we recover in the presence of missing values (vertical axis) to those without missing values (horizontal axis). As can be seen in this figure, the missing values impact our results by spreading the membership probabilities. This spread is expected since in general, decisions are compromised by the loss of information. The box (region above p_{t}) contains the objects which can be considered as the contaminants (at p = p_{t}) resulting from missing values. These objects have a small impact, representing only 1.8% of the contamination (indicated by the difference between the CR for missing and complete cases in left panel of Fig. 4 at p_{t}). We note that objects lacking just one observable appear to have larger biases than those lacking two or three. However, this is an artefact of the relative frequencies of their numbers. The most striking difference between both probabilities comes from objects lacking the CI (enclosed in black). Our methodology uses the true CI to prescribe the true photometry, and the observed CI to constrain the marginalisation integral of the true CI. Thus, it is expected that a missing CI will produce a probability spread. These missing CI objects show two different behaviours. In one case, there are sources with membership probabilities p_{complete} ≈ 0 which have overestimated probabilities in the incomplete case (vertical axis). In the other case, the sources in the combed area below the line of unit slope have underestimated probabilities in the incomplete case. While the first case contributes to the CR the second one diminishes the TPR. The first case reaches the maximum difference at p ≈ 0 (difference between red and blue dashed lines in Fig. 4), thus its impact in our results is marginal. The second case, however, represents the unavoidable (in our model) loss of members due to the missing values (4% at p_{t} = 0.84). In a future version we will try to reduce this breach. In spite of the mentioned behaviours, the rootmeansquare (rms) of the difference between membership probabilities of both data sets (with and without missing values) is 0.12, which we consider an small price given the gained improvements due to the inclusion of missing values. This rms drops to only 0.02 for objects with completely observed values (red squares) in both data sets. The previous effects show an overall agreement between results on data sets with and without the missing values, nonetheless, care must be taken when dealing individually with objects lacking this colour index.
Finally, as explained in Sect. 1, our methodology aims to determine the statistical distributions of the cluster population. Our model returns these distributions without any threshold in cluster membership probabilities. In our methodology, each object contributes to the cluster distributions proportionally to its cluster membership probability. In this sense our results are free of any possible bias introduced by hard cuts in the membership probability. Nevertheless, contamination is still present and must be quantified. To quantify it, we compute the expected value of the CR.^{15} It is ⟨CR⟩ = 5.8 ± 0.2%. In it, each CR contributes proportionally to the probability threshold at which it is measured.
Fig. 4. Left: TPR (solid line) and CR (dashed line) of our methodology when applied on synthetic data sets with and without missing values (red and blue lines, respectively). Black dots show the TPR and CR reported by Sarro et al. (2014) for their nonmissing values model. Right: accuracy and precision as a function of probability threshold for our classifier when applied on synthetic data with missing values. The highest accuracy is obtained at p_{t} = 0.84 (red dot). In both panels, the grey areas show the maximum deviations from the mean of the results of the five missingvalues synthetic data sets. 
Fig. 5. Comparison between the cluster membership probabilities recovered from the synthetic data with missing values (incomplete) and without them (complete). The colour and shape indicate the amount of missing values. The symbols enclosed in black indicate a missing CI. The top left box contains objects considered as contaminants due to missing values, at the probability threshold p_{t} = 0.84. 
3.2. Pleiades results
In the previous section we characterised the effectiveness of our methodology, quantified its contamination and found an objective probability threshold based on synthetic data. In this section, we present the results of applying this methodology to the real data set of Sect. 2. First, we give the cluster and equalmass binaries membership probabilities together with a summary of the probability distributions describing the cluster population. Afterwards, we derive the luminosity functions in the J, H and K_{s} bands.
The high dimensionality of our results prevents their direct graphical representation. Nevertheless, in what follows we present them projected onto the subspaces of proper motions and the K_{s} vs. i−K_{s} CMD.
Once the MCMC converged (see Sect. 2.3), we used the last ten iterations (1700 samples of the parameters) to compute the cluster and equalmass binaries membership probabilities (Eq. (6)) for the one and a half million objects in the DANCe DR2. These membership probabilities are summarised in Table 1, which also contains a flag indicating if the object has a missing CI (see Sect. 3.1 for a discussion on the impact of the missing CI). In addition, Figs. 6 and 7 show the cluster and equalmass binaries membership probabilities for those objects considered as cluster candidate members. The figures are projected into the subspaces of proper motions and K_{s} vs. i−K_{s} CMD, and also show the modes of the posterior distributions for parameters in the cluster and equalmass binaries models (with dashed and dotted lines, respectively). We considered an object to be a candidate member if its membership probability plus its sensitivity to the cluster parameters (P_{c}+σ_{Pc}) is larger than the probability threshold p_{t} = 0.84. In the DANCE DR2 data set there are 1973 objects fulfilling this criterion, in the following we refer to them as the High Membership Probability Sample (HMPS). We considered that an object is an equalmass binary if its equalmass binary probability is greater than 0.5. Figure 8 gives the fraction of candidate members, in bins of CI, classified as equalmass binaries by our methodology. Uncertainties are Poissonian.
We summarise the posterior distributions of cluster parameters in Table B.1. It contains the mode and uncertainty of each parameter in our model. Uncertainty is expressed by the 16 and 84 percentiles of the parameter marginal posterior distribution. In Appendix A we give details of these parameters and their definition. Briefly, the first six correspond to the fractions of field, cluster sequence (Cs) and to the weights in the proper motions GMMs of single stars and equalmass binaries. The next 14 describe the true colour index distribution, (fractions, means and variances). The following 14, from Mean PM Cs[1,1] to Variance Cs[4,3], and eight, from Mean PM Bs[1,1] to Variance Bs[2,3], describe, respectively, the proper motions GMM of cluster and equalmass binaries. The next 28 correspond to the coeﬃcients of the cubic splines, with seven coeﬃcients for each band (Y, J, H and K_{s}). The final 15 correspond to the entries of the Cholesky decomposition of the covariance matrix which represents the intrinsic dispersion of the cluster sequence, Σ_{clus}.
In Fig. 9 we show some of these distributions. It depicts objects and models in the subspaces of proper motions an K_{s} vs. i−K_{s} CMD. The grey ellipses delineate the GMM of the field model. We notice that, due to the high amount of missing values, most of the plotted ellipses are empty. The orange lines portray a sample of 100 realisation of the posterior distributions of the cluster parameters. We plot, in black triangles and grey dots respectively, those objects that we classify as candidate members and as field population. We draw the mode of the posterior distributions of parameters modelling single stars and equalmass binaries with dashed blue and dotdashed maroon, respectively. Although the number of Gaussians describing the cluster proper motions for the single stars is four, one of them collapses to fractions and covariances near zero.
For the sake of clarity, the right panel of Fig. 9 does not show the parameters related to the width of the cluster sequence. Thus, this last one appears as a narrow line. Also, and as explained in Sect. 2.1.1, we built the field photometric model using the five photometric dimensions of our data set and, more importantly, we took into account the missing values. For these reasons the grey ellipses of the right panel apparently lack objects inside and in their vicinity.
Fig. 6. Proper motion (left) and K_{s} vs. i−K_{s} CMD (right) of our candidate members (HMPS, see text). Grey dots depict candidate members whose cluster membership probability is below the probability threshold p_{t} but are only included because it sensitivity to the cluster parameters reaches the p_{t}. The lines represent the MAP of the parameters in the equalmass binaries (dotdashed line labelled as MAP EMB) and single stars (dashed line labeled as MAP Singles) models. Standard uncertainties in photometry are in general smaller than symbols. 
Fig. 7. Proper motion (left) and K_{s} vs. i−K_{s} CMD (right) of our candidate members classified as equalmass binaries. Captions as in Fig. 6. 
Fig. 8. Fraction of candidate members classified as equalmass binaries as a function of the CI (binned intervals). Uncertainties are Poissonian. 
Fig. 9. Proper motion (left) and K_{s} vs. i−K_{s} CMD (right) projections of cluster and field models and objects in our dataset. Our candidate members in the HMPS are in black triangles and the field objects in grey dots. Grey ellipses delineate the covariance matrices of the field GMM. The cluster model is represented by the MAP (dotdashed and dashed lines labelled as MAP EMB and MAP Singles corresponding to equalmass binaries and single stars, respectively) and 100 samples (orange lines) of the posterior distributions. 
3.3. Luminosity functions
We derived the distributions of the apparent magnitudes J, H, and K_{s} using the posterior distributions of the parameters in our photometric model. Briefly, we do this by transforming the true CI distribution into the J, H, K_{s} distributions using the splines series and the intrinsic dispersion of the cluster sequence. Appendix A.3 describes in detail how we do this transformation.
Then, we obtained the luminosity distributions using the magnitude distributions, the parallax and extinction of the cluster. We assumed that the parallax is normally distributed with mean, 7.44 mas, and standard deviation 0.42 mas (Galli et al. 2017). This parallax distribution is convolved with the magnitude distributions to obtain the absolute magnitude distributions. We notice that this scheme results in a smoother distribution than the hypothetical one resulting from the transformation of relative to absolute magnitudes by means of individual parallaxes; the smoothness results from a lack of information. However, we do so because we do not have individual parallaxes. Finally, we deredenned them employing the canonical value, A_{v} = 0.12 (Guthrie 1987), which we transform to the J, H, K_{s} values using the extinction law of Cardelli et al. (1989).
Since our methodology prescribes the true photometric quantities based on the true colour index CI, therefore, the completeness limits of this CI dictate those of the photometric bands. The upper completeness limits that Bouy et al. (2015b) estimate for i and K_{s} are i ≈ 23 mag and K_{s} ≈ 18 mag (see their Appendix A). As they also mention, due to the heterogeneous origins of the DANCe DR2 survey, its completeness is not homogeneous over its entire area. To overcome this issue, they identified a region, the inner three degrees of the cluster, with homogeneous spatial and depth coverage and restricted their sample to it. Here, instead of restricting the sample, we assumed that the UKIDSS survey provides the homogeneous spatial coverage at the faint magnitudes, and quote more conservative completeness limits at the bright end. Figure 10 shows the K_{s} and i density for all sources in the Pleiades DANCe DR2. The upper completeness limits correspond to the point with maximum density, i = 21.4 mag, K_{s} = 18.1 mag. For the lower completeness limits we choose i = 13.2 mag and K_{s} = 11.0 mag because the density of brighter objects shows a sharp decline, probably due to saturation. Thus, we defined the CI completeness interval as that of all the points, along the cluster sequence in the K_{s} vs. i−K_{s} CMD, for which i and K_{s} are bounded by their upper and lower completeness limits, respectively. This results on 2.7 < i−K_{s} < 5.6 mag. With it and the cluster sequence, we derived the completeness intervals for the J, H, K_{s}. Finally, we transform these intervals to absolute magnitudes and deredden them.
The luminosity distributions in J, H, K_{s} together with their completeness limits are depicted (orange lines, hereafter continuous BHMBayesian Hierarchical Model) in Fig. 11. For the sake of comparison we also show the luminosity distributions of: (i) our candidate members (HMPS, as a black dashed line), and, (ii) the candidate members of Bouy et al. (2015b, blue dotdashed line). We impute the missing values of the discrete cases using the nearest euclidean neighbour. The difference between the continuous BHM function and the HMPS comes from the imputed missing values and the objects used to obtain them. The BHM uses all objects proportionally to their cluster membership probability while the HMPS uses only the high probability candidate members. We expect differences since the HMPS is not a random sample of the continuous BHM, therefore their distributions are not exactly alike. The differences between the HMPS and that of Bouy et al. (2015b) arise mainly at the bright and faint end (K_{s} ≈ 4 mag and K_{s} ≈ 11 mag). We argue that the origin of these differences lay in our new candidate members and the rejected ones of Bouy et al. (2015b) (as discussed in Sect. 4).
Fig. 10. Density of all DANCe DR2 sources in K_{s} vs i magnitudes. Lines show our completeness limits, 13.2 < i < 21.4 mag and 11 < K_{s} < 18.1 mag. The grey area is considered incomplete. 
Fig. 11. Luminosity functions from J, H, K_{s} derived from the model (orange lines labelled BHM). Also shown the regions of incompleteness and the luminosity functions computed from: the candidate members of Bouy et al. (2015b; dotdashed blue line), and our candidate members (HMPS, dashed black line). 
4. Discussion
In this section we focus on the differences between our results and those found by Bouy et al. (2015b) on the DANCe DR2 data set. First, we discuss the differences in the cluster membership probabilities, particularly on the new candidate members and the rejected ones. Later, we obtain the present day mass function, compare it with theoretical and empirical ones, and elaborate on the statistical differences that we found.
4.1. Comparison with previous results
Our work, and that of Bouy et al. (2015b), although essentially different, have common elements which allow their comparison. Despite the differences, the two works agree on ≈90% of the recovered candidate members (the upper right corner of Fig. 12). In what follows, we detail the differences for individual objects.
In Fig. 12 we directly compare, for objects in our data set, the cluster membership probabilities recovered by both works. Although our results on the posterior distributions of the cluster population do not depend on this probability threshold, we use it here only to illustrate differences in the classification processes. As shown in this figure, there is an overall outstanding, 99.6% agreement between both methodologies, which is shown by the upper right and lower left boxes of Fig. 12. Nonetheless, the differences are worthy of discussion.
The rejected candidates of Bouy et al. (2015b, at the lower right box of Fig. 12) amount to 12% of their candidate members. This value is higher than the contamination rate reported by Sarro et al. (2014), 7.3 ± 1.4%. Also, the fraction of our new candidates (upper left box), 10%, is higher than the 4.3 ± 0.2% CR reported on Sect. 3.1. We plot the new candidates and the rejected ones of Bouy et al. (2015b) in Figs. 13 and 14, respectively. In what follows we address these differences.
The new candidate members have proper motions uncertainties (median ) two times larger than those of the candidate members in common (median ). Also, as shown by Fig. 13, the majority of them (171) have probabilities lower than 0.95, are located in a halo around the locus of the cluster proper motions and on top of the cluster sequence in the K_{s} vs. i−K_{s} CMD. On the contrary, the new candidates with probabilities higher than 0.95 (37), lay in the centre of the cluster proper motions and fall above the cluster sequence in the K_{s} vs. i−K_{s} CMD. Thus, we hypothesise that, (i) objects with photometry compatible with the cluster sequence but in the proper motions halo, have higher membership probabilities in our methodology due to the increased flexibility of the cluster proper motions model (four Gaussians instead of the two of Bouy et al. 2015b), and (ii) objects at the centre of the cluster proper motions but above the cluster sequence are multiple systems (probably triple systems which can amount to 4% of the population Duquennoy and Mayor 1991) with an increased probability of membership due to our more flexible photometric model of the cluster and equalmass binaries sequences.
The rejected candidates of Bouy et al. (2015b), as it is shown in Figs. 14 and 15, have proper motions uncertainties (median ) more than four times larger than those of the candidates in common and are distributed along the cluster sequence. The relatively high membership probability among these objects occurs at the middle of the cluster sequence (green squares of Fig. 15) while the lowest probabilities occur at the bright and faint ends (blue and red triangles of Fig. 15, respectively), where the missing values happen the most. We stress that Bouy et al. (2015b) construct their field model using a sample of ≈20, 000 objects without missing values. Proceeding in that way underestimates the photometric field density in the regions where missing values happen (see Fig. 2). Underestimating the photometric field likelihood leads to an increase in the cluster field likelihood ratio, and therefore it increases the cluster membership probabilities. Furthermore, the proper motions uncertainties of objects at the bright end (median and depicted as blue triangles), faint end (median depicted as red triangles), and at the middle magnitudes (median depicted as green squares) are approximately six, five, and four times larger than those of the candidates in common. Thus, we hypothesise that higher proper motion uncertainties and field likelihoods are responsible for our lower membership probabilities of Bouy et al. (2015b) rejected candidates. However, we stress the fact that, although the probability threshold p_{t} = 0.84 returns the maximum accuracy of our methodology, at this value the TPR is just 90.0 ± 0.05%. Thus, the rejected candidate members of Bouy et al. (2015b) cannot be discarded as potential members. To solve this discrepancy it is necessary to have lower proper motion uncertainties and fewer missing values. Future steps will be taken to try to solve this issue.
Finally, the discrepancies in the individual membership probabilities of both works, Bouy et al. (2015b) and ours, arise from the subtle but important differences between them. The inclusion of missing values in our methodology have two main consequences. First, the use of missing values in the field photometric model leads to lower membership probabilities than those of Bouy et al. (2015b) in the regions where missing values happen the most. Second, the use of missing values in the construction of the cluster model allow us to include the information of good candidate members that were otherwise discarded a priori. This last point, together with the higher flexibility of our cluster model allow us to rise the membership probability of the previously discarded candidates. Furthermore, as shown by the red squares in the upper left corner of Fig. 12, the higher flexibility of our cluster model allow us to include as new candidate members previously rejected objects with complete (nonmissing) values.
Fig. 12. Recovered membership probabilities compared to those of Bouy et al. (2015b). Lines show the 0.75 and p_{t} = 0.84 probability thresholds used in both works. The numbers indicate the new candidate members (top left), rejected candidate members (bottom right), and common candidate members (top right). 
Fig. 13. Proper motion (left) and K_{s} vs. i−K_{s} CMD (right) showing the new candidate members found in this work. Captions as in Fig. 6. 
Fig. 14. Proper motion (left) and K_{s} vs. i−K_{s} CMD (right) showing the rejected candidate members of Bouy et al. (2015b). Captions as in Fig. 6. 
Fig. 15. Proper motion (left) and K_{s} vs. i−K_{s} CMD (right) showing the rejected candidate members of Bouy et al. (2015b). The colours and shapes are a proxy for their K_{s} magnitude. The dotdashed and dashed lines labelled MAP EMB and MAP Singles correspond to the MAP of the equalmass binaries and single star models, respectively. 
4.2. Present day mass function
Now, we proceed to compare the photometric distributions of the cluster population to those present in the literature. First, we computed the present day system mass function (PDSMF) and compare it to the initial mass functions (IMF) of Chabrier et al. (2005) and Thies & Kroupa (2007). Then, we analyse and discuss the differences between the Pleiades PDSMF and those of the Trapezium and Hyades clusters.
We obtained the PDSMF, independently in the J, H, K_{s} bands, by transforming the continuous luminosity distributions obtained in Sect. 3.3. These are transformed into system mass functions using the massluminosity relations given by the BTSettl models of Allard et al. (2012, the grid CIFIST2011bc for the 2MASS Vega photometric system, and i band from the SDSS AB sytem), whit exactly the same grid used by Bouy et al. (2015b, priv. comm.). Since the luminosity functions of Sect. 3.3 correspond to the luminosity of systems (single and binary stars) and the massluminosity relation is not a linear one, we derive the PDSMF by adding the mass distributions of single stars to those of the equalmass binaries taking into account the proportion of equalmass binaries inferred by the model. We notice that: (i) working with only two mass ratios, 0 and 1, is an oversimplistic assumption that we plan to remedy in future versions of our methodology, and, (ii) the PDSMF is derived from the luminosity functions, which in turn are derived from the posterior distributions of the cluster parameters. Thus, the uncertainties in the PDSMFs result from the propagation of those of the cluster parameters. In the literature it is customary to derive the mass distribution from individual masses of stars or systems, and then assign Poisson uncertainties accordingly. The methodology introduced in this work is conceptually different. The uncertainties in our mass distribution result from those of the posterior distribution of the cluster parameters, which in turn are propagated from those of the data and the model itself.
We assumed an age of 120 Myr for the Pleiades together with solar metallicity. We notice that, due to the uncertainty in the age (125 ± 8 Myr; Stauffer et al. 1998) and metallicity of the Pleiades, the previous assumptions are oversimplistic. The massluminosity relation must incorporate all sources of uncertainty (e.g. models, metallicities, ages). However, the analysis of these uncertainties and their impact in the PDMF is outside the scope of the present work. Here, we make these simplistic assumptions to directly compare our results with those of Bouy et al. (2015b). The transformation from luminosities to masses is proportional to the derivative of the massluminosity relation, and indeed very sensitive to it (see D’Antona 1998 for some words of caution). Therefore, we decided to fit the BTSettl grid with splines, and obtain the derivatives from this fit.
Figure 16 shows the logarithmic PDSMF (ξ_{L}) for the J, H, K_{s} bands normalised on the completeness limits of the survey (see Sect. 3.3). Figure 17 shows the PDSMF in the K_{s} band, superimposed to the PDSMF proposed by Bouy et al. (2015b) and, the IMFs of Thies & Kroupa (2007) and Chabrier et al. (2005). For this last one, we show its standard uncertainty (taken from Chabrier 2003) as a sample of blue lines. As shown in this figure, the PDSMFs of this work compare well with each others, and, in the overlap interval, with that proposed by Bouy et al. (2015b). However, the difference that the PDSMFs show above 0.3 M_{⊙} (−0.5 < log M/M_{⊙}) may have its origin on the new and rejected candidate members, which are preferentially M stars (with masses in the range 0.075−0.6 M_{⊙} or −1.12 < log M/M_{⊙} < −0.22). Also, similarly to what Bouy et al. (2015b) pointed out, there is a possible flattening in the PDSMF below 50 M_{Jup} (log M/M_{⊙} < −1.3). However, due to the level of uncertainty in this region we have not enough evidence to claim it.
Using PyMultiNest^{16} (Buchner et al. 2014), we fit three models to synthetic samples of our K_{s} band PDSMF in the completeness interval: a lognormal distribution and two powerlaw distributions, m^{−α}, with two and three segments. Table 2 show the parameters of these models together with their evidence. In addition and for comparison, this table also includes the BIC value for the powerlaw models of Bouy et al. (2015b) and Kroupa (2001), and the lognormal model of Chabrier et al. (2005). Judging by the evidence in our models the best fits are the two and three segment powerlaws (black lines in Fig. 17, labelled as 2Spls and 3Spls, respectively). Judging by the BIC values the best model is the powerlaw, particularly the two and three slopes models of this work and the one of Bouy et al. (2015b). However, given the uncertainties of both the evidence and BIC, each of the three previous models is equally good at describing our data set. Under this similarity of evidences, the prejudice of simplicity can be used to choose the twoslopes powerlaw over the threeslopes powerlaw. This distributions is similar to that found by Bouy et al. (2015b), except for the flat part in the lowmass range and the less step slope in the high mass range. The best models found here are in clear discrepancy with the IMFs of Chabrier et al. (2005) and of Thies & Kroupa (2007). We notice that this apparent discrepancy may have its origin on the not yet established uncertainties in the massluminosity relationships, on dynamical effects associated with age, or most probably in a combination of both of them.
Our PDSMF allows us to give a lower limit to the mass of the cluster. The average and mode mass of the cluster members (computed within the completeness limits) are 0.24 ± 0.01 M_{⊙} and 0.26 ± 0.09 M_{⊙}, respectively. We computed the expected number of cluster members as the integral, over the whole range of membership probabilities, of number of objects at each membership probability, and its value is 3290 ± 140 objects. The product of the mean mass times the expected number of members is M_{⊙}. Since we still lack the high mass range of the PDSMF, this value is a lower limit to the mass of the cluster. However, we cannot make any further claim based on our results because the quoted uncertainties are probably underestimated. They do not take into account the uncertainties in the massluminosity relations, which are yet to be established.
Fig. 16. Normalised PDSMF in J, H, K_{s} bands. 
Fig. 17. Normalised PDSMF in K_{s} band together with the IMFs of Chabrier et al. (2005); Thies & Kroupa (2007) and fits to the PDSMF found by us and Bouy et al. (2015b). 
Parameters, evidences and BIC values of models fitted to the ten synthetic samples of the PDSMF.
4.3. Comparison with empirical mass functions.
Dynamical effects may have an impact on the cluster mass function. Figure 18 (left panel) compares the PDSMF from the Pleiades (120±8 Myr, Stauffer et al. 1998) derived here, to those of the younger (0.2 to 1.4 Myr, Muench et al. 2002) and farther (414 ± 7 pc, Menten et al. 2007) Trapezium, and to the older (648±45 Myr, De Gennaro et al. 2009) and closer (47.5±3.6 pc, McArthur et al. 2011) Hyades clusters. These PDSMFs correspond to those of Fig. 11 of Bouy et al. (2015b, priv. comm.). As mentioned by Bouy et al. (2015b), the abundance of lowmass stars and brown dwarfs in the range 0.03−0.1 M_{⊙} (log M/M_{⊙} ≈ {−1, −1.4}) seems to diminish with time (since the PDSMF is normalised, this produces a relative increase of lowmass stars in the range −0.4 < log M/M_{⊙} < −0.2). This effect is consistent with the classical scenario in which lowmass stars and brown dwarfs are ejected as the cluster relaxes (see for example Moraux et al. 2004; Terlevich 1987). To test the validity of this scenario, at least the statistical significance of the observed differences among the PDSMF of this three clusters, we tested the null hypothesis that the Trapezium and the Hyades have the same PDSMF as the Pleiades. Since we just have the cluster model of the Pleiades, we were not able to perform model comparison in a Bayesian fashion. Thus, to do the statistical comparison of these three clusters PDSMF we use the KolmogorovSmirnov and AndersonDarling tests.
The right panel of Fig. 18 shows the cumulative distribution functions (CDFs) of the Trapezium, Pleiades (only in K_{s} band) and Hyades PDSMFs. Also and for comparison, we show the CDFs of Chabrier et al. (2005) and Thies & Kroupa (2007) IMFs. The grey area around the Pleiades CDF shows the hypothesis test in which we compare each CDF with that of the Pleiades. The null hypothesis is that each compared CDF is exactly that of the Pleiades. We used the KolmogorovSmirnov statistic (Pearson & Hartley 1954) with the alpha value α = 0.01, to compute the maximum vertical distance d_{α} from the Pleiades CDF (shown as the grey region in Fig. 18). The null hypothesis is rejected only if the tested CDF lies entirely outside the grey region around the Pleiades CDF. As can be seen, neither the IMFs nor the PDSMF of the Trapezium and Hyades lay entirely within the grey area, thus we can reject the null hypothesis that they share the same PDSMF of the Pleiades. Furthermore, since the Kolmogorov–Smirnov test uses only the maximum distance between CDFs, we also applied the more robust Anderson–Darling test (Anderson & Darling 1952). In this test, the distance is computed placing more weight in the observations at the tails of the distribution. To transform this distance into a probability we use the statistic and critical values given by Scholz & Stephens (1987)^{17}. This test also rejects the null hypotheses, with probabilities of p < 0.004, that the Trapezium and Hyades PDSMFs, and the Chabrier et al. (2005) and Thies & Kroupa (2007) IMFs have the same CDF as the Pleiades.
Furthermore, we performed the AndersonDarling test but only in the lowmass regime (M/M_{⊙} < 0.1). This test rejects the null hypothesis that the PDSMF of the Pleiades, in this lowmass regime, was drawn from the IMFs of Thies & Kroupa (2007) and Chabrier et al. (2005) with p < 0.05 and p < 0.004, respectively. However, this test does not entirely reject the null hypothesis that the PDSMFs of Hyades, Trapezium and Pleiades are drawn from the same distribution in the lowmass regime. The maximum probabilities render by the uncertainties in the Pleiades PDSMF are p < 0.13 and p < 0.1 for the Hyades and the Trapezium, respectively. Nevertheless, these results must be analysed in the light of better constrained uncertainties. In particular those concerning the Hyades and the Trapezium PDSMFs.
The previous tests show that there is mild evidence to claim for differences among the PDSMFs of these three clusters and from IMFs and Pleiades PDSMF. Thus suggesting that these differences may have an origin on dynamical effects associated with age and relaxation. Nevertheless, to claim for reliable evidence supporting these differences the census of the Trapezium and Hyades must be done using the same methods. Also, the uncertainties must be properly established both for the other PDSMFs and for the massluminosity relation from which all these PDSMF are derived.
Fig. 18. Left: PDSMFs of the Pleiades (derived here for J, H, K_{s} bands), Trapezium, and Hyades, from Bouy et al. (2015b). They are normalised in the interval of completeness. Right: cumulative distribution functions (CDF) of the PDSMFs from left panel and that of Chabrier et al. (2005) and Thies & Kroupa (2007) system initial mass function (normalised also in the interval of completeness). The Pleiades CDF shown is just from K_{s} band. The grey area depicts the area in which the null hypothesis of same PDSMF as that of the Pleiades can not be rejected (at α = 0.01). 
5. Conclusions and perspectives
In this work we have created a methodology which models the photometric and astrometric data of the heterogeneous multiarchive DANCe survey (Bouy et al. 2013). We modelled these data with most of its inherent characteristics: missing values, nonhomogeneous observational uncertainties (heteroscedasticity) and correlations (whenever available). This enables us not just to dramatically increase the number of objects used to construct the cluster and field models (10^{5} compared to the 2 × 10^{4} and 1662 of Sarro et al. 2014, for their field and cluster model, respectively), but also to obtain results that minimise the biases associated to the lack of treatment for missing values and nonuniform uncertainties. The Bayesian framework used here, together with the MCMC techniques, enables us to accomplish our first objective: sample the posterior distribution of the parameters in the cluster model. This sampling also resulted in the accomplishment of our second objective: the cluster and equalmass binaries membership probabilities. Finally, we compared our results to those of previous works and found a general agreement. Since our luminosity functions and PDSMF do not use any probability threshold, they are free of any possible bias associated to it. We also provide the list of candidate members which is the most complete up to date. We estimate that at the probability threshold of p_{t} = 0.84 (at which our methodology performs the best as a classifier) the contamination and true positive rates are 4.3 ± 0.2% and 90.0 ± 0.05%, respectively. We stress the fact that at this probability threshold, up to 10% of true cluster members may still have membership probabilities below it.
The main limitations of our methodology are:
The computing power it demands.
The accuracy of our PDSMF and mass dependent results. The later are now limited by the accuracy of the massluminosity relationship, which still needs to be confirmed and calibrated at low masses and very young ages.
The lack of uncertainty in the field model, which remains fixed at the MLE value.
The lack of treatment of correlations amongst photometric and proper motions observables, which assumes that cluster members are located in a tight distribution of distances.
The methodology presented here represents the ground upon which we will continue improving our data and cluster modelling. In terms of data modelling, future steps will aim to include the spatial, radial velocities, and parallaxes distributions, together with their correlations. This will allows us to deal with more complicated configurations, like two (or more) supper imposed clusters, and to apply our methodology to data from other surveys. Regarding the cluster modelling, in future works we will include: extended photometrical and kinematical treatments of binaries (regardless their mass ratio), multiple systems, white dwarfs, and the treatment of extinction. In particular, this last one will enable us to apply our methodology to even younger and embedded star forming regions.
This assumption means that the probability of measuring certain value of an object is independent of the measured value of another object. The DANCe DR2 sample shows no significant correlation amongst the observables it reports, for more details see Bouy et al. (2013).
For comparison, the methodology of Sarro et al. (2014) would take approximately two days in the same computer cluster.
Implemented in the R package mcmcse (Flegal et al. 2016).
PyMultiNest is a Python implementation of the MultiNest code (Feroz et al. 2009). MultiNest is a multimodal nested sampling algorithm that computes the evidence, with its uncertainty, and posterior samples of possible multimodal distributions.
In the AD test, the distance between two distributions F(x) and G(x) is computed as
with n, m and N the samples of F, G and the total of them. The A^{2} distance is transformed to the T_{2N} statistic following the formula
where the σ_{N} and the critical values of T_{2N} are given in Eq. (4) and Table 1 of Scholz & Stephens (1987), respectively.
Acknowledgments
We are grateful to the anonymous referee for his/her kind, precise and important comments, which considerably improved the quality of this work. J. Olivares acknowledges founding from the Ministère de l’Enseignement Supérieur et de la Recherche, and thanks Sergio Suárez for his kind support in the management of the computer cluster at CAB. Some of the computations presented in this paper were performed using the Froggy platform of the CIMENT infrastructure (https://ciment.ujfgrenoble.fr), which is supported by the RhôneAlpes region (GRANT CPER0713 CIRA), the OSUG@2020 labex (reference ANR10 LABX56) and the Equip@Meso project (reference ANR10EQPX2901) of the programme Investissements d’Avenir supervised by the Agence Nationale pour la Recherche. We are grateful to S. Spiriti, P. Smith and P. Lecuyer for their R package freeknotsplines. This study has received financial support from the French State in the framework of the Investments for the Future Programme, IdEx Bordeaux, reference ANR10IDEX0302. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 682903 COSMICDANCE). We are grateful to I. Thies and P. Kroupa for providing the electronic table of their Pleiades mass function predictions. We are grateful to L. Hillebrand and G. Muench for providing an electronic version of the mass and luminosity functions of the Trapezium cluster presented in their respective articles. P.A.B. Galli acknowledges financial support from FAPESP. This research has been funded by Spanish grants AYA201238897C0201 and AYA201021161C0202. E. Moraux acknowledges funding from the Agence Nationale pour la Recherche programme ANR 2010 JCJC 0501 1 “DESC (Dynamical Evolution of Stellar Clusters)”. J. Bouvier acknowledges funding form the Agence Nationale pour la Recherche programme ANR 2011 Blanc SIMI 56 020 01 (“Toupies”). This publication makes use of VOSA, developed under the Spanish Virtual Observatory project supported from the Spanish MICINN through grant AyA200802156. Based on observations obtained with MegaPrime/MegaCam, a joint project of CFHT and CEA/DAPNIA, at the CanadaFranceHawaii Telescope (CFHT), which is operated by the National Research Council (NRC) of Canada, the Institut National des Science de l’Univers of the Centre National de la Recherche Scientifique (CNRS) of France, and the University of Hawaii. Based on observations obtained with WIRCam, a joint project of CFHT, Taiwan, Korea, Canada, and France, at the CanadaFranceHawaii Telescope (CFHT), which is operated by the National Research Council (NRC) of Canada, the Institute National des Sciences de l’Univers of the Centre National de la Recherche Scientifique of France, and the University of Hawaii. This paper makes use of data obtained from the Isaac Newton Group Archive, which is maintained as part of the CASU Astronomical Data Centre at the Institute of Astronomy, Cambridge. The data was made publicly available through the Isaac Newton Group’s Wide Field Camera Survey Programme. The Isaac Newton Telescope is operated on the island of La Palma by the Isaac Newton Group in the Spanish Observatorio del Roque de los Muchachos of the Instituto de Astrofísica de Canarias. This research used the facilities of the Canadian Astronomy Data Centre operated by the National Research Council of Canada with the support of the Canadian Space Agency. This research draws upon data provided by C. Briceño as distributed by the NOAO Science Archive. NOAO is operated by the Association of Universities for Research in Astronomy (AURA) under cooperative agreement with the National Science Foundation. This publication makes use of data products from the Two Micron All Sky Survey, which is a joint project of the University of Massachusetts and the Infrared Processing and Analysis Center/California Institute of Technology, funded by the National Aeronautics and Space Administration and the National Science Foundation. This work is based in part on data obtained as part of the UKIRT Infrared Deep Sky Survey. This research has made use of the VizieR and Aladin images and catalogue access tools and of the SIMBAD database, operated at the CDS, Strasbourg, France. We are grateful to the Isaac Newton Group for the service observations with LIRIS at the WHT. The William Herschel Telescope and its service programme are operated on the island of La Palma by the Isaac Newton Group in the Spanish Observatorio del Roque de los Muchachos of the Instituto de Astrofísica de Canarias. This publication makes use of data products from the Widefield Infrared Survey Explorer, which is a joint project of the University of California, Los Angeles, and the Jet Propulsion Laboratory/California Institute of Technology, funded by the National Aeronautics and Space Administration. This work is based [in part] on archival data obtained with the Spitzer Space Telescope, which is operated by the Jet Propulsion Laboratory, California Institute of Technology under a contract with NASA. Support for this work was provided by NASA.
References
 Akeret, J., Seehars, S., Amara, A., Refregier, A., & Csillaghy, A. 2013, Astron. Comput., 2, 27 [NASA ADS] [CrossRef] [Google Scholar]
 Allard, F., Homeier, D., Freytag, B., & Sharp, C. M. 2012, EAS Pub. Ser., 57, 3 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Anderson, T. W., & Darling, D. A. 1952, Ann. Math. Statist., 23, 193 [CrossRef] [Google Scholar]
 Barrado, D., Bouy, H., Bouvier, J., et al. 2016, A&A, 596, A113 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Blackwell, T., & Bentley, P. 2002, in Proc. of the 4th Annual Conference on Genetic and Evolutionary Computations, eds. W. Langdon, E. CantuPaz, K. Mathias, R. Roy, & D. Davis (San Francisco: Morgan kaufmann Publishers Inc.), 19 [Google Scholar]
 Bouy, H., Bertin, E., Moraux, E., et al. 2013, A&A, 554, A101 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Bouy, H., Bertin, E., Barrado, D., et al. 2015a, A&A, 575, A120 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Bouy, H., Bertin, E., Sarro, L. M., et al. 2015b, A&A, 577, A148 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Bovy, J., Hogg, D. W., & Roweis, S. T. 2011, Ann. Appl. Stat., 5, 1657 [NASA ADS] [CrossRef] [Google Scholar]
 Buchner, J., Georgakakis, A., Nandra, K., et al. 2014, A&A, 564, A125 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Cardelli, J. A., Clayton, G. C., & Mathis, J. S. 1989, ApJ, 345, 245 [NASA ADS] [CrossRef] [Google Scholar]
 Chabrier, G. 2003, PASP, 115, 763 [Google Scholar]
 Chabrier, G. 2005, in The Initial Mass Function 50 Years Later, eds. E. Corbelli, & F. Palla, Astrophys. Space Sci. Lib., 327, 41 [NASA ADS] [CrossRef] [Google Scholar]
 Chung, Y., Gelman, A., RabeHesketh, S., Liu, J., & Dorie, V. 2015, J. Educ. Behav. Stat., 40, 136 [CrossRef] [Google Scholar]
 Clerc, M., & Kennedy, J. 2002, Trans. Evol. Comp, 6, 58 [CrossRef] [Google Scholar]
 Converse, J. M., & Stahler, S. W. 2008, ApJ, 678, 431 [NASA ADS] [CrossRef] [Google Scholar]
 D’Antona, F. 1998, ASP Conf. Ser., 142, 157 [NASA ADS] [Google Scholar]
 De Gennaro, S., von Hippel, T., Jefferys, W. H., et al. 2009, ApJ, 696, 12 [NASA ADS] [CrossRef] [Google Scholar]
 Duquennoy, A., & Mayor, M. 1991, A&A, 248, 485 [NASA ADS] [Google Scholar]
 Faherty, J. K., Burgasser, A. J., Walter, F. M., et al. 2012, ApJ, 752, 56 [NASA ADS] [CrossRef] [Google Scholar]
 Feeney, S. M., Johnson, M. C., McEwen, J. D., Mortlock, D. J., & Peiris, H. V. 2013, Phys. Rev. D, 88, 043012 [NASA ADS] [CrossRef] [Google Scholar]
 Feroz, F., Hobson, M. P., & Bridges, M. 2009, MNRAS, 398, 1601 [NASA ADS] [CrossRef] [Google Scholar]
 Flegal, J. M., Hughes, J., & Vats, D. 2016, PASP, 125, 306 [Google Scholar]
 Gagné, J., Lafrenière, D., Doyon, R., Malo, L., & Artigau, É. 2014, ApJ, 783, 121 [NASA ADS] [CrossRef] [Google Scholar]
 Galli, P. A. B., Moraux, E., Bouy, H., et al. 2017, A&A, 598, A48 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Gelman, A. 2006, Bayesian Anal., 1, 515 [CrossRef] [Google Scholar]
 Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y.S. 2008, Ann. Appl. Stat., 2, 1360 [CrossRef] [Google Scholar]
 Gelman, A., Carlin, J., Stern, H., et al. 2013, Bayesian Data Analysis (Chapman and Hall/CRC) [Google Scholar]
 Goldman, B., Röser, S., Schilbach, E., et al. 2013, A&A, 559, A43 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Gong, L., & Flegal, J. M. 2016, J. Comput. Graph. Stat., 25, 684 [CrossRef] [Google Scholar]
 Goodman, J., & Weare, J. 2010, Comm. Appl. Math. Com. Sci., 5, 65 [NASA ADS] [CrossRef] [Google Scholar]
 Guthrie, B. N. G. 1987, QJRAS, 28, 289 [NASA ADS] [Google Scholar]
 Hogg, D. W., Myers, A. D., & Bovy, J. 2010, ApJ, 725, 2166 [NASA ADS] [CrossRef] [Google Scholar]
 Hong, D., Balzano, L., & Fessler, J. A. 2016, ArXiv eprints [arXiv:1610.03595] [Google Scholar]
 Huang, A., & Wand, M. P. 2013, Bayesian Anal., 8, 439 [CrossRef] [Google Scholar]
 Jefferys, T. R., Jefferys, W. H., Barnes, T. G., & Dambis, A. 2007, ASP Conf. Ser., 371, 433 [NASA ADS] [Google Scholar]
 KroneMartins, A., & Moitinho, A. 2014, A&A, 561, A57 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Kroupa, P. 2001, MNRAS, 322, 231 [NASA ADS] [CrossRef] [Google Scholar]
 Lindstrom, M. 1999, J. Comput. Graph. Stat., 8, 333 [Google Scholar]
 Lodieu, N., Deacon, N. R., & Hambly, N. C. 2012, MNRAS, 422, 1495 [NASA ADS] [CrossRef] [Google Scholar]
 Malo, L., Doyon, R., Lafrenière, D., et al. 2013, ApJ, 762, 88 [NASA ADS] [CrossRef] [Google Scholar]
 McArthur, B. E., Benedict, G. F., Harrison, T. E., & van Altena, W. 2011, AJ, 141, 172 [NASA ADS] [CrossRef] [Google Scholar]
 McMichael, D. W. 1996, in Proc. Fourth International Symposium on Signal Processing and its Applications (ISSPA), 377 [Google Scholar]
 Menten, K. M., Reid, M. J., Forbrich, J., & Brunthaler, A. 2007, A&A, 474, 515 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Moraux, E., Bouvier, J., & Stauffer, J. R. 2001, A&A, 367, 211 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Moraux, E., Kroupa, P., & Bouvier, J. 2004, A&A, 426, 75 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Muench, A. A., Lada, E. A., Lada, C. J., & Alves, J. 2002, ApJ, 573, 366 [NASA ADS] [CrossRef] [Google Scholar]
 Pearson, E., & Hartley, H. 1954, Biometrika Tables for Statisticians, 2 [Google Scholar]
 Riedel, A. R., Blunt, S. C., Lambrides, E. L., et al. 2017, AJ, 153, 95 [NASA ADS] [CrossRef] [Google Scholar]
 Roberts, G. O., & Rosenthal, J. S. 2004, Probability Surveys, 1, 20 [CrossRef] [Google Scholar]
 Sale, S. E. 2012, MNRAS, 427, 2119 [NASA ADS] [CrossRef] [Google Scholar]
 Sampedro, L., & Alfaro, E. J. 2016, MNRAS, 457, 3949 [NASA ADS] [CrossRef] [Google Scholar]
 Sarro, L. M., Bouy, H., Berihuete, A., et al. 2014, A&A, 14 [Google Scholar]
 Scholz, F. W., & Stephens, M. A. 1987, J. Am. Stat. Assoc., 82, 918 [Google Scholar]
 Schwarz, G. 1978, Ann. Statist., 6, 461 [Google Scholar]
 Shkedy, Z., Decin, L., Molenberghs, G., & Aerts, C. 2007, MNRAS, 377, 120 [NASA ADS] [CrossRef] [Google Scholar]
 Simpson, D. P., Rue, H., Martins, T. G., Riebler, A., & Sørbye, S. H. 2017, Stat. Sci., 32, 1 [CrossRef] [Google Scholar]
 Spiriti, S., Eubank, R., Smith, P. W., & Young, D. 2013, J. Stat. Comput. Simul., 83, 1020 [CrossRef] [Google Scholar]
 Stauffer, J. R., Schultz, G., & Kirkpatrick, J. D. 1998, ApJ, 499, L199 [NASA ADS] [CrossRef] [Google Scholar]
 Takeda, Y., Hashimoto, O., & Honda, S. 2017, PASJ, 69, 1 [NASA ADS] [CrossRef] [Google Scholar]
 Terlevich, E. 1987, MNRAS, 224, 193 [NASA ADS] [CrossRef] [Google Scholar]
 Thies, I., & Kroupa, P. 2007, ApJ, 671, 767 [NASA ADS] [CrossRef] [Google Scholar]
 Trumpler, R. J. 1921, Lick Observatory Bull., 10, 110 [NASA ADS] [CrossRef] [Google Scholar]
Appendix A: Methodology details
This appendix gives specific details of the field and cluster generative models explained in Sect. 2.1, the priors introduced in Sect. 2.2, and the transformations of colour into magnitude distributions mentioned in Sect. 3.3. Also, in Table A.2 we summarise the parameters in our model, their symbols and the priors we use for them. Additionally, at the end of this appendix, we schematically represent the relations among parameters in our model by means of probabilistic graphical models.
Hyperparameters for different blocks of the Hierarchical Bayesian Model.
Parameters names, symbols, and priors.
A.1. Details of the generative model
In what follows, the subscripts pm and ph stand for proper motion and photometry, respectively. Thus, an object with measurements d has proper motions d_{pm} and photometry d_{ph}. Also, we represent the standard uncertainties as the associated covariance matrix ϵ. In it, almost all offdiagonal elements are zero, with the exception of those of the colour index i−K_{s} and K_{s}. Thus, ϵ_{pm} and ϵ_{ph} refer to the covariance matrices of proper motions and photometric standard uncertainties.
A.1.1. Field population model
As explained in Sect. 2.1.1, we model the field photometry and proper motions as independent distributions. We used mixtures of distributions for both these models. A GMM for the photometric model and a mixture of Gaussians and uniform distributions for the proper motions model. Thus, the field likelihood of an object with measurements d and standard uncertainties ϵ is
In this equation, θ_{f} refers to the set of field parameters, with π_{f}, µ_{f}, Σ_{f} standing for the fractions, means and covariance matrices, respectively. The first and second brackets contain the models of proper motions and photometry, respectively. The first term of the proper motion model corresponds to the uniform distribution. In it, c is a constant determined by the inverse of the product of the proper motions ranges, and π_{f,pm,0} is the fraction of this uniform distribution. The second term in the same bracket is the mixture of M_{pm} Gaussians with means µ_{f,pm} and covariance matrices Σ_{f,pm} + ϵ_{pm}. The parameters of the photometric GMM, in the second bracket, are similar to those in the proper motion model, except for the uniform distribution.
A.1.2. Cluster population model
In the cluster, we also assumed independence between the photometric and proper motions models. The photometric model is a mixture of cluster (subindex Cs) and the equalmass binaries (subindex Bs). We modelled each element of this mixture with multivariate normal distributions, where the means are given by the true photometric quantities both of cluster, t_{ph;Cs}, and equalmass binaries, t_{ph;Bs}. The covariance matrices of these multivariate normal distributions result from the addition of the covariance matrix of the standard photometric uncertainties, ϵ_{ph}, with the modelled covariance matrix Σ_{clus}. We assumed that this last one describes the intrinsic dispersion of both cluster and equalmass binaries sequences. Since by definition, covariance matrices are symmetric and positive semidefinite, then the Cholesky decomposition allows us to describe Σ_{clus} with only 15 independent parameters which we infer from the data.
The photometric model of the cluster.
In the photometric model, we prescribe the true photometric quantities both for the cluster sequence, t_{ph;Cs} = {CI, Y, J, H, K_{s}}, and the equalmass binaries, t_{ph;Bs} = {CI, Y − 0.75, J − 0.75, H − 0.75, K_{s} − 0.75}, by means of the cubic spline series, S. These series specify the true photometric quantities by means of the colour index CI, the knots and seven coeﬃcients for each magnitude, β_{Y, J, H, Ks}. Thus, Y = S_{Y}(CI, β_{Y}), J = S_{J}(CI, β_{J}), H = S_{H}(CI, β_{H}), and K_{s} = S_{Ks}(CI, β_{Ks}). We denote the coeﬃcients of all the splines as the 4 × 7 matrix, β. Since the true photometry of the equalmass binaries is a linear transformation, T_{Bs}, of the true photometry of cluster sequence, no extra parameters are required. Therefore,
Thus, the cluster and equalmass binaries likelihoods of an object with photometric measurements d_{ph} and standard uncertainties ϵ_{ph} are
where t_{ph;Cs} and t_{ph;Bs} are given by Eqs. (A.2) and (A.3), respectively.
Modelling the true CI for each object in our data set demands a computing power that we currently lack. Instead, we marginalised them with the aid of a truncated GMM whose fractions (π_{CI}), means (µ_{CI}) and variances (σ_{CI}) we also infer from the data. This GMM is
In Eq. (A.5) above, the symbol N_{t} stands for the truncated (0.8 < CI < 8) univariate normal distribution.
Then, the marginalisation of CI runs as follows:
In these equations, θ_{c} stands for all cluster parameters, and the first and second terms of the integrals in the last equalities correspond to Eqs. (A.4) and (A.5), respectively. Since CI depends only on π_{CI}, µ_{CI}, σ_{CI}, thus, the cluster and equalmass binaries likelihoods of datum d_{ph} are
The proper motions model of the cluster.
For the proper motions models of both cluster and equalmass binaries we use GMM with four and two Gaussians, respectively. Each with its own fractions, π, means, µ and covariance matrices, Σ. However, Gaussians within each GMM share the mean. Since covariance matrices are symmetric, only three independent parameters are needed to describe them. Thus, the cluster and equalmass binaries likelihoods of object with measurements d_{pm} and uncertainties ϵ_{pm} are
Finally, the total cluster likelihood of an object with measurement d and uncertainties ϵ is
where π_{CB} is the parameter representing the fraction of single cluster sequence stars Cs (the non equal mass binaries in the cluster). The photometric and proper motions likelihoods are given by Eqs. (A.7) and (A.8), respectively.
A.2. Details of the priors
In Sect. 2.2 we give the kind of distributions we use for setting our prior beliefs. Here, we give details on the parameter values we chose for these distributions. In the hierarchical Bayesian model formalism, the parameters of the distributions governing the priors are called hyperparameters, here, we stick to that convention. We classified the priors of our parameters into three categories: fractions, means, and covariance matrices.
As mentioned in Sect.2.2, we use the Dirichlet distribution to set the priors of the fractions. For the fieldcluster mixture we set the hyperparameters to α = {98, 2}. The means of the field and cluster fractions distributions resulting from these hyperparameters, correspond to the fraction of objects in our data set, that Bouy et al. (2015b) classified as field an candidate members, respectively. For the clusterequalmass binaries mixtures, we used as hyperparameter values, α_{CB} = {8, 2}, this induce a distribution for the fraction of equalmass binaries whose mean is at 20%, as suggested by Bouy et al. (2015b). For fractions in the cluster and equalmass binaries proper motions we set their hyperparameters to α_{Cs} = {1, 1, 1, 1} and α_{Bs} = {1.2, 8.8}. The first values result in equal priors to all components while the second one induce similar means to those recovered after fitting a GMM to the Bouy et al. (2015b) candidate members. Since the Gaussians in proper motion model of single stars could be interchanged and we observe that a posteriori the fraction of one of them goes to zero, we adopt an even less informative prior for these fractions and set all α_{Cs} to one. Despite this less informative prior, one of the Gaussians still has a negligible contribution in the posterior solution (see Table B.1).
Although the means of the distributions correspond to what Bouy et al. (2015b) found, the variances of these priors allow us to explore wide ranges of values (except for the fraction of field objects), as it is shown in Fig. A.1. However, the narrow variance in the clusterfield mixture correspond to our prior belief about the fraction candidate members within our large data set which we expect to be very small ≤2%. For the fraction in the GMM of the CI distribution, we set all the hyperparameter values to 1, (α_{CI} = {1, 1, 1, 1, 1}), which results in equal means and large variances for all of them alike.
Fig. A.1. Prior distribution of fraction parameters. From top left to bottom right, the distributions of field fraction (π), equalmass binaries fraction (1 − π_{CB}), and the cluster (π_{Cs}) and equalmass binaries (π_{Bs}) fractions in their proper motion GMM, respectively. 
We selected the bivariate normal distribution as prior for the parameters representing the means of the GMM, both for cluster and equalmass binaries. We set the hyperparameters of this bivariate normal as those found after fitting a bivariate normal to the candidate members of Bouy et al. (2015b), both cluster and equalmass binaries together. These values are µ_{µpm} = {16.30, −39.62} and Σ_{µpm} = {{36.84, 1.18}, {1.18, 40.71}}.
We used the Half–t (ν, A) distribution as prior for the covariance matrices in the proper motions GMM. We set the hyperparameter to ν = 3 and A_{pm} = {10^{5}, 10^{5}}. According to Huang & Wand (2013), a value of ν = 3 leads to marginal squared root distributions for all correlation terms of the covariance matrix. And also, arbitrarily large values of A lead to arbitrarily weakly informative priors on the corresponding standard deviation terms.
For the means and variances in the GMM of the CI, we use uniform and Half–Cauchy (0, η) distributions, respectively. The uniform distribution is defined over the CI span (0.8 < CI < 8) while for the scale of the Half–Cauchy we used a value of η = 100.
Again, we make use of the Half–t (ν, A) distribution to establish the prior for the intrinsic dispersion of the cluster sequence, Σ_{clus}. ν is again 3. However, we use A_{ph} = {10, 10, 10, 10, 10}, which are large values compared to those of the photometric uncertainties.
We used univariate normal distributions to establish the priors for the coeﬃcients in the spline series. To find the values of the hyperparameters, we proceeded as follows. First, we discarded equalmass binaries from Bouy et al. (2015b) candidate members. To do this, we iteratively fit the cluster sequence and remove objects above 0.75 magnitudes. Then, to obtain an empirical prior in the region were no members have been found, we complement our list with the browndwarfs from the Faherty et al. (2012) sample that have the same bands as our data set. Finally, we fit the splines, and used the coeﬃcients of this fit as means, µ_{β} of the univariate normal distributions. We set the standard deviation to σ_{β} = {1, 1, 1, 1, 1, 0.5, 0.1}. These values provide a reasonable compromise between cluster sequences compatible with the previously known candidates and those far away or with exotic shapes. We showed a sample of this priors in Fig. A.2. This figure also shows the browndwarfs from Faherty et al. (2012) and the sequence (dashed line) we use to provide the means of the univariate normal distributions. Finally, in Table A.1, we summarise all the hyperparameter values of our Bayesian Hierarchical Model.
Fig. A.2. CMD K_{s} vs. i−K_{s} showing a sample of the prior for the coeﬃcients in the splines series. Also shown the browndwarfs we add from Faherty et al. (2012) sample, and the cluster sequence (dashed line) found after fitting the splines to the browndwarfs and candidate members below the equalmass binaries sequence. 
A.3. Derivation of the magnitude distributions
To derive the J, H, K_{s} magnitude distributions, we used the distribution of the colour index, CI, and the cluster and equalmass binaries photometric sequences (the spline series). We exemplify this derivation on the K_{s} band, but similar transformations apply to the rest of the bands. To obtain the distribution of K_{s} for the cluster objects, we introduced the colour index CI, as a nuisance parameter and then we marginalised it. Thus,
The term p(K_{s}CI, θ_{c}) corresponds to the GMM modelling the distribution of CI (Eq. (A.5)), while p(K_{s}CI, θ_{c}) is the probability of K_{s} given the CI and the cluster parameters θ_{c}. Since our photometric model takes into account the equalmass binaries, we included them proportionally to their fraction, (1−π_{CB}). Thus,
In this equation, Cs and Bs stand for cluster and equalmass binaries sequences, respectively. The terms inside the integrals correspond to Eqs. (A.4) and (A.5). However, since here we focus only on the distribution of K_{s}, we marginalised the rest of the bands. Also, we changed the integration limits to those of the truncated colour distribution (CI_{min} = 0.8, CI_{max} = 8). Finally, we obtain
The derivation of the J and H magnitude distributions is similar to the procedure described for K_{s}. We notice that, the derivation of these magnitude distributions takes into account the equalmass binaries and the systems which could have different mass ratios. Therefore, these distribution are the system magnitude distributions.
A.4. Probabilistic graphical model
A probabilistic graphical model is a graph which expresses the relationships, either deterministic or stochastic, among random variables in a model: parameters and observations. Figure A.3 shows the probabilistic graphical model of our hierarchical Bayesian model. In this figure, the following characteristics apply: (i) conditional relations are depicted with arrows, solid when the condition is stochastic (i.e. given by a probability distribution function) and dashed when it is deterministic, (ii) random variables are surrounded by circles (also known as nodes) while constants by rectangles; the marginalised parameter (CI) is drawn as a square inside a circle, (iii) black dots indicate that categorical variables have been marginalised, (iv) the dimension of constants or independent parameters is written in brackets inside the nodes, (v) figures filled with grey indicate that their value is known (e.g. data), and (vi) plates join variables which repeat together, the number of repetitions is indicated in one corner.
Fig. A.3. Probabilistic graphical model. The left grey plates show the field model. The middle yellow plate shows the node where the likelihood is computed for each datum, d. The right plates describe the relations among parameters in the cluster model. The photometric cluster model (red) is on top, while the proper motions cluster (blue) and equalmass binaries (green) are at the bottom left and right, respectively. See description in the text for more details. 
The left panels of Fig. A.3 represents the set of model parameters that we use to describe the field population. Since these parameters remain fixed throughout the inference (see Sect. 2.1.1), we consider them constants, thus we depict them with grey squares. The right panels show the parameters of the cluster population. The top right and bottom right panels describe the photometric and kinematic models, respectively. The top inner plate inside the photometric panel shows the GMM that we use to describe the CI distribution. The bottom left and right panels show the cluster and equalmass binaries proper motion models, respectively. The plates inside them designate the covariance matrices of each GMM. Because each Gaussian in the mixture shares the mean, it lies outside the plate. Finally, the middle (yellow) plate depicts the comparison between the true quantities and the measured ones. Therefore, it is here where we compute the likelihood of each elements in the data set.
Appendix B: Additional table
Mode, 16 and 84 percentiles of each parameter posterior distribution.
All Tables
Parameters, evidences and BIC values of models fitted to the ten synthetic samples of the PDSMF.
All Figures
Fig. 1. Proper motion (left panel) and K_{s} vs. i−K_{s} CMD (right panel) projections of field models (ellipses and crosses depicting the covariance matrices and means of the GMM) and the density of objects in our dataset (grey pixels in logarithmic scale, shown only to guide the eyes). The colour scale shows the weight of each Gaussian in the GMM. 

In the text 
Fig. 2. Densities in K_{s} vs. i−K_{s} (left panel) and K_{s} vs. J − K_{s} (right panel) CMDs of 10^{5} synthetic sources. We drawn these from two different models: complete (solid line) and incomplete (dashed line). We construct the incomplete model using all objects in our data set, even those with missing values, while for the complete one we use only those object without missing values. The contour values (at 10^{−3}, 10^{−2}, 10^{−1}, 4 × 10^{−1}) are the same for both complete and incomplete models. 

In the text 
Fig. 3. Normalised mean (left panel) and variance (right panel) of each parameter in our model, given the DANCE DR2 data set as functions of iterations in the MCMC. Each parameter is scaled using the mean and variance of its corresponding ensemble of particles positions at the last iteration. Red lines show one and two sigma levels of these normalisation values. These figures depict the evolution of the Markov Chains from the original values provided by the PSO to the convergence. This later shown by the last ∼200 iterations in which the mean and variances are within the twosigma levels. We notice that some parameters evolve (within the MCMC) in groups, which is related to their correlation. 

In the text 
Fig. 4. Left: TPR (solid line) and CR (dashed line) of our methodology when applied on synthetic data sets with and without missing values (red and blue lines, respectively). Black dots show the TPR and CR reported by Sarro et al. (2014) for their nonmissing values model. Right: accuracy and precision as a function of probability threshold for our classifier when applied on synthetic data with missing values. The highest accuracy is obtained at p_{t} = 0.84 (red dot). In both panels, the grey areas show the maximum deviations from the mean of the results of the five missingvalues synthetic data sets. 

In the text 
Fig. 5. Comparison between the cluster membership probabilities recovered from the synthetic data with missing values (incomplete) and without them (complete). The colour and shape indicate the amount of missing values. The symbols enclosed in black indicate a missing CI. The top left box contains objects considered as contaminants due to missing values, at the probability threshold p_{t} = 0.84. 

In the text 
Fig. 6. Proper motion (left) and K_{s} vs. i−K_{s} CMD (right) of our candidate members (HMPS, see text). Grey dots depict candidate members whose cluster membership probability is below the probability threshold p_{t} but are only included because it sensitivity to the cluster parameters reaches the p_{t}. The lines represent the MAP of the parameters in the equalmass binaries (dotdashed line labelled as MAP EMB) and single stars (dashed line labeled as MAP Singles) models. Standard uncertainties in photometry are in general smaller than symbols. 

In the text 
Fig. 7. Proper motion (left) and K_{s} vs. i−K_{s} CMD (right) of our candidate members classified as equalmass binaries. Captions as in Fig. 6. 

In the text 
Fig. 8. Fraction of candidate members classified as equalmass binaries as a function of the CI (binned intervals). Uncertainties are Poissonian. 

In the text 
Fig. 9. Proper motion (left) and K_{s} vs. i−K_{s} CMD (right) projections of cluster and field models and objects in our dataset. Our candidate members in the HMPS are in black triangles and the field objects in grey dots. Grey ellipses delineate the covariance matrices of the field GMM. The cluster model is represented by the MAP (dotdashed and dashed lines labelled as MAP EMB and MAP Singles corresponding to equalmass binaries and single stars, respectively) and 100 samples (orange lines) of the posterior distributions. 

In the text 
Fig. 10. Density of all DANCe DR2 sources in K_{s} vs i magnitudes. Lines show our completeness limits, 13.2 < i < 21.4 mag and 11 < K_{s} < 18.1 mag. The grey area is considered incomplete. 

In the text 
Fig. 11. Luminosity functions from J, H, K_{s} derived from the model (orange lines labelled BHM). Also shown the regions of incompleteness and the luminosity functions computed from: the candidate members of Bouy et al. (2015b; dotdashed blue line), and our candidate members (HMPS, dashed black line). 

In the text 
Fig. 12. Recovered membership probabilities compared to those of Bouy et al. (2015b). Lines show the 0.75 and p_{t} = 0.84 probability thresholds used in both works. The numbers indicate the new candidate members (top left), rejected candidate members (bottom right), and common candidate members (top right). 

In the text 
Fig. 13. Proper motion (left) and K_{s} vs. i−K_{s} CMD (right) showing the new candidate members found in this work. Captions as in Fig. 6. 

In the text 
Fig. 14. Proper motion (left) and K_{s} vs. i−K_{s} CMD (right) showing the rejected candidate members of Bouy et al. (2015b). Captions as in Fig. 6. 

In the text 
Fig. 15. Proper motion (left) and K_{s} vs. i−K_{s} CMD (right) showing the rejected candidate members of Bouy et al. (2015b). The colours and shapes are a proxy for their K_{s} magnitude. The dotdashed and dashed lines labelled MAP EMB and MAP Singles correspond to the MAP of the equalmass binaries and single star models, respectively. 

In the text 
Fig. 16. Normalised PDSMF in J, H, K_{s} bands. 

In the text 
Fig. 17. Normalised PDSMF in K_{s} band together with the IMFs of Chabrier et al. (2005); Thies & Kroupa (2007) and fits to the PDSMF found by us and Bouy et al. (2015b). 

In the text 
Fig. 18. Left: PDSMFs of the Pleiades (derived here for J, H, K_{s} bands), Trapezium, and Hyades, from Bouy et al. (2015b). They are normalised in the interval of completeness. Right: cumulative distribution functions (CDF) of the PDSMFs from left panel and that of Chabrier et al. (2005) and Thies & Kroupa (2007) system initial mass function (normalised also in the interval of completeness). The Pleiades CDF shown is just from K_{s} band. The grey area depicts the area in which the null hypothesis of same PDSMF as that of the Pleiades can not be rejected (at α = 0.01). 

In the text 
Fig. A.1. Prior distribution of fraction parameters. From top left to bottom right, the distributions of field fraction (π), equalmass binaries fraction (1 − π_{CB}), and the cluster (π_{Cs}) and equalmass binaries (π_{Bs}) fractions in their proper motion GMM, respectively. 

In the text 
Fig. A.2. CMD K_{s} vs. i−K_{s} showing a sample of the prior for the coeﬃcients in the splines series. Also shown the browndwarfs we add from Faherty et al. (2012) sample, and the cluster sequence (dashed line) found after fitting the splines to the browndwarfs and candidate members below the equalmass binaries sequence. 

In the text 
Fig. A.3. Probabilistic graphical model. The left grey plates show the field model. The middle yellow plate shows the node where the likelihood is computed for each datum, d. The right plates describe the relations among parameters in the cluster model. The photometric cluster model (red) is on top, while the proper motions cluster (blue) and equalmass binaries (green) are at the bottom left and right, respectively. See description in the text for more details. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.