Issue 
A&A
Volume 645, January 2021



Article Number  A82  
Number of page(s)  11  
Section  Numerical methods and codes  
DOI  https://doi.org/10.1051/00046361/202039332  
Published online  14 January 2021 
Remarks on generating realistic synthetic meteoroid orbits
Astronomical Observatory Institute, Faculty of Physics, A. Mickiewicz University, ul. Sloneczna 36, 60286 Poznań, Poland
email: jopek@amu.edu.pl
Received:
3
September
2020
Accepted:
4
November
2020
Context. To identify the real associations of small bodies, we can use synthetic sets of orbits generated by various methods. These are not perfect methods, therefore the assessment of their quality is an essential task.
Aims. In this study, we compared five methods for generating synthetic meteoroid orbits. Three of them (ME0, KD10, and KDns) had already been proposed in the literature, while two additional ones (ME1 and ME4) are new methods.
Methods. As far as possible, the synthetic orbits were compared with the orbits of the observed meteoroids. For quantitative comparison, we applied a few tests: the χ^{2}distance and the nearest neighbor NN_{N} tests used in previous works, and onedimensional χ^{2} and KolmogorovSmirnov (KS) tests, as well as a twodimensional KS test implemented in this study. To estimate a general property of the orbital sample, we proposed the use of the entropy H_{N} of the data set based on the nearest neighbor distances. Finally, we did a cluster analysis of the synthetic orbits. We calculated and compared the values of the orbital similarity thresholds.
Results. We showed that generating “realistic” meteoroid orbits and testing their quality is a complex issue. An assessment of the quality of the generated orbits depends on the type of test applied, and it refers to the sample of the observed orbits used. Different tests give different assessments. However, in practice, the investigated methods produced similar results if they were applied correspondingly.
Key words: methods: data analysis / methods: numerical / methods: statistical / meteorites / meteors, meteoroids
© ESO 2021
1. Introduction
Searching for streams among meteoroid orbits, including an assessment of their reliability, is based heavily on the artificial orbital samples derived by means of the statistical properties of the observed orbital elements. One can derive the synthetic orbital samples, free from clusters, by different methods (see, for example, Jopek & Froeschlé 1997; Jopek et al. 2003; Pauls & Gladman 2005; Koten et al. 2014; Guennoun et al. 2019). Recently, Jopek & Bronikowska (2017) have shown that the choice of the method used for generating synthetic orbits impacts the assessment of the probability of spurious pairings. Jopek & Bronikowska (2017) tested a few methods and, in the case of meteoroid orbits, they recommend their method as the most reliable. However, Vida et al. (2017) found some drawbacks of this method, and concurrently they proposed an interesting approach based on the Kernel Density Estimation method (KD method).
In this study, we describe an improvement of method E, by eliminating the drawbacks pointed out by Vida et al. (2017), and by introducing into the algorithm a few additional steps that more precisely allow for the correlations between the meteoroid orbital elements. The modified methods have been compared with the KD methods proposed by Vida et al. (2017).
2. Orbital data used in this study
In this study, the same meteoroid data have been used as in Vida et al. (2017), namely, the file CAMSv22013.csv (see Jenniskens et al. 2016) downloaded from the GitHub site^{1}. We used the heliocentric ecliptical osculating orbital elements q, e, ω, Ω, i and the ecliptic latitude β_{G} of the meteor geocentric radiants. Additionally, we use the heliocentric distances of the Earth R_{E}, calculated at the moment of meteor observations by means of the formula taken from Bell & Urban (2012). Using the software posted by Vida et al. (2017) at the GitHub site, from the original CAMS file, we extracted 58 029 orbits of sporadic meteors, which we considered to be an orbital sample free from clusters.
3. Synthetic meteoroid orbits generation methods
In this study, we compared a few methods: ME0 (method E in Jopek & Bronikowska 2017) and its two variations ME1 and ME4. The ME methods use the CPD (cumulative probability distribution) inversion technique parametrized by the choice of the bin width of the histogram used for each orbital element. In the ME0 method, it is assumed that the Earth moves in a circular orbit around the Sun. The ME1 method is based on the ME0 method, in which the assumption of the elliptical orbit of Earth was introduced. In turn, the ME4 method is an extended version of the ME1 method. The values of the argument of perihelion ω were generated by the CPD technique using not one marginal distribution but two, corresponding to the positive and negative ecliptic latitudes of the meteor radiants.
Additionally, we made use of three KD methods: KD0.1, KD10, and KDns (see, Vida et al. 2017). The KD methods use the nonparametric technique; they are based on the multivariate kernel density estimation. The methods depend on the choice of the bandwidth matrix H. This matrix has a decisive influence on the shape (smoothness) of the obtained multivariate probability distribution function. In the work of Vida et al. (2017), three diagonal H matrices were used, two scalar matrices with elements equal to 0.1 and 10 (methods KD0.1 and KD10 respectively) and one nonscalar matrix (KDns method) with elements selected separately for each orbital element. More details about the applied ME and KD methods are provided in the Appendix.
4. Valuation of the synthetic orbital sample
4.1. Comparison of the observed and synthetic samples
The ME method uses the marginal densities represented by the marginal histograms of the observed orbital elements (see Fig. 1). The application of the CPD inversion technique guarantees that the marginal distributions of the synthetic and observed orbital elements are very similar. However, this technique does not conserve the correlations between the meteoroid orbital elements q, e, and ω (see Fig. 2). To address these correlations, Jopek & Bronikowska (2017) added steps 2 and 3 to the ME0 method (see Appendix A.1). However, in their paper no comparisons between the obtained synthetic and observed samples were accomplished, except for a visual assessment.
Fig. 1. Marginal histograms of the osculating orbital elements of the observed subsamples CAMSA and CAMSB (in gray). Each sample consists of 29 014 orbits, the number of bins is 62. The distributions display a clear statistical similarity. Except for one small surplus seen in the ω histogram, the only visible differences are the statistical fluctuations. 
Fig. 2. Distributions of 29 014 sporadic video meteoroids (top – CAMSA and bottom – CAMSB subsample) on the q–e, (1/a)–e, and ω–q planes. Due to the observation selection effect, not all combinations of [q, e], [(1/a),e], or [ω, q] are observable from the surface of the Earth. 
In Vida et al. (2017), to ascertain the differences or similarities between the observed and synthetic orbital sets, a few comparisons were accomplished, namely, a visual comparison of 1D or 2D histograms of the observed and synthetic orbital elements, and a limited quantitative assessment of the similarity between observed and synthetic samples. For this purpose, Vida et al. (2017) used the 2D histogram of ω versus q, for which the socalled χ^{2} histogram distance was calculated (see Pele & Werman 2010) using
where P_{ij} and Q_{ij} are the contents of the ij bins of the synthetic and observed data histograms, respectively. The number of bins are represented by k_{ω} and k_{q}.
In Vida et al. (2017), as an additional statistical property of the sample, the arithmetic mean NN_{N} of the nearest neighbor distances was calculated, which is the averaged minimum distance from each orbit to any other for a given sample. Let D_{N, i} be the nearest neighbor distance of the orbit O_{i} and the other O_{j}: D_{N, i} = min_{j ≠ i, j ≤ N} D(O_{i}, O_{j}), then
where N – is the sample size. To determine the distance D(O_{i}, O_{j}) between two orbits, the D_{SH}function was applied (see, Southworth & Hawkins 1963).
As a statistical property between two orbital samples, Vida et al. (2017) calculated the arithmetic mean OS_{N} of the smallest Dvalues between each synthetic orbit paired with each observed orbit.
To compare the two orbital samples, in the present work we have used ideas from Vida et al. (2017), expanding them slightly, as well as additional new ideas, namely we compared the marginal distributions of q, e, ω, Ω, i as well as the 2D distributions of all pairs of orbital elements using the χ^{2} and KolmogorovSmirnov (KS) tests taken from Press et al. (2002). The advantage of this approach is the ability to estimate whether the compared samples could come from the same population.
To assess the general similarity of orbital samples, we used the concept of entropy. As far as we know, this concept has never been used for this purpose. The entropy H_{N} was calculated on the basis of the nearest neighborhood values D_{N, i}. For each point in the sample, one has to find the distance to its nearest neighbor and apply the formula (Beirlant et al. 1997)
where C_{E} = 0.577215664901532860… is the Euler constant. An analogy between NN_{N} and H_{N} is clearly seen.
Finally, we propose a purely practical approach. For the synthetic orbits, we compared the orbital similarity thresholds used in the cluster analysis. We determined the thresholds by a single linkage cluster analysis and three Dfunctions. More details of this procedure are given in Jopek (2020).
In the next section, we present the results of our calculation of the χ^{2} distances and OS_{N}, NN_{N} statistics, as well as the results of the χ^{2} test and KS test, using the observed and synthetic data sets. Moreover, we present the values of the entropy and orbital similarity thresholds calculated for the synthetic orbital samples.
4.2. Results of comparison and discussion
In this section, we present the results of quality tests on synthetic sets of 58 029 and 29 014 orbits. First, tests carried out by Vida et al. (2017) were repeated. The following sections present the results of the extended comparison of reduced orbital samples. The last section describes the results of the cluster analysis among the generated orbits.
4.2.1. Samples comprising 58 029 orbits
At this stage, we only recalculated the results already obtained by Vida et al. (2017) that are provided in their Table 1. Exploiting the software placed by Vida et al.^{2} on the GitHub website, we generated four synthetic samples and calculated their χ^{2} distances and OS_{N}, NN_{N} statistics (see our Table 1).
χ^{2} distances and SO_{N}, NN_{N} statistics calculated for 58 029 orbits, observed and generated by the ME method and three variations of the KD method, see text.
The χ^{2} distances given in the second column of Table 1 were obtained using q versus ω histograms with 158 × 103 bins, as used in Vida et al. (2017). Our results differ slightly from those found by Vida et al. (2017) due to two reasons: we used corrected software, and we used a different sequence of random values starting with the seed point 2003.
As in Vida et al. (2017), in Table 1 the largest χ^{2} value was obtained for the ME sample, the smallest value for the KD0.1 sample. However, we must point out that the χ^{2} distances given in Table 1 have no clear interpretation. The high value χ^{2} = 19 053 does not necessarily mean that in some applications the ME synthetic sample cannot effectively represent the observed orbits. On the other hand, one may ask, does the smallest value χ^{2} = 1858 indicate that the synthetic sample generated by the KD0.1 method is too similar to the observed one?
Furthermore, the χ^{2} distance considerably depends on the number of applied bins. This is shown in the third column of Table 1 where we placed the χ^{2} distances calculated using histograms with 80 × 80 bins. The number of bins was calculated according to the Rice Rule: k = ⌈2N^{1/3}⌉, where ⌈…⌉ is the ceiling function and N is the sample size. The obtained χ^{2} distances differ significantly from those given in the second column of Table 1. Hence, the question may arise, which χ^{2} value should be used to valuate the quality of a synthetic sample? Thus, we believe that the χ^{2} value determined by Eq. (1) should be used with care for the adjudication of a synthetic orbits generator.
In Vida et al. (2017), the authors used another statistics for a quantitative comparison of the observed and synthetic samples. As was mentioned in Sect. 4.1, the authors proposed two statistics: the arithmetic means NN_{N} and OS_{N}. We also calculated NN_{N} and OS_{N} and the obtained results are given in Table 1. Our results differ slightly from those given in Table 1 in Vida et al. (2017) for the same reasons as those we mentioned earlier. The highest values of NN_{N} and OS_{N} are for the ME sample. Vida et al. (2017) considered the OS_{N} = 0.0995 as an upper bound for an acceptable synthetic orbit data set. As the lower bound, Vida et al. considered a value of OS_{N} = 0.0408. Certainly, a low value of OS_{N} may imply a strong similarity between the two samples. However, we are interested in a statistical similarity. Therefore, as in the case of the χ^{2} distance, one may ask, which limit values of OS_{N}, NN_{N} may be relevant to the issue of the comparison between synthetic and observed samples? We address this question in the next section.
4.2.2. Samples comprising 29 014 orbits
To find the expected values of the χ^{2} distances, OS_{N} and NN_{N}, and other statistics, we split the original CAMS sample into two subsamples, CAMSA and CAMSB, each containing 29 014 orbits. The CAMS sample is sorted by increasing values of the ecliptic longitude of the Sun at a meteor instant, hence to split the data sample we just used the catalog ordinal numbers of orbits: the evennumbered orbit entered sample CAMSA, the oddnumbered entered sample CAMSB.
The CAMSA and CAMSB subsamples were taken from the same “population” of the observed orbits, hence their statistical properties should be very similar. In Fig. 1 we see that their marginal distributions of the orbital elements display a clear statistical similarity. This similarity is also visible in the 2D distributions in Fig. 2. This figure shows the effects of observational selection that correlate some elements of the meteoroid orbit. Because a typical meteoroid can be observed only in the form of a meteor phenomenon in the Earth’s atmosphere, an approximate relationship between its orbital elements q, e, and ω is often used (see, e.g., Jopek & Froeschlé 1997; Babadzhanov et al. 2012), namely
where R_{E} in astronomical units is the heliocentric distance of a meteoroid at its moment of collision with Earth. The sign at the ecosω is positive for the negative geocentric ecliptic latitude of a meteor radiant, β_{G} < 0. Assuming a circular orbit of the Earth (R_{E} = 1), the relationship q = f(ω) has been drawn in Fig. 2 in the form of continuous lines. Due to the nonzero eccentricity of the Earth’s orbit, R_{E} belongs to the interval
and as a result, the points corresponding to the observed meteoroids surround the curve for R_{E} = 1.
If we ignore the size of the Earth, that is, if we assume that meteors are observed at the center of the Earth, then Eq. (4) is exact. Violations of Eq. (4) and Condition (5) do not happen often (see Jopek et al. 2010), but in 58 029 sporadic meteoroid orbits from the CAMS catalog they occurred ∼2000 times.
Another consequence of the observational selection concerns the values of the meteoroid’s orbital elements e, q, and 1/a. As we see in Fig. 2, certain combinations of the values of these elements are unacceptable for meteoroids observed from the Earth’s surface. For the given values of e and R_{E}, the limit values for q and 1/a are
where R_{E} belongs to interval (5). Equations (6) describe the boundary curve q(e) and two limits for (1/a) seen in Fig. 2.
The similarity of the 1D and 2D distributions, shown in Figs. 1 and 2, confirms our assumption that both CAMSA and CAMSB samples represent statistical properties of the same population. Therefore, for these samples we calculated the values of the statistics used, and they were applied as the reference values when the synthetic data were evaluated. We repeated the calculations of the χ^{2} distance, and OS_{N} and NN_{N} statistics for reduced synthetic samples of 29 014 orbits (see Table 2). To find the χ^{2} distances, we used ω–q histograms with 62 × 62 bins, and the number of bins was calculated according to the earlier mentioned Rice Rule.
Results of the quantitative comparison of 29 014 observed and synthetic meteoroid orbits.
In Table 2, the χ^{2} values are two to three times smaller than those provided in Table 1. In Table 2, the largest χ^{2} value was calculated for the ME sample, the smallest one corresponds to the KD0.1 sample. Moreover for the NN_{N} statistics, the ME sample shows the large difference compared to the observed orbits. However, while it is not the largest one, the largest difference was produced by the KD0.1 method. Vida et al. (2017) interpret the OS_{N} and NN_{N} as the indicators of statistical similarity between the synthetic and observed samples. However, in comparison with the reference values, the small values OS_{N} = 0.048 and NN_{N} = 0.051 given in Table 2 do not mean that the KD0.1 sample is statistically very similar to the observed one. This is obvious when we look at Fig. 3. The NN histograms for the observed and KD0.1 samples should look similar, but here the discrepancy is considerable, and we have to admit that we do not know what the reason is. In addition, in Table 2, for the KD0.1 method the value of the χ^{2} distance is too small. The χ^{2} = 473 is smaller than χ^{2} = 760 calculated for the CAMSA and CAMSB, two samples drawn from the same ‘population’.
Fig. 3. Histograms of the values used to calculate the OS_{N} (top) and NN_{N} (bottom) statistics. The dashed red lines correspond to the OS_{N} and NN_{N} mean values, the continuous lines correspond to the median values. The OS histograms were produced using the observed CAMSA and CAMSB orbits (left), the CAMSA and KD0.1 (middle), and the CAMSA and KDns samples (right). The NN histograms correspond to CAMSA, KD0.1, and KDns samples, respectively. For the KD0.1 synthetic data, the histograms are considerably different from the remaining ones. 
The OS_{N} and NN_{N} statistics do not give an unambiguous assessment of the quality of the synthetic orbit generator. As we can see in Table 2, if we use OS_{N} statistics, then method KD10 turns out to be the best; if we choose NN_{N} statistics, then method KDns is the best.
In conclusion, we believe that the application of the χ^{2} distance, and OS_{N} and NN_{N} statistics only may not be sufficient to decide which synthetic sample more adequately represents the observed data.
4.3. Extended comparison and discussion
To assess whether the synthetic orbital data sets are of acceptable quality, in Vida et al. (2017) the authors used the χ^{2} distance calculated for ω and q only, so using only a part of the orbital sample, and as we have shown such a limited approach may not always be applicable. We think that to evaluate the properties of a synthetic orbit generator, one should perform a more comprehensive test that seeks to establish the statistical similarity of two data sets using all five orbital elements. Moreover, from a practical point of view it is essential to compare the results obtained with different synthetic samples used in a specific issue, for example, in a cluster analysis.
In the next section, we present the results of the extended tests of the observed and synthetic data. We also present the results of calculations of the thresholds of orbital similarity, which are key parameters of each cluster analysis using the meteoroid data.
4.3.1. χ^{2} and KS statistical tests
Can we prove that two orbital data sets are consistent with a single distribution function? As far as we know, there is no affirmative answer to this question. Hence, we do not have a readyforuse solution. However, using statistical tests, we can compare the 1D and 2D distributions of the orbital element samples. Hence, in this study, we wanted to find which available statistical tests can be helpful in assessing the quality of the orbital sample generators. We used the χ^{2} test (not to be confused with the χ^{2}distance given by Eq. (1)) and two KS tests. All tests were performed using the subroutines taken from the ‘Numerical Recipes’ package (Press et al. 2002), namely: chstwo, kstwo, and ks2d2s. Each subroutine returns an estimate of the significance prob parameter, which determines the level of consistency between compared distributions. The prob takes values in the range [0, 1], however, concerning the critical value of this parameter, in Press et al. (2002) the authors only note that a small value of prob indicates a significant difference between the compared distributions. Hence, to estimate what values of prob can be expected, we performed the χ^{2} and KS tests using the observed CAMSA and CAMSB samples.
4.3.2. Onedimensional χ^{2} tests
Making use of the 1D χ^{2} test, we compared the marginal distributions of e, q, ω, Ω, i orbital elements of the CAMSA sample with CAMSB and synthetic samples. The results of the tests are collected in Table 3. We tested five methods, and their algorithms are described in the Appendix.
Results of the 1D χ^{2} test (first section) and 1D KS test (second section) for marginal distributions of the orbital elements.
As can be seen in Table 3, the comparison of the marginal distributions of the perihelion distances q, gave negative results for all synthetic samples. This is not surprising in the case of ME methods, where q is not generated using its marginal distribution of the observed sample. In each of the ME methods the perihelion distance was calculated using Eq. (4), substituting the generated values of e, ω as well as the heliocentric distance R_{E} of Earth at the moment of time of the meteor observation.
For the remaining orbital elements, the results of the χ^{2} test are usually more favorable for ME methods. This is due to the fact that KD orbits are not generated on the basis of the marginal distributions. Similarly, one can explain why the prob values for e, i, and partly for ω tests are lower for the CAMSB sample than for synthetic orbits generated by ME methods. The CAMSA and CAMSB samples were obtained based on the sorted values of the ecliptic longitudes of the Sun at the meteor instant (see Sect. 4.2.2). This also explains the highest prob value (100%) obtained for the test of Ω distributions of the CAMSA and CAMSB samples.
In Fig. 4 we plotted the differences of frequencies corresponding to the same bins of the histograms of marginal distributions of the orbital elements. The left column illustrates the differences between the CAMSA and CAMSB subsamples, the remaining columns show the differences between the CAMSA and KDns and CAMSA and ME4 samples. Except for a few cases, the differences are essentially equivalent to each other. However, the KDns method does not cope with generating orbits with large eccentricities. Similar problems arise for small and large perihelion distances, and for small inclinations. In case of the ME4 method, the same problems arise for small and large perihelion distances. The reference differences between CAMSA and CAMSB data are clearly random. This cannot be said for the differences in eccentricity, perihelion distance, and partly in inclination for the KDns sample. However, also for the ME4 method, the systematic trend is evident for the perihelion distance. Taking into account the results of the χ^{2}tests from Table 3 and the course of differences in Fig. 4, in our opinion, the ME4 method gives more consistent results than the KDns method.
Fig. 4. Plot of the differences of the absolute frequencies of the marginal distributions of orbital elements of the observed and synthetic data. From the left, in the columns we have the differences between CAMSA and CAMSB data, between CAMSA and KDns data, and between CAMSA and ME4 data. Positive values indicate the prevalence of frequencies in a given bin for CAMSA data. 
4.3.3. Onedimensional KS tests
Use of binned data involves a loss of information, however. Additionally, there is often considerable arbitrariness as to how the bins should be chosen. The results given in the first section of Table 3 were obtained assuming the following ranges for the observed and synthetic orbital elements: e ∈ [0, 1), q ∈ [0, 1.068), i ∈ [0.180), ω ∈ [0, 360) and Ω ∈ [0, 360). For each orbital element, these intervals were divided by 62, and the number of bins was calculated by the Rice Rule.
To avoid the problem of the influence of the number of bins, we applied the KS test to the unbinned data of e, q, ω, Ω, i orbital elements. We used the 1D KS test, and more particularly, we used the kstwo subroutine taken from the ‘Numerical Recipes’ package (Press et al. 2002). The results of the KS tests are given in the second section of Table 3. Like the earlier χ^{2} tests, the 1D KS test for q gave negative results for all synthetic samples. Furthermore, like the previous test the KS test shows that the ME methods give results closer to the observed sample than the KD methods.
4.3.4. Twodimensional tests
We describe the results of a comparison of the observed and synthetic samples applying the χ^{2}distance (Eq. (1)) and a twodimensional KS test for all pairs of orbital elements. The 2D KS test was performed using the ks2d2s subroutine taken from the ‘Numerical Recipes’ package (Press et al. 2002). Results are provided in Table 4. The first section of Table 4 provides the χ^{2} distances between the observed CAMSA and the synthetic KD10, KDns, ME0, ME1, and ME4 samples. The results for the observed CAMSB data are for reference. As we can see, on the basis of this table, it is impossible to establish which method gives the best orbits. The result depends on what pair of orbital elements we choose. Choosing the pair q–ω, the ME4 method gives the result closest to the observed sample. Choosing q–e, KDns is the best. Choosing the pair q–i we see that for the KDns the χ^{2} distance is smaller than for the reference CAMSB sample, and this indicates that KDns gives orbits that are too similar to the observed sample. Hence, for the pair q–i, the KD10 method takes precedence. The same is true for the pair q–Ω, therefore the ME4 method takes precedence. In the case of KDns method, for five orbital element pairs, the χ^{2} distances prove to be smaller than for the CAMSB data, for the KD10 method three times, respectively. On the other hand, this was not the case with all ME methods.
Distances of χ^{2} (first section) and results of 2D KS tests (second section) for the pairs of orbital elements.
The results of the 2D KS tests for all pairs of orbital elements are given in the second section of Table 4. As one can see, in general the results of these tests are negative for all methods. Some exceptions are the results obtained for the pair e–Ω and ME0 and ME1 methods. To sum up, it follows that assessing the quality of the synthetic orbit generator only on the basis of comparing boundary distributions can be problematic. Therefore, in the next section we describe the results of further tests based on NN_{N} and H_{N} statistics.
4.3.5. Application of NN_{N} and H_{N} statistics
To compare the synthetic orbit generators, we also set the values of statistics NN_{N} and H_{N}. We used formulas (2) and (3) and three Dfunctions, D_{SH}, D_{H}, and ρ_{1} described in Southworth & Hawkins (1963), Jopek (1993), and Kholshevnikov et al. (2016) respectively. In the previous tests, we used the marginal 1D or 2D distributions of the observed and synthetic samples. Using NN_{N} and H_{N} statistics is a different approach, because to calculate them we have to use all orbital elements simultaneously. The results obtained are given in Table 5.
Values of the NN_{N} and H_{N} statistics calculated for the observed CAMSA and CAMSB samples, and for the orbits generated by the KD10, KDns, ME0, ME1, and ME4 methods.
This time, the values of NN_{N} and H_{N} for the KD methods proved to be closer to the values for the observed samples than for the ME methods. The KDns gave the closest values for all Dfunctions, the ME4 method gave the closest values among the ME methods. The values of NN_{N} and H_{N} depend on the Dfunction used in the calculations, which is understandable because these functions are not mutually equivalent. For the ρ_{1} function, the results were always greater than for the remaining ones. For the D_{SH} and D_{H} functions, the results are similar to each other, but in general for the former they are slightly smaller.
It might seem that both statistics are equivalent to each other and differ only by a scaling factor. In general, however, this is not the case, and this can be seen when comparing the NN_{N} and H_{N} values calculated for the CAMSB and KDns samples. For the D_{SH} function, the NN_{N} values for both these samples equal 0.097, while for the H_{N} they are 9.076 and 9.098, respectively. A similar situation occurs when we compare the results obtained using the D_{H} function for the same CAMSB and KDns samples. The NN_{N} statistics are 0.098 and 0.096 and H_{N} are 9.086 and 9.090. The mean neighbor distance NN_{N} for the CAMSB sample may be smaller or greater than for a given KDns sample, while the entropy of CAMSB sample may be smaller or greater than for the same KDns sample.
It should be recalled that these statistics differ in their interpretation; only the H_{N} is a measure of the entropy of a data set, which in our opinion gives the H_{N} statistics an important advantage over NN_{N}.
4.3.6. Comparison in practical application
Both NN_{N} and H_{N} statistics are the point estimators, hence by using them the assessment of the quality of the generators may only be very general. We believe that a more valuable estimate of the usefulness of synthetic orbit generators can be obtained by comparing the results obtained in a specific practical application. In this section, we present the results of calculations of the thresholds of the orbital similarity, which are key parameters of each cluster analysis. For the orbits generated by KD and ME methods, the thresholds were determined by the numerical experiment as described in Jopek (2020). Applying a single linkage cluster analysis algorithm, the thresholds were found for each meteoroid group of M = 2, 3, …20 members, and for each of the D_{SH}, D_{H}, and ρ_{1} distance functions. All obtained results corresponded to a low probability (less than 1%) of chance clustering. The final individual thresholds D_{M}, M = 2, 3, …20 were calculated as the arithmetic means of the D_{M} values determined in each of the one hundred orbital samples. The final values are given in Table 6 and their graphical illustration is shown in Fig. 5.
Values of the orbital similarity thresholds calculated for the synthetic samples obtained by KD and ME methods.
For obvious reasons, we could not compare the results obtained for the synthetic orbits with the thresholds calculated for the observed orbits, hence, our comparisons relate only to synthetic orbits. As expected, the threshold values proved to depend on the applied Dfunctions and the synthetic data set. Although in all cases the thresholds increased monotonically with the size M of the identified group, this dependency is quite diverse. In Fig. 5 we see that in the case of the D_{SH} function the highest thresholds are for the KD10 orbits, the smallest one for the ME4 orbits. This means that if we use the D_{SH} function and the KD10 orbits in the cluster analysis, then by using the obtained thresholds in the sample of observed orbits, some of the results may turn out to be artifacts, both for the number of identified streams and the number of their members. Using the ME4 orbits, we can face the opposite situation, we cannot identify certain streams, and the number of members of the identified streams may be significantly smaller.
Fig. 5. Orbital similarity thresholds calculated for synthetic samples ME0, ME1, ME4, KD10, and KDns, using the D_{SH}, D_{H}, and ρ_{1} functions. The thresholds were determined for groups of M = 2, 3, …, 20 members, identified among 29 014 orbits. 
When the D_{H} function is applied, the smallest thresholds are for the KDns orbits, the highest for the ME1 orbits. Finally, in the case of the ρ_{1} function, again, the smallest values are for the ME4 orbits, the highest for the KD10 orbits.
However, we cannot claim that any of the calculated threshold values are too large or too small. That is because we cannot compare the results obtained for the synthetic and observed orbits here. We can only claim that in the case of cluster analysis, among the CAMS orbits (using a single linkage algorithm and D_{SH} or ρ_{1} function), the values of the orbital similarity thresholds will be the smallest if we generate synthetic orbits using the ME4 method. If we apply the D_{H} function, the smallest thresholds will be obtained using the KDns method. Moreover, we cannot be sure that what has been said above would also be true for the observed orbits taken from another source or if another cluster analysis were applied.
5. Conclusions
We compared five methods for generating synthetic meteoroid orbits. Three of them (ME0, KD10, and KDns) had already been proposed in the literature, two additional methods (ME1 and ME4) are new. As far as possible, the synthetic orbits were compared with the orbits obtained as a result of observations, otherwise, the comparison is of a relative nature. For quantitative comparison, we applied a few tests: the χ^{2}distance and NN_{N} tests used in Vida et al. (2017), and, implemented in this study, onedimensional χ^{2} and KS tests, as well as a twodimensional KS test. In addition to the NN_{N} statistics, to estimate a general property of the orbital sample, we used the entropy H_{N} of the data set based on nearest neighbor distances. Finally, we did a cluster analysis among the synthetic orbits. We calculated and compared the values of the orbital similarity thresholds.
As a result of our research, we can make a few conclusions summarized by the following points:

1.
Using the χ^{2} distance (see Eq. (1)) for one pair of orbital elements only, for example, ω–q, it is not sufficient to decide which synthetic sample more adequately represents the observed data. It may be risky, as we see in Table 4. The result of the comparison clearly depends on the pair used in the test.

2.
The same can be said about the χ^{2} and KS tests performed for the marginal distributions of each orbital element. You cannot judge the quality of the orbit generator by only testing the data for a single orbit element (see Table 3). Nevertheless, in our study, the ME methods performed significantly better in the 1D tests than the KD methods.

3.
The 2D KolmogorovSmirnov test gave negative results for all compared ME and KD methods. In our opinion, it follows that the algorithms of the tested methods are not capable of generating ‘realistic’ meteoroid orbits.

4.
The NN_{N} and H_{N} statistics gave a clear result. The values of NN_{N} and H_{N} for the orbits obtained by KD10 and KDns are closer to NN_{N} and H_{N} values calculated for the observed orbits. Both statistics are based on nearest neighbor distances, but only the H_{N} statistics is the entropy of a data set.

5.
The results of the cluster analysis among synthetic orbits turned out to be dependent on the applied Dfunction of orbital similarity. For D_{SH} and ρ_{1} functions, the smallest threshold values were obtained for the ME4 method. In the case of D_{H} function, the smallest values were obtained for KDns orbits. By the relative comparison, we cannot decide which orbits give the thresholds as too large or which as too small.
The results showed us that the generation of “realistic” meteoroid orbits is a complex issue. All methods are based on the observed data, which are heavily biased by observational selection, responsible for a number of correlations between the orbital elements. Including these correlations in an algorithm of a given method is not an easy matter. As far as we know, these correlations are not yet well understood. Perhaps an improvement in the situation can be achieved by using the geocentric parameters of the observed meteoroids (see Froeschlé et al. 1999; Valsecchi et al. 1999).
The same can be said about the tests and comparisons made in our study. This is also a complex issue. They give very limited results depending on the type of the test, the orbital elements used, the Dfunction applied, and other factors.
However, we were pleasantly surprised that, in the case of comparisons based on the determination of the thresholds of the orbital similarity, the results obtained, albeit different, were quite similar. Thus, at least in the case of the single linkage cluster analysis, we think that all ME and KD methods can be applied successfully.
Acknowledgments
The author acknowledges the anonymous referee for the comments and suggestions which improved the manuscript. We acknowledge Peter Jenniskens for his kind decision to make the CAMS video data available to the meteor astronomers community. Tadeusz J. Jopek was supported by the National Science Center in Poland (project No 2016/21/B/ST9/01479). This research has made use of NASA’s Astrophysics Data System Bibliographic Services.
References
 Babadzhanov, P. B., Williams, I. P., & Kokhirova, G. I. 2012, MNRAS, 420, 2546 [NASA ADS] [CrossRef] [Google Scholar]
 Beirlant, J., Dudewicz, E. J., Gyorfi, L., & van der Meulen, E. 1997, Int. J. Math. Stat. Sci., 6, 17 [Google Scholar]
 Bell, S. A., & Urban, S. E. 2012, The Astronomical Almanac for the Year 2013 (Dept. of the Navy; HAR/PSC edition (2012)), 612 [Google Scholar]
 Froeschlé, C., Jopek, T. J., & Valsecchi, G. B. 1999, Celest. Mech. Dyn. Astron., 73, 55 [CrossRef] [Google Scholar]
 Guennoun, M., Vaubaillon, J., Čapek, D., Koten, P., & Benkhaldoun, Z. 2019, A&A, 622, A84 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Jenniskens, P., Nénon, Q., Albers, J., et al. 2016, Icarus, 266, 331 [NASA ADS] [CrossRef] [Google Scholar]
 Jopek, T. J. 1993, Icarus, 106, 603 [NASA ADS] [CrossRef] [Google Scholar]
 Jopek, T. J. 2020, MNRAS, 494, 680 [CrossRef] [Google Scholar]
 Jopek, T. J., & Bronikowska, M. 2017, Planet. Space Sci., 143, 43 [NASA ADS] [CrossRef] [Google Scholar]
 Jopek, T. J., & Froeschlé, C. 1997, A&A, 320, 631 [Google Scholar]
 Jopek, T. J., Valsecchi, G. B., & Froeschlé, C. 2003, MNRAS, 344, 665 [NASA ADS] [CrossRef] [Google Scholar]
 Jopek, T. T., Rudawska, R., & ZiomekPretka, H. 2010, Proceedings of the International Meteor Conference, 27th IMC, Sachticka, Slovakia, 2008, 91 [Google Scholar]
 Kholshevnikov, K. V., Kokhirova, G. I., Babadzhanov, P. B., & Khamroev, U. H. 2016, MNRAS, 462, 2275 [CrossRef] [Google Scholar]
 Koten, P., Vaubaillon, J., Čapek, D., et al. 2014, Icarus, 239, 244 [NASA ADS] [CrossRef] [Google Scholar]
 Pauls, A., & Gladman, B. 2005, Meteorit. Planet. Sci., 40, 1241 [NASA ADS] [CrossRef] [Google Scholar]
 Pele, O., & Werman, M. 2010, European Conference on Computer Vision, 749 [Google Scholar]
 Press, W. H., Teukolsky, S. A., Vetterling, W. I., & Flannery, B. P. 2002, Numerical Recipes in C, 2nd edn. (Cambridge University Press) [Google Scholar]
 Southworth, R. B., & Hawkins, G. S. 1963, Smithsonian Contrib. Astrophys., 7, 261 [Google Scholar]
 Valsecchi, G. B., Jopek, T. J., & Froeschlé, C. 1999, MNRAS, 304, 743 [NASA ADS] [CrossRef] [Google Scholar]
 Vida, D., Brown, P. G., & CampbellBrown, M. 2017, Icarus, 296, 197 [CrossRef] [Google Scholar]
 Zieliński, R. 1979, Random Number Generators (Warszawa: Wydawnictwo Naukowo Techniczne), 1 [Google Scholar]
Appendix A: Generation of the synthetic orbits
A.1. ME methods
ME0. The algorithm of the ME0 method (in Jopek & Bronikowska 2017, it was named ME) involves the following steps:

1.
For a given observed sample, obtain the histograms of the orbital elements and additionally find the fraction of the orbits β_{fr} for which the geocentric ecliptic latitudes of the meteor radiant β_{G} > 0.

2.
Generate e, ω, Ω, i separately by the CPD inversion method, (Zieliński 1979).

3.
Using the Earth crossing condition (4) with R_{E} = 1 and the obtained e, ω calculate q_{c}; choose the sign at the ecosω by generating the number b distributed uniformly, U(0, 1); if b > β_{fr} , choose sign “−” otherwise choose sign “+”.

4.
Using q_{c} and the e generated earlier, calculate the inverse of the semimajor axis 1/a_{c}, if 1/a_{c} > (1/a)_{max}, repeat steps (2), (3), (4). ( The inverse (1/a)_{max} is the maximal value in the observed orbital sample).
ME1. This method is based on the ME0, however we incorporated a suggestion given in Vida et al. (2017), which was to set in formula (4), the distance R_{E} depending on the ecliptic longitude λ_{E} of the Earth at the moment of time of the meteor appearance. Both λ_{E} and R_{E} values were calculated by means of the formulas taken from Bell & Urban (2012). Figure A.1 shows that dependence R_{E} = f(λ_{E}) is not random, therefore we applied its approximation
Fig. A.1. Dependence of the heliocentric distance R_{E} of Earth on the ecliptic longitude λ_{E} of Earth at the time of meteor observation. Both values correspond to the meteoroids taken from the CAMSA orbital sample. 
Also we have modified the condition for q_{c} and (1/a)_{c}. The calculated values had to fulfill the condition
where q_{min}, q_{max}, (1/a)_{min}, (1/a)_{max} are given by formula (6).
The algorithm of the ME1 method involves the following steps:

1.
For a given observed sample, obtain the histograms of q, e, ω, Ω, i, λ_{E}, β_{G}.

2.
Generate β_{G}, λ_{E}, e, ω, Ω, i separately by the CPD inversion method.

3.
Calculate R_{E} using formula (A.1).

4.
Using the Earth crossing condition (4) and obtained R_{E}, e, ω calculate q_{c}; choose the sign at the ecosω, if β_{G} > 0 choose sign “−” , otherwise choose sign “+”,

5.
Using q_{c} and the e generated earlier, calculate the inverse of the semimajor axis 1/a_{c}.

6.
If 1/a_{c} is outside the interval [(1/a)_{min}, (1/a)_{max}] or if q_{c} is outside the interval [q_{min}, q_{max}], repeat steps 2, 3, 4.
ME4. The ME4 method is based on the ME1, however with one modification. We made use of the asymmetry of the β_{G}–ω distribution (see Fig. A.2). Choosing the value β_{G} = 0, the ω values were divided into two subsamples for which β_{G} ≥ 0 and β_{G} < 0. In the method ME4, instead of the ωhistogram for the whole CAMSA sample we used two histograms corresponding to β_{G} ≥ 0 and β_{G} < 0, respectively. This solution significantly improved the test results obtained with the ME4 method. We can see this if, for example, we compare the values for ME1 and ME4 in the first row of Table 4.
Fig. A.2. Distribution of 29 014 sporadic CAMSA video meteoroids on the β_{G} − ω plane. As can be seen, this plot is asymmetric due to the β_{G} coordinate. 
Thus, the algorithm of the ME4 method involves the same steps as for the method ME1, except that the argument of perihelion ω was generated by the CPD inversion method using two ωhistograms, depending on the sign of the generated β_{G} value.
A.2. KD methods
The KD methods proposed in Vida et al. (2017) for the generation of synthetic meteoroid orbits make use of the nonparametric technique. It is based on the multivariate kernel density estimation, a function which for the five dimensional problem is given as
where O = (q, e, ω, Ω, i)^{T}, O_{k} = (q_{k}, e_{k}, ω_{k}, Ω_{k}, i_{k})^{T}, k = 1, 2, …, N, and N is the size the orbital sample. The letter H represents the bandwidth (or smoothing) 5 × 5 matrix; K_{5} is the kernel function, symmetric multivariate density function. In Vida et al. (2017) the authors used a Gaussian kernel. The choice of the kernel function K is not critical to the accuracy of kernel density estimators. The most important factor affecting results obtained by Eq. (A.3) is the choice of the bandwidth matrix H. Vida et al. (2017) presented the results of three KD variations. In the KD10 or KD0.1 method the bandwidth H was a scalar matrix with all diagonal elements equal to 10 or 0.1, respectively. In the KD nonscalar method (KDns) the bandwidth matrix is defined as
where the values on the diagonal correspond to q, Ω, e, ω, i respectively. Having the multivariate estimate (A.3), it can be used to draw a synthetic sample of the orbits, for details see Vida et al. (2017).
All Tables
χ^{2} distances and SO_{N}, NN_{N} statistics calculated for 58 029 orbits, observed and generated by the ME method and three variations of the KD method, see text.
Results of the quantitative comparison of 29 014 observed and synthetic meteoroid orbits.
Results of the 1D χ^{2} test (first section) and 1D KS test (second section) for marginal distributions of the orbital elements.
Distances of χ^{2} (first section) and results of 2D KS tests (second section) for the pairs of orbital elements.
Values of the NN_{N} and H_{N} statistics calculated for the observed CAMSA and CAMSB samples, and for the orbits generated by the KD10, KDns, ME0, ME1, and ME4 methods.
Values of the orbital similarity thresholds calculated for the synthetic samples obtained by KD and ME methods.
All Figures
Fig. 1. Marginal histograms of the osculating orbital elements of the observed subsamples CAMSA and CAMSB (in gray). Each sample consists of 29 014 orbits, the number of bins is 62. The distributions display a clear statistical similarity. Except for one small surplus seen in the ω histogram, the only visible differences are the statistical fluctuations. 

In the text 
Fig. 2. Distributions of 29 014 sporadic video meteoroids (top – CAMSA and bottom – CAMSB subsample) on the q–e, (1/a)–e, and ω–q planes. Due to the observation selection effect, not all combinations of [q, e], [(1/a),e], or [ω, q] are observable from the surface of the Earth. 

In the text 
Fig. 3. Histograms of the values used to calculate the OS_{N} (top) and NN_{N} (bottom) statistics. The dashed red lines correspond to the OS_{N} and NN_{N} mean values, the continuous lines correspond to the median values. The OS histograms were produced using the observed CAMSA and CAMSB orbits (left), the CAMSA and KD0.1 (middle), and the CAMSA and KDns samples (right). The NN histograms correspond to CAMSA, KD0.1, and KDns samples, respectively. For the KD0.1 synthetic data, the histograms are considerably different from the remaining ones. 

In the text 
Fig. 4. Plot of the differences of the absolute frequencies of the marginal distributions of orbital elements of the observed and synthetic data. From the left, in the columns we have the differences between CAMSA and CAMSB data, between CAMSA and KDns data, and between CAMSA and ME4 data. Positive values indicate the prevalence of frequencies in a given bin for CAMSA data. 

In the text 
Fig. 5. Orbital similarity thresholds calculated for synthetic samples ME0, ME1, ME4, KD10, and KDns, using the D_{SH}, D_{H}, and ρ_{1} functions. The thresholds were determined for groups of M = 2, 3, …, 20 members, identified among 29 014 orbits. 

In the text 
Fig. A.1. Dependence of the heliocentric distance R_{E} of Earth on the ecliptic longitude λ_{E} of Earth at the time of meteor observation. Both values correspond to the meteoroids taken from the CAMSA orbital sample. 

In the text 
Fig. A.2. Distribution of 29 014 sporadic CAMSA video meteoroids on the β_{G} − ω plane. As can be seen, this plot is asymmetric due to the β_{G} coordinate. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.