Free Access
Issue
A&A
Volume 645, January 2021
Article Number A82
Number of page(s) 11
Section Numerical methods and codes
DOI https://doi.org/10.1051/0004-6361/202039332
Published online 14 January 2021

© ESO 2021

1. Introduction

Searching for streams among meteoroid orbits, including an assessment of their reliability, is based heavily on the artificial orbital samples derived by means of the statistical properties of the observed orbital elements. One can derive the synthetic orbital samples, free from clusters, by different methods (see, for example, Jopek & Froeschlé 1997; Jopek et al. 2003; Pauls & Gladman 2005; Koten et al. 2014; Guennoun et al. 2019). Recently, Jopek & Bronikowska (2017) have shown that the choice of the method used for generating synthetic orbits impacts the assessment of the probability of spurious pairings. Jopek & Bronikowska (2017) tested a few methods and, in the case of meteoroid orbits, they recommend their method as the most reliable. However, Vida et al. (2017) found some drawbacks of this method, and concurrently they proposed an interesting approach based on the Kernel Density Estimation method (KD method).

In this study, we describe an improvement of method E, by eliminating the drawbacks pointed out by Vida et al. (2017), and by introducing into the algorithm a few additional steps that more precisely allow for the correlations between the meteoroid orbital elements. The modified methods have been compared with the KD methods proposed by Vida et al. (2017).

2. Orbital data used in this study

In this study, the same meteoroid data have been used as in Vida et al. (2017), namely, the file CAMS-v2-2013.csv (see Jenniskens et al. 2016) downloaded from the GitHub site1. We used the heliocentric ecliptical osculating orbital elements q, e, ω, Ω, i and the ecliptic latitude βG of the meteor geocentric radiants. Additionally, we use the heliocentric distances of the Earth RE, calculated at the moment of meteor observations by means of the formula taken from Bell & Urban (2012). Using the software posted by Vida et al. (2017) at the GitHub site, from the original CAMS file, we extracted 58 029 orbits of sporadic meteors, which we considered to be an orbital sample free from clusters.

3. Synthetic meteoroid orbits generation methods

In this study, we compared a few methods: ME0 (method E in Jopek & Bronikowska 2017) and its two variations ME1 and ME4. The ME methods use the CPD (cumulative probability distribution) inversion technique parametrized by the choice of the bin width of the histogram used for each orbital element. In the ME0 method, it is assumed that the Earth moves in a circular orbit around the Sun. The ME1 method is based on the ME0 method, in which the assumption of the elliptical orbit of Earth was introduced. In turn, the ME4 method is an extended version of the ME1 method. The values of the argument of perihelion ω were generated by the CPD technique using not one marginal distribution but two, corresponding to the positive and negative ecliptic latitudes of the meteor radiants.

Additionally, we made use of three KD methods: KD0.1, KD10, and KDns (see, Vida et al. 2017). The KD methods use the nonparametric technique; they are based on the multivariate kernel density estimation. The methods depend on the choice of the bandwidth matrix H. This matrix has a decisive influence on the shape (smoothness) of the obtained multivariate probability distribution function. In the work of Vida et al. (2017), three diagonal H matrices were used, two scalar matrices with elements equal to 0.1 and 10 (methods KD0.1 and KD10 respectively) and one non-scalar matrix (KDns method) with elements selected separately for each orbital element. More details about the applied ME and KD methods are provided in the Appendix.

4. Valuation of the synthetic orbital sample

4.1. Comparison of the observed and synthetic samples

The ME method uses the marginal densities represented by the marginal histograms of the observed orbital elements (see Fig. 1). The application of the CPD inversion technique guarantees that the marginal distributions of the synthetic and observed orbital elements are very similar. However, this technique does not conserve the correlations between the meteoroid orbital elements q, e, and ω (see Fig. 2). To address these correlations, Jopek & Bronikowska (2017) added steps 2 and 3 to the ME0 method (see Appendix A.1). However, in their paper no comparisons between the obtained synthetic and observed samples were accomplished, except for a visual assessment.

thumbnail Fig. 1.

Marginal histograms of the osculating orbital elements of the observed subsamples CAMSA and CAMSB (in gray). Each sample consists of 29 014 orbits, the number of bins is 62. The distributions display a clear statistical similarity. Except for one small surplus seen in the ω histogram, the only visible differences are the statistical fluctuations.

thumbnail Fig. 2.

Distributions of 29 014 sporadic video meteoroids (top – CAMSA and bottom – CAMSB subsample) on the qe, (1/a)–e, and ωq planes. Due to the observation selection effect, not all combinations of [q, e], [(1/a),e], or [ω, q] are observable from the surface of the Earth.

In Vida et al. (2017), to ascertain the differences or similarities between the observed and synthetic orbital sets, a few comparisons were accomplished, namely, a visual comparison of 1D or 2D histograms of the observed and synthetic orbital elements, and a limited quantitative assessment of the similarity between observed and synthetic samples. For this purpose, Vida et al. (2017) used the 2D histogram of ω versus q, for which the so-called χ2 histogram distance was calculated (see Pele & Werman 2010) using

(1)

where Pij and Qij are the contents of the ij bins of the synthetic and observed data histograms, respectively. The number of bins are represented by kω and kq.

In Vida et al. (2017), as an additional statistical property of the sample, the arithmetic mean NNN of the nearest neighbor distances was calculated, which is the averaged minimum distance from each orbit to any other for a given sample. Let DN, i be the nearest neighbor distance of the orbit Oi and the other Oj: DN, i = minj ≠ i, j ≤ ND(Oi, Oj), then

(2)

where N – is the sample size. To determine the distance D(Oi, Oj) between two orbits, the DSH-function was applied (see, Southworth & Hawkins 1963).

As a statistical property between two orbital samples, Vida et al. (2017) calculated the arithmetic mean OSN of the smallest D-values between each synthetic orbit paired with each observed orbit.

To compare the two orbital samples, in the present work we have used ideas from Vida et al. (2017), expanding them slightly, as well as additional new ideas, namely we compared the marginal distributions of q, e, ω, Ω, i as well as the 2D distributions of all pairs of orbital elements using the χ2 and Kolmogorov-Smirnov (K-S) tests taken from Press et al. (2002). The advantage of this approach is the ability to estimate whether the compared samples could come from the same population.

To assess the general similarity of orbital samples, we used the concept of entropy. As far as we know, this concept has never been used for this purpose. The entropy HN was calculated on the basis of the nearest neighborhood values DN, i. For each point in the sample, one has to find the distance to its nearest neighbor and apply the formula (Beirlant et al. 1997)

(3)

where CE = 0.577215664901532860… is the Euler constant. An analogy between NNN and HN is clearly seen.

Finally, we propose a purely practical approach. For the synthetic orbits, we compared the orbital similarity thresholds used in the cluster analysis. We determined the thresholds by a single linkage cluster analysis and three D-functions. More details of this procedure are given in Jopek (2020).

In the next section, we present the results of our calculation of the χ2 distances and OSN, NNN statistics, as well as the results of the χ2 test and K-S test, using the observed and synthetic data sets. Moreover, we present the values of the entropy and orbital similarity thresholds calculated for the synthetic orbital samples.

4.2. Results of comparison and discussion

In this section, we present the results of quality tests on synthetic sets of 58 029 and 29 014 orbits. First, tests carried out by Vida et al. (2017) were repeated. The following sections present the results of the extended comparison of reduced orbital samples. The last section describes the results of the cluster analysis among the generated orbits.

4.2.1. Samples comprising 58 029 orbits

At this stage, we only recalculated the results already obtained by Vida et al. (2017) that are provided in their Table 1. Exploiting the software placed by Vida et al.2 on the GitHub website, we generated four synthetic samples and calculated their χ2 distances and OSN, NNN statistics (see our Table 1).

Table 1.

χ2 distances and SON, NNN statistics calculated for 58 029 orbits, observed and generated by the ME method and three variations of the KD method, see text.

The χ2 distances given in the second column of Table 1 were obtained using q versus ω histograms with 158 × 103 bins, as used in Vida et al. (2017). Our results differ slightly from those found by Vida et al. (2017) due to two reasons: we used corrected software, and we used a different sequence of random values starting with the seed point 2003.

As in Vida et al. (2017), in Table 1 the largest χ2 value was obtained for the ME sample, the smallest value for the KD0.1 sample. However, we must point out that the χ2 distances given in Table 1 have no clear interpretation. The high value χ2 = 19 053 does not necessarily mean that in some applications the ME synthetic sample cannot effectively represent the observed orbits. On the other hand, one may ask, does the smallest value χ2 = 1858 indicate that the synthetic sample generated by the KD0.1 method is too similar to the observed one?

Furthermore, the χ2 distance considerably depends on the number of applied bins. This is shown in the third column of Table 1 where we placed the χ2 distances calculated using histograms with 80 × 80 bins. The number of bins was calculated according to the Rice Rule: k = ⌈2N1/3⌉, where ⌈…⌉ is the ceiling function and N is the sample size. The obtained χ2 distances differ significantly from those given in the second column of Table 1. Hence, the question may arise, which χ2 value should be used to valuate the quality of a synthetic sample? Thus, we believe that the χ2 value determined by Eq. (1) should be used with care for the adjudication of a synthetic orbits generator.

In Vida et al. (2017), the authors used another statistics for a quantitative comparison of the observed and synthetic samples. As was mentioned in Sect. 4.1, the authors proposed two statistics: the arithmetic means NNN and OSN. We also calculated NNN and OSN and the obtained results are given in Table 1. Our results differ slightly from those given in Table 1 in Vida et al. (2017) for the same reasons as those we mentioned earlier. The highest values of NNN and OSN are for the ME sample. Vida et al. (2017) considered the OSN = 0.0995 as an upper bound for an acceptable synthetic orbit data set. As the lower bound, Vida et al. considered a value of OSN = 0.0408. Certainly, a low value of OSN may imply a strong similarity between the two samples. However, we are interested in a statistical similarity. Therefore, as in the case of the χ2 distance, one may ask, which limit values of OSN, NNN may be relevant to the issue of the comparison between synthetic and observed samples? We address this question in the next section.

4.2.2. Samples comprising 29 014 orbits

To find the expected values of the χ2 distances, OSN and NNN, and other statistics, we split the original CAMS sample into two subsamples, CAMSA and CAMSB, each containing 29 014 orbits. The CAMS sample is sorted by increasing values of the ecliptic longitude of the Sun at a meteor instant, hence to split the data sample we just used the catalog ordinal numbers of orbits: the even-numbered orbit entered sample CAMSA, the odd-numbered entered sample CAMSB.

The CAMSA and CAMSB subsamples were taken from the same “population” of the observed orbits, hence their statistical properties should be very similar. In Fig. 1 we see that their marginal distributions of the orbital elements display a clear statistical similarity. This similarity is also visible in the 2D distributions in Fig. 2. This figure shows the effects of observational selection that correlate some elements of the meteoroid orbit. Because a typical meteoroid can be observed only in the form of a meteor phenomenon in the Earth’s atmosphere, an approximate relationship between its orbital elements q, e, and ω is often used (see, e.g., Jopek & Froeschlé 1997; Babadzhanov et al. 2012), namely

(4)

where RE in astronomical units is the heliocentric distance of a meteoroid at its moment of collision with Earth. The sign at the ecosω is positive for the negative geocentric ecliptic latitude of a meteor radiant, βG <  0. Assuming a circular orbit of the Earth (RE = 1), the relationship q = f(ω) has been drawn in Fig. 2 in the form of continuous lines. Due to the nonzero eccentricity of the Earth’s orbit, RE belongs to the interval

(5)

and as a result, the points corresponding to the observed meteoroids surround the curve for RE = 1.

If we ignore the size of the Earth, that is, if we assume that meteors are observed at the center of the Earth, then Eq. (4) is exact. Violations of Eq. (4) and Condition (5) do not happen often (see Jopek et al. 2010), but in 58 029 sporadic meteoroid orbits from the CAMS catalog they occurred ∼2000 times.

Another consequence of the observational selection concerns the values of the meteoroid’s orbital elements e, q, and 1/a. As we see in Fig. 2, certain combinations of the values of these elements are unacceptable for meteoroids observed from the Earth’s surface. For the given values of e and RE, the limit values for q and 1/a are

(6)

where RE belongs to interval (5). Equations (6) describe the boundary curve q(e) and two limits for (1/a) seen in Fig. 2.

The similarity of the 1D and 2D distributions, shown in Figs. 1 and 2, confirms our assumption that both CAMSA and CAMSB samples represent statistical properties of the same population. Therefore, for these samples we calculated the values of the statistics used, and they were applied as the reference values when the synthetic data were evaluated. We repeated the calculations of the χ2 distance, and OSN and NNN statistics for reduced synthetic samples of 29 014 orbits (see Table 2). To find the χ2 distances, we used ωq histograms with 62 × 62 bins, and the number of bins was calculated according to the earlier mentioned Rice Rule.

Table 2.

Results of the quantitative comparison of 29 014 observed and synthetic meteoroid orbits.

In Table 2, the χ2 values are two to three times smaller than those provided in Table 1. In Table 2, the largest χ2 value was calculated for the ME sample, the smallest one corresponds to the KD0.1 sample. Moreover for the NNN statistics, the ME sample shows the large difference compared to the observed orbits. However, while it is not the largest one, the largest difference was produced by the KD0.1 method. Vida et al. (2017) interpret the OSN and NNN as the indicators of statistical similarity between the synthetic and observed samples. However, in comparison with the reference values, the small values OSN = 0.048 and NNN = 0.051 given in Table 2 do not mean that the KD0.1 sample is statistically very similar to the observed one. This is obvious when we look at Fig. 3. The NN histograms for the observed and KD0.1 samples should look similar, but here the discrepancy is considerable, and we have to admit that we do not know what the reason is. In addition, in Table 2, for the KD0.1 method the value of the χ2 distance is too small. The χ2 = 473 is smaller than χ2 = 760 calculated for the CAMSA and CAMSB, two samples drawn from the same ‘population’.

thumbnail Fig. 3.

Histograms of the values used to calculate the OSN (top) and NNN (bottom) statistics. The dashed red lines correspond to the OSN and NNN mean values, the continuous lines correspond to the median values. The OS histograms were produced using the observed CAMSA and CAMSB orbits (left), the CAMSA and KD0.1 (middle), and the CAMSA and KDns samples (right). The NN histograms correspond to CAMSA, KD0.1, and KDns samples, respectively. For the KD0.1 synthetic data, the histograms are considerably different from the remaining ones.

The OSN and NNN statistics do not give an unambiguous assessment of the quality of the synthetic orbit generator. As we can see in Table 2, if we use OSN statistics, then method KD10 turns out to be the best; if we choose NNN statistics, then method KDns is the best.

In conclusion, we believe that the application of the χ2 distance, and OSN and NNN statistics only may not be sufficient to decide which synthetic sample more adequately represents the observed data.

4.3. Extended comparison and discussion

To assess whether the synthetic orbital data sets are of acceptable quality, in Vida et al. (2017) the authors used the χ2 distance calculated for ω and q only, so using only a part of the orbital sample, and as we have shown such a limited approach may not always be applicable. We think that to evaluate the properties of a synthetic orbit generator, one should perform a more comprehensive test that seeks to establish the statistical similarity of two data sets using all five orbital elements. Moreover, from a practical point of view it is essential to compare the results obtained with different synthetic samples used in a specific issue, for example, in a cluster analysis.

In the next section, we present the results of the extended tests of the observed and synthetic data. We also present the results of calculations of the thresholds of orbital similarity, which are key parameters of each cluster analysis using the meteoroid data.

4.3.1. χ2 and K-S statistical tests

Can we prove that two orbital data sets are consistent with a single distribution function? As far as we know, there is no affirmative answer to this question. Hence, we do not have a ready-for-use solution. However, using statistical tests, we can compare the 1D and 2D distributions of the orbital element samples. Hence, in this study, we wanted to find which available statistical tests can be helpful in assessing the quality of the orbital sample generators. We used the χ2 test (not to be confused with the χ2-distance given by Eq. (1)) and two K-S tests. All tests were performed using the subroutines taken from the ‘Numerical Recipes’ package (Press et al. 2002), namely: chstwo, kstwo, and ks2d2s. Each subroutine returns an estimate of the significance prob parameter, which determines the level of consistency between compared distributions. The prob takes values in the range [0, 1], however, concerning the critical value of this parameter, in Press et al. (2002) the authors only note that a small value of prob indicates a significant difference between the compared distributions. Hence, to estimate what values of prob can be expected, we performed the χ2 and K-S tests using the observed CAMSA and CAMSB samples.

4.3.2. One-dimensional χ2 tests

Making use of the 1D χ2 test, we compared the marginal distributions of e, q, ω, Ω, i orbital elements of the CAMSA sample with CAMSB and synthetic samples. The results of the tests are collected in Table 3. We tested five methods, and their algorithms are described in the Appendix.

Table 3.

Results of the 1D χ2 test (first section) and 1D K-S test (second section) for marginal distributions of the orbital elements.

As can be seen in Table 3, the comparison of the marginal distributions of the perihelion distances q, gave negative results for all synthetic samples. This is not surprising in the case of ME methods, where q is not generated using its marginal distribution of the observed sample. In each of the ME methods the perihelion distance was calculated using Eq. (4), substituting the generated values of e, ω as well as the heliocentric distance RE of Earth at the moment of time of the meteor observation.

For the remaining orbital elements, the results of the χ2 test are usually more favorable for ME methods. This is due to the fact that KD orbits are not generated on the basis of the marginal distributions. Similarly, one can explain why the prob values for e, i, and partly for ω tests are lower for the CAMSB sample than for synthetic orbits generated by ME methods. The CAMSA and CAMSB samples were obtained based on the sorted values of the ecliptic longitudes of the Sun at the meteor instant (see Sect. 4.2.2). This also explains the highest prob value (100%) obtained for the test of Ω distributions of the CAMSA and CAMSB samples.

In Fig. 4 we plotted the differences of frequencies corresponding to the same bins of the histograms of marginal distributions of the orbital elements. The left column illustrates the differences between the CAMSA and CAMSB subsamples, the remaining columns show the differences between the CAMSA and KDns and CAMSA and ME4 samples. Except for a few cases, the differences are essentially equivalent to each other. However, the KDns method does not cope with generating orbits with large eccentricities. Similar problems arise for small and large perihelion distances, and for small inclinations. In case of the ME4 method, the same problems arise for small and large perihelion distances. The reference differences between CAMSA and CAMSB data are clearly random. This cannot be said for the differences in eccentricity, perihelion distance, and partly in inclination for the KDns sample. However, also for the ME4 method, the systematic trend is evident for the perihelion distance. Taking into account the results of the χ2-tests from Table 3 and the course of differences in Fig. 4, in our opinion, the ME4 method gives more consistent results than the KDns method.

thumbnail Fig. 4.

Plot of the differences of the absolute frequencies of the marginal distributions of orbital elements of the observed and synthetic data. From the left, in the columns we have the differences between CAMSA and CAMSB data, between CAMSA and KDns data, and between CAMSA and ME4 data. Positive values indicate the prevalence of frequencies in a given bin for CAMSA data.

4.3.3. One-dimensional K-S tests

Use of binned data involves a loss of information, however. Additionally, there is often considerable arbitrariness as to how the bins should be chosen. The results given in the first section of Table 3 were obtained assuming the following ranges for the observed and synthetic orbital elements: e ∈ [0, 1), q ∈ [0, 1.068), i ∈ [0.180), ω ∈ [0, 360) and Ω ∈ [0, 360). For each orbital element, these intervals were divided by 62, and the number of bins was calculated by the Rice Rule.

To avoid the problem of the influence of the number of bins, we applied the K-S test to the unbinned data of e, q, ω, Ω, i orbital elements. We used the 1D K-S test, and more particularly, we used the kstwo subroutine taken from the ‘Numerical Recipes’ package (Press et al. 2002). The results of the K-S tests are given in the second section of Table 3. Like the earlier χ2 tests, the 1D K-S test for q gave negative results for all synthetic samples. Furthermore, like the previous test the K-S test shows that the ME methods give results closer to the observed sample than the KD methods.

4.3.4. Two-dimensional tests

We describe the results of a comparison of the observed and synthetic samples applying the χ2-distance (Eq. (1)) and a two-dimensional K-S test for all pairs of orbital elements. The 2D K-S test was performed using the ks2d2s subroutine taken from the ‘Numerical Recipes’ package (Press et al. 2002). Results are provided in Table 4. The first section of Table 4 provides the χ2 distances between the observed CAMSA and the synthetic KD10, KDns, ME0, ME1, and ME4 samples. The results for the observed CAMSB data are for reference. As we can see, on the basis of this table, it is impossible to establish which method gives the best orbits. The result depends on what pair of orbital elements we choose. Choosing the pair qω, the ME4 method gives the result closest to the observed sample. Choosing qe, KDns is the best. Choosing the pair qi we see that for the KDns the χ2 distance is smaller than for the reference CAMSB sample, and this indicates that KDns gives orbits that are too similar to the observed sample. Hence, for the pair qi, the KD10 method takes precedence. The same is true for the pair q–Ω, therefore the ME4 method takes precedence. In the case of KDns method, for five orbital element pairs, the χ2 distances prove to be smaller than for the CAMSB data, for the KD10 method three times, respectively. On the other hand, this was not the case with all ME methods.

Table 4.

Distances of χ2 (first section) and results of 2D K-S tests (second section) for the pairs of orbital elements.

The results of the 2D K-S tests for all pairs of orbital elements are given in the second section of Table 4. As one can see, in general the results of these tests are negative for all methods. Some exceptions are the results obtained for the pair e–Ω and ME0 and ME1 methods. To sum up, it follows that assessing the quality of the synthetic orbit generator only on the basis of comparing boundary distributions can be problematic. Therefore, in the next section we describe the results of further tests based on NNN and HN statistics.

4.3.5. Application of NNN and HN statistics

To compare the synthetic orbit generators, we also set the values of statistics NNN and HN. We used formulas (2) and (3) and three D-functions, DSH, DH, and ρ1 described in Southworth & Hawkins (1963), Jopek (1993), and Kholshevnikov et al. (2016) respectively. In the previous tests, we used the marginal 1D or 2D distributions of the observed and synthetic samples. Using NNN and HN statistics is a different approach, because to calculate them we have to use all orbital elements simultaneously. The results obtained are given in Table 5.

Table 5.

Values of the NNN and HN statistics calculated for the observed CAMSA and CAMSB samples, and for the orbits generated by the KD10, KDns, ME0, ME1, and ME4 methods.

This time, the values of NNN and HN for the KD methods proved to be closer to the values for the observed samples than for the ME methods. The KDns gave the closest values for all D-functions, the ME4 method gave the closest values among the ME methods. The values of NNN and HN depend on the D-function used in the calculations, which is understandable because these functions are not mutually equivalent. For the ρ1 function, the results were always greater than for the remaining ones. For the DSH and DH functions, the results are similar to each other, but in general for the former they are slightly smaller.

It might seem that both statistics are equivalent to each other and differ only by a scaling factor. In general, however, this is not the case, and this can be seen when comparing the NNN and HN values calculated for the CAMSB and KDns samples. For the DSH function, the NNN values for both these samples equal 0.097, while for the HN they are 9.076 and 9.098, respectively. A similar situation occurs when we compare the results obtained using the DH function for the same CAMSB and KDns samples. The NNN statistics are 0.098 and 0.096 and HN are 9.086 and 9.090. The mean neighbor distance NNN for the CAMSB sample may be smaller or greater than for a given KDns sample, while the entropy of CAMSB sample may be smaller or greater than for the same KDns sample.

It should be recalled that these statistics differ in their interpretation; only the HN is a measure of the entropy of a data set, which in our opinion gives the HN statistics an important advantage over NNN.

4.3.6. Comparison in practical application

Both NNN and HN statistics are the point estimators, hence by using them the assessment of the quality of the generators may only be very general. We believe that a more valuable estimate of the usefulness of synthetic orbit generators can be obtained by comparing the results obtained in a specific practical application. In this section, we present the results of calculations of the thresholds of the orbital similarity, which are key parameters of each cluster analysis. For the orbits generated by KD and ME methods, the thresholds were determined by the numerical experiment as described in Jopek (2020). Applying a single linkage cluster analysis algorithm, the thresholds were found for each meteoroid group of M = 2, 3, …20 members, and for each of the DSH, DH, and ρ1 distance functions. All obtained results corresponded to a low probability (less than 1%) of chance clustering. The final individual thresholds DM, M = 2, 3, …20 were calculated as the arithmetic means of the DM values determined in each of the one hundred orbital samples. The final values are given in Table 6 and their graphical illustration is shown in Fig. 5.

Table 6.

Values of the orbital similarity thresholds calculated for the synthetic samples obtained by KD and ME methods.

For obvious reasons, we could not compare the results obtained for the synthetic orbits with the thresholds calculated for the observed orbits, hence, our comparisons relate only to synthetic orbits. As expected, the threshold values proved to depend on the applied D-functions and the synthetic data set. Although in all cases the thresholds increased monotonically with the size M of the identified group, this dependency is quite diverse. In Fig. 5 we see that in the case of the DSH function the highest thresholds are for the KD10 orbits, the smallest one for the ME4 orbits. This means that if we use the DSH function and the KD10 orbits in the cluster analysis, then by using the obtained thresholds in the sample of observed orbits, some of the results may turn out to be artifacts, both for the number of identified streams and the number of their members. Using the ME4 orbits, we can face the opposite situation, we cannot identify certain streams, and the number of members of the identified streams may be significantly smaller.

thumbnail Fig. 5.

Orbital similarity thresholds calculated for synthetic samples ME0, ME1, ME4, KD10, and KDns, using the DSH, DH, and ρ1 functions. The thresholds were determined for groups of M = 2, 3, …, 20 members, identified among 29 014 orbits.

When the DH function is applied, the smallest thresholds are for the KDns orbits, the highest for the ME1 orbits. Finally, in the case of the ρ1 function, again, the smallest values are for the ME4 orbits, the highest for the KD10 orbits.

However, we cannot claim that any of the calculated threshold values are too large or too small. That is because we cannot compare the results obtained for the synthetic and observed orbits here. We can only claim that in the case of cluster analysis, among the CAMS orbits (using a single linkage algorithm and DSH or ρ1 function), the values of the orbital similarity thresholds will be the smallest if we generate synthetic orbits using the ME4 method. If we apply the DH function, the smallest thresholds will be obtained using the KDns method. Moreover, we cannot be sure that what has been said above would also be true for the observed orbits taken from another source or if another cluster analysis were applied.

5. Conclusions

We compared five methods for generating synthetic meteoroid orbits. Three of them (ME0, KD10, and KDns) had already been proposed in the literature, two additional methods (ME1 and ME4) are new. As far as possible, the synthetic orbits were compared with the orbits obtained as a result of observations, otherwise, the comparison is of a relative nature. For quantitative comparison, we applied a few tests: the χ2-distance and NNN tests used in Vida et al. (2017), and, implemented in this study, one-dimensional χ2 and K-S tests, as well as a two-dimensional K-S test. In addition to the NNN statistics, to estimate a general property of the orbital sample, we used the entropy HN of the data set based on nearest neighbor distances. Finally, we did a cluster analysis among the synthetic orbits. We calculated and compared the values of the orbital similarity thresholds.

As a result of our research, we can make a few conclusions summarized by the following points:

  • 1.

    Using the χ2 distance (see Eq. (1)) for one pair of orbital elements only, for example, ωq, it is not sufficient to decide which synthetic sample more adequately represents the observed data. It may be risky, as we see in Table 4. The result of the comparison clearly depends on the pair used in the test.

  • 2.

    The same can be said about the χ2 and K-S tests performed for the marginal distributions of each orbital element. You cannot judge the quality of the orbit generator by only testing the data for a single orbit element (see Table 3). Nevertheless, in our study, the ME methods performed significantly better in the 1D tests than the KD methods.

  • 3.

    The 2D Kolmogorov-Smirnov test gave negative results for all compared ME and KD methods. In our opinion, it follows that the algorithms of the tested methods are not capable of generating ‘realistic’ meteoroid orbits.

  • 4.

    The NNN and HN statistics gave a clear result. The values of NNN and HN for the orbits obtained by KD10 and KDns are closer to NNN and HN values calculated for the observed orbits. Both statistics are based on nearest neighbor distances, but only the HN statistics is the entropy of a data set.

  • 5.

    The results of the cluster analysis among synthetic orbits turned out to be dependent on the applied D-function of orbital similarity. For DSH and ρ1 functions, the smallest threshold values were obtained for the ME4 method. In the case of DH function, the smallest values were obtained for KDns orbits. By the relative comparison, we cannot decide which orbits give the thresholds as too large or which as too small.

The results showed us that the generation of “realistic” meteoroid orbits is a complex issue. All methods are based on the observed data, which are heavily biased by observational selection, responsible for a number of correlations between the orbital elements. Including these correlations in an algorithm of a given method is not an easy matter. As far as we know, these correlations are not yet well understood. Perhaps an improvement in the situation can be achieved by using the geocentric parameters of the observed meteoroids (see Froeschlé et al. 1999; Valsecchi et al. 1999).

The same can be said about the tests and comparisons made in our study. This is also a complex issue. They give very limited results depending on the type of the test, the orbital elements used, the D-function applied, and other factors.

However, we were pleasantly surprised that, in the case of comparisons based on the determination of the thresholds of the orbital similarity, the results obtained, albeit different, were quite similar. Thus, at least in the case of the single linkage cluster analysis, we think that all ME and KD methods can be applied successfully.


2

Actually we used the corrected version of this software. A few corrections and a small supplementation were implemented by Denis Vida.

Acknowledgments

The author acknowledges the anonymous referee for the comments and suggestions which improved the manuscript. We acknowledge Peter Jenniskens for his kind decision to make the CAMS video data available to the meteor astronomers community. Tadeusz J. Jopek was supported by the National Science Center in Poland (project No 2016/21/B/ST9/01479). This research has made use of NASA’s Astrophysics Data System Bibliographic Services.

References

  1. Babadzhanov, P. B., Williams, I. P., & Kokhirova, G. I. 2012, MNRAS, 420, 2546 [NASA ADS] [CrossRef] [Google Scholar]
  2. Beirlant, J., Dudewicz, E. J., Gyorfi, L., & van der Meulen, E. 1997, Int. J. Math. Stat. Sci., 6, 17 [Google Scholar]
  3. Bell, S. A., & Urban, S. E. 2012, The Astronomical Almanac for the Year 2013 (Dept. of the Navy; HAR/PSC edition (2012)), 612 [Google Scholar]
  4. Froeschlé, C., Jopek, T. J., & Valsecchi, G. B. 1999, Celest. Mech. Dyn. Astron., 73, 55 [CrossRef] [Google Scholar]
  5. Guennoun, M., Vaubaillon, J., Čapek, D., Koten, P., & Benkhaldoun, Z. 2019, A&A, 622, A84 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  6. Jenniskens, P., Nénon, Q., Albers, J., et al. 2016, Icarus, 266, 331 [NASA ADS] [CrossRef] [Google Scholar]
  7. Jopek, T. J. 1993, Icarus, 106, 603 [NASA ADS] [CrossRef] [Google Scholar]
  8. Jopek, T. J. 2020, MNRAS, 494, 680 [CrossRef] [Google Scholar]
  9. Jopek, T. J., & Bronikowska, M. 2017, Planet. Space Sci., 143, 43 [NASA ADS] [CrossRef] [Google Scholar]
  10. Jopek, T. J., & Froeschlé, C. 1997, A&A, 320, 631 [Google Scholar]
  11. Jopek, T. J., Valsecchi, G. B., & Froeschlé, C. 2003, MNRAS, 344, 665 [NASA ADS] [CrossRef] [Google Scholar]
  12. Jopek, T. T., Rudawska, R., & Ziomek-Pretka, H. 2010, Proceedings of the International Meteor Conference, 27th IMC, Sachticka, Slovakia, 2008, 91 [Google Scholar]
  13. Kholshevnikov, K. V., Kokhirova, G. I., Babadzhanov, P. B., & Khamroev, U. H. 2016, MNRAS, 462, 2275 [CrossRef] [Google Scholar]
  14. Koten, P., Vaubaillon, J., Čapek, D., et al. 2014, Icarus, 239, 244 [NASA ADS] [CrossRef] [Google Scholar]
  15. Pauls, A., & Gladman, B. 2005, Meteorit. Planet. Sci., 40, 1241 [NASA ADS] [CrossRef] [Google Scholar]
  16. Pele, O., & Werman, M. 2010, European Conference on Computer Vision, 749 [Google Scholar]
  17. Press, W. H., Teukolsky, S. A., Vetterling, W. I., & Flannery, B. P. 2002, Numerical Recipes in C, 2nd edn. (Cambridge University Press) [Google Scholar]
  18. Southworth, R. B., & Hawkins, G. S. 1963, Smithsonian Contrib. Astrophys., 7, 261 [Google Scholar]
  19. Valsecchi, G. B., Jopek, T. J., & Froeschlé, C. 1999, MNRAS, 304, 743 [NASA ADS] [CrossRef] [Google Scholar]
  20. Vida, D., Brown, P. G., & Campbell-Brown, M. 2017, Icarus, 296, 197 [CrossRef] [Google Scholar]
  21. Zieliński, R. 1979, Random Number Generators (Warszawa: Wydawnictwo Naukowo Techniczne), 1 [Google Scholar]

Appendix A: Generation of the synthetic orbits

A.1. ME methods

ME0. The algorithm of the ME0 method (in Jopek & Bronikowska 2017, it was named ME) involves the following steps:

  • 1.

    For a given observed sample, obtain the histograms of the orbital elements and additionally find the fraction of the orbits βfr for which the geocentric ecliptic latitudes of the meteor radiant βG >  0.

  • 2.

    Generate e, ω, Ω, i separately by the CPD inversion method, (Zieliński 1979).

  • 3.

    Using the Earth crossing condition (4) with RE = 1 and the obtained e, ω calculate qc; choose the sign at the ecosω by generating the number b distributed uniformly, U(0, 1); if b >  βfr , choose sign “−” otherwise choose sign “+”.

  • 4.

    Using qc and the e generated earlier, calculate the inverse of the semi-major axis 1/ac, if 1/ac >  (1/a)max, repeat steps (2), (3), (4). ( The inverse (1/a)max is the maximal value in the observed orbital sample).

ME1. This method is based on the ME0, however we incorporated a suggestion given in Vida et al. (2017), which was to set in formula (4), the distance RE depending on the ecliptic longitude λE of the Earth at the moment of time of the meteor appearance. Both λE and RE values were calculated by means of the formulas taken from Bell & Urban (2012). Figure A.1 shows that dependence RE = f(λE) is not random, therefore we applied its approximation

(A.1)

thumbnail Fig. A.1.

Dependence of the heliocentric distance RE of Earth on the ecliptic longitude λE of Earth at the time of meteor observation. Both values correspond to the meteoroids taken from the CAMSA orbital sample.

Also we have modified the condition for qc and (1/a)c. The calculated values had to fulfill the condition

(A.2)

where qmin, qmax, (1/a)min, (1/a)max are given by formula (6).

The algorithm of the ME1 method involves the following steps:

  • 1.

    For a given observed sample, obtain the histograms of q, e, ω, Ω, i, λE, βG.

  • 2.

    Generate βG, λE, e, ω, Ω, i separately by the CPD inversion method.

  • 3.

    Calculate RE using formula (A.1).

  • 4.

    Using the Earth crossing condition (4) and obtained RE, e, ω calculate qc; choose the sign at the ecosω, if βG >  0 choose sign “−” , otherwise choose sign “+”,

  • 5.

    Using qc and the e generated earlier, calculate the inverse of the semi-major axis 1/ac.

  • 6.

    If 1/ac is outside the interval [(1/a)min, (1/a)max] or if qc is outside the interval [qmin, qmax], repeat steps 2, 3, 4.

ME4. The ME4 method is based on the ME1, however with one modification. We made use of the asymmetry of the βGω distribution (see Fig. A.2). Choosing the value βG = 0, the ω values were divided into two subsamples for which βG ≥ 0 and βG <  0. In the method ME4, instead of the ω-histogram for the whole CAMSA sample we used two histograms corresponding to βG ≥ 0 and βG <  0, respectively. This solution significantly improved the test results obtained with the ME4 method. We can see this if, for example, we compare the values for ME1 and ME4 in the first row of Table 4.

thumbnail Fig. A.2.

Distribution of 29 014 sporadic CAMSA video meteoroids on the βG − ω plane. As can be seen, this plot is asymmetric due to the βG coordinate.

Thus, the algorithm of the ME4 method involves the same steps as for the method ME1, except that the argument of perihelion ω was generated by the CPD inversion method using two ω-histograms, depending on the sign of the generated βG value.

A.2. KD methods

The KD methods proposed in Vida et al. (2017) for the generation of synthetic meteoroid orbits make use of the nonparametric technique. It is based on the multivariate kernel density estimation, a function which for the five dimensional problem is given as

(A.3)

where O = (q, e, ω, Ω, i)T, Ok = (qk, ek, ωk, Ωk, ik)T, k = 1, 2, …, N, and N is the size the orbital sample. The letter H represents the bandwidth (or smoothing) 5 × 5 matrix; K5 is the kernel function, symmetric multivariate density function. In Vida et al. (2017) the authors used a Gaussian kernel. The choice of the kernel function K is not critical to the accuracy of kernel density estimators. The most important factor affecting results obtained by Eq. (A.3) is the choice of the bandwidth matrix H. Vida et al. (2017) presented the results of three KD variations. In the KD10 or KD0.1 method the bandwidth H was a scalar matrix with all diagonal elements equal to 10 or 0.1, respectively. In the KD non-scalar method (KDns) the bandwidth matrix is defined as

(A.4)

where the values on the diagonal correspond to q, Ω, e, ω, i respectively. Having the multivariate estimate (A.3), it can be used to draw a synthetic sample of the orbits, for details see Vida et al. (2017).

All Tables

Table 1.

χ2 distances and SON, NNN statistics calculated for 58 029 orbits, observed and generated by the ME method and three variations of the KD method, see text.

Table 2.

Results of the quantitative comparison of 29 014 observed and synthetic meteoroid orbits.

Table 3.

Results of the 1D χ2 test (first section) and 1D K-S test (second section) for marginal distributions of the orbital elements.

Table 4.

Distances of χ2 (first section) and results of 2D K-S tests (second section) for the pairs of orbital elements.

Table 5.

Values of the NNN and HN statistics calculated for the observed CAMSA and CAMSB samples, and for the orbits generated by the KD10, KDns, ME0, ME1, and ME4 methods.

Table 6.

Values of the orbital similarity thresholds calculated for the synthetic samples obtained by KD and ME methods.

All Figures

thumbnail Fig. 1.

Marginal histograms of the osculating orbital elements of the observed subsamples CAMSA and CAMSB (in gray). Each sample consists of 29 014 orbits, the number of bins is 62. The distributions display a clear statistical similarity. Except for one small surplus seen in the ω histogram, the only visible differences are the statistical fluctuations.

In the text
thumbnail Fig. 2.

Distributions of 29 014 sporadic video meteoroids (top – CAMSA and bottom – CAMSB subsample) on the qe, (1/a)–e, and ωq planes. Due to the observation selection effect, not all combinations of [q, e], [(1/a),e], or [ω, q] are observable from the surface of the Earth.

In the text
thumbnail Fig. 3.

Histograms of the values used to calculate the OSN (top) and NNN (bottom) statistics. The dashed red lines correspond to the OSN and NNN mean values, the continuous lines correspond to the median values. The OS histograms were produced using the observed CAMSA and CAMSB orbits (left), the CAMSA and KD0.1 (middle), and the CAMSA and KDns samples (right). The NN histograms correspond to CAMSA, KD0.1, and KDns samples, respectively. For the KD0.1 synthetic data, the histograms are considerably different from the remaining ones.

In the text
thumbnail Fig. 4.

Plot of the differences of the absolute frequencies of the marginal distributions of orbital elements of the observed and synthetic data. From the left, in the columns we have the differences between CAMSA and CAMSB data, between CAMSA and KDns data, and between CAMSA and ME4 data. Positive values indicate the prevalence of frequencies in a given bin for CAMSA data.

In the text
thumbnail Fig. 5.

Orbital similarity thresholds calculated for synthetic samples ME0, ME1, ME4, KD10, and KDns, using the DSH, DH, and ρ1 functions. The thresholds were determined for groups of M = 2, 3, …, 20 members, identified among 29 014 orbits.

In the text
thumbnail Fig. A.1.

Dependence of the heliocentric distance RE of Earth on the ecliptic longitude λE of Earth at the time of meteor observation. Both values correspond to the meteoroids taken from the CAMSA orbital sample.

In the text
thumbnail Fig. A.2.

Distribution of 29 014 sporadic CAMSA video meteoroids on the βG − ω plane. As can be seen, this plot is asymmetric due to the βG coordinate.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.