Table 2.
Feature importances using various machine learning models and statistics.
Feature | OLS | Lasso | Stability | RanFor | RFEOLS | RFErf | MIC | Corr | Mean |
---|---|---|---|---|---|---|---|---|---|
B2 | 0.0076 | 0.0048 | 0.0 | 0.0049 | 0.2 | 0.6 | 0.0000 | 0.1019 | 0.1149 |
B2Bx | 0.6345 | 0.2577 | 0.0 | 1.0000 | 0.8 | 1.0 | 1.0000 | 0.9972 | 0.8362 |
B2By | 0.0871 | 0.0240 | 0.0 | 0.0000 | 0.6 | 0.0 | 0.3979 | 0.1210 | 0.1538 |
Bx | 1.0000 | 1.0000 | 1.0 | 0.4404 | 1.0 | 0.8 | 1.0000 | 0.9982 | 0.7793 |
By | 0.0818 | 0.0000 | 0.0 | 0.0000 | 0.4 | 0.4 | 0.4048 | 0.1179 | 0.1756 |
BxBy | 0.0000 | 0.0145 | 0.0 | 0.0001 | 0.0 | 0.2 | 0.2089 | 0.1366 | 0.0700 |
Notes.
All the numbers are normalized such that they are positive and lie between 0 and 1 for the sake of comparison. Stability column refers to randomized LASSO where the LASSO shrinkage coefficient is randomly varied for different features and the feature that is most robust to this variation survives. RFE stands for Recursive Feature Elimination where a model is trained with all features and the top ranking feature is given the most significance, while the bottom ranking one gets the smallest score. We applied this to both OLS and random forests, and reverse sorted the entries to give the highest score to top ranking feature. MIC stands for Maximal Information Coefficient and computes a normalized measure of the mutual information between two variables scaled between 0 and 1. It gives a quantitative measure of the question: how much information about some variable Y can be obtained through some variable X? MIC is capable of capturing non-linear relationships that Pearson correlation (Corr) cannot. In the last column we take the mean over all models that represents the synthesis of several models (linear, non-linear) to show that and
are the two strongest features.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.