Spread Metrics#
Alpha Importance Score#
The Alpha Importance Score (AIS) metric evaluates the proportion of features that have an importance value greater than or equal to a specified threshold (alpha). This metric helps in understanding the concentration of importance among the most significant features, providing insights into model interpretability and feature relevance.
Methodology#
Filter Feature Importance: - Identify features with importance values greater than or equal to a specified threshold, alpha.
Calculate Alpha Importance: - Compute the ratio of the number of features with importance values greater than or equal to alpha to the total number of features.
Mathematical Representation#
Let \(I\) denote the list of feature importance values, and \(\alpha\) be the specified threshold. The filtered feature importance list \(I_\alpha\) is given by:
The Alpha Importance Score ( AIS ) is then calculated as:
where \(|{I_\alpha}|\) is the number of features with importance greater than or equal to \(\alpha\), and \(|I|\) is the total number of features.
Interpretation#
High score: Indicates that a significant proportion of features have high importance values, suggesting that the model relies heavily on a few key features.
Low score: Indicates that fewer features have high importance values, suggesting a more even distribution of feature importance.
Feature Spread Metric#
Methodology#
Normalize Feature Importance: - For a given set of features \(F = [F_1, F_2, \ldots, F_n]\), normalize the feature importance using a normalization function \(N\) such that the sum of the normalized feature importance values equals 1. - Mathematically, for each feature \(F_i\), the normalized feature importance \(P(F_i)\) is given by:
\[P(F_i) = \frac{F_i}{\sum_{j=1}^{n} F_j}\]Calculate Jensen–Shannon Divergence: - Jensen–Shannon divergence (JSD) measures the similarity between the normalized feature importance distribution and a uniform distribution. - It is calculated using the formula:
\[JSD(P \| Q) = \frac{1}{2} D_{KL}(P \| M) + \frac{1}{2} D_{KL}(Q \| M)\]where \(D_{KL}\) is the Kullback-Leibler divergence, \(P\) is the normalized feature importance distribution, \(Q\) is the uniform distribution, and \(M = \frac{1}{2}(P + Q)\).
Mathematical Representation#
Let ( P ) denote the normalized feature importance vector and ( Q ) denote the uniform distribution vector. The Jensen–Shannon Divergence ( JSD ) is given by:
where:
and
Interpretation#
We use the inverse of the Jensen–Shannon Divergence as the Spread Divergence Metric.
High Score: Indicates that the distribution of feature importance is close to uniform, suggesting that the model relies on a broader set of features. This implies lower interpretability.
Low Score: Indicates that the distribution of feature importance is far from uniform, suggesting that the model relies on fewer, more significant features. This implies higher interpretability.