Recommender Systems#
Equality of Outcome Metrics#
Mean Absolute Deviation: Difference in average score for group a and group b.
A large value of MAD indicates differential treatment of group a and group b. A positive value indicates that group a received higher scores on average, while a negative value indicates higher ratings for group b.
where \(\hat{y}_{g}\) is the predicted score of group \(g\).
Exposure Total Variation: This metric computes the total variation norm between the group a exposure distribution to the group b exposure distribution.
A total variation divergence of 0 is desired, which occurs when the distributions are equal. The maximum value is 1 indicating the distributions are very far apart.
where \(item_{dist_{g}}\) is the distribution of items for group \(g\).
Exposure KL Divergence: This metric computes the KL divergence from the group a exposure distribution to the group_b exposure distribution.
A KL divergence of 0 is desired, which occurs when the distributions are equal. Higher values of the KL divergence indicate difference in exposure distributions of group a and group b.
where \(item_{dist_{g}}\) is the distribution of items for group \(g\).
Equality of Opportunity Metrics#
Average Precision Ratio: This metric computes the ratio of average precision (over users) on group b and group a group.
A value of 1 is desired. Lower values show bias against group b group. Higher values show bias against group a group.
where \(AP_{g}\) is the average precision of group \(g\).
Average Recall Ratio: This metric computes the ratio of average recall (over users) on group b and group a group.
A value of 1 is desired. Lower values show bias against group b group. Higher values show bias against group a group.
where \(\text{AVG_recall_g}\) is the average recall of group \(g\).
Average F1 Ratio: This metric computes the ratio of average f1 (over users) on group b and group a group.
A value of 1 is desired. Lower values show bias against group b group. Higher values show bias against group a group.
where \(F1_{g}\) is the average f1 of group \(g\).
Item Metrics#
Aggregate Diversity: Given a matrix of scores, this metric computes the recommended items for each user, selecting either the highest-scored items or those above an input threshold. It then returns the aggregate diversity: the proportion of recommended items out of all possible items.
A value of 1 is desired. We wish for a high proportion of items to be shown to avoid the ‘rich get richer effect’.
where \(Items\; shown\) is the number of items shown to users and \(Items\) is the total number of items.
GINI index: Measures the inequality across the frequency distribution of the recommended items.
An algorithm that recommends each item the same number of times (uniform distribution) will have a Gini index of 0 and the one with extreme inequality will have a Gini of 1.
where \(item_{dist_{i}}\) is the distribution of items.
Exposure Distribution Entropy: This metric measures the entropy of the item exposure distribution.
A low entropy (close to 0) indicates high certainty as to which item will be shown. Higher entropies therefore ensure a more homogeneous distribution. Scale is relative to number of items.
where \(item_{dist_{i}}\) is the distribution of items.
Average Recommendation Popularity: This metric computes the average recommendation popularity of items over users. We define the recommendation popularity as the average amount of times an item is recommended.
A low value is desidered and suggests that items have been recommended equally across the population.
where \(item_{dist_{i}}\) is the distribution of items.