holisticai.bias.metrics.cluster_dist_l1#
- holisticai.bias.metrics.cluster_dist_l1(group_a, group_b, y_pred)[source]#
Cluster Distribution Total Variation
This function computes the distribution of group_a and group_b across clusters. It then outputs the total variation distance between these distributions.
Interpretation
A value of 0 is desired. That indicates that both groups are distributed similarly amongst the clusters. The metric ranges between 0 and 1, with higher values indicating the groups are distributed in very different ways.
Parameters
- group_aarray-like
Group membership vector (binary)
- group_barray-like
Group membership vector (binary)
- y_predarray-like
Cluster predictions (categorical)
Returns
- float
Cluster Distribution Total Variation
Examples
>>> import numpy as np >>> from holisticai.bias.metrics import cluster_dist_l1 >>> group_a = np.array([1, 1, 1, 1, 0, 0, 0, 0, 0, 0]) >>> group_b = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1]) >>> y_pred_cluster = np.array([0, 1, 1, 2, 0, 0, 0, 0, 1, 2]) >>> cluster_dist_l1(group_a, group_b, y_pred_cluster) 0.4166666666666667