holisticai.bias.metrics.cluster_dist_kl#
- holisticai.bias.metrics.cluster_dist_kl(group_a, group_b, y_pred)[source]#
Cluster Distribution KL
This function computes the distribution of group_a and group_b membership across the clusters. It then returns the KL distance from the distribution of group_a to the distribution of group_b.
Interpretation
A value of 0 is desired. That indicates that both groups are distributed similarly amongst the clusters. Higher values indicate the distributions of both groups amongst the clusters differ more.
Parameters
- group_aarray-like
Group membership vector (binary)
- group_barray-like
Group membership vector (binary)
- y_predarray-like
Cluster predictions (categorical)
Returns
- float
Cluster Distribution KL
Notes
\(KL(P_a,P_b)\)
Examples
>>> import numpy as np >>> from holisticai.bias.metrics import cluster_dist_kl >>> group_a = np.array([1, 1, 1, 1, 0, 0, 0, 0, 0, 0]) >>> group_b = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1]) >>> y_pred_cluster = np.array([0, 1, 1, 2, 0, 0, 0, 0, 1, 2]) >>> cluster_dist_kl(group_a, group_b, y_pred_cluster) 0.4054651081081642