holisticai.bias.metrics.cluster_dist_kl#

holisticai.bias.metrics.cluster_dist_kl(group_a, group_b, y_pred)[source]#

Cluster Distribution KL

This function computes the distribution of group_a and group_b membership across the clusters. It then returns the KL distance from the distribution of group_a to the distribution of group_b.

Interpretation

A value of 0 is desired. That indicates that both groups are distributed similarly amongst the clusters. Higher values indicate the distributions of both groups amongst the clusters differ more.

Parameters

group_aarray-like

Group membership vector (binary)

group_barray-like

Group membership vector (binary)

y_predarray-like

Cluster predictions (categorical)

Returns

float

Cluster Distribution KL

Notes

\(KL(P_a,P_b)\)

Examples

>>> import numpy as np
>>> from holisticai.bias.metrics import cluster_dist_kl
>>> group_a = np.array([1, 1, 1, 1, 0, 0, 0, 0, 0, 0])
>>> group_b = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
>>> y_pred_cluster = np.array([0, 1, 1, 2, 0, 0, 0, 0, 1, 2])
>>> cluster_dist_kl(group_a, group_b, y_pred_cluster)
0.4054651081081642