holisticai.bias.metrics.cluster_dist_l1#

holisticai.bias.metrics.cluster_dist_l1(group_a, group_b, y_pred)[source]#

Cluster Distribution Total Variation

This function computes the distribution of group_a and group_b across clusters. It then outputs the total variation distance between these distributions.

Interpretation

A value of 0 is desired. That indicates that both groups are distributed similarly amongst the clusters. The metric ranges between 0 and 1, with higher values indicating the groups are distributed in very different ways.

Parameters

group_aarray-like

Group membership vector (binary)

group_barray-like

Group membership vector (binary)

y_predarray-like

Cluster predictions (categorical)

Returns

float

Cluster Distribution Total Variation

Examples

>>> import numpy as np
>>> from holisticai.bias.metrics import cluster_dist_l1
>>> group_a = np.array([1, 1, 1, 1, 0, 0, 0, 0, 0, 0])
>>> group_b = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
>>> y_pred_cluster = np.array([0, 1, 1, 2, 0, 0, 0, 0, 1, 2])
>>> cluster_dist_l1(group_a, group_b, y_pred_cluster)
0.4166666666666667