Variational Fair Clustering#

Note

Learning tasks: Clustering.

Introduction#

The Variational Fair Clustering method integrates a Kullback-Leibler (KL) fairness term with a variety of clustering objectives, including prototype-based and graph-based methods. This approach allows for controlling the trade-off between fairness and clustering quality, providing a scalable solution with convergence guarantees. Unlike existing combinatorial and spectral methods, this variational framework does not require eigenvalue decomposition, making it suitable for large-scale datasets.

Description#

The problem involves assigning \(N\) data points to \(K\) clusters while ensuring that the clusters are balanced with respect to \(J\) different demographic groups. The method introduces a fairness term based on the Kullback-Leibler (KL) divergence to measure the deviation from a target demographic distribution within each cluster.

  • Problem Definition: Given a set of data points \(X = \{ x_p \in \mathbb{R}^M, p = 1, \ldots, N \}\) and \(J\) demographic groups, the goal is to assign these points to \(K\) clusters such that the clusters are fair with respect to the demographic groups.

  • Main Characteristics: - Integrates a KL fairness term with clustering objectives. - Provides a tight upper bound based on concave-convex decomposition. - Scalable and suitable for large datasets. - Allows independent updates for each assignment variable, facilitating distributed computation.

  • Step-by-Step Description:

    1. Initialization:

      • Initialize cluster labels from initial seeds.

      • Initialize the soft cluster-assignment vector \(S\) from the labels.

    2. Iterative Optimization:

      • Compute the auxiliary variable \(a_p\) from \(S\).

      • Initialize the assignment probabilities \(s_p\).

      • Update \(s_p\) using the derived bound optimization.

      • Repeat until the objective function converges.

    3. Final Assignment:

      • Assign each data point to the cluster with the highest probability.

Basic Usage#

You can find an example of using the Variational Fair Clustering method in the following demo.

Read more about the class attributes and methods in the API reference: VariationalFairClustering.

References#

  1. Ziko, Imtiaz Masud, et al. “Variational fair clustering.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. No. 12. 2021.