Minimal Cluster Modification for Fairness (MCMF)#
Note
Learning tasks: Clustering.
Introduction#
The Minimal Cluster Modification for Fairness (MCMF) method addresses the challenge of ensuring fairness in clustering results. Unlike existing approaches that modify clustering algorithms to produce fair results, MCMF focuses on post-processing the output of any partitional clustering method to make it fairer without significantly affecting clustering quality. This approach is particularly useful when dealing with already deployed clustering results or when fair versions of all clustering algorithms are not available.
Description#
Problem definition
The MCMF problem is defined as follows: Given a partitional clustering of a dataset, the goal is to minimally modify the clustering to improve its fairness with respect to a set of protected variables, while ensuring that the clustering quality is not significantly degraded. The problem is formulated as an Integer Linear Programming (ILP) problem, where the objective is to minimize the modifications required to achieve fairness.
Main features
Minimal Modification Problem: The MCMF problem is an approach to producing fair clustering by post-processing existing clustering results.
ILP Formulation with Total Unimodularity: The MCMF problem is formulated as an ILP, and it is shown that the constraint matrix introduced by the fairness requirements is totally unimodular (TU). This property ensures that any polynomial-time linear programming (LP) algorithm can obtain integer solutions to the MCMF problem.
Scalability: The method can handle large datasets with millions of instances due to the efficiency of well-known LP solvers that run in polynomial time. Variations of the problem, such as allowing overlapping clusters, also maintain the TU property.
Step-by-step description of the approach
Input: The method takes as input Cluster assignments generated by an existing clustering algorithm.
Formulation of the MCMF Problem: The MCMF problem is formulated as an ILP where the objective is to minimize the number of modifications to the clustering. The constraints ensure that the modified clustering meets the fairness requirements.
Solving the ILP: An LP solver is used to efficiently solve the ILP model, finding optimal cluster assignments that minimize distance while satisfying fairness criteria.
Basic Usage#
You can find an example of using the MCMF method in the following demo.
Read more about the class attributes and methods in the API reference: MCMF.
References#
Davidson, Ian, and S. S. Ravi. “Making existing clusterings fairer: Algorithms, complexity results and insights.”Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34. No. 04. 2020.