Bias mitigation with “Popularity propensity” and “Two-sided fairness”#
This demo demonstrates how to implement the “popularity propensity” and “Two-sided fairness” method to enhance fairness in recommender systems.
Popularity propensity
Two-sided fairness
First, install the holisticai package if you haven’t already:
!pip install holisticai[all]
Then, import the necessary libraries.
[2]:
import numpy as np
import pandas as pd
from holisticai.datasets import load_dataset
from holisticai.bias.metrics import recommender_bias_metrics
from holisticai.bias.mitigation import PopularityPropensityMF
np.random.seed(0)
import warnings
warnings.filterwarnings("ignore")
Loading the proprocessed “LastFM” dataset.
[3]:
dataset = load_dataset('lastfm')
df_pivot, p_attr = dataset['data_pivot'], dataset['p_attr']
[4]:
def explode(arr, num_items):
out = np.zeros(num_items)
out[arr] = 1
return out
Bias mitigation#
Method: Popularity propensity#
Traditional implementation#
First, we will show the traditional implementation of the “Popularity Propensity” method.
[5]:
mf = PopularityPropensityMF(K=40, beta=0.02, steps=100, verbose=1)
data_matrix = df_pivot.fillna(0).to_numpy()
mf.fit(data_matrix)
[5]:
def recommended_items(model, data_matrix, k):
recommended_items_mask = data_matrix>0
candidate_index = ~recommended_items_mask
candidate_rating = model.pred*candidate_index
return np.argsort(-candidate_rating,axis=1)[:,:k]
[6]:
new_items = recommended_items(mf, data_matrix, 10)
new_recs = [explode(new_items[u], len(df_pivot.columns)) for u in range(df_pivot.shape[0])]
new_df_pivot_db = pd.DataFrame(new_recs, columns = df_pivot.columns)
mat = new_df_pivot_db.replace(0,np.nan).to_numpy()
df_popularity = recommender_bias_metrics(mat_pred=mat, metric_type='item_based')
df_popularity
[6]:
| Value | Reference | |
|---|---|---|
| Metric | ||
| Aggregate Diversity | 0.999004 | 1 |
| GINI index | 0.440891 | 0 |
| Exposure Distribution Entropy | 6.579432 | - |
| Average Recommendation Popularity | 278.321600 | - |
Pipeline implementation#
[7]:
from holisticai.pipeline import Pipeline
inprocessing_model = PopularityPropensityMF(K=40, beta=0.02, steps=100, verbose=1)
pipeline = Pipeline(
steps=[
("bm_inprocessing", inprocessing_model),
]
)
pipeline.fit(data_matrix)
rankings = pipeline.predict(data_matrix, top_n=10)
mat = rankings.pivot(columns='Y',index='X',values='score').replace(np.nan,0).to_numpy()
df = recommender_bias_metrics(mat_pred=mat>0, metric_type='item_based')
df_pop_pipeline =df.copy()
df_pop_pipeline
[7]:
| Value | Reference | |
|---|---|---|
| Metric | ||
| Aggregate Diversity | 1.000000 | 1 |
| GINI index | 0.441953 | 0 |
| Exposure Distribution Entropy | 6.578349 | - |
| Average Recommendation Popularity | 275.996493 | - |
Method: Two sided fairness#
Traditional implementation for FairRec#
Now, we will show the traditional implementation of the “Two sided fairness” method.
[8]:
from holisticai.bias.mitigation import FairRec
fr = FairRec(rec_size=10, MMS_fraction=0.5)
fr.fit(data_matrix)
[8]:
[FairRec]
Type: Bias Mitigation Inprocessing
[9]:
recommendations = fr.recommendation
new_recs = [explode(recommendations[key], len(df_pivot.columns)) for key in recommendations.keys()]
new_df_pivot_db = pd.DataFrame(new_recs, columns = df_pivot.columns)
mat = new_df_pivot_db.replace(0,np.nan).to_numpy()
df_tsf = recommender_bias_metrics(mat_pred=mat, metric_type='item_based')
df_tsf
[9]:
| Value | Reference | |
|---|---|---|
| Metric | ||
| Aggregate Diversity | 1.000000 | 1 |
| GINI index | 0.421428 | 0 |
| Exposure Distribution Entropy | 6.567894 | - |
| Average Recommendation Popularity | 317.154227 | - |
Pipeline implementation for FairRec#
[10]:
from holisticai.pipeline import Pipeline
inprocessing_model = FairRec(rec_size=10, MMS_fraction=0.5)
pipeline = Pipeline(
steps=[
("bm_inprocessing", inprocessing_model),
]
)
pipeline.fit(data_matrix)
rankings = pipeline.predict(data_matrix, top_n=10)
mat = rankings.pivot(columns='Y',index='X',values='score').replace(np.nan,0).to_numpy()
df_tsf_pipeline = recommender_bias_metrics(mat_pred=mat>0, metric_type='item_based')
df_tsf_pipeline
[10]:
| Value | Reference | |
|---|---|---|
| Metric | ||
| Aggregate Diversity | 1.000000 | 1 |
| GINI index | 0.421428 | 0 |
| Exposure Distribution Entropy | 6.567894 | - |
| Average Recommendation Popularity | 317.154227 | - |