Bias mitigation with “Popularity propensity” and “Two-sided fairness”#

This demo demonstrates how to implement the “popularity propensity” and “Two-sided fairness” method to enhance fairness in recommender systems.

Popularity propensity
- Traditional implementation
- Pipeline implementation
Two-sided fairness
- Traditional implementation
- Pipeline implementation

First, install the holisticai package if you haven’t already:

!pip install holisticai[all]

Then, import the necessary libraries.

[2]:

import numpy as np
import pandas as pd
from holisticai.datasets import load_dataset
from holisticai.bias.metrics import recommender_bias_metrics
from holisticai.bias.mitigation import PopularityPropensityMF

np.random.seed(0)
import warnings
warnings.filterwarnings("ignore")

Loading the proprocessed “LastFM” dataset.

[3]:

dataset = load_dataset('lastfm')
df_pivot, p_attr = dataset['data_pivot'], dataset['p_attr']

[4]:

def explode(arr, num_items):
    out = np.zeros(num_items)
    out[arr] = 1
    return out

Bias mitigation#

Method: Popularity propensity#

Traditional implementation#

First, we will show the traditional implementation of the “Popularity Propensity” method.

[5]:

mf = PopularityPropensityMF(K=40, beta=0.02, steps=100, verbose=1)
data_matrix = df_pivot.fillna(0).to_numpy()
mf.fit(data_matrix)

[5]:

def recommended_items(model, data_matrix, k):
    recommended_items_mask = data_matrix>0
    candidate_index = ~recommended_items_mask
    candidate_rating = model.pred*candidate_index
    return np.argsort(-candidate_rating,axis=1)[:,:k]

[6]:

new_items = recommended_items(mf, data_matrix, 10)
new_recs = [explode(new_items[u], len(df_pivot.columns)) for u in range(df_pivot.shape[0])]
new_df_pivot_db = pd.DataFrame(new_recs, columns = df_pivot.columns)

mat = new_df_pivot_db.replace(0,np.nan).to_numpy()
df_popularity = recommender_bias_metrics(mat_pred=mat, metric_type='item_based')
df_popularity

[6]:

	Value	Reference
Metric
Aggregate Diversity	0.999004	1
GINI index	0.440891	0
Exposure Distribution Entropy	6.579432	-
Average Recommendation Popularity	278.321600	-

Pipeline implementation#

[7]:

from holisticai.pipeline import Pipeline

inprocessing_model = PopularityPropensityMF(K=40, beta=0.02, steps=100, verbose=1)

pipeline = Pipeline(
    steps=[
        ("bm_inprocessing", inprocessing_model),
    ]
)

pipeline.fit(data_matrix)

rankings  = pipeline.predict(data_matrix, top_n=10)
mat = rankings.pivot(columns='Y',index='X',values='score').replace(np.nan,0).to_numpy()
df = recommender_bias_metrics(mat_pred=mat>0, metric_type='item_based')
df_pop_pipeline =df.copy()
df_pop_pipeline

[7]:

	Value	Reference
Metric
Aggregate Diversity	1.000000	1
GINI index	0.441953	0
Exposure Distribution Entropy	6.578349	-
Average Recommendation Popularity	275.996493	-

Method: Two sided fairness#

Traditional implementation for FairRec#

Now, we will show the traditional implementation of the “Two sided fairness” method.

[8]:

from holisticai.bias.mitigation import FairRec

fr = FairRec(rec_size=10, MMS_fraction=0.5)
fr.fit(data_matrix)

[8]:

[FairRec]

FairRec(rec_size=10, MMS_fraction=0.5)

Type: Bias Mitigation Inprocessing

[9]:

recommendations = fr.recommendation
new_recs = [explode(recommendations[key], len(df_pivot.columns)) for key in recommendations.keys()]

new_df_pivot_db = pd.DataFrame(new_recs, columns = df_pivot.columns)

mat = new_df_pivot_db.replace(0,np.nan).to_numpy()

df_tsf = recommender_bias_metrics(mat_pred=mat, metric_type='item_based')
df_tsf

[9]:

	Value	Reference
Metric
Aggregate Diversity	1.000000	1
GINI index	0.421428	0
Exposure Distribution Entropy	6.567894	-
Average Recommendation Popularity	317.154227	-

Pipeline implementation for FairRec#

[10]:

from holisticai.pipeline import Pipeline

inprocessing_model = FairRec(rec_size=10, MMS_fraction=0.5)

pipeline = Pipeline(
    steps=[
        ("bm_inprocessing", inprocessing_model),
    ]
)

pipeline.fit(data_matrix)

rankings  = pipeline.predict(data_matrix, top_n=10)
mat = rankings.pivot(columns='Y',index='X',values='score').replace(np.nan,0).to_numpy()
df_tsf_pipeline = recommender_bias_metrics(mat_pred=mat>0, metric_type='item_based')
df_tsf_pipeline

[10]:

	Value	Reference
Metric
Aggregate Diversity	1.000000	1
GINI index	0.421428	0
Exposure Distribution Entropy	6.567894	-
Average Recommendation Popularity	317.154227	-