Quickstart#

This quickstart guide will show you how to get started with the holisticai library.

Install holisticai using pip:

[1]:

#!pip install -q holisticai

[2]:

# skip warnings
import warnings
warnings.filterwarnings("ignore")

Load an example dataset and split

[4]:

from holisticai.datasets import load_dataset

dataset = load_dataset('law_school')
dataset_split = dataset.train_test_split(test_size=0.3)

# some stats about the data
dataset['x'].describe()

[4]:

	age	decile1	decile3	fam_inc	lsat	ugpa
count	20800.000000	20800.000000	20800.000000	20800.000000	20800.000000	20800.000000
mean	59.112115	5.746779	5.565096	3.465625	36.762808	3.226500
std	5.271906	2.780279	2.857598	0.853842	5.386819	0.412616
min	3.000000	1.000000	1.000000	1.000000	11.000000	0.000000
25%	58.000000	3.000000	3.000000	3.000000	33.000000	3.000000
50%	61.000000	6.000000	6.000000	4.000000	37.000000	3.300000
75%	62.000000	8.000000	8.000000	4.000000	41.000000	3.500000
max	69.000000	10.000000	10.000000	5.000000	48.000000	4.000000

[5]:

dataset['y'].value_counts()

[5]:

bar
1    18507
0     2293
Name: count, dtype: int64

Dataset clustering and correlation analysis

[6]:

import seaborn as sns
g = sns.clustermap(dataset['x'].corr(), center=0, cmap="seismic", dendrogram_ratio=(.1, .2), cbar_pos=(.02, .32, .03, .2),
                   linewidths=.75, figsize=(5, 5))

g.ax_row_dendrogram.remove()

../_images/getting_started_quickstart_9_0.png

Separate the data into train and test sets

[7]:

train_data = dataset_split['train']
test_data = dataset_split['test']

print(train_data['x'].shape)
print(test_data['x'].shape)

(14560, 8)
(6240, 8)

Rescale the training and testing data

[8]:

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_t = scaler.fit_transform(train_data['x'])
X_test_t = scaler.transform(test_data['x'])

Training a logistic regression model and compute predictions

[9]:

from sklearn.linear_model import LogisticRegression

model = LogisticRegression(random_state=42, max_iter=500)
model.fit(X_train_t, train_data['y'])

# make predictions
y_pred = model.predict(X_test_t)

A simple classification report for model’s predictions

[10]:

from sklearn.metrics import classification_report

print(classification_report(test_data['y'], y_pred))

              precision    recall  f1-score   support

           0       0.60      0.22      0.33       707
           1       0.91      0.98      0.94      5533

    accuracy                           0.90      6240
   macro avg       0.75      0.60      0.64      6240
weighted avg       0.87      0.90      0.87      6240

Classification bias metrics for the model’s predictions

[11]:

from holisticai.metrics.bias import classification_bias_metrics

metrics = classification_bias_metrics(
group_a = test_data['group_a'],
group_b = test_data['group_b'],
y_true = test_data['y'],
y_pred = y_pred
)

metrics

[11]:

	Value	Reference
Metric
Statistical Parity	0.172941	0
Disparate Impact	1.213069	1
Four Fifths Rule	0.824356	1
Cohen D	0.902750	0
2SD Rule	24.618566	0
Equality of Opportunity Difference	0.085044	0
False Positive Rate Difference	0.313732	0
Average Odds Difference	0.199388	0
Accuracy Difference	0.164560	0

Plot the bias metrics in a simple plot

[12]:

from holisticai.plots.bias import bias_metrics_report

bias_metrics_report(model_type='binary_classification', table_metrics=metrics)

../_images/getting_started_quickstart_21_0.png