Robustness of a binary classification Model#

In this notebook, we will evaluate the robustness of a binary classification model trained on the Adult dataset. We will use the binary classification attacker module to generate adversarial examples and measure the model’s robustness using adversarial accuracy and empirical robustness metrics.

[1]:

import warnings

from holisticai.datasets import load_dataset
from holisticai.robustness.attackers import HopSkipJump, ZooAttack
from holisticai.robustness.metrics import adversarial_accuracy, empirical_robustness
from holisticai.utils import BinaryClassificationProxy
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

warnings.filterwarnings("ignore")

Loading the dataset#

We will use the Adult dataset, which contains census data. The target variable is whether a person’s income exceeds $50K/year, and the protected attribute we will consider is ‘sex’. For time constraints, we will only use a small subset of the data for testing the model and use the attackers.

[2]:

dataset = load_dataset('adult', protected_attribute='sex')
train_test = dataset.train_test_split(test_size=0.001, random_state=42)

train = train_test['train']
test = train_test['test']
train, test

[2]:

(<holisticai.datasets._dataset.Dataset at 0x7f882f620190>,
 <holisticai.datasets._dataset.Dataset at 0x7f882d89c050>)

Preprocessing the data#

For this tutorial, we will select the 10 features that are most correlated with the target variable.

[3]:

correlations = train['X'].corrwith(train['y']).sort_values(ascending=False)

# Select the names of the top 10 most correlated features
top_10_features = correlations.head(10).index.tolist()

train['X'] = train['X'][top_10_features]
test['X'] = test['X'][top_10_features]

feature_names = list(train['X'].columns)

Training the classification model#

We will train a logistic regression model on the training data.

[4]:

from sklearn.pipeline import Pipeline

pipe = Pipeline([('scaler', StandardScaler()), ('lr', LogisticRegression())])
pipe.fit(train['X'], train['y'])

[4]:

Pipeline(steps=[('scaler', StandardScaler()), ('lr', LogisticRegression())])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Let’s evaluate the accuracy of our trained model on the test data.

[5]:

y_pred = pipe.predict(test['X'])
accuracy_score(test['y'], y_pred)

[5]:

0.7608695652173914

Adversarial Attacks#

We will now generate adversarial examples using the binary classification attacker module and evaluate the robustness of our model against these attacks. This module allow us to generate adversarial examples using different attack strategies, such as the HopSkipJump attack and the Zeroth Order Optimization attack.

To do that, first we will use a wrapper around the model that will allow us to use the attacker module.

[6]:

proxy = BinaryClassificationProxy(predict=pipe.predict, predict_proba=pipe.predict_proba, classes=[0, 1])

HopSkipJump Attack#

The HopSkipJump attack is a black-box attack that generates adversarial examples by performing a series of queries to the model. It is an advanced version of the Boundary attack that can generate adversarial examples with high success rates.

So, we will generate adversarial examples using the HopSkipJump attack and evaluate the model’s robustness against these attacks.

[7]:

hsj_attacker = HopSkipJump(name="HSJ", predictor=proxy.predict)

hsj_adv_x = hsj_attacker.generate(test['X'])

y_adv_pred = proxy.predict(hsj_adv_x)

Evaluating Adversarial Robustness

We will evaluate the adversarial accuracy and empirical robustness of our model.

[8]:

hsj_accuracy = adversarial_accuracy(test['y'], y_pred, y_adv_pred)
hsj_robustness = empirical_robustness(test['X'], hsj_adv_x, y_pred, y_adv_pred, norm=2)

print("HSJ accuracy: ", hsj_accuracy)
print("HSJ robustness: ", hsj_robustness)

HSJ accuracy:  0.22857142857142856
HSJ robustness:  0.016330717265190788

Zeroth Order Optimization Attack#

The Zeroth Order Optimization attack is another black-box attack that is a variant of the Carlini and Wagner attack which uses ADAM coordinate descent to perform numerical estimation of gradients.

[9]:

zoo_attacker = ZooAttack(name="Zoo", proxy=proxy)

zoo_adv_x = zoo_attacker.generate(test['X'])

y_adv_pred = proxy.predict(zoo_adv_x)

Evaluating Adversarial Robustness

We will evaluate the adversarial accuracy and empirical robustness of our model.

[10]:

zoo_accuracy = adversarial_accuracy(test['y'], y_pred, y_adv_pred)
zoo_robustness = empirical_robustness(test['X'], zoo_adv_x, y_pred, y_adv_pred, norm=2)

print("Zoo accuracy: ", zoo_accuracy)
print("Zoo robustness: ", zoo_robustness)

Zoo accuracy:  0.7428571428571429
Zoo robustness:  0.006486298867347964