Learn AI Series (#35) - Data Ethics and Bias in ML

What will I learn

You will learn how bias enters ML systems -- from data collection through deployment;
the major types of bias: selection, measurement, confirmation, survivorship, historical;
fairness metrics and why "fair" doesn't have a single mathematical definition;
practical debiasing techniques: resampling, reweighting, threshold adjustment;
differential privacy concepts -- protecting individual data in trained models;
building fairness auditing tools from scratch so you can measure what matters;
when NOT to use ML -- because sometimes a simple rule is both safer and fairer.

Requirements

A working modern computer running macOS, Windows or Ubuntu;
An installed Python 3(.11+) distribution;
The ambition to learn AI and machine learning.

Difficulty

Beginner

Curriculum (of the `Learn AI Series`):

@scipio/learn-ai-series-1-what-machine-learning-actually-is" target="_blank" rel="noopener noreferrer">Learn AI Series (#1) - What Machine Learning Actually Is
@scipio/learn-ai-series-2-setting-up-your-ai-workbench-python-and-numpy" target="_blank" rel="noopener noreferrer">Learn AI Series (#2) - Setting Up Your AI Workbench - Python and NumPy
@scipio/learn-ai-series-3-your-data-is-just-numbers-how-machines-see-the-world" target="_blank" rel="noopener noreferrer">Learn AI Series (#3) - Your Data Is Just Numbers - How Machines See the World
@scipio/learn-ai-series-4-your-first-prediction-no-math-just-intuition" target="_blank" rel="noopener noreferrer">Learn AI Series (#4) - Your First Prediction - No Math, Just Intuition
@scipio/learn-ai-series-5-patterns-in-data-what-learning-actually-looks-like" target="_blank" rel="noopener noreferrer">Learn AI Series (#5) - Patterns in Data - What "Learning" Actually Looks Like
@scipio/learn-ai-series-6-from-intuition-to-math-why-we-need-formulas" target="_blank" rel="noopener noreferrer">Learn AI Series (#6) - From Intuition to Math - Why We Need Formulas
@scipio/learn-ai-series-7-the-training-loop-see-it-work-step-by-step" target="_blank" rel="noopener noreferrer">Learn AI Series (#7) - The Training Loop - See It Work Step by Step
@scipio/learn-ai-series-8-the-math-you-actually-need-part-1-linear-algebra" target="_blank" rel="noopener noreferrer">Learn AI Series (#8) - The Math You Actually Need (Part 1) - Linear Algebra
@scipio/learn-ai-series-9-the-math-you-actually-need-part-2-calculus-and-probability" target="_blank" rel="noopener noreferrer">Learn AI Series (#9) - The Math You Actually Need (Part 2) - Calculus and Probability
@scipio/learn-ai-series-10-your-first-ml-model-linear-regression-from-scratch" target="_blank" rel="noopener noreferrer">Learn AI Series (#10) - Your First ML Model - Linear Regression From Scratch
@scipio/learn-ai-series-11-making-linear-regression-real" target="_blank" rel="noopener noreferrer">Learn AI Series (#11) - Making Linear Regression Real
@scipio/learn-ai-series-12-classification-logistic-regression-from-scratch" target="_blank" rel="noopener noreferrer">Learn AI Series (#12) - Classification - Logistic Regression From Scratch
@scipio/learn-ai-series-13-evaluation-how-to-know-if-your-model-actually-works" target="_blank" rel="noopener noreferrer">Learn AI Series (#13) - Evaluation - How to Know If Your Model Actually Works
@scipio/learn-ai-series-14-data-preparation-the-80-nobody-talks-about" target="_blank" rel="noopener noreferrer">Learn AI Series (#14) - Data Preparation - The 80% Nobody Talks About
@scipio/learn-ai-series-15-feature-engineering-and-selection" target="_blank" rel="noopener noreferrer">Learn AI Series (#15) - Feature Engineering and Selection
@scipio/learn-ai-series-16-scikit-learn-the-standard-library-of-ml" target="_blank" rel="noopener noreferrer">Learn AI Series (#16) - Scikit-Learn - The Standard Library of ML
@scipio/learn-ai-series-17-decision-trees-how-machines-make-decisions" target="_blank" rel="noopener noreferrer">Learn AI Series (#17) - Decision Trees - How Machines Make Decisions
@scipio/learn-ai-series-18-random-forests-wisdom-of-crowds" target="_blank" rel="noopener noreferrer">Learn AI Series (#18) - Random Forests - Wisdom of Crowds
@scipio/learn-ai-series-19-gradient-boosting-the-kaggle-champion" target="_blank" rel="noopener noreferrer">Learn AI Series (#19) - Gradient Boosting - The Kaggle Champion
@scipio/learn-ai-series-20-support-vector-machines-drawing-the-perfect-boundary" target="_blank" rel="noopener noreferrer">Learn AI Series (#20) - Support Vector Machines - Drawing the Perfect Boundary
@scipio/learn-ai-series-21-mini-project-predicting-crypto-market-regimes" target="_blank" rel="noopener noreferrer">Learn AI Series (#21) - Mini Project - Predicting Crypto Market Regimes
@scipio/learn-ai-series-22-k-means-clustering-finding-groups" target="_blank" rel="noopener noreferrer">Learn AI Series (#22) - K-Means Clustering - Finding Groups
@scipio/learn-ai-series-23-advanced-clustering-beyond-k-means" target="_blank" rel="noopener noreferrer">Learn AI Series (#23) - Advanced Clustering - Beyond K-Means
@scipio/learn-ai-series-24-dimensionality-reduction-pca" target="_blank" rel="noopener noreferrer">Learn AI Series (#24) - Dimensionality Reduction - PCA
@scipio/learn-ai-series-25-advanced-dimensionality-reduction-t-sne-and-umap" target="_blank" rel="noopener noreferrer">Learn AI Series (#25) - Advanced Dimensionality Reduction - t-SNE and UMAP
@scipio/learn-ai-series-26-anomaly-detection-finding-what-doesnt-belong" target="_blank" rel="noopener noreferrer">Learn AI Series (#26) - Anomaly Detection - Finding What Doesn't Belong
@scipio/learn-ai-series-27-recommendation-systems-users-like-you-also-liked" target="_blank" rel="noopener noreferrer">Learn AI Series (#27) - Recommendation Systems - "Users Like You Also Liked..."
@scipio/learn-ai-series-28-time-series-fundamentals-when-order-matters" target="_blank" rel="noopener noreferrer">Learn AI Series (#28) - Time Series Fundamentals - When Order Matters
@scipio/learn-ai-series-29-time-series-forecasting-predicting-what-comes-next" target="_blank" rel="noopener noreferrer">Learn AI Series (#29) - Time Series Forecasting - Predicting What Comes Next
@scipio/learn-ai-series-30-natural-language-processing-text-as-data" target="_blank" rel="noopener noreferrer">Learn AI Series (#30) - Natural Language Processing - Text as Data
@scipio/learn-ai-series-31-word-embeddings-meaning-in-numbers" target="_blank" rel="noopener noreferrer">Learn AI Series (#31) - Word Embeddings - Meaning in Numbers
@scipio/learn-ai-series-32-bayesian-methods-thinking-in-probabilities" target="_blank" rel="noopener noreferrer">Learn AI Series (#32) - Bayesian Methods - Thinking in Probabilities
@scipio/learn-ai-series-33-ensemble-methods-deep-dive-stacking-and-blending" target="_blank" rel="noopener noreferrer">Learn AI Series (#33) - Ensemble Methods Deep Dive - Stacking and Blending
@scipio/learn-ai-series-34-ml-engineering-from-notebook-to-production" target="_blank" rel="noopener noreferrer">Learn AI Series (#34) - ML Engineering - From Notebook to Production
@scipio/learn-ai-series-35-data-ethics-and-bias-in-ml" target="_blank" rel="noopener noreferrer">Learn AI Series (#35) - Data Ethics and Bias in ML (this post)

Learn AI Series (#35) - Data Ethics and Bias in ML

This episode is different from what we've been doing. There's no new algorithm to implement from scratch, no loss function to derive, no hyperparameter to tune. In stead, we're confronting a reality that every ML practitioner must face eventually: the models we build are not neutral. They absorb the biases in our data, amplify patterns we might not endorse, and make decisions that affect real people. Understanding this isn't optional -- it's a professional responsibility.

I'm not going to lecture you about being a good person. You already know right from wrong. What I will do is show you the mechanics of how bias enters ML systems, how to measure it, and what tools exist to mitigate it. Because the most dangerous bias isn't malicious -- it's accidental, invisible, and baked into data you assumed was objective. And as we saw in episode #14 (data preparation) and episode #13 (evaluation), how you handle your data and what you measure determines everything about your model's real-world behavior.

Here we go!

How bias enters the pipeline

Bias doesn't start at the algorithm. It starts with data, and data is generated by humans operating within imperfect systems. Remember the data preparation pipeline from episode #14? Every single step of that pipeline is a potential injection point for bias:

Collection bias: who's in the dataset and who's missing. A facial recognition system trained mostly on lighter-skinned faces performs poorly on darker-skinned faces -- not because the algorithm is racist, but because the training data didn't represent the full population. A medical diagnosis model trained on data from a single hospital serves that hospital's patient demographics, not the world's. The model can only learn what it sees, and if it never sees certain groups, it can't serve them.

Labeling bias: who decided the ground truth and what assumptions they brought. If historical hiring data shows that Company X hired mostly men for engineering roles, a model trained on that data learns that being male is a positive signal for engineering hiring. The model didn't create the bias -- it learned it from biased human decisions. This connects directly to what we discussed in episode #7 about the training loop: the model optimizes whatever objective you give it, including objectives that encode human prejudice.

Measurement bias: how features are collected and what they proxy for. Zip code, in the United States, correlates strongly with race and income. A model that uses zip code for credit scoring is -- whether it intends to or not -- partially making decisions based on race and socioeconomic status. The feature is a proxy for protected characteristics. We covered feature engineering in episode #15, and this is the dark side: features can encode information you didn't intend to include.

Aggregation bias: treating all groups as one when they have different underlying patterns. A single diabetes prediction model might perform well on average but poorly for specific ethnic groups whose physiological markers differ from the majority population in the training data. The evaluation metrics from episode #13 would show you a nice 85% accuracy -- hiding the fact that it's 92% for group A and 63% for group B.

Let me show you exactly how this plays out with code:

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Simulate a biased hiring dataset
# Historical data: company hired mostly from group A
np.random.seed(42)
n = 1000

# Group membership (A or B)
group = np.random.choice(['A', 'B'], n, p=[0.7, 0.3])

# Skill scores (equal between groups in reality)
skill = np.random.randn(n) * 10 + 50

# Historical hiring decisions (BIASED)
# Group A: hired if skill > 45
# Group B: hired if skill > 55 (unfairly higher bar)
hired = np.where(
    group == 'A',
    (skill > 45).astype(int),
    (skill > 55).astype(int)
)

# Train a model on this biased history
X = np.column_stack([
    skill,
    (group == 'A').astype(int)  # group as binary feature
])
X_train, X_test, y_train, y_test = train_test_split(
    X, hired, test_size=0.3, random_state=42
)

model = LogisticRegression(random_state=42)
model.fit(X_train, y_train)

# Check performance by group
test_groups = np.where(X_test[:, 1] == 1, 'A', 'B')
for g in ['A', 'B']:
    mask = test_groups == g
    acc = model.score(X_test[mask], y_test[mask])
    pred_rate = model.predict(X_test[mask]).mean()
    print(f"Group {g}: accuracy={acc:.3f}, "
          f"approval rate={pred_rate:.1%}")

print(f"\nModel coefficients:")
print(f"  Skill weight: {model.coef_[0][0]:.4f}")
print(f"  Group weight: {model.coef_[0][1]:.4f}")
print(f"  (Positive group weight = being in group A "
      f"helps your chances)")

The model learned exactly what the data told it: group A gets approved more easily than group B, even at identical skill levels. The group coefficient is positive, meaning membership in group A is a predictive feature for getting hired -- because it WAS predictive in the historical data. The model isn't wrong about what happened in the past. It's just reproducing a past we shouldn't repeat.

Types of bias, named and explained

Let me catalog the main types so you can recognize them when you encounter them (and you will -- I guarantee it):

Selection bias: the training data isn't representative of the population you'll deploy to. Survivorship bias is a subset -- you only see data from entities that survived a selection process. If you study successful startups to predict startup success, you're missing all the failed startups with identical characteristics that just happened to fail. Your model learns "having a ping-pong table predicts success" when in reality it predicts nothing -- the failed startups with ping-pong tables just aren't in your dataset.

Confirmation bias: you (the engineer) look for patterns that confirm your existing beliefs and ignore contradicting evidence. You expect feature X to be important, so you engineer it aggressively and dismiss the cross-validation fold where it hurt performance. We talked about proper evaluation in episode #13 -- this is why you never look at the test set until the very end.

Historical bias: the data faithfully represents a biased world. This is the trickiest form because the data is correct -- it accurately reflects past decisions and outcomes. But those past decisions were made in a world with systemic inequality, and a model that replicates them perpetuates that inequality. The hiring example above is exactly this: the data is accurate, the model is well-trained, and the result is discrimination.

Representation bias: certain groups are under- or over-represented relative to the target population. Even if each individual data point is unbiased, the distribution of who appears in the dataset determines what patterns the model learns. Remember from episode #14 -- class imbalance affects model behavior, and group imbalance does too.

# Demonstrate representation bias
np.random.seed(42)

# Dataset: 900 from group A, 100 from group B
n_a, n_b = 900, 100
X_a = np.random.randn(n_a, 5) + 0.5
X_b = np.random.randn(n_b, 5) - 0.5
y_a = (X_a[:, 0] + X_a[:, 1] > 1.0).astype(int)
y_b = (X_b[:, 0] + X_b[:, 1] > -0.5).astype(int)

X_all = np.vstack([X_a, X_b])
y_all = np.concatenate([y_a, y_b])
groups = np.array(['A'] * n_a + ['B'] * n_b)

# Train on the imbalanced dataset
X_tr, X_te, y_tr, y_te, g_tr, g_te = train_test_split(
    X_all, y_all, groups, test_size=0.3, random_state=42
)

clf = LogisticRegression(random_state=42, max_iter=1000)
clf.fit(X_tr, y_tr)

# Evaluate per group
print("Representation bias -- unbalanced groups:\n")
print(f"Training set: {(g_tr == 'A').sum()} from A, "
      f"{(g_tr == 'B').sum()} from B")
print(f"  (Group B is {(g_tr == 'B').mean():.0%} of "
      f"training data)\n")

for g in ['A', 'B']:
    mask = g_te == g
    acc = clf.score(X_te[mask], y_te[mask])
    n = mask.sum()
    print(f"Group {g}: accuracy = {acc:.3f} "
          f"(n={n} test samples)")

print(f"\n--> Group B performance is likely worse "
      f"because the model optimized for Group A")

The model optimizes for overall accuracy, which means it cares much more about getting group A right (900 samples) than group B (100 samples). This is the same class imbalance problem from episode #14, just wearing a different hat. The solution approaches are similar too -- resampling, reweighting, stratified evaluation.

Measuring fairness

"Fair" doesn't have a single mathematical definition. Seriously. Researchers have proposed dozens of fairness metrics, and it's been proven that most of them are mutually incompatible -- you can't satisfy all of them simultaneously. Here are the three most important ones you need to know:

Demographic parity: the model's positive prediction rate should be equal across groups. If 60% of group A gets approved and only 30% of group B, demographic parity is violated. Simple to measure, but it ignores whether the groups genuinely differ in the target variable.

Equalized odds: the model's true positive rate and false positive rate should be equal across groups. This is stronger than demographic parity because it conditions on the actual outcome. A credit model should catch deadbeat borrowers at the same rate regardless of group, and should approve creditworthy borrowers at the same rate regardless of group.

Individual fairness: similar people should get similar predictions. If two applicants differ only in a protected attribute (race, gender), they should get the same decision. The challenge: defining "similar" mathematically.

import numpy as np

def demographic_parity(predictions, group_labels):
    """Measure selection rate difference between groups."""
    groups = np.unique(group_labels)
    rates = {}
    for g in groups:
        mask = group_labels == g
        rates[g] = predictions[mask].mean()
    disparity = max(rates.values()) - min(rates.values())
    return rates, disparity

def equalized_odds_diff(predictions, true_labels,
                         group_labels):
    """Measure TPR and FPR difference between groups."""
    groups = np.unique(group_labels)
    tpr, fpr = {}, {}
    for g in groups:
        mask = group_labels == g
        positives = true_labels[mask] == 1
        negatives = true_labels[mask] == 0
        if positives.any():
            tpr[g] = predictions[mask][positives].mean()
        else:
            tpr[g] = 0
        if negatives.any():
            fpr[g] = predictions[mask][negatives].mean()
        else:
            fpr[g] = 0
    tpr_diff = max(tpr.values()) - min(tpr.values())
    fpr_diff = max(fpr.values()) - min(fpr.values())
    return tpr, fpr, tpr_diff, fpr_diff

# Example: synthetic lending decisions
np.random.seed(42)
n = 1000
groups = np.random.choice(['A', 'B'], n)
true_labels = np.random.binomial(1, 0.5, n)

# Biased predictions: group B gets lower approval
preds = np.where(
    groups == 'A',
    np.random.binomial(1, 0.7, n),
    np.random.binomial(1, 0.4, n)
)

rates, gap = demographic_parity(preds, groups)
print(f"Demographic parity check:")
print(f"  Selection rates: {rates}")
print(f"  Disparity: {gap:.3f}")
print(f"  (0 = perfectly fair, higher = more biased)\n")

tpr, fpr, tpr_d, fpr_d = equalized_odds_diff(
    preds, true_labels, groups
)
print(f"Equalized odds check:")
print(f"  True positive rates: {tpr}")
print(f"  False positive rates: {fpr}")
print(f"  TPR difference: {tpr_d:.3f}")
print(f"  FPR difference: {fpr_d:.3f}")

The impossibility result

This is the part that makes ethics in ML genuinely hard -- not just hand-wringing hard, but mathematically hard. In most real-world cases, you cannot achieve demographic parity AND equalized odds simultaneously. If group A is genuinely twice as likely to repay a loan (due to systemic economic factors), a model with equalized odds will approve more of group A, violating demographic parity. And a model with demographic parity will approve less-qualified group A members and more-qualified group B members, violating equalized odds.

This isn't a technical problem -- it's a values problem. Which kind of fairness matters depends on the context, the stakeholders, and the consequences of errors. A hiring model, a criminal sentencing model, and a medical triage model might all need different fairness definitions even though they're all classification tasks. The math can't tell you which one to pick. That's a human decision.

# Demonstrate the impossibility: you can't have both
np.random.seed(42)
n = 2000

# Two groups with genuinely different base rates
group = np.random.choice(['A', 'B'], n)
# Group A: 70% positive base rate
# Group B: 40% positive base rate
# (This reflects real-world inequality, not model bias)
true = np.where(
    group == 'A',
    np.random.binomial(1, 0.70, n),
    np.random.binomial(1, 0.40, n)
)

# Option 1: Equal accuracy (equalized odds focus)
# Use same threshold for both groups
scores = true + np.random.randn(n) * 0.3
preds_equal_thresh = (scores > 0.5).astype(int)

# Option 2: Equal selection rates (demographic parity focus)
# Different thresholds per group to equalize approval rates
target_rate = 0.55  # target 55% approval for both
preds_equal_rate = np.zeros(n, dtype=int)
for g in ['A', 'B']:
    mask = group == g
    g_scores = scores[mask]
    # Find threshold that gives target rate
    threshold = np.percentile(
        g_scores, (1 - target_rate) * 100
    )
    preds_equal_rate[mask] = (
        g_scores > threshold
    ).astype(int)

print("=== Option 1: Same threshold (equalized odds) ===")
rates1, gap1 = demographic_parity(
    preds_equal_thresh, group)
print(f"Selection rates: {rates1}")
print(f"Demographic parity gap: {gap1:.3f}")
tpr1, fpr1, _, _ = equalized_odds_diff(
    preds_equal_thresh, true, group)
print(f"TPR by group: {tpr1}")
print(f"FPR by group: {fpr1}")

print(f"\n=== Option 2: Adjusted thresholds "
      f"(demographic parity) ===")
rates2, gap2 = demographic_parity(
    preds_equal_rate, group)
print(f"Selection rates: {rates2}")
print(f"Demographic parity gap: {gap2:.3f}")
tpr2, fpr2, _, _ = equalized_odds_diff(
    preds_equal_rate, true, group)
print(f"TPR by group: {tpr2}")
print(f"FPR by group: {fpr2}")

print(f"\n--> Option 1 has equal error rates but "
      f"unequal selection")
print(f"--> Option 2 has equal selection but "
      f"unequal error rates")
print(f"--> You CANNOT have both when base rates differ")

Debiasing techniques

Once you've measured bias, how do you reduce it? The approaches fall into three categories, and each connects to concepts we've already covered:

Pre-processing: fix the data

Resampling to balance group representation (same idea as the class imbalance techniques from episode #14). Reweighting samples so underrepresented groups have more influence. Removing or transforming features that serve as proxies for protected attributes (the feature selection from episode #15, but with a fairness lens). These are the simplest approaches and often the most effective:

from sklearn.utils import resample
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

# Back to our biased hiring dataset
np.random.seed(42)
n = 1000
group = np.random.choice(['A', 'B'], n, p=[0.7, 0.3])
skill = np.random.randn(n) * 10 + 50
hired = np.where(
    group == 'A',
    (skill > 45).astype(int),
    (skill > 55).astype(int)
)

X = skill.reshape(-1, 1)  # skill only, DROP group feature

# Technique 1: Resampling to balance groups
mask_a = group == 'A'
mask_b = group == 'B'
n_target = min(mask_a.sum(), mask_b.sum())

idx_a = resample(
    np.where(mask_a)[0],
    n_samples=n_target, random_state=42
)
idx_b = resample(
    np.where(mask_b)[0],
    n_samples=n_target, random_state=42
)
balanced_idx = np.concatenate([idx_a, idx_b])

print(f"Original: {mask_a.sum()} A, {mask_b.sum()} B")
print(f"Balanced: {n_target} A, {n_target} B")

# Technique 2: Reweighting
weights = np.ones(n)
# Give underrepresented group more weight
weight_ratio = mask_a.sum() / mask_b.sum()
weights[mask_b] *= weight_ratio

print(f"\nSample weights:")
print(f"  Group A weight: 1.00")
print(f"  Group B weight: {weight_ratio:.2f}")

# Train with fairness-aware approach
# Step 1: remove the group feature entirely
# Step 2: use balanced sampling
clf_balanced = LogisticRegression(random_state=42)
clf_balanced.fit(X[balanced_idx], hired[balanced_idx])

# Step 3: compare predictions
preds_balanced = clf_balanced.predict(X)
for g_label, g_mask in [('A', mask_a), ('B', mask_b)]:
    rate = preds_balanced[g_mask].mean()
    print(f"\nGroup {g_label} approval rate "
          f"(balanced model): {rate:.1%}")

The first and most impactful step is often the simplest: remove the protected attribute as a feature. If group membership shouldn't influence the decision, don't give it to the model. But (and this is a big but) removing the feature doesn't guarantee fairness -- other features might correlate with group membership and act as proxies. Zip code proxies for race. Name proxies for gender and ethnicity. Height proxies for sex. You can remove the explicit feature and still have implicit bias leaking through correlated features.

In-processing: constrain the algorithm

Adding fairness constraints to the optimization objective. The idea is elegant: train your model to predict well WHILE also ensuring that its predictions don't discriminate. Adversarial debiasing trains a model to predict the target variable while simultaneously fooling a discriminator that tries to predict the protected attribute from the model's predictions. If the discriminator can't figure out which group a prediction belongs to, the model's output is independent of group membership.

Post-processing: fix the predictions

Apply different decision thresholds per group to equalize error rates. If group B has a higher false positive rate, raise the threshold for group B. Simple, effective, and doesn't require retraining:

from sklearn.metrics import confusion_matrix

# Train a standard model
np.random.seed(42)
n = 2000
group = np.random.choice(['A', 'B'], n)
features = np.random.randn(n, 5)
# True labels with different base rates
true = np.where(
    group == 'A',
    np.random.binomial(1, 0.6, n),
    np.random.binomial(1, 0.4, n)
)

X_tr, X_te, y_tr, y_te, g_tr, g_te = train_test_split(
    features, true, group,
    test_size=0.3, random_state=42
)

clf = LogisticRegression(random_state=42, max_iter=1000)
clf.fit(X_tr, y_tr)
probs = clf.predict_proba(X_te)[:, 1]

# Default threshold: 0.5 for everyone
default_preds = (probs > 0.5).astype(int)
print("Default threshold (0.5 for all):")
for g in ['A', 'B']:
    mask = g_te == g
    rate = default_preds[mask].mean()
    acc = (default_preds[mask] == y_te[mask]).mean()
    print(f"  Group {g}: approval={rate:.1%}, "
          f"accuracy={acc:.3f}")

# Post-processing: adjust thresholds per group
# Goal: equalize approval rates
target_rate = 0.50
thresholds = {}
for g in ['A', 'B']:
    mask = g_te == g
    g_probs = probs[mask]
    # Find threshold that gives the target approval rate
    thresholds[g] = np.percentile(
        g_probs, (1 - target_rate) * 100
    )

print(f"\nAdjusted thresholds: {thresholds}")

adjusted_preds = np.zeros(len(X_te), dtype=int)
for g in ['A', 'B']:
    mask = g_te == g
    adjusted_preds[mask] = (
        probs[mask] > thresholds[g]
    ).astype(int)

print(f"\nAdjusted thresholds:")
for g in ['A', 'B']:
    mask = g_te == g
    rate = adjusted_preds[mask].mean()
    acc = (adjusted_preds[mask] == y_te[mask]).mean()
    print(f"  Group {g}: approval={rate:.1%}, "
          f"accuracy={acc:.3f}")

The practical approach for most projects: start with measurement. Before debiasing anything, quantify how much bias actually exists. Often you'll find that a well-built model with representative data has less bias than you feared. When bias is present, start with pre-processing (it's cheapest and most transparent), and escalate to in-processing or post-processing only if needed.

Building a complete fairness audit

Let's put this all together into a reusable fairness audit function -- something you can drop into any ML project. This connects the evaluation mindset from episode #13 (measure what matters) with the fairness concepts from today:

from sklearn.metrics import accuracy_score

def fairness_audit(y_true, y_pred, y_prob,
                    group_labels, group_names=None):
    """Complete fairness audit for a binary classifier.
    Returns per-group metrics and fairness gaps."""
    groups = np.unique(group_labels)
    if group_names is None:
        group_names = {g: str(g) for g in groups}

    report = {'groups': {}, 'gaps': {}}

    for g in groups:
        mask = group_labels == g
        n_g = mask.sum()
        pos = y_true[mask] == 1
        neg = y_true[mask] == 0

        # Selection rate
        sel_rate = y_pred[mask].mean()

        # Accuracy
        acc = accuracy_score(y_true[mask], y_pred[mask])

        # TPR (sensitivity / recall)
        tpr = (y_pred[mask][pos].mean()
               if pos.any() else 0)

        # FPR
        fpr = (y_pred[mask][neg].mean()
               if neg.any() else 0)

        # Average predicted probability
        avg_prob = y_prob[mask].mean()

        report['groups'][group_names[g]] = {
            'n': n_g,
            'selection_rate': round(sel_rate, 4),
            'accuracy': round(acc, 4),
            'tpr': round(tpr, 4),
            'fpr': round(fpr, 4),
            'avg_probability': round(avg_prob, 4),
        }

    # Compute gaps
    sel_rates = [r['selection_rate']
                 for r in report['groups'].values()]
    tprs = [r['tpr'] for r in report['groups'].values()]
    fprs = [r['fpr'] for r in report['groups'].values()]

    report['gaps'] = {
        'demographic_parity': round(
            max(sel_rates) - min(sel_rates), 4),
        'tpr_gap': round(max(tprs) - min(tprs), 4),
        'fpr_gap': round(max(fprs) - min(fprs), 4),
    }

    return report


# Run the audit on our lending model
audit = fairness_audit(
    y_te, adjusted_preds, probs, g_te,
    group_names={'A': 'Group A', 'B': 'Group B'}
)

print("=== Fairness Audit Report ===\n")
for name, metrics in audit['groups'].items():
    print(f"{name} (n={metrics['n']}):")
    print(f"  Selection rate: {metrics['selection_rate']:.1%}")
    print(f"  Accuracy:       {metrics['accuracy']:.3f}")
    print(f"  TPR:            {metrics['tpr']:.3f}")
    print(f"  FPR:            {metrics['fpr']:.3f}")
    print(f"  Avg probability:{metrics['avg_probability']:.3f}")
    print()

print(f"Fairness gaps:")
for metric, value in audit['gaps'].items():
    status = "OK" if value < 0.1 else "ALERT"
    print(f"  {metric}: {value:.3f} [{status}]")

This audit function gives you a standardized way to check any classifier for fairness issues. Run it alongside your regular evaluation metrics from episode #13 -- accuracy, precision, recall, F1 are necessary but not sufficient. A model with 95% accuracy that's 98% accurate for one group and 75% for another is NOT a good model, even though the aggregate number looks excellent ;-)

Differential privacy: protecting individuals

Even when a model isn't biased, it can leak private information. A model trained on medical records might memorize specific patients' data. An adversary who queries the model carefully could extract whether a specific person was in the training set -- revealing their medical condition without ever seeing the raw data.

Differential privacy adds calibrated noise during training so that no single individual's data can be identified from the model's outputs. The formal guarantee: the model's behavior changes by at most a bounded amount when any single person's data is added or removed from the training set. This means an adversary gains negligible information about any individual, even with unlimited queries.

# Demonstrate the privacy-accuracy tradeoff
np.random.seed(42)

# Sensitive dataset: salary predictions
n = 500
experience = np.random.uniform(0, 20, n)
salary = 30000 + 3000 * experience + np.random.randn(n) * 5000

# Standard linear regression (no privacy)
from sklearn.linear_model import LinearRegression
X_sal = experience.reshape(-1, 1)
model_std = LinearRegression().fit(X_sal, salary)

# Simulated differentially private regression
# Add noise proportional to sensitivity / epsilon
def dp_linear_regression(X, y, epsilon=1.0):
    """Simple DP regression via output perturbation.
    Adds Laplace noise to coefficients."""
    model = LinearRegression().fit(X, y)

    # Sensitivity depends on data range and model
    # (simplified for illustration)
    sensitivity = (y.max() - y.min()) / len(y)
    noise_scale = sensitivity / epsilon

    # Add Laplace noise to coefficients
    noisy_coef = model.coef_ + np.random.laplace(
        0, noise_scale, size=model.coef_.shape
    )
    noisy_intercept = model.intercept_ + np.random.laplace(
        0, noise_scale
    )

    model.coef_ = noisy_coef
    model.intercept_ = noisy_intercept
    return model

# Compare at different privacy levels
print(f"Standard model: slope={model_std.coef_[0]:.1f}, "
      f"intercept={model_std.intercept_:.0f}\n")

print(f"{'Epsilon':>10s}  {'Privacy':>12s}  "
      f"{'Slope':>8s}  {'Intercept':>10s}")
print("-" * 44)
for eps in [0.1, 0.5, 1.0, 5.0, 10.0, 100.0]:
    dp_model = dp_linear_regression(X_sal, salary, eps)
    privacy = ("Strong" if eps <= 1
               else "Moderate" if eps <= 10
               else "Weak")
    print(f"{eps:>10.1f}  {privacy:>12s}  "
          f"{dp_model.coef_[0]:>8.1f}  "
          f"{dp_model.intercept_:>10.0f}")

print(f"\n--> Lower epsilon = stronger privacy = more noise")
print(f"    Higher epsilon = weaker privacy = less noise")
print(f"    True slope is ~3000, true intercept is ~30000")

The privacy budget (epsilon) controls the tradeoff -- lower epsilon means stronger privacy guarantees but more noise. In practice, epsilon values between 1 and 10 are common, offering meaningful privacy protection with acceptable accuracy loss.

For most ML practitioners, full differential privacy is overkill. The practical minimum: don't memorize outliers (regularize your model -- as we covered in episode #11), don't expose training data through model APIs, and anonymize data before training. These are table-stakes hygiene practices that cost almost nothing.

When NOT to use ML

The most important ethical skill in this entire episode -- and maybe the most underrated skill in all of ML: knowing when a rule-based system is better than a model.

High-stakes binary decisions with clear rules. If the law says "applicants under 18 cannot be approved," write an if-statement. Don't train a model that might learn this implicitly from data and might not. A rule is transparent, auditable, and guaranteed to work. A model is a black box that might get it right 99.8% of the time and catastrophically wrong for the other 0.2%.

When the model can't explain its decision. Some domains -- credit, healthcare, criminal justice -- require explainability by law in many jurisdictions. A gradient boosting model with 500 trees (episode #19) can't explain why it denied a loan application. A logistic regression with 10 features (episode #12) can -- each coefficient tells you how much each feature contributed. Sometimes the simpler, more explainable model is the ethically correct choice, even if the black box is 2% more accurate.

When the training data reflects injustice you don't want to perpetuate. If historical data encodes discrimination, and you can't adequately debias it, don't build a model on it. A human reviewer with clear guidelines might be fairer than an algorithm trained on biased history. This is uncomfortable for ML practitioners because our entire discipline is built on the assumption that data contains useful patterns -- but sometimes the patterns in data are ones we should actively reject.

When the cost of errors is asymetric and catastrophic. A false positive in cancer screening leads to further tests -- inconvenient but manageable. A false positive in an autonomous weapon system is irreversible. For catastrophic-cost decisions, human oversight is non-negotiable regardless of model performance.

# Decision framework: ML vs rules
scenarios = [
    ("Age verification (must be 18+)",
     "Rule-based",
     "Legal requirement, binary, deterministic"),
    ("Spam filtering",
     "ML",
     "Fuzzy boundary, evolving patterns, low cost of error"),
    ("Loan approval",
     "ML + human review",
     "High stakes, legal explainability requirements"),
    ("Medical diagnosis",
     "ML as assistant (human decides)",
     "Catastrophic error cost, requires expert judgment"),
    ("Product recommendation",
     "ML",
     "Low stakes, huge data, personalization needed"),
    ("Criminal sentencing",
     "Rules (sentencing guidelines)",
     "Highest stakes, historical bias in data"),
    ("Fraud detection",
     "ML + rules + human review",
     "Mixed: ML catches patterns, rules catch known fraud"),
    ("Content moderation",
     "ML + human review",
     "Scale requires ML, nuance requires humans"),
]

print(f"{'Scenario':>35s}  {'Approach':>25s}")
print("=" * 62)
for scenario, approach, reason in scenarios:
    print(f"{scenario:>35s}  {approach:>25s}")
    print(f"{'':>35s}  ({reason})")
    print()

The uncomfortable reality: deploying an ML model is a decision with moral implications. "The algorithm decided" is not an excuse -- someone built the algorithm, chose the data, set the thresholds, and decided to deploy it. That someone is responsible. If you're building ML systems (and after 34 episodes, you certainly can), you're also responsible for understanding what those systems do to people.

(Having said that, the flip side is also true: NOT using ML when it could help is also a choice with consequences. Manual processes have their own biases -- humans are inconsistent, tired, prejudiced in ways they don't recognize. A carefully built and audited ML system might actually be fairer than the human process it replaces. The goal isn't to avoid ML -- it's to deploy it thoughtfully.)

Zo, wat hebben we geleerd?

We've covered ground that's different from our usual episodes -- less math, more judgment -- but no less important. Here's the full picture:

Bias enters ML at every pipeline stage -- collection, labeling, measurement, and aggregation -- not just at the algorithm level. The data preparation principles from episode #14 apply here with higher stakes;
Historical bias is the trickiest form: the data is correct, it just reflects an unjust world. A model trained on biased history perpetuates that history;
Fairness has multiple competing definitions (demographic parity, equalized odds, individual fairness) that can't all be satisfied simultaneously. Choosing which definition matters is a values decision, not a technical one;
Measure bias before fixing it -- quantify selection rates and error rates across groups using the evaluation toolkit from episode #13. You can't fix what you can't measure;
Debiasing approaches: fix the data (resampling, reweighting -- pre-processing), constrain the algorithm (adversarial debiasing -- in-processing), or adjust predictions (threshold tuning -- post-processing). Start with the simplest approach that works;
Differential privacy protects individuals by adding noise during training -- stronger privacy means lower accuracy. For most projects, basic data hygeine and regularization (episode #11) are sufficient;
Know when NOT to use ML: when rules are clearer, when explainability is required, when data encodes injustice you shouldn't perpetuate, when error costs are catastrophic and asymmetric;
"The algorithm decided" is never an excuse -- the people who built, trained, and deployed it are responsible.

This episode and the previous one (#34, ML Engineering) complete the classical ML toolkit in a practical sense. We've gone from "what is ML?" all the way through building models, evaluating them, deploying them, and now understanding our responsibility as the people who build them. The path ahead leads to a comprehensive mini-project that ties all of these individual techniques together into one coherent ML pipeline -- from raw data to deployed, monitored, and fair predictions. And after that, we enter the deep learning world where neural networks, backpropagation, and the architectures that power modern AI systems await. Everything we've built so far is the foundation for what comes next.

Thanks for reading! Tot de volgende keer ;-)

@scipio

Learn AI Series (#35) - Data Ethics and Bias in ML

@scipio

Learn AI Series (#35) - Data Ethics and Bias in ML

What will I learn

Requirements

Difficulty

Curriculum (of the Learn AI Series):

Learn AI Series (#35) - Data Ethics and Bias in ML

How bias enters the pipeline

Types of bias, named and explained

Measuring fairness

The impossibility result

Debiasing techniques

Pre-processing: fix the data

In-processing: constrain the algorithm

Post-processing: fix the predictions

Building a complete fairness audit

Differential privacy: protecting individuals

When NOT to use ML

Zo, wat hebben we geleerd?

Thanks for reading! Tot de volgende keer ;-)

Discussion

Curriculum (of the `Learn AI Series`):