Fair Localization and Refined Per-Group Generalization Bounds

Project Overview

Overview

This work seeks to view fair machine learning across groups through the lens of multi-task learning, where we seek not just bounds on the total risk, but rather on the risk of each group, and thus we employ a localization-based analysis approach to better understand the regularizing effects of fair learning over multiple groups, which results in stronger guarantees for the generalization error of the cardinal fairness objective, as well as the generalization error of each group. This analysis rigorously quantifies the tradeoff between the variance-reducing and the bias-increasing effects of joint training. Crucially, this lets us explore the impact of pooling data across groups on minority groups, which benefit most from additional data (due to overfitting to small sample sizes), but are most susceptible to being ignored by common machine learning objectives (due to small impact on objective functions).

To Pool or Not To Pool:
Analyzing the Regularizing Effects of Group-Fair Training on Shared Models

Cyrus Cousins, Indra Elizabeth Kumar, and Suresh Venkatasubramanian

Abstract

In fair machine learning, one source of performance disparities between groups is overfitting to groups with relatively few training samples. We derive group-specific bounds on the generalization error of welfare-centric fair machine learning that benefit from the larger sample size of the majority group. We do this by considering group-specific Rademacher averages over a restricted hypothesis class, which contains the family of models likely to perform well with respect to a fair learning objective (e.g., a power-mean). Our simulations demonstrate these bounds improve over a naive method, as expected by theory, with particularly significant improvement for smaller group sizes.

Keywords

Fair Machine Learning ♦ Cardinal Welfare Theory ♣ Statistical Learning Theory ♥ Localization ♠ Multi-Task Learning

Read the full paper on arXiv

Other Materials

Poster

Short slide deck

Code

All code for running experiments and generating plots is available here.